NetBSD Problem Report #35224

From pitz@nepal.rz.uni-konstanz.de  Sat Dec  9 21:49:33 2006
Return-Path: <pitz@nepal.rz.uni-konstanz.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 88F9E63BA6D
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  9 Dec 2006 21:49:33 +0000 (UTC)
Message-Id: <200612092032.kB9KWBh3007479@nepal.rz.uni-konstanz.de>
Date: Sat, 9 Dec 2006 21:32:11 +0100 (CET)
From: stephan.pietzko@uni-konstanz.de
Reply-To: stephan.pietzko@uni-konstanz.de
To: gnats-bugs@NetBSD.org
Subject: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
X-Send-Pr-Version: 3.95

>Number:         35224
>Category:       kern
>Synopsis:       daemons freeze in mclpl condition after lot of net traffic from netbsd-2 through netbsd-3
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Dec 09 21:50:00 +0000 2006
>Last-Modified:  Mon Dec 11 16:15:01 +0000 2006
>Originator:     Charlie Root
>Release:        NetBSD-3
>Organization:
        Admin of several NetBSD server at the university of konstanz
>Environment:
                I use the the GENERIC-kernel on sparc64 on a sun Netra T1. This server is
                a http-mirror for big files. I followed the netbsd-2 brach and had allways
                the same problem and now i follow the netbsd-3 branch and still have the problem.
System: NetBSD nepal 3.1_STABLE NetBSD 3.1_STABLE (GENERIC) #0: Mon Nov 13 01:16:33 CET 2006 root@nepal:/usr/obj/sys/arch/sparc64/compile/GENERIC sparc64
Architecture: sparc64
Machine: sparc64
>Description:
                The server has heavy net load (50-100% of a 100BaseT Interface all the time) and the daemon
                freezes after some days in a unkillable condition (mclpl in top or ps). I followed
                netbsd-2 and netbsd-3 without difference. I have the same problem with apache, thttpd
                and lighttpd. I have to reboot the machine every some days. If the traffic is very high i
                have to reboot two times a day.
>How-To-Repeat:
                I think you just have to produce constant heavy net usage. I donno how to include more data or
                infomation about this problem. I have no core dump or something else. These days the server is
                serving a new Wikipedia-DVD-image and the machine is crashing once a day. I can give some
                of the developers an account on this machine, if someone likes to verify the problem right on
                this server.
>Fix:


>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: stephan.pietzko@uni-konstanz.de
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
Date: Sat, 9 Dec 2006 23:48:33 +0100

 On Sat, Dec 09, 2006 at 09:50:00PM +0000, stephan.pietzko@uni-konstanz.de wrote:
 > >Synopsis:       daemons freeze in mclpl condition after lot of net traffic from netbsd-2 through netbsd-3

 This sounds like a mbuf leak. Check netstat output, maybe you can spot where
 the mbuf are lingering.

 > the machine is crashing once a day

 Is this related? If not, please file a separate PR.

 What kind of crash is it? A panic should print a message before rebooting, we
 need at least that to even start thinking about this.

 Martin

From: Pavel Cahyna <pavel@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
Date: Sat, 9 Dec 2006 23:53:05 +0100

 On Sat, Dec 09, 2006 at 10:50:02PM +0000, Martin Husemann wrote:
 >  This sounds like a mbuf leak. Check netstat output, maybe you can spot where
 >  the mbuf are lingering.

 Preferably netstat -mssv with a kernel built with "options MBUFTRACE"

From: Stephan Pietzko <stephan.pietzko@uni-konstanz.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
Date: Sun, 10 Dec 2006 10:01:18 +0100

 Pavel Cahyna <pavel@NetBSD.org> wrote

 >  >  This sounds like a mbuf leak. Check netstat output, maybe you can spot where
 >  >  the mbuf are lingering.
 >  Preferably netstat -mssv with a kernel built with "options MBUFTRACE"

 ok - the server is now running with this option - so i wait for net
 traffic and the next mclpl condition.

 Can i do anything else to trace the problem?

 tnx Stephan Pietzko

From: Martin Husemann <martin@duskware.de>
To: Stephan Pietzko <stephan.pietzko@uni-konstanz.de>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
Date: Sun, 10 Dec 2006 11:37:09 +0100

 On Sun, Dec 10, 2006 at 10:01:18AM +0100, Stephan Pietzko wrote:
 > Can i do anything else to trace the problem?

 What does sysctl kern.mbuf.nmbclusters say? You could try increasing
 that (in case this is not a leak but just high mbuf cluster demand).

 Martin

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org, stephan.pietzko@uni-konstanz.de
Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
Date: Sun, 10 Dec 2006 13:04:43 +0100

 On Sun, Dec 10, 2006 at 10:40:02AM +0000, Martin Husemann wrote:
 > The following reply was made to PR kern/35224; it has been noted by GNATS.
 > 
 > From: Martin Husemann <martin@duskware.de>
 > To: Stephan Pietzko <stephan.pietzko@uni-konstanz.de>
 > Cc: gnats-bugs@NetBSD.org
 > Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
 > Date: Sun, 10 Dec 2006 11:37:09 +0100
 > 
 >  On Sun, Dec 10, 2006 at 10:01:18AM +0100, Stephan Pietzko wrote:
 >  > Can i do anything else to trace the problem?
 >  
 >  What does sysctl kern.mbuf.nmbclusters say? You could try increasing
 >  that (in case this is not a leak but just high mbuf cluster demand).

 Yes. I've noticed this with several public http servers. On occasions
 I have several connections with a full send-queue to the same client which
 don't make any progress. I think a combination of brocken ADSL setup and
 IE browser cause this: for some reasons the ADSL link can't pass some big
 packets, and IE opens a new connection and send the same request when
 it doesn't get the data fast enough (and the timeout is short).
 Fortunably such setups seems to be less and less frequents (maybe since
 ethernet/ADSL and USB/ADSL bridges are remplaced by ADSL NAT/routers).

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Stephan Pietzko <stephan.pietzko@uni-konstanz.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
Date: Sun, 10 Dec 2006 11:49:57 +0100

 Martin Husemann <martin@duskware.de> wrote

 >  On Sun, Dec 10, 2006 at 10:01:18AM +0100, Stephan Pietzko wrote:
 >  > Can i do anything else to trace the problem?
 >  What does sysctl kern.mbuf.nmbclusters say? You could try increasing
 >  that (in case this is not a leak but just high mbuf cluster demand).

 root@nepal:/root> sysctl kern.mbuf.nmbclusters
 kern.mbuf.nmbclusters = 1024

 thats an idea, but i try first to reproduce the problem with the 
 options         MBUFTRACE # Debuging the mclpl problem
 and change the only one switch after an other.

 tnx Stephan Pietzko

From: Stephan Pietzko <stephan.pietzko@uni-konstanz.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
Date: Mon, 11 Dec 2006 17:14:42 +0100

 Pavel Cahyna <pavel@NetBSD.org> wrote

 >  >  This sounds like a mbuf leak. Check netstat output, maybe you can spot where
 >  >  the mbuf are lingering.
 >  Preferably netstat -mssv with a kernel built with "options MBUFTRACE"

 Ok - after one day i had the same problem again.

 The httpd daemon hangs in mclpl condition again.
 lsof said for several connections 
 '0t0  TCP no PCB, CANTSENDMORE, CANTRCVMORE'

 hth Stephan
 ----------------------------------------------------------------------
 And netstat -mssv says:
 (if it is interesting, i have the output from every 20min)
 root@nepal:/var/tmp> cat netstat.200612111700
 2818 mbufs in use:
         2652 mbufs allocated to data
         166 mbufs allocated to packet headers
 4176 calls to protocol drain routines
                                              small        ext    cluster
            route               inuse             0          0          0
                                claims           31          0          0
                                releases         31          0          0
              arp               inuse             0          0          0
                                claims        70979          0          0
                                releases      70979          0          0
             unix               inuse             0          0          0
                                claims        14176         27         27
                                releases      14176         27         27
        internet6               inuse             0          0          0
                                claims            3          0          0
                                releases          3          0          0
              tcp               inuse           166          0          0
                                claims        32254          0          0
                                releases      32088          0          0
              tcp rx            inuse           254         56         56
                                claims    161841104      12655      12655
                                releases  161840850      12599      12599
              tcp tx            inuse          2398        511        378
                                claims   1193613013  380746746  263702503
                                releases 1193610615  380746235  263702125
              udp               inuse             0          0          0
                                claims          372          0          0
                                releases        372          0          0
                                              small        ext    cluster
              udp rx            inuse             0          0          0
                                claims       337730          2          2
                                releases     337730          2          2
              udp tx            inuse             0          0          0
                                claims          370          0          0
                                releases        370          0          0
         internet rx            inuse             0          0          0
                                claims    162295296      17445      17445
                                releases  162295296      17445      17445
         internet tx            inuse             0          0          0
                                claims    285747856          0          0
                                releases  285747856          0          0
              lo0               inuse             0          0          0
                                claims           48          0          0
                                releases         48          0          0
             hme0 rx            inuse             0          0          0
                                claims    162772341      19318      19318
                                releases  162772341      19318      19318
             hme0 tx            inuse             0          0          0
                                claims    852431570  296636379  198341873
                                releases  852431570  296636379  198341873
          unknown data          inuse             0          0          0
                                claims   1070662544      19318      19318
                                releases 1070662544      19318      19318
                                              small        ext    cluster
          unknown header        inuse             0          0          0
                                claims    285745681          0          0
                                releases  285745681          0          0
          unknown soname        inuse             0          0          0
                                claims        40926          0          0
                                releases      40926          0          0
          unknown soopts        inuse             0          0          0
                                claims           70          0          0
                                releases         70          0          0
          unknown control       inuse             0          0          0
                                claims           18          0          0
                                releases         18          0          0

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.