NetBSD Problem Report #35224
From pitz@nepal.rz.uni-konstanz.de Sat Dec 9 21:49:33 2006
Return-Path: <pitz@nepal.rz.uni-konstanz.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 88F9E63BA6D
for <gnats-bugs@gnats.NetBSD.org>; Sat, 9 Dec 2006 21:49:33 +0000 (UTC)
Message-Id: <200612092032.kB9KWBh3007479@nepal.rz.uni-konstanz.de>
Date: Sat, 9 Dec 2006 21:32:11 +0100 (CET)
From: stephan.pietzko@uni-konstanz.de
Reply-To: stephan.pietzko@uni-konstanz.de
To: gnats-bugs@NetBSD.org
Subject: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
X-Send-Pr-Version: 3.95
>Number: 35224
>Category: kern
>Synopsis: daemons freeze in mclpl condition after lot of net traffic from netbsd-2 through netbsd-3
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Dec 09 21:50:00 +0000 2006
>Last-Modified: Mon Dec 11 16:15:01 +0000 2006
>Originator: Charlie Root
>Release: NetBSD-3
>Organization:
Admin of several NetBSD server at the university of konstanz
>Environment:
I use the the GENERIC-kernel on sparc64 on a sun Netra T1. This server is
a http-mirror for big files. I followed the netbsd-2 brach and had allways
the same problem and now i follow the netbsd-3 branch and still have the problem.
System: NetBSD nepal 3.1_STABLE NetBSD 3.1_STABLE (GENERIC) #0: Mon Nov 13 01:16:33 CET 2006 root@nepal:/usr/obj/sys/arch/sparc64/compile/GENERIC sparc64
Architecture: sparc64
Machine: sparc64
>Description:
The server has heavy net load (50-100% of a 100BaseT Interface all the time) and the daemon
freezes after some days in a unkillable condition (mclpl in top or ps). I followed
netbsd-2 and netbsd-3 without difference. I have the same problem with apache, thttpd
and lighttpd. I have to reboot the machine every some days. If the traffic is very high i
have to reboot two times a day.
>How-To-Repeat:
I think you just have to produce constant heavy net usage. I donno how to include more data or
infomation about this problem. I have no core dump or something else. These days the server is
serving a new Wikipedia-DVD-image and the machine is crashing once a day. I can give some
of the developers an account on this machine, if someone likes to verify the problem right on
this server.
>Fix:
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: stephan.pietzko@uni-konstanz.de
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
Date: Sat, 9 Dec 2006 23:48:33 +0100
On Sat, Dec 09, 2006 at 09:50:00PM +0000, stephan.pietzko@uni-konstanz.de wrote:
> >Synopsis: daemons freeze in mclpl condition after lot of net traffic from netbsd-2 through netbsd-3
This sounds like a mbuf leak. Check netstat output, maybe you can spot where
the mbuf are lingering.
> the machine is crashing once a day
Is this related? If not, please file a separate PR.
What kind of crash is it? A panic should print a message before rebooting, we
need at least that to even start thinking about this.
Martin
From: Pavel Cahyna <pavel@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
Date: Sat, 9 Dec 2006 23:53:05 +0100
On Sat, Dec 09, 2006 at 10:50:02PM +0000, Martin Husemann wrote:
> This sounds like a mbuf leak. Check netstat output, maybe you can spot where
> the mbuf are lingering.
Preferably netstat -mssv with a kernel built with "options MBUFTRACE"
From: Stephan Pietzko <stephan.pietzko@uni-konstanz.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
Date: Sun, 10 Dec 2006 10:01:18 +0100
Pavel Cahyna <pavel@NetBSD.org> wrote
> > This sounds like a mbuf leak. Check netstat output, maybe you can spot where
> > the mbuf are lingering.
> Preferably netstat -mssv with a kernel built with "options MBUFTRACE"
ok - the server is now running with this option - so i wait for net
traffic and the next mclpl condition.
Can i do anything else to trace the problem?
tnx Stephan Pietzko
From: Martin Husemann <martin@duskware.de>
To: Stephan Pietzko <stephan.pietzko@uni-konstanz.de>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
Date: Sun, 10 Dec 2006 11:37:09 +0100
On Sun, Dec 10, 2006 at 10:01:18AM +0100, Stephan Pietzko wrote:
> Can i do anything else to trace the problem?
What does sysctl kern.mbuf.nmbclusters say? You could try increasing
that (in case this is not a leak but just high mbuf cluster demand).
Martin
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org, stephan.pietzko@uni-konstanz.de
Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
Date: Sun, 10 Dec 2006 13:04:43 +0100
On Sun, Dec 10, 2006 at 10:40:02AM +0000, Martin Husemann wrote:
> The following reply was made to PR kern/35224; it has been noted by GNATS.
>
> From: Martin Husemann <martin@duskware.de>
> To: Stephan Pietzko <stephan.pietzko@uni-konstanz.de>
> Cc: gnats-bugs@NetBSD.org
> Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
> Date: Sun, 10 Dec 2006 11:37:09 +0100
>
> On Sun, Dec 10, 2006 at 10:01:18AM +0100, Stephan Pietzko wrote:
> > Can i do anything else to trace the problem?
>
> What does sysctl kern.mbuf.nmbclusters say? You could try increasing
> that (in case this is not a leak but just high mbuf cluster demand).
Yes. I've noticed this with several public http servers. On occasions
I have several connections with a full send-queue to the same client which
don't make any progress. I think a combination of brocken ADSL setup and
IE browser cause this: for some reasons the ADSL link can't pass some big
packets, and IE opens a new connection and send the same request when
it doesn't get the data fast enough (and the timeout is short).
Fortunably such setups seems to be less and less frequents (maybe since
ethernet/ADSL and USB/ADSL bridges are remplaced by ADSL NAT/routers).
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Stephan Pietzko <stephan.pietzko@uni-konstanz.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
Date: Sun, 10 Dec 2006 11:49:57 +0100
Martin Husemann <martin@duskware.de> wrote
> On Sun, Dec 10, 2006 at 10:01:18AM +0100, Stephan Pietzko wrote:
> > Can i do anything else to trace the problem?
> What does sysctl kern.mbuf.nmbclusters say? You could try increasing
> that (in case this is not a leak but just high mbuf cluster demand).
root@nepal:/root> sysctl kern.mbuf.nmbclusters
kern.mbuf.nmbclusters = 1024
thats an idea, but i try first to reproduce the problem with the
options MBUFTRACE # Debuging the mclpl problem
and change the only one switch after an other.
tnx Stephan Pietzko
From: Stephan Pietzko <stephan.pietzko@uni-konstanz.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/35224: kernel hangs in mclpl after heavy net load in the sparc64 port (eventually also other ports)
Date: Mon, 11 Dec 2006 17:14:42 +0100
Pavel Cahyna <pavel@NetBSD.org> wrote
> > This sounds like a mbuf leak. Check netstat output, maybe you can spot where
> > the mbuf are lingering.
> Preferably netstat -mssv with a kernel built with "options MBUFTRACE"
Ok - after one day i had the same problem again.
The httpd daemon hangs in mclpl condition again.
lsof said for several connections
'0t0 TCP no PCB, CANTSENDMORE, CANTRCVMORE'
hth Stephan
----------------------------------------------------------------------
And netstat -mssv says:
(if it is interesting, i have the output from every 20min)
root@nepal:/var/tmp> cat netstat.200612111700
2818 mbufs in use:
2652 mbufs allocated to data
166 mbufs allocated to packet headers
4176 calls to protocol drain routines
small ext cluster
route inuse 0 0 0
claims 31 0 0
releases 31 0 0
arp inuse 0 0 0
claims 70979 0 0
releases 70979 0 0
unix inuse 0 0 0
claims 14176 27 27
releases 14176 27 27
internet6 inuse 0 0 0
claims 3 0 0
releases 3 0 0
tcp inuse 166 0 0
claims 32254 0 0
releases 32088 0 0
tcp rx inuse 254 56 56
claims 161841104 12655 12655
releases 161840850 12599 12599
tcp tx inuse 2398 511 378
claims 1193613013 380746746 263702503
releases 1193610615 380746235 263702125
udp inuse 0 0 0
claims 372 0 0
releases 372 0 0
small ext cluster
udp rx inuse 0 0 0
claims 337730 2 2
releases 337730 2 2
udp tx inuse 0 0 0
claims 370 0 0
releases 370 0 0
internet rx inuse 0 0 0
claims 162295296 17445 17445
releases 162295296 17445 17445
internet tx inuse 0 0 0
claims 285747856 0 0
releases 285747856 0 0
lo0 inuse 0 0 0
claims 48 0 0
releases 48 0 0
hme0 rx inuse 0 0 0
claims 162772341 19318 19318
releases 162772341 19318 19318
hme0 tx inuse 0 0 0
claims 852431570 296636379 198341873
releases 852431570 296636379 198341873
unknown data inuse 0 0 0
claims 1070662544 19318 19318
releases 1070662544 19318 19318
small ext cluster
unknown header inuse 0 0 0
claims 285745681 0 0
releases 285745681 0 0
unknown soname inuse 0 0 0
claims 40926 0 0
releases 40926 0 0
unknown soopts inuse 0 0 0
claims 70 0 0
releases 70 0 0
unknown control inuse 0 0 0
claims 18 0 0
releases 18 0 0
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.