NetBSD Problem Report #40462

From jarle@festningen.uninett.no  Fri Jan 23 17:10:10 2009
Return-Path: <jarle@festningen.uninett.no>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id A1C4D63BAB8
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 23 Jan 2009 17:10:10 +0000 (UTC)
Message-Id: <20090123171007.716D480B90@festningen.uninett.no>
Date: Fri, 23 Jan 2009 18:10:07 +0100 (CET)
From: jarle@uninett.no
Reply-To: jarle@uninett.no
To: gnats-bugs@gnats.NetBSD.org
Subject: bnx0: Double mbuf allocation failure!
X-Send-Pr-Version: 3.95

>Number:         40462
>Category:       kern
>Synopsis:       bnx0: Double mbuf allocation failure!
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jan 23 17:15:00 +0000 2009
>Closed-Date:    Tue Jul 21 06:34:08 +0000 2020
>Last-Modified:  Tue Jul 21 06:34:08 +0000 2020
>Originator:     Jarle Greipsland
>Release:        NetBSD 5.0_BETA
>Organization:

>Environment:


System: NetBSD nyc.uninett.no 5.0_BETA NetBSD 5.0_BETA (NYC) #1: Thu Jan  8 16:38:34 CET 2009 jarle@nyc.uninett.no:/usr/obj/sys/arch/i386/compile/NYC i386
Architecture: i386
Machine: i386
>Description:
The system is an IBM e-series rack server with two built-in ethernet
ports.  The ports are driven by the bnx driver.  It had been running for
some time using only one network port, and I was trying to bring up the
second port when it paniced.  I gathered some console log output, and tried
again.  Same panic.

Console log:

panic: bnx0: Double mbuf allocation failure!
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c0302a5c cs 8 eflags 246 cr2 80699a0 ilevel 6
Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x4:  popl    %ebp
db{0}> 
db{0}> trace
breakpoint(c03ced3b,cc418ed8,c0400bc0,10,5,cbebb012,0,c02ecd48,19a,c37e9a00) at netbsd:breakpoint+0x4
panic(c03e9af8,cc48aea4,cc418f0e,cc418f0c,cc418f08,ce493000,cc490004,dbaa,dba1,cbeb6800) at netbsd:panic+0x1b8
bnx_rx_intr(cc490000,cc493000,84,60000,8,0,cc490004,c2a30480,0,6) at netbsd:bnx_rx_intr+0x3e4
bnx_intr(cc490000,0,0,0,cbca7c80,0,c2c5b680,c0106dad,c2a30480,cbc9eca8) at netbsd:bnx_intr+0xe5
intr_biglock_wrapper(c2a30480,cbc9eca8,0,0,0,0,0,0,0,0) at netbsd:intr_biglock_wrapper+0x1f
DDB lost frame for netbsd:Xintr_ioapic_level1+0xad, trying 0xcc418f74
Xintr_ioapic_level1() at netbsd:Xintr_ioapic_level1+0xad
--- interrupt ---
--- switch to interrupt stack ---
x86_mwait(0,0,0,c025e152,cbca7c80,cbca4ec0,cbc9ed2c,c024d006,cbca7c80,0) at netbsd:x86_mwait+0xc
x86_cpu_idle_mwait(cbca7c80,0,c0400bc0,cbca7c80,c024cf20,cbca7c80,0,c01002e1,cbca7c80,0) at netbsd:x86_cpu_idle_mwait+0x44
idle_loop(cbca7c80,0,c01002cd,0,c01002cd,0,0,0,0,0) at netbsd:idle_loop+0xe6
db{0}> ps
 PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    WAIT
 1221           651     1221          0 2  0x4000    1         ifconfig
 1255           585     1255       1000 2  0x4000    1              top  select
 1170           761     1170       1003 2  0x4000    1             bash  ttyraw
 761           1003     1003       1003 2   0x100    1             sshd  select
 1003           739     1003          0 2  0x4101    1             sshd   netio
 902            371      371         12 2  0x4100    1           pickup  kqueue
 651            760      651          0 2  0x4000    1             tcsh   pause
 760            775      760       1000 2  0x4000    1             bash    wait
 775            741      741       1000 2  0x4000    1            xterm  select
 741            678      741       1000 2  0x4000    1              csh   pause
 678            354      354       1000 2   0x100    1             sshd  select
 354            739      354          0 2  0x4100    1             sshd   netio
 739              1      739          0 2       0    1             sshd  select
 585            559      585       1000 2  0x4000    1             bash    wait
 559            571      571       1000 2   0x100    1             sshd  select
 571              1      571          0 2  0x4101    1             sshd   netio
 452              1      452          0 2  0x4000    1            getty  ttyraw
 409            371      371         12 2  0x4100    1             qmgr  kqueue
 404              1      404          0 2  0x4000    1            getty  ttyraw
 405              1      405          0 2  0x4000    1            getty  ttyraw
 406              1      406          0 2  0x4000    1            getty  ttyraw
 394              1      394          0 2       0    1             cron nanoslp
 360              1      360          0 2       0    1            inetd  kqueue
 371              1      371          0 2  0x4100    1           master  kqueue
 255              1      255         15 2   0x100    1             ntpd   pause
 154              1      154         14 2   0x100    5            named       *
 119              1      119          0 2       0    1          syslogd
 1                0        1          0 2  0x4001    1             init    wait
*0               -1        0          0 2 0x20002   49           system       *
db{0}> mach cpu 1
using CPU 1
db{0}> trace
_kernel_lock(1,d1796f00,d15fcc1c,c0277b93,c29d1800,90,1001,d1796f00,90,0) at netbsd:_kernel_lock+0xb5
soo_ioctl(d1796f00,c0906911,d1388948,0,0,d1629d00,0,c02a34e0,c3761028,c02a34e0) at netbsd:soo_ioctl+0x7c
sys_ioctl(d1629d00,d15fcd00,d15fcd28,3,c0906911,bfbfe220,bbbd7d68,ffffffff,0,0) at netbsd:sys_ioctl+0x13d
syscall(d15fcd48,b3,ab,1f,1f,3,bfbfe328,bfbfe2c8,bfbfe220,5) at netbsd:syscall+0xa9
db{0}> reboot 4

[ ... second panic starts here ... ]
panic: bnx0: Double mbuf allocation failure!
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c0302a5c cs 8 eflags 246 cr2 bb5faffc ilevel 6
Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x4:  popl    %ebp
db{0}> trace
breakpoint(c03ced3b,cc418ed8,c0400bc0,10,5,d0f6488c,0,c02ecd48,1d,c374a800) at netbsd:breakpoint+0x4
panic(c03e9af8,cc48aea4,cc418f0e,cc418f0c,cc418f08,ce493000,cc490004,25,24,cbe0a800) at netbsd:panic+0x1b8
bnx_rx_intr(cc490000,cc493000,84,60000,8,3f8,cc490004,c2a30480,0,6) at netbsd:bnx_rx_intr+0x3e4
bnx_intr(cc490000,800,30c10000,0,0,0,c2c5b680,c0106dad,c2a30480,cbc9eca8) at netbsd:bnx_intr+0xe5
intr_biglock_wrapper(c2a30480,cbc9eca8,0,0,0,0,0,0,0,0) at netbsd:intr_biglock_wrapper+0x1f
DDB lost frame for netbsd:Xintr_ioapic_level1+0xad, trying 0xcc418f74
Xintr_ioapic_level1() at netbsd:Xintr_ioapic_level1+0xad
--- interrupt ---
--- switch to interrupt stack ---
x86_mwait(0,0,0,c025e152,cbca7c80,cbca4ec0,cbc9ed2c,c024d006,cbca7c80,0) at netbsd:x86_mwait+0xc
x86_cpu_idle_mwait(cbca7c80,0,c0400bc0,cbca7c80,c024cf20,cbca7c80,0,c01002e1,cbca7c80,0) at netbsd:x86_cpu_idle_mwait+0x44
idle_loop(cbca7c80,0,c01002cd,0,c01002cd,0,0,0,0,0) at netbsd:idle_loop+0xe6
db{0}> mach cpu 1
using CPU 1
db{0}> trace
x86_mwait(0,0,0,c025e152,cbcaa2a0,cbca4dc0,cc422d20,c024d006,cbcaa2a0,0) at netbsd:x86_mwait+0xc
x86_cpu_idle_mwait(cbcaa2a0,0,c2a16000,0,cc422da0,0,cbcaa2a0,c2a16000,0,c024cf20) at netbsd:x86_cpu_idle_mwait+0x44
idle_loop(0,c024cf20,cbcaa2a0,c01002d0,0,c01002cd,0,c01002cd,0,0) at netbsd:idle_loop+0xe6
Bad frame pointer: 0xcbcaa2a0
db{0}> reboot 4


>How-To-Repeat:
Bring up two bnx interface on a system?

>Fix:


>Release-Note:

>Audit-Trail:
From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/40462: bnx0: Double mbuf allocation failure!
Date: Fri, 23 Jan 2009 12:20:36 -0700 (MST)

 On Fri, 23 Jan 2009, jarle@uninett.no wrote:

 >> Description:
 > The system is an IBM e-series rack server with two built-in ethernet
 > ports.  The ports are driven by the bnx driver.  It had been running for
 > some time using only one network port, and I was trying to bring up the
 > second port when it paniced.  I gathered some console log output, and tried
 > again.  Same panic.
 ..
 >> How-To-Repeat:
 > Bring up two bnx interface on a system?
 >
 >> Fix:

    I think this occurs because each bnx device attempts to allocate a large
 number of receive buffers from the mbuf cluster pool.  The default size
 is too small for two bnx devices, and the bnx driver appears to be the 
 only one I've seen that panics when it can't allocate the receive buffers.

    You can override this with the NMBCLUSTERS kernel config option (which
 required rebuilding the kernel).  Also, enabling the GATEWAY option will
 usually double the default NMBCLUSTERS (but also requires a kernel 
 rebuild).

    I think in the past I've been able to patch the nmbclusters value in
 the kernel using gdb.

    I also seem to recall some comments or discussion about increasing the 
 default NMBCLUSTERS, but I don't think anything ever came of that.

 --
 Michael L. Hitch			mhitch@montana.edu
 Computer Consultant
 Information Technology Center
 Montana State University	Bozeman, MT	USA

From: Jarle Greipsland <jarle@uninett.no>
To: gnats-bugs@NetBSD.org, mhitch@lightning.msu.montana.edu
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/40462: bnx0: Double mbuf allocation failure!
Date: Fri, 23 Jan 2009 20:48:17 +0100 (CET)

 "Michael L. Hitch" <mhitch@lightning.msu.montana.edu> writes:
 >     I think this occurs because each bnx device attempts to allocate a large
 >  number of receive buffers from the mbuf cluster pool.  The default size
 >  is too small for two bnx devices, and the bnx driver appears to be the 
 >  only one I've seen that panics when it can't allocate the receive buffers.
 >  
 >     You can override this with the NMBCLUSTERS kernel config option (which
 >  required rebuilding the kernel).  Also, enabling the GATEWAY option will
 >  usually double the default NMBCLUSTERS (but also requires a kernel 
 >  rebuild).
 Thanks for the advice.  I'll find a suitable service window and
 try out a kernel with an increased setting for the NMBCLUSTERS
 option and see what happens.
 					-jarle

From: David Young <dyoung@pobox.com>
To: jarle@uninett.no, gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/40462: bnx0: Double mbuf allocation failure!
Date: Fri, 23 Jan 2009 14:52:06 -0600

 On Fri, Jan 23, 2009 at 05:15:01PM +0000, jarle@uninett.no wrote:
 > >Number:         40462
 > >Category:       kern
 > >Synopsis:       bnx0: Double mbuf allocation failure!
 > >Confidential:   no
 > >Severity:       serious
 > >Priority:       medium
 > >Responsible:    kern-bug-people
 > >State:          open
 > >Class:          sw-bug
 > >Submitter-Id:   net
 > >Arrival-Date:   Fri Jan 23 17:15:00 +0000 2009
 > >Originator:     Jarle Greipsland
 > >Release:        NetBSD 5.0_BETA
 > >Organization:
 > 	
 > >Environment:
 > 	
 > 	
 > System: NetBSD nyc.uninett.no 5.0_BETA NetBSD 5.0_BETA (NYC) #1: Thu Jan  8 16:38:34 CET 2009 jarle@nyc.uninett.no:/usr/obj/sys/arch/i386/compile/NYC i386
 > Architecture: i386
 > Machine: i386
 > >Description:
 > The system is an IBM e-series rack server with two built-in ethernet
 > ports.  The ports are driven by the bnx driver.  It had been running for
 > some time using only one network port, and I was trying to bring up the
 > second port when it paniced.  I gathered some console log output, and tried
 > again.  Same panic.

 Ultimately, you are going to have to increase NMBCLUSTERS, but the
 patch in <ftp://cuw.ojctech.com/users/netbsd-082c59a0/bnx.add_buf>
 should stop the panic.  Please test.

 Dave

 -- 
 David Young             OJC Technologies
 dyoung@ojctech.com      Urbana, IL * (217) 278-3933

From: Jarle Greipsland <jarle@uninett.no>
To: gnats-bugs@NetBSD.org, mhitch@lightning.msu.montana.edu
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/40462: bnx0: Double mbuf allocation failure!
Date: Tue, 27 Jan 2009 21:16:21 +0100 (CET)

 Jarle Greipsland <jarle@uninett.no> writes:
 > Thanks for the advice.  I'll find a suitable service window and
 > try out a kernel with an increased setting for the NMBCLUSTERS
 > option and see what happens.
 I haven't been able to test anything on the computer that
 originally experienced the problem, but I was allowed to play
 with a fairly similar server with ethernet ports of identical
 make and model, and have both managed to reproduce the problem
 and try out the two suggested fixes/patches.

 The patch offered by David Young prevented the system from
 panicing, although the kernel would still warn:

 WARNING: mclpool limit reached; increase NMBCLUSTERS

 and network connectivity was interrupted.  By toggling interfaces
 up and down a bit network connectivity was restored, but would
 soon thereafter be interrupted again preceeded by the NMBCLUSTERS
 warnings. Once I also noticed a kernel message of the form:

 Error filling RX chain: rx_bd[0x????]!

 (I forgot to write down the hex number in brackets).

 Increasing the NMBCLUSTERS option in the kernel configuration
 file seemed to solve the problem though.  I could no longer make
 the system panic, and network connectivity was not interrupted.

 					-jarle
 -- 
 "It takes thought to right software"
 				-- parke@star.enet.dec.com

From: Fredrik Pettai <pettai@nordu.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/40462: bnx0: Double mbuf allocation failure!
Date: Tue, 24 Nov 2009 10:46:37 +0100

 Hi,

 This case seems to be open still? We just hit a problem with same  
 error output on NetBSD 4.0 (and XEN).

 I wonder if Dave's patch is the right way to go if your domUs are  
 reliant on NFS.
 If the interface stops working, then your systems ends up in a limbo  
 state instead of coming up fresh after a reboot.

 NMBCLUSTERS are already raised to 65536

 /P

State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Fri, 17 Jul 2020 07:31:16 +0000
State-Changed-Why:
Seems there were some changes in driver which should cause this problem
to not happen. Can you confirm whether this (or something similar) still
happens?


From: Jarle Greipsland <jarle@norid.no>
To: gnats-bugs@netbsd.org, jdolecek@NetBSD.org
Cc: 
Subject: Re: kern/40462 (bnx0: Double mbuf allocation failure!)
Date: Tue, 21 Jul 2020 08:16:54 +0200 (CEST)

 jdolecek@NetBSD.org writes:
 > Synopsis: bnx0: Double mbuf allocation failure!
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: jdolecek@NetBSD.org
 > State-Changed-When: Fri, 17 Jul 2020 07:31:16 +0000
 > State-Changed-Why:
 > Seems there were some changes in driver which should cause this problem
 > to not happen. Can you confirm whether this (or something similar) still
 > happens?

 I no longer have access to the original system, so I can not tell
 for sure.  But I do have a slightly different IBM eSeries system
 from about the same period, and there I can use both the built-in
 bnx ports without problems.  So, the problem has probably been
 fixed, I'd say.
 					-jarle
 --
 "Avoiding precedents does not mean nothing should ever be done.
  It only means that nothing should ever be done for the first time."
 				-- Sir Humphrey Appleby K.C.B.

State-Changed-From-To: feedback->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Tue, 21 Jul 2020 06:34:08 +0000
State-Changed-Why:
Reported apparently fixed. Thanks for report.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: gnats-precook-prs,v 1.4 2018/12/21 14:20:20 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.