NetBSD Problem Report #47229

From www@NetBSD.org  Wed Nov 21 13:17:33 2012
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 76DE663DCB2
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 21 Nov 2012 13:17:33 +0000 (UTC)
Message-Id: <20121121131731.48E5963DCB2@www.NetBSD.org>
Date: Wed, 21 Nov 2012 13:17:31 +0000 (UTC)
From: pettai@nordu.net
Reply-To: pettai@nordu.net
To: gnats-bugs@NetBSD.org
Subject: Using multiple bnx interfaces on quadcard makes amd64 systems unresponsive
X-Send-Pr-Version: www-1.0

>Number:         47229
>Category:       kern
>Synopsis:       Using multiple bnx interfaces on quadcard makes amd64 systems unresponsive
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 21 13:20:01 +0000 2012
>Closed-Date:    Mon Jul 19 04:54:17 +0000 2021
>Last-Modified:  Mon Jul 19 04:54:17 +0000 2021
>Originator:     Fredrik Pettai
>Release:        NetBSD 6.0 (amd64)
>Organization:
NORDUnet A/S
>Environment:
NetBSD statler 6.0 NetBSD 6.0 (GENERIC) #0: Thu Oct 18 01:14:47 CEST 2012  root@statler:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
>Description:
This problem was reported in 2010 for NetBSD 5_STABLE, but was never fixed. See:
http://mail-index.netbsd.org/current-users/2010/09/09/msg014270.html
http://mail-index.netbsd.org/port-amd64/2010/08/25/msg001232.html

This is still is an issue on NetBSD 6.0 amd64

# dmesg
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011, 2012
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 6.0 (GENERIC) #0: Thu Oct 18 01:14:47 CEST 2012
        root@waldorf.nordu.net:/usr/obj/sys/arch/amd64/compile/GENERIC
total memory = 8179 MB
avail memory = 7926 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
Dell Inc. PowerEdge R610
mainbus0 (root)
cpu0 at mainbus0 apid 32: Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz, id 0x206c2
cpu1 at mainbus0 apid 34: Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz, id 0x206c2
cpu2 at mainbus0 apid 50: Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz, id 0x206c2
cpu3 at mainbus0 apid 52: Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz, id 0x206c2
cpu4 at mainbus0 apid 33: Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz, id 0x206c2
cpu5 at mainbus0 apid 35: Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz, id 0x206c2
cpu6 at mainbus0 apid 51: Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz, id 0x206c2
cpu7 at mainbus0 apid 53: Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz, id 0x206c2
ioapic0 at mainbus0 apid 0: pa 0xfec00000, version 20, 24 pins
ioapic1 at mainbus0 apid 1: pa 0xfec80000, version 20, 24 pins
acpi0 at mainbus0: Intel ACPICA 20110623
acpi0: X/RSDT: OemId <DELL  ,PE_SC3  ,00000001>, AslId <DELL,00000001>
acpi0: SCI interrupting at int 9
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
hpet0 at acpi0: high precision event timer (mem 0xfed00000-0xfed00400)
timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
WHEA (PNP0C33) at acpi0 not configured
PMI0 (ACPI000D) at acpi0 not configured
SPK (PNP0C01) at acpi0 not configured
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x5f irq 0
COMA (PNP0501) at acpi0 not configured
COMB (PNP0501) at acpi0 not configured
MBIO (PNP0C01) at acpi0 not configured
NIPM (IPI0001) at acpi0 not configured
PEHB (PNP0C02) at acpi0 not configured
VTD (PNP0C02) at acpi0 not configured
ipmi0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: vendor 0x8086 product 0x3403 (rev. 0x13)
ppb0 at pci0 dev 1 function 0: vendor 0x8086 product 0x3408 (rev. 0x13)
ppb0: PCI Express 2.0 <Root Port of PCI-E Root Complex>
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
bnx0 at pci1 dev 0 function 0: Broadcom NetXtreme II BCM5709 1000Base-T
bnx0: Ethernet address 84:2b:2b:fc:21:d6
bnx0: interrupting at ioapic1 pin 4
brgphy0 at bnx0 phy 1: BCM5709 10/100/1000baseT PHY, rev. 8
brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bnx1 at pci1 dev 0 function 1: Broadcom NetXtreme II BCM5709 1000Base-T
bnx1: Ethernet address 84:2b:2b:fc:21:d8
bnx1: interrupting at ioapic1 pin 16
brgphy1 at bnx1 phy 1: BCM5709 10/100/1000baseT PHY, rev. 8
brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
ppb1 at pci0 dev 3 function 0: vendor 0x8086 product 0x340a (rev. 0x13)
ppb1: PCI Express 2.0 <Root Port of PCI-E Root Complex>
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
bnx2 at pci2 dev 0 function 0: Broadcom NetXtreme II BCM5709 1000Base-T
bnx2: Ethernet address 84:2b:2b:fc:21:da
bnx2: interrupting at ioapic1 pin 0
brgphy2 at bnx2 phy 1: BCM5709 10/100/1000baseT PHY, rev. 8
brgphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bnx3 at pci2 dev 0 function 1: Broadcom NetXtreme II BCM5709 1000Base-T
bnx3: Ethernet address 84:2b:2b:fc:21:dc
bnx3: interrupting at ioapic1 pin 10
brgphy3 at bnx3 phy 1: BCM5709 10/100/1000baseT PHY, rev. 8
brgphy3: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
ppb2 at pci0 dev 7 function 0: vendor 0x8086 product 0x340e (rev. 0x13)
ppb2: PCI Express 2.0 <Root Port of PCI-E Root Complex>
pci3 at ppb2 bus 4
pci3: i/o space, memory space enabled, rd/line, wr/inv ok
ppb3 at pci0 dev 9 function 0: vendor 0x8086 product 0x3410 (rev. 0x13)
ppb3: PCI Express 2.0 <Root Port of PCI-E Root Complex>
pci4 at ppb3 bus 5
pci4: i/o space, memory space enabled, rd/line, wr/inv ok
vendor 0x8086 product 0x342e (interrupt system, revision 0x13) at pci0 dev 20 function 0 not configured
vendor 0x8086 product 0x3422 (interrupt system, revision 0x13) at pci0 dev 20 function 1 not configured
vendor 0x8086 product 0x3423 (interrupt system, revision 0x13) at pci0 dev 20 function 2 not configured
uhci0 at pci0 dev 26 function 0: vendor 0x8086 product 0x2937 (rev. 0x02)
uhci0: interrupting at ioapic0 pin 17
usb0 at uhci0: USB revision 1.0
uhci1 at pci0 dev 26 function 1: vendor 0x8086 product 0x2938 (rev. 0x02)
uhci1: interrupting at ioapic0 pin 18
usb1 at uhci1: USB revision 1.0
ehci0 at pci0 dev 26 function 7: vendor 0x8086 product 0x293c (rev. 0x02)
ehci0: interrupting at ioapic0 pin 19
ehci0: BIOS has given up ownership
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1
usb2 at ehci0: USB revision 2.0
ppb4 at pci0 dev 28 function 0: vendor 0x8086 product 0x2940 (rev. 0x02)
ppb4: PCI Express 1.0 <Root Port of PCI-E Root Complex>
pci5 at ppb4 bus 3
pci5: i/o space, memory space enabled, rd/line, wr/inv ok
mfi0 at pci5 dev 0 function 0mfi0: interrupting at ioapic0 pin 16
mfi0: logical drives 2, version 12.10.0-0025, 1024MB RAM
scsibus0 at mfi0: 64 targets, 8 luns per target
uhci2 at pci0 dev 29 function 0: vendor 0x8086 product 0x2934 (rev. 0x02)
uhci2: interrupting at ioapic0 pin 21
usb3 at uhci2: USB revision 1.0
uhci3 at pci0 dev 29 function 1: vendor 0x8086 product 0x2935 (rev. 0x02)
uhci3: interrupting at ioapic0 pin 20
usb4 at uhci3: USB revision 1.0
ehci1 at pci0 dev 29 function 7: vendor 0x8086 product 0x293a (rev. 0x02)
ehci1: interrupting at ioapic0 pin 21
ehci1: EHCI version 1.0
ehci1: companion controllers, 2 ports each: uhci2 uhci3
usb5 at ehci1: USB revision 2.0
ppb5 at pci0 dev 30 function 0: vendor 0x8086 product 0x244e (rev. 0x92)
pci6 at ppb5 bus 6
pci6: i/o space, memory space enabled
vga0 at pci6 dev 3 function 0: vendor 0x102b product 0x0532 (rev. 0x0a)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
drm at vga0 not configured
ichlpcib0 at pci0 dev 31 function 0: vendor 0x8086 product 0x2918 (rev. 0x02)
timecounter: Timecounter "ichlpcib0" frequency 3579545 Hz quality 1000
ichlpcib0: 24-bit timer
ichlpcib0: TCO (watchdog) timer configured.
piixide0 at pci0 dev 31 function 2: Intel 82801I Serial ATA Controller (ICH9) (rev. 0x02)
piixide0: bus-master DMA support present
piixide0: primary channel configured to native-PCI mode
piixide0: using ioapic0 pin 23 for native-PCI interrupt
atabus0 at piixide0 channel 0
piixide0: secondary channel configured to native-PCI mode
atabus1 at piixide0 channel 1
isa0 at ichlpcib0
tpm0 at isa0 iomem 0xfed40000-0xfed44fff irq 7: device 0x0000104a rev 0x4e
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
attimer1: attached to pcppi0
acpicpu0 at cpu0: ACPI CPU
acpicpu0: C1: FFH, lat   1 us, pow  1000 mW
acpicpu0: C3: FFH, lat  96 us, pow   350 mW
coretemp0 at cpu0: thermal sensor, 1 C resolution
acpicpu1 at cpu1: ACPI CPU
coretemp1 at cpu1: thermal sensor, 1 C resolution
acpicpu2 at cpu2: ACPI CPU
coretemp2 at cpu2: thermal sensor, 1 C resolution
acpicpu3 at cpu3: ACPI CPU
coretemp3 at cpu3: thermal sensor, 1 C resolution
acpicpu4 at cpu4: ACPI CPU
acpicpu5 at cpu5: ACPI CPU
acpicpu6 at cpu6: ACPI CPU
acpicpu7 at cpu7: ACPI CPU
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
scsibus0: waiting 2 seconds for devices to settle...
uhub0 at usb3: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhub1 at usb0: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhub2 at usb1: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhub3 at usb2: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub3: 4 ports with 4 removable, self powered
uhub4 at usb5: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub4: 4 ports with 4 removable, self powered
uhub5 at usb4: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub5: 2 ports with 2 removable, self powered
sd0 at scsibus0 target 0 lun 0: <DELL, PERC H700, 2.10> disk fixed
sd0: fabricating a geometry
sd0: 69376 MB, 69376 cyl, 64 head, 32 sec, 512 bytes/sect x 142082048 sectors
sd0: fabricating a geometry
sd1 at scsibus0 target 1 lun 0: <DELL, PERC H700, 2.10> disk fixed
sd1: fabricating a geometry
sd1: 558 GB, 571776 cyl, 64 head, 32 sec, 512 bytes/sect x 1170997248 sectors
sd1: fabricating a geometry
uhub6 at uhub3 port 3: vendor 0x0424 product 0x2514, class 9/0, rev 2.00/0.00, addr 2
uhub6: multiple transaction translators
uhub6: 3 ports with 3 removable, self powered
uhidev0 at uhub0 port 2 configuration 1 interface 0
uhidev0: Avocent USB Composite Device-0, rev 1.10/0.00, addr 2, iclass 3/1
ukbd0 at uhidev0
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhidev1 at uhub0 port 2 configuration 1 interface 1
uhidev1: Avocent USB Composite Device-0, rev 1.10/0.00, addr 2, iclass 3/1
ums0 at uhidev1: 3 buttons and Z dir
wsmouse0 at ums0 mux 0
atapibus0 at atabus0: 2 targets
cd0 at atapibus0 drive 0: <TEAC DVD-ROM DV-28SW, 10092412141201, R.2A> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
cd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
Kernelized RAIDframe activated
pad0: outputs: 44100Hz, 16-bit, stereo
audio0 at pad0: half duplex, playback, capture
boot device: sd0
root on sd0a dumps on sd0b
root file system type: ffs
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
mfi0: normal state on 'mfi0:0' (online)
mfi0: normal state on 'mfi0:1' (online)
ipmi0: version 2.0 interface KCS iobase 0xca8/8 spacing 4
>How-To-Repeat:
Grab a standard Dell R610 server (comes with Broadcom NetXtreme II BCM5709 Quadcard default).
Install NetBSD 6.0/amd64 on it, and connect at least two bnx interfaces to the network and send/recieve traffic on them.
The system will become total unresponsive after a day or two...
>Fix:

>Release-Note:

>Audit-Trail:
From: Bernd Ernesti <netbsd@lists.veego.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/47229: Using multiple bnx interfaces on quadcard makes
 amd64 systems unresponible
Date: Thu, 22 Nov 2012 07:52:38 +0100

 On Wed, Nov 21, 2012 at 01:20:02PM +0000, pettai@nordu.net wrote:
 [..]

 > >How-To-Repeat:
 > Grab a standard Dell R610 server (comes with Broadcom NetXtreme II BCM5709 Quadcard default).
 > Install NetBSD 6.0/amd64 on it, and connect at least two bnx interfaces to the network and send/recieve traffic on them.
 > The system will become total unresponsive after a day or two...

 You mean access via network or on the console of the system?

 Bernd

From: Fredrik Pettai <pettai@nordu.net>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/47229: Using multiple bnx interfaces on quadcard makes amd64 systems unresponible
Date: Thu, 22 Nov 2012 09:39:55 +0100

 >>> How-To-Repeat:
 >> The system will become total unresponsive after a day or two...
 > 
 > You mean access via network or on the console of the system?

 Both. Nothing works, except a power cycle...

State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Fri, 17 Jul 2020 07:25:26 +0000
State-Changed-Why:
I might have actually fixed this problem in rev 1.99:

revision 1.99
date: 2020-07-14 17:37:40 +0200;  author: jdolecek;  state: Exp;  lines: +3 -4;  commitid: 1TC0KGWiSfp8U3gC;
make bnx_wk (used to trigger bnx_alloc_pkts()) part of softc instead
of using a static variable, so it's independant for each adapter

Can you please confirm whether the problem happens with up-to-date -current?


From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/47229 CVS commit: src/sys/dev/pci
Date: Fri, 17 Jul 2020 10:56:15 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Fri Jul 17 10:56:15 UTC 2020

 Modified Files:
 	src/sys/dev/pci: if_bnx.c

 Log Message:
 if bnx_tx_encap() fails because mbuf is too fragmented or too long,
 drop the mbuf instead of wedging the TX queue forever; found by code inspection

 this is quite unlikely scenario since it requires mbuf chain consisting of
 more than 8 frags and m_defrag() failing, so probably unrelated to PR kern/47229


 To generate a diff of this commit:
 cvs rdiff -u -r1.104 -r1.105 src/sys/dev/pci/if_bnx.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Fredrik Pettai <pettai@sunet.se>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
        "jdolecek@netbsd.org" <jdolecek@NetBSD.org>,
        Fredrik Pettai <pettai@nordu.net>
Subject: Re: kern/47229 (Using multiple bnx interfaces on quadcard makes amd64
 systems unresponsive)
Date: Wed, 22 Jul 2020 06:48:34 +0200

 >=20
 > Can you please confirm whether the problem happens with up-to-date =
 -current?

 Ok, thx
 I=E2=80=99ll have a look at it in August then I return from my holiday=

State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 19 Jul 2021 04:54:17 +0000
State-Changed-Why:
1-year feedback timeout, assume fixed


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.