NetBSD Problem Report #28361

From woods@building.weird.com  Fri Nov 19 21:04:13 2004
Return-Path: <woods@building.weird.com>
Received: from building.weird.com (building.weird.com [204.92.254.24])
	by narn.netbsd.org (Postfix) with ESMTP id 6813F251EC1
	for <gnats-bugs@gnats.netbsd.org>; Fri, 19 Nov 2004 21:04:12 +0000 (UTC)
Message-Id: <m1CVFvL-0024fvC@building.weird.com>
Date: Fri, 19 Nov 2004 16:04:07 -0500 (EST)
From: "Greg A. Woods" <woods@weird.com>
Reply-To: "Greg A. Woods" <woods@planix.com>
To: gnats-bugs@netbsd.org
Subject: bge(4) locks up on AlphaServer ES40 when any significant traffic is transmitted
X-Send-Pr-Version: 3.95

>Number:         28361
>Category:       port-alpha
>Synopsis:       bge(4) locks up on AlphaServer ES40 when any significant traffic is transmitted
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    thorpej
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Nov 19 21:05:00 +0000 2004
>Closed-Date:    Tue Oct 13 15:52:54 +0000 2020
>Last-Modified:  Tue Oct 13 15:52:54 +0000 2020
>Originator:     Greg A. Woods
>Release:        NetBSD-current (2.99.10) 2004/11/15
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Environment:
System: NetBSD  2.99.10 NetBSD 2.99.10 (TSUNAMI) #0: Thu Nov 18 15:27:36 EST 2004  root@woffi.planix.com:/m5/netbsd-current/src/sys/arch/alpha/compile/obj.alpha/TSUNAMI alpha
Architecture: alpha
Machine: alpha
>Description:

	The bge(4) driver, when used with a DEGXA-TX card on an
	AlphaServer ES40, seems to lock up and go catatonic when one
	attempts to send or receive any significant amount of traffic
	through it.

	It will work sufficiently for a basic "ping", but it dies
	immediately (i.e. seemingly on the first packet) with the likes
	of "ttcp" or "ping -f".

	Given the "netstat -i|-I" results it appears the interface
	receives some/all of the packets but they're never handed to the
	application.

	Note also that it spits out some warnings on the console when it
	is first configured:  "bge0: pcistate failed to revert"

	Also, attempts to ifconfig the wm0 card after the bge0 device is
	hung result in the following panic:

	[console]<@> # ifconfig wm0 inet 10.11.11.2 netmask 255.255.255.0 up
	wm0: unable to load rx DMA map 1, error = 35
	panic: wm_add_rxbuf
	Stopped in pid 49.1 (ifconfig) at       netbsd:cpu_Debugger+0x4:        ret     zero,(ra)
	db> trace
	cpu_Debugger() at netbsd:cpu_Debugger+0x4
	panic() at netbsd:panic+0x1f8
	wm_add_rxbuf() at netbsd:wm_add_rxbuf+0x4dc
	wm_init() at netbsd:wm_init+0x490
	ether_ioctl() at netbsd:ether_ioctl+0xac
	wm_ioctl() at netbsd:wm_ioctl+0x90
	ifioctl() at netbsd:ifioctl+0x434
	soo_ioctl() at netbsd:soo_ioctl+0xf8
	sys_ioctl() at netbsd:sys_ioctl+0x12c
	syscall_plain() at netbsd:syscall_plain+0xc4
	XentSys() at netbsd:XentSys+0x5c
	--- syscall (54) ---
	--- user mode ---
	db>

	About this Jason Thorpe speculated:
	>
	>  The bge driver is probably 
	> gobbling up all of the SGMAP resources...


	I can make lots of other information available, and I can grant
	temporary console access to this machine to anyone who can help
	fix the bug!

	I do need the fix to work on NetBSD-1.6.x, but for now I can
	test any kernel that'll work with a 1.6 userland (and if really
	necessary I could do a full re-install on one or more of the
	other currently unused drives).

>How-To-Repeat:

P00>>>boot -file netbsd-cur dkc100
(boot dkc100.1.0.102.0 -file netbsd-cur -flags A)
block 0 of dkc100.1.0.102.0 is a valid boot block
reading 15 blocks from dkc100.1.0.102.0
bootstrap code read in
base = 200000, image_start = 0, image_bytes = 1e00(7680)
initializing HWRPB at 2000
initializing page table at 3fb54000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code

NetBSD/alpha 1.6.2_STABLE FFS Primary Bootstrap
Jumping to entry point...

NetBSD/alpha 1.6.2_STABLE Secondary Bootstrap, Revision 1.13
(woods@building, Wed Sep 22 19:07:04 EDT 2004)

VMS PAL rev: 0x4006800010162
OSF PAL rev: 0x400690002015c
Switch to OSF PAL code succeeded.

Boot file: netbsd-cur
Boot flags: A
3670688+386744 [221112+137599]=0x436790

Entering netbsd-cur at 0xfffffc00003012e0...
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 2.99.10 (TSUNAMI) #0: Thu Nov 18 15:27:36 EST 2004
        root@woffi.planix.com:/m5/netbsd-current/src/sys/arch/alpha/compile/obj.alpha/TSUNAMI
AlphaServer ES40, 666MHz, s/n NI94900217
8192 byte page size, 4 processors.
total memory = 16384 MB
(7080 KB reserved for PROM, 16377 MB used by NetBSD)
avail memory = 16088 MB
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21264A-14
cpu0: Architecture extensions: 307<PAT,MVI,CIX,FIX,BWX>
cpu1 at mainbus0: ID 1, 21264A-14
cpu1: processor off-line; multiprocessor support not present in kernel
cpu2 at mainbus0: ID 2, 21264A-14
cpu2: processor off-line; multiprocessor support not present in kernel
cpu3 at mainbus0: ID 3, 21264A-14
cpu3: processor off-line; multiprocessor support not present in kernel
tsc0 at mainbus0: 21272 Core Logic Chipset, Cchip rev 0
tsc0: 8 Dchips, 2 memory buses of 32 bytes
tsc0: arrays present: 4096MB (split), 4096MB (split), 4096MB (split), 4096MB (split), Dchip 0 rev 1
tsp0 at tsc0
tsp0: window 2: 0/base 3ff00000/mask 5300000 reinitialized
tsp0: window 3: 0/base fff00000/mask 5800000 reinitialized
pci0 at tsp0 bus 0
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
vga0 at pci0 dev 1 function 0: ATI 3D Rage II+ (rev. 0x9a)
wsdisplay0 at vga0 kbdmux 1
wsmux1: connecting to wsdisplay0
ahc0 at pci0 dev 2 function 0: Adaptec 3960D Ultra160 SCSI adapter
ahc0: interrupting at dec 6600 irq 12
ahc0: aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
scsibus0 at ahc0: 16 targets, 8 luns per target
ahc1 at pci0 dev 2 function 1: Adaptec 3960D Ultra160 SCSI adapter
ahc1: interrupting at dec 6600 irq 13
ahc1: aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
scsibus1 at ahc1: 16 targets, 8 luns per target
isp0 at pci0 dev 3 function 0: QLogic Dual Port FC-AL and 2Gbps Fabric HBA
isp0: interrupting at dec 6600 irq 16
isp0: bad execution throttle of 0- using 16
scsibus2 at isp0: 256 targets, 8 luns per target
tlp0 at pci0 dev 4 function 0: DECchip 21143 Ethernet, pass 3.0
tlp0: interrupting at dec 6600 irq 20
tlp0: DEC DE500-BA, Ethernet address 08:00:2b:c4:b5:26
tlp0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
sio0 at pci0 dev 7 function 0: Acer Labs M1543 PCI-ISA Bridge (rev. 0xc3)
Acer Labs M5229 UDMA IDE Controller (IDE mass storage, interface 0xfa, revision 0xc1) at pci0 dev 15 function 0 not configured
Acer Labs M5237 USB 1.1 Host Controller (USB serial bus, interface 0x10, revision 0x03) at pci0 dev 19 function 0 not configured
isa0 at sio0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
isabeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
mcclock0 at isa0 port 0x70-0x71: mc146818 or compatible
tsp1 at tsc0
tsp1: window 2: 0/base 3ff00000/mask 5200000 reinitialized
tsp1: window 3: 0/base fff00000/mask 5400000 reinitialized
pci1 at tsp1 bus 0
pci1: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
wm0 at pci1 dev 1 function 0: Intel i82542 1000BASE-X Ethernet, rev. 3
wm0: interrupting at dec 6600 irq 24
wm0: Ethernet address 00:d0:b7:82:33:b0
wm0: 1000baseSX, 1000baseSX-FDX, auto
isp1 at pci1 dev 2 function 0: QLogic Dual Port FC-AL and 2Gbps Fabric HBA
isp1: interrupting at dec 6600 irq 28
isp1: bad execution throttle of 0- using 16
scsibus3 at isp1: 256 targets, 8 luns per target
esiop0 at pci1 dev 4 function 0: Symbios Logic 53c895 (ultra2-wide scsi)
esiop0: using on-board RAM
esiop0: interrupting at dec 6600 irq 36
scsibus4 at esiop0: 16 targets, 8 luns per target
bge0 at pci1 dev 6 function 0: Broadcom BCM5703X Gigabit Ethernet
bge0: interrupting at dec 6600 irq 44
bge0: ASIC BCM5703 A2 (0x1002), Ethernet address 00:08:02:91:89:ae
ukphy0 at bge0 phy 1: Generic IEEE 802.3u media interface
ukphy0: BCM5703 1000BASE-T media interface (OUI 0x001018, model 0x0016), rev. 2
ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
scsibus0: waiting 2 seconds for devices to settle...
scsibus1: waiting 2 seconds for devices to settle...
scsibus2: waiting 2 seconds for devices to settle...
scsibus3: waiting 2 seconds for devices to settle...
scsibus4: waiting 2 seconds for devices to settle...
cd0 at scsibus0 target 4 lun 0: <TOSHIBA, CD-ROM XM-5701TA, 0557> cdrom removable
cd0: sync (100.00ns offset 8), 8-bit (10.000MB/s) transfers
sd0 at scsibus1 target 0 lun 0: <COMPAQ, BF01864663, 3B07> disk fixed
sd0: 17365 MB, 7001 cyl, 20 head, 254 sec, 512 bytes/sect x 35565080 sectors
sd0: sync (25.00ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
sd1 at scsibus1 target 1 lun 0: <COMPAQ, BF01864663, 3B07> disk fixed
sd1: 17365 MB, 7001 cyl, 20 head, 254 sec, 512 bytes/sect x 35565080 sectors
sd1: sync (25.00ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
sd2 at scsibus1 target 2 lun 0: <COMPAQ, BF03685A35, HPB7> disk fixed
sd2: 34732 MB, 31310 cyl, 4 head, 567 sec, 512 bytes/sect x 71132000 sectors
sd2: sync (12.50ns offset 63), 16-bit (160.000MB/s) transfers, tagged queueing
sd3 at scsibus1 target 3 lun 0: <COMPAQ, BF03685A35, HPB7> disk fixed
sd3: 34732 MB, 31310 cyl, 4 head, 567 sec, 512 bytes/sect x 71132000 sectors
sd3: sync (12.50ns offset 63), 16-bit (160.000MB/s) transfers, tagged queueing
sd4 at scsibus1 target 4 lun 0: <COMPAQ, BF03685A35, HPB7> disk fixed
sd4: 34732 MB, 31310 cyl, 4 head, 567 sec, 512 bytes/sect x 71132000 sectors
sd4: sync (12.50ns offset 63), 16-bit (160.000MB/s) transfers, tagged queueing
sd5 at scsibus2 target 1 lun 0: <APPLE, Xserve RAID, 1.21> disk fixed
sd5: 1402 GB, 179526 cyl, 128 head, 128 sec, 512 bytes/sect x 2941353984 sectors
sd6 at scsibus3 target 1 lun 0: <APPLE, Xserve RAID, 1.21> disk fixed
sd6: 1402 GB, 179526 cyl, 128 head, 128 sec, 512 bytes/sect x 2941353984 sectors
sd2: no disk label
sd3: no disk label
sd4: no disk label
root on sd1a dumps on sd1b
root file system type: ffs
WARNING: preposterous clock chip time
 -- CHECK AND RESET THE DATE!
/etc/rc.conf is not configured.  Multiuser boot aborted.

  N O T I C E :  Please do not use the console except to run shutdown!

We recommend creating a non-root account and using su(1) for root access.
Terminal type is wsvt25m.
chmod: /tmp: Read-only file system
We recommend creating a non-root account and using su(1) for root access.
[console]<@> # uname -a
NetBSD  2.99.10 NetBSD 2.99.10 (TSUNAMI) #0: Thu Nov 18 15:27:36 EST 2004  root@woffi.planix.com:/m5/netbsd-current/src/sys/arch/alpha/compile/obj.alpha/TSUNAMI alpha
[console]<@> # ifconfig bge0
bge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
        capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
        enabled=0<>
        address: 00:08:02:91:89:ae
        media: Ethernet autoselect (1000baseT full-duplex,master)
        status: active
[console]<@> # ifconfig bge0 inet 10.10.10.2 netmask 255.255.255.0 up
bge0: pcistate failed to revert
bge0: pcistate failed to revert
[console]<@> # ping 10.10.10.1
PING 10.10.10.1 (10.10.10.1): 48 data bytes
64 bytes from 10.10.10.1: icmp_seq=0 ttl=64 time=0.381 ms
64 bytes from 10.10.10.1: icmp_seq=1 ttl=64 time=0.145 ms
64 bytes from 10.10.10.1: icmp_seq=2 ttl=64 time=0.238 ms
64 bytes from 10.10.10.1: icmp_seq=3 ttl=64 time=0.160 ms
64 bytes from 10.10.10.1: icmp_seq=4 ttl=64 time=0.247 ms
64 bytes from 10.10.10.1: icmp_seq=5 ttl=64 time=0.178 ms
64 bytes from 10.10.10.1: icmp_seq=6 ttl=64 time=0.260 ms
64 bytes from 10.10.10.1: icmp_seq=7 ttl=64 time=0.194 ms
64 bytes from 10.10.10.1: icmp_seq=8 ttl=64 time=0.129 ms
64 bytes from 10.10.10.1: icmp_seq=9 ttl=64 time=0.209 ms
64 bytes from 10.10.10.1: icmp_seq=10 ttl=64 time=0.146 ms
^C
----10.10.10.1 PING Statistics----
11 packets transmitted, 11 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.129/0.208/0.381/0.072 ms
[console]<@> # ifconfig bge0
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
        enabled=0<>
        address: 00:08:02:91:89:ae
        media: Ethernet autoselect (1000baseT full-duplex,master)
        status: active
        inet 10.10.10.2 netmask 0xffffff00 broadcast 10.10.10.255
[console]<@> # ttcp -v -r -s
ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001  tcp
ttcp-r: socket
ttcp-r: accept from 10.10.10.1
load: 0.06  cmd: ttcp 31 [netio] 0.00u 0.02s 0% 264k
load: 0.14  cmd: ttcp 31 [netio] 0.00u 0.02s 0% 264k
load: 0.12  cmd: ttcp 31 [netio] 0.00u 0.02s 0% 264k
load: 0.07  cmd: ttcp 31 [netio] 0.00u 0.02s 0% 264k
^C
[console]<@> # ping 10.10.10.1
PING 10.10.10.1 (10.10.10.1): 48 data bytes
^C
----10.10.10.1 PING Statistics----
11 packets transmitted, 0 packets received, 100.0% packet loss
[console]<@> # ifconfig bge0
bge0: flags=8c43<UP,BROADCAST,RUNNING,OACTIVE,SIMPLEX,MULTICAST> mtu 1500
        capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
        enabled=0<>
        address: 00:08:02:91:89:ae
        media: Ethernet autoselect (1000baseT full-duplex,master)
        status: active
        inet 10.10.10.2 netmask 0xffffff00 broadcast 10.10.10.255
[console]<@> # netstat -i -I bge0
Name  Mtu   Network       Address              Ipkts Ierrs    Opkts Oerrs Colls
bge0  1500  <Link>        00:08:02:91:89:ae      512    19      356     0     0
bge0  1500  10.10.10/24   10.10.10.2             512    19      356     0     0
[console]<@> # netstat -b -I bge0
Name  Mtu   Network       Address               Ibytes     Obytes
bge0  1500  <Link>        00:08:02:91:89:ae     755254      25282
bge0  1500  10.10.10/24   10.10.10.2            755254      25282
[console]<@> # 

>Fix:

	unknown

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: port-alpha-maintainer->thorpej
Responsible-Changed-By: thorpej@NetBSD.org
Responsible-Changed-When: Tue, 13 Oct 2020 15:52:54 +0000
Responsible-Changed-Why:
TAke.


State-Changed-From-To: open->closed
State-Changed-By: thorpej@NetBSD.org
State-Changed-When: Tue, 13 Oct 2020 15:52:54 +0000
State-Changed-Why:
I have recently put lots of traffic though a bge on a DS25.  Please file
a new bug if this continues to be an issue.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: gnats-precook-prs,v 1.4 2018/12/21 14:20:20 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.