NetBSD Problem Report #55974

From hannken@eis.cs.tu-bs.de  Thu Feb  4 10:12:28 2021
Return-Path: <hannken@eis.cs.tu-bs.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 6D4341A9217
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  4 Feb 2021 10:12:28 +0000 (UTC)
Message-Id: <20210204101219.BA20ECBAE9@builder.isf.cs.tu-bs.de>
Date: Thu,  4 Feb 2021 11:12:19 +0100 (MET)
From: hannken@eis.cs.tu-bs.de
Reply-To: hannken@eis.cs.tu-bs.de
To: gnats-bugs@NetBSD.org
Subject: Broadcom NetXtreme II BCM5708 stopped working
X-Send-Pr-Version: 3.95

>Number:         55974
>Category:       kern
>Synopsis:       Broadcom NetXtreme II BCM5708 stopped working
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    jdolecek
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Feb 04 10:15:00 +0000 2021
>Closed-Date:    Sat Feb 13 11:04:01 +0000 2021
>Last-Modified:  Sat Feb 13 11:04:01 +0000 2021
>Originator:     Juergen Hannken-Illjes
>Release:        NetBSD 9.99.77
>Organization:

>Environment:
System: NetBSD poweredge.isf.cs.tu-bs.de 9.99.77 NetBSD 9.99.77 (generic-dlog.amd64) #5: Wed Feb  3 16:21:58 MET 2021  hannken@builder.isf.cs.tu-bs.de:/work/build/obj/obj.amd64/sys/arch/amd64/compile/generic-dlog.amd64 amd64
Architecture: x86_64
Machine: amd64
>Description:
Since MSI/MSI-X got enabled for if_bnx with this commit:

Module Name:	src
Committed By:	jdolecek
Date:		Sun Jul 12 19:05:32 UTC 2020

Modified Files:
	src/sys/dev/pci: if_bnx.c if_bnxvar.h

Log Message:
	enable MSI/MSI-X if supported by adapter

	tested MSI-X with Broadcom NetXtreme II BCM5709 1000Base-T

the Dell PowerEdge 2950 with two Broadcom NetXtreme II BCM5708 1000Base-T
seems to loose tx interrupts and its watchdog fires.

[  68.1828359] bnx0: Watchdog timeout -- resetting!
[  88.6042909] bnx0: Watchdog timeout -- resetting!
[ 119.0265230] bnx0: Watchdog timeout -- resetting!
[ 145.4484562] bnx0: Watchdog timeout -- resetting!

Dmesg before is:

[     1.017306] pci4 at ppb3 bus 7
[     1.017306] pci4: i/o space, memory space enabled, rd/line, wr/inv ok
[     1.017306] bnx0 at pci4 dev 0 function 0: Broadcom NetXtreme II BCM5708 1000Base-T
[     1.017306] bnx0: Ethernet address 00:24:e8:67:4b:db
[     1.017306] bnx0: ASIC BCM5708 B2 (0x57081020)
[     1.017306] bnx0: PCI-X 64bit 133MHz
[     1.017306] bnx0: B/C (4.6.0); Bufs (RX:2;TX:2); Flags (MFW); MFW (ipms 1.6.0)
[     1.017306] bnx0: Coal (RX:6,6,18,18; TX:20,20,80,80)
[     1.017306] brgphy0 at bnx0 phy 1: BCM5708C 1000BASE-T media interface, rev. 6
[     1.017306] brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
[     1.017306] bnx0: interrupting at ioapic0 pin 16

while dmesg after is:

[     1.017262] pci4 at ppb3 bus 7
[     1.017262] pci4: i/o space, memory space enabled, rd/line, wr/inv ok
[     1.017262] bnx0 at pci4 dev 0 function 0: Broadcom NetXtreme II BCM5708 1000Base-T
[     1.017262] bnx0: Ethernet address 00:24:e8:67:4b:db
[     1.017262] bnx0: ASIC BCM5708 B2 (0x57081020)
[     1.017262] bnx0: PCI-X 64bit 133MHz
[     1.017262] bnx0: B/C (4.6.0); Bufs (RX:2;TX:2); Flags (MFW); MFW (ipms 1.6.0)
[     1.017262] bnx0: Coal (RX:6,6,18,18; TX:20,20,80,80)
[     1.017262] brgphy0 at bnx0 phy 1: BCM5708C 1000BASE-T media interface, rev. 6
[     1.017262] brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
[     1.017262] bnx0: interrupting at msi0 vec 0

Pcictl dump gives (in the MSI-X case):

PCI Message Signaled Interrupt
Message Control register: 0x0081
MSI Enabled: on
Multiple Message Capable: no (1 vector)
Multiple Message Enabled: off (1 vector)
64 Bit Address Capable: on
Per-Vector Masking Capable: off
Extended Message Data Capable: off
Extended Message Data Enable: off
Message Address (lower) register: 0xfee00000
Message Address (upper) register: 0x00000000
Message Data register: 0x0064
>How-To-Repeat:
Install a -current kernel and try to use the network.
>Fix:
unknown

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Thu, 04 Feb 2021 10:18:07 +0000
Responsible-Changed-Why:
Over to committer.
Please fix or revert, I'm able to test patches as it is a test machine.


From: "John D. Baker" <jdbaker@consolidated.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/55974: Broadcom NetXtreme II BCM5708 stopped working
Date: Thu, 4 Feb 2021 10:58:36 -0600 (CST)

 Also seen on IBM System X3650:

 The watchdog messages were ever only written to the console, so I don't
 have those, but they appear the same as already shown in this PR.

 "dmesg.boot" excerpts:

 ppb10 at pci0 dev 28 function 0: Intel 63xxESB PCI Express Port #1 (rev. 0x09)
 ppb10: PCI Express capability version 1 <Root Port of PCI-E Root Complex> x1 @ 2
 .5GT/s
 pci11 at ppb10 bus 2
 pci11: i/o space, memory space enabled, rd/line, wr/inv ok
 ppb11 at pci11 dev 0 function 0: ServerWorks BCM5714/BCM5715 Integral PCI-E to P
 CI-X Bridge (rev. 0xc3)
 ppb11: PCI Express capability version 1 <PCI-E to PCI/PCI-X Bridge>
 pci12 at ppb11 bus 3
 pci12: i/o space, memory space enabled, rd/line, wr/inv ok
 bnx0 at pci12 dev 0 function 0: Broadcom NetXtreme II BCM5708 1000Base-T
 bnx0: autoconfiguration error: /x/current/src/sys/dev/pci/if_bnx.c(716): Management firmware enabled but not running!
 bnx0: Ethernet address 00:1a:64:c5:52:c0
 bnx0: ASIC BCM5708 B2 (0x57081020)
 bnx0: PCI-X 64bit 133MHz
 bnx0: B/C (3.4.4); Bufs (RX:2;TX:2); Flags (MFW); MFW (NOT RUNNING!)
 bnx0: Coal (RX:6,6,18,18; TX:20,20,80,80)
 brgphy0 at bnx0 phy 1: BCM5708C 1000BASE-T media interface, rev. 6
 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
 bnx0: interrupting at msi0 vec 0

 ppb12 at pci0 dev 28 function 1: Intel 63xxESB PCI Express Port #2 (rev. 0x09)
 ppb12: PCI Express capability version 1 <Root Port of PCI-E Root Complex> x1 @ 2.5GT/s
 pci13 at ppb12 bus 5
 ppb13 at pci13 dev 0 function 0: ServerWorks BCM5714/BCM5715 Integral PCI-E to PCI-X Bridge (rev. 0xc3)
 ppb13: PCI Express capability version 1 <PCI-E to PCI/PCI-X Bridge>
 pci14 at ppb13 bus 6
 pci14: i/o space, memory space enabled, rd/line, wr/inv ok
 bnx1 at pci14 dev 0 function 0: Broadcom NetXtreme II BCM5708 1000Base-T
 bnx1: autoconfiguration error: /x/current/src/sys/dev/pci/if_bnx.c(716): Management firmware enabled but not running!
 bnx1: Ethernet address 00:1a:64:c5:52:c2
 bnx1: ASIC BCM5708 B2 (0x57081020)
 bnx1: PCI-X 64bit 133MHz
 bnx1: B/C (3.4.4); Bufs (RX:2;TX:2); Flags (MFW); MFW (NOT RUNNING!)
 bnx1: Coal (RX:6,6,18,18; TX:20,20,80,80)
 brgphy1 at bnx1 phy 1: BCM5708C 1000BASE-T media interface, rev. 6
 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
 bnx1: interrupting at msi1 vec 0

 Problem is compounded by the fact that this machine only runs -current
 via netboot/NFS root.

 I don't have any dmesg from a working -current as the machine is only
 rarely powered up (due to heat production/power consumption).

 I have a number of other machines with bnx(4) interfaces (mostly Dell),
 so they will probably behave similarly but I have no space to set them
 up for testing..

 There seems to be a systemic problem with PCI-X devices and/or the
 MSI/MSI-X support in NetBSD (see kern/55115).  Perhaps that in each
 case, the PCI-X device is behind a PCIe<->PCI-X bridge?

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]consolidated[flyspeck]net  OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: "Jonathan A. Kollasch" <jakllsch@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55974 CVS commit: src/sys/dev/pci
Date: Sat, 13 Feb 2021 01:51:24 +0000

 Module Name:	src
 Committed By:	jakllsch
 Date:		Sat Feb 13 01:51:24 UTC 2021

 Modified Files:
 	src/sys/dev/pci: if_bnx.c

 Log Message:
 Revert bnx(4) to INTx interrupts.

 Should fix PR kern/55974.

 This driver does not yet do the special MSI and MSI-X setup that the
 chip apparently requires.


 To generate a diff of this commit:
 cvs rdiff -u -r1.105 -r1.106 src/sys/dev/pci/if_bnx.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Sat, 13 Feb 2021 11:04:01 +0000
State-Changed-Why:
Fixed with Rev. 1.106 of if_bnx.c -- Thanks.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.