NetBSD Problem Report #53265

From gson@gson.org  Sun May  6 12:03:53 2018
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 58E0B7A272
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  6 May 2018 12:03:53 +0000 (UTC)
Message-Id: <20180506120347.44B6898B44B@guava.gson.org>
Date: Sun,  6 May 2018 15:03:47 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: panic in bnx_detach() on shutdown
X-Send-Pr-Version: 3.95

>Number:         53265
>Category:       kern
>Synopsis:       panic in bnx_detach() on shutdown
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    msaitoh
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun May 06 12:05:00 +0000 2018
>Closed-Date:    Tue May 08 04:15:34 +0000 2018
>Last-Modified:  Wed May 09 14:55:01 +0000 2018
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date 2018.05.04.14.15.41
>Organization:

>Environment:
System: NetBSD 
Architecture: x86_64
Machine: amd64
>Description:

Seen on the serial console shutting down an 8-core amd64 machine
running a fresh -current:

[ 63426.6135600] syncing disks... done
[ 63428.0241184] cd0: detached
[ 63428.0608678] brgphy3: detached
[ 63428.0983665] brgphy2: detached
[ 63428.1358664] brgphy1: detached
[ 63428.1733649] brgphy0: detached
[ 63428.2108644] atapibus0: detached
[ 63428.2442055] uhub5: detached
[ 63428.2842215] uhub4: detached
[ 63428.3142328] uhub2: detached
[ 63428.3542487] uhub1: detached
[ 63428.3842605] uhub0: detached
[ 63428.4242764] com1: detached
[ 63428.4642922] bnx3: detached
[ 63428.5043087] bnx2: detached
[ 63428.5443239] Skipping crash dump on recursive panic
[ 63428.5943436] panic: kernel diagnostic assertion "c->c_cpu->cc_lwp == curlwp || c->c_cpu->cc_active != c" failed: file "/tmp/bracket/build/2018.05.04.14.15\
.41-amd64-debug-baremetal/src/sys/kern/kern_timeout.c", line 318
[ 63428.8344384] cpu7: Begin traceback...
[ 63428.8744542] vpanic() at netbsd:vpanic+0x16f
[ 63428.9344780] ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
[ 63429.0045057] callout_destroy() at netbsd:callout_destroy+0x75
[ 63429.0745334] bnx_detach() at netbsd:bnx_detach+0xbb
[ 63429.1345572] config_detach() at netbsd:config_detach+0x121
[ 63429.2045849] config_detach_all() at netbsd:config_detach_all+0x97
[ 63429.2746126] cpu_reboot() at netbsd:cpu_reboot+0x19a
[ 63429.3346364] sys_reboot() at netbsd:sys_reboot+0x85
[ 63429.3946602] syscall() at netbsd:syscall+0x208
[ 63429.4446800] --- syscall (number 208) ---
[ 63429.4946998] 74ed2443ebda:
[ 63429.5347157] cpu7: End traceback...
[ 63429.5847356] rebooting...

No harm done, but it's a bug nonetheless...

>How-To-Repeat:

Only happened once so far.

>Fix:

>Release-Note:

>Audit-Trail:
From: Masanobu SAITOH <msaitoh@execsw.org>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: msaitoh@execsw.org
Subject: Re: kern/53265: panic in bnx_detach() on shutdown
Date: Mon, 7 May 2018 19:09:01 +0900

 On 2018/05/06 21:05, Andreas Gustafsson wrote:
 >> Number:         53265
 >> Category:       kern
 >> Synopsis:       panic in bnx_detach() on shutdown
 >> Confidential:   no
 >> Severity:       non-critical
 >> Priority:       low
 >> Responsible:    kern-bug-people
 >> State:          open
 >> Class:          sw-bug
 >> Submitter-Id:   net
 >> Arrival-Date:   Sun May 06 12:05:00 +0000 2018
 >> Originator:     Andreas Gustafsson
 >> Release:        NetBSD-current, source date 2018.05.04.14.15.41
 >> Organization:
 > 
 >> Environment:
 > System: NetBSD
 > Architecture: x86_64
 > Machine: amd64
 >> Description:
 > 
 > Seen on the serial console shutting down an 8-core amd64 machine
 > running a fresh -current:
 > 
 > [ 63426.6135600] syncing disks... done
 > [ 63428.0241184] cd0: detached
 > [ 63428.0608678] brgphy3: detached
 > [ 63428.0983665] brgphy2: detached
 > [ 63428.1358664] brgphy1: detached
 > [ 63428.1733649] brgphy0: detached
 > [ 63428.2108644] atapibus0: detached
 > [ 63428.2442055] uhub5: detached
 > [ 63428.2842215] uhub4: detached
 > [ 63428.3142328] uhub2: detached
 > [ 63428.3542487] uhub1: detached
 > [ 63428.3842605] uhub0: detached
 > [ 63428.4242764] com1: detached
 > [ 63428.4642922] bnx3: detached
 > [ 63428.5043087] bnx2: detached
 > [ 63428.5443239] Skipping crash dump on recursive panic
 > [ 63428.5943436] panic: kernel diagnostic assertion "c->c_cpu->cc_lwp == curlwp || c->c_cpu->cc_active != c" failed: file "/tmp/bracket/build/2018.05.04.14.15\
 > .41-amd64-debug-baremetal/src/sys/kern/kern_timeout.c", line 318
 > [ 63428.8344384] cpu7: Begin traceback...
 > [ 63428.8744542] vpanic() at netbsd:vpanic+0x16f
 > [ 63428.9344780] ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
 > [ 63429.0045057] callout_destroy() at netbsd:callout_destroy+0x75
 > [ 63429.0745334] bnx_detach() at netbsd:bnx_detach+0xbb
 > [ 63429.1345572] config_detach() at netbsd:config_detach+0x121
 > [ 63429.2045849] config_detach_all() at netbsd:config_detach_all+0x97
 > [ 63429.2746126] cpu_reboot() at netbsd:cpu_reboot+0x19a
 > [ 63429.3346364] sys_reboot() at netbsd:sys_reboot+0x85
 > [ 63429.3946602] syscall() at netbsd:syscall+0x208
 > [ 63429.4446800] --- syscall (number 208) ---
 > [ 63429.4946998] 74ed2443ebda:
 > [ 63429.5347157] cpu7: End traceback...
 > [ 63429.5847356] rebooting...
 > 
 > No harm done, but it's a bug nonetheless..

   Even if you do "shutdown -h", it doesn't halt and reboot.


 >> How-To-Repeat:
 > 
 > Only happened once so far.
 > 
 >> Fix:
 > 

   How often does it panic on shutdown? Could you test the following patch
 to verify the problem is fixed?

 ---------------------------
 - Fix a bug that bnx(4) panic on shutdown. Reported by Andreas Gustafsson in
    PR#53265.
 - Make sure not to re-arm the callout when we are about to detach. Same as
    if_bge.c rev. 1.292.
 - Use pci_intr_establish_xname().
 ---------------------------
 Index: if_bnxvar.h
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/pci/if_bnxvar.h,v
 retrieving revision 1.6
 diff -u -p -r1.6 if_bnxvar.h
 --- if_bnxvar.h	1 Jul 2014 17:11:35 -0000	1.6
 +++ if_bnxvar.h	7 May 2018 10:03:56 -0000
 @@ -210,6 +210,7 @@ struct bnx_softc
   	uint32_t		tx_prod_bseq;	/* Counts the bytes used.  */

   	struct callout		bnx_timeout;
 +	int			bnx_detaching;

   	/* Frame size and mbuf allocation size for RX frames. */
   	uint32_t		max_frame_size;
 Index: if_bnx.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/pci/if_bnx.c,v
 retrieving revision 1.63
 diff -u -p -r1.63 if_bnx.c
 --- if_bnx.c	8 Feb 2018 09:05:19 -0000	1.63
 +++ if_bnx.c	7 May 2018 10:03:59 -0000
 @@ -792,7 +792,8 @@ bnx_attach(device_t parent, device_t sel
   	    IFCAP_CSUM_UDPv4_Tx | IFCAP_CSUM_UDPv4_Rx;

   	/* Hookup IRQ last. */
 -	sc->bnx_intrhand = pci_intr_establish(pc, ih, IPL_NET, bnx_intr, sc);
 +	sc->bnx_intrhand = pci_intr_establish_xname(pc, ih, IPL_NET, bnx_intr,
 +	    sc, device_xname(self));
   	if (sc->bnx_intrhand == NULL) {
   		aprint_error_dev(self, "couldn't establish interrupt");
   		if (intrstr != NULL)
 @@ -890,17 +891,7 @@ bnx_detach(device_t dev, int flags)

   	/* Stop and reset the controller. */
   	s = splnet();
 -	if (ifp->if_flags & IFF_RUNNING)
 -		bnx_stop(ifp, 1);
 -	else {
 -		/* Disable the transmit/receive blocks. */
 -		REG_WR(sc, BNX_MISC_ENABLE_CLR_BITS, 0x5ffffff);
 -		REG_RD(sc, BNX_MISC_ENABLE_CLR_BITS);
 -		DELAY(20);
 -		bnx_disable_intr(sc);
 -		bnx_reset(sc, BNX_DRV_MSG_CODE_RESET);
 -	}
 -
 +	bnx_stop(ifp, 1);
   	splx(s);

   	pmf_device_deregister(dev);
 @@ -3371,10 +3362,11 @@ bnx_stop(struct ifnet *ifp, int disable)

   	DBPRINT(sc, BNX_VERBOSE_RESET, "Entering %s()\n", __func__);

 -	if ((ifp->if_flags & IFF_RUNNING) == 0)
 -		return;
 -
 -	callout_stop(&sc->bnx_timeout);
 +	if (disable) {
 +		sc->bnx_detaching = 1;
 +		callout_halt(&sc->bnx_timeout, NULL);
 +	} else
 +		callout_stop(&sc->bnx_timeout);

   	mii_down(&sc->bnx_mii);

 @@ -5694,9 +5686,6 @@ bnx_tick(void *xsc)
   	/* Update the statistics from the hardware statistics block. */
   	bnx_stats_update(sc);

 -	/* Schedule the next tick. */
 -	callout_reset(&sc->bnx_timeout, hz, bnx_tick, sc);
 -
   	mii = &sc->bnx_mii;
   	mii_tick(mii);

 @@ -5707,6 +5696,11 @@ bnx_tick(void *xsc)
   	bnx_get_buf(sc, &prod, &chain_prod, &prod_bseq);
   	sc->rx_prod = prod;
   	sc->rx_prod_bseq = prod_bseq;
 +
 +	/* Schedule the next tick. */
 +	if (!sc->bnx_detaching)
 +		callout_reset(&sc->bnx_timeout, hz, bnx_tick, sc);
 +
   	splx(s);
   	return;
   }


 The same diff is at:

 	http://www.netbsd.org/~msaitoh/bnx-20180507-0.dif


 -- 
 -----------------------------------------------
                  SAITOH Masanobu (msaitoh@execsw.org
                                   msaitoh@netbsd.org)

Responsible-Changed-From-To: kern-bug-people->msaitoh
Responsible-Changed-By: msaitoh@NetBSD.org
Responsible-Changed-When: Mon, 07 May 2018 10:11:38 +0000
Responsible-Changed-Why:
mine.


From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53265: panic in bnx_detach() on shutdown
Date: Mon, 7 May 2018 17:42:32 +0300

 Masanobu SAITOH wrote:
 > How often does it panic on shutdown?

 It has only happened once.

 > Could you test the following patch
 > to verify the problem is fixed?

 I tested the patch and was able to shut down the system without a
 panic, and did not notice any other problems, either. Since I have
 also been able to shut it down without a panic many times without the
 patch, this isn't conclusive proof that the problem is fixed, but at
 least the patch appears not to break anything.
 -- 
 Andreas Gustafsson, gson@gson.org

State-Changed-From-To: open->closed
State-Changed-By: msaitoh@NetBSD.org
State-Changed-When: Tue, 08 May 2018 04:15:34 +0000
State-Changed-Why:
Fixed. Thanks.


From: Masanobu SAITOH <msaitoh@execsw.org>
To: gnats-bugs@NetBSD.org, msaitoh@NetBSD.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org, Andreas Gustafsson <gson@gson.org>
Cc: msaitoh@execsw.org
Subject: Re: kern/53265: panic in bnx_detach() on shutdown
Date: Tue, 8 May 2018 13:13:55 +0900

 On 2018/05/07 23:45, Andreas Gustafsson wrote:
 > The following reply was made to PR kern/53265; it has been noted by GNATS.
 > 
 > From: Andreas Gustafsson <gson@gson.org>
 > To: gnats-bugs@NetBSD.org
 > Cc:
 > Subject: Re: kern/53265: panic in bnx_detach() on shutdown
 > Date: Mon, 7 May 2018 17:42:32 +0300
 > 
 >   Masanobu SAITOH wrote:
 >   > How often does it panic on shutdown?
 >   
 >   It has only happened once.
 >   
 >   > Could you test the following patch
 >   > to verify the problem is fixed?
 >   
 >   I tested the patch and was able to shut down the system without a
 >   panic, and did not notice any other problems, either. Since I have
 >   also been able to shut it down without a panic many times without the
 >   patch, this isn't conclusive proof that the problem is fixed

   Destroying callout without stopping is a bug and the stack trace
 say so. I committed the change and it won't happen anymore.

   Thank you for the report.

 >   , but at
 >   least the patch appears not to break anything.
 >   --
 >   Andreas Gustafsson, gson@gson.org
 >   
 > 


 -- 
 -----------------------------------------------
                  SAITOH Masanobu (msaitoh@execsw.org
                                   msaitoh@netbsd.org)

From: "SAITOH Masanobu" <msaitoh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53265 CVS commit: src/sys/dev/pci
Date: Tue, 8 May 2018 04:11:10 +0000

 Module Name:	src
 Committed By:	msaitoh
 Date:		Tue May  8 04:11:10 UTC 2018

 Modified Files:
 	src/sys/dev/pci: if_bnx.c if_bnxvar.h

 Log Message:
 - Fix a bug that bnx(4) panics on shutdown. Stop callout before restroy.
   Reported by Andreas Gustafsson in PR#53265.
 - Make sure not to re-arm the callout when we are about to detach. Same as
   if_bge.c rev. 1.292.
 - Use pci_intr_establish_xname().


 To generate a diff of this commit:
 cvs rdiff -u -r1.63 -r1.64 src/sys/dev/pci/if_bnx.c
 cvs rdiff -u -r1.6 -r1.7 src/sys/dev/pci/if_bnxvar.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53265 CVS commit: [netbsd-8] src/sys/dev/pci
Date: Wed, 9 May 2018 14:52:40 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Wed May  9 14:52:40 UTC 2018

 Modified Files:
 	src/sys/dev/pci [netbsd-8]: if_bnx.c if_bnxvar.h

 Log Message:
 Pull up following revision(s) (requested by msaitoh in ticket #814):

 	sys/dev/pci/if_bnxvar.h: revision 1.7
 	sys/dev/pci/if_bnx.c: revision 1.64

 - Fix a bug that bnx(4) panics on shutdown. Stop callout before restroy.
    Reported by Andreas Gustafsson in PR#53265.
 - Make sure not to re-arm the callout when we are about to detach. Same as
    if_bge.c rev. 1.292.
 - Use pci_intr_establish_xname().


 To generate a diff of this commit:
 cvs rdiff -u -r1.61.8.1 -r1.61.8.2 src/sys/dev/pci/if_bnx.c
 cvs rdiff -u -r1.6 -r1.6.22.1 src/sys/dev/pci/if_bnxvar.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.