NetBSD Problem Report #51133

From martin@duskware.de  Thu May 12 21:09:58 2016
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 4ED457A476
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 12 May 2016 21:09:58 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: KASSERT on shutdown
X-Send-Pr-Version: 3.95

>Number:         51133
>Category:       kern
>Synopsis:       KASSERT on shutdown
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu May 12 21:10:00 +0000 2016
>Last-Modified:  Fri Apr 14 18:35:01 +0000 2023
>Originator:     Martin Husemann
>Release:        NetBSD 7.99.29
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD thirdstage.duskware.de 7.99.29 NetBSD 7.99.29 (MODULAR) #483: Thu May 12 13:25:20 CEST 2016 martin@thirdstage.duskware.de:/usr/src/sys/arch/sparc64/compile/MODULAR sparc64
Architecture: sparc64
Machine: sparc64
>Description:

When shutting down or rebooting a machine I reproducably get:

sd1: detached
brgphy3: detached
brgphy2: detached
scsibus1: detached
atabus1: detached
atabus0: detached
brgphy1: detached
brgphy0: detached
bge3: detached
bge2: detached
Skipping crash dump on recursive panic
panic: kernel diagnostic assertion "!cpu_intr_p()" failed: file "../../../../ker
n/subr_xcall.c", line 351
db{0}> ps
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
24861    1 3   1         0          10653bba0               halt xchicv
1        1 3   1   8020080          104762080               init wait
[..]
0       19 1   1       200          103b6aca0          softnet/1
0    >  18 7   1       201          103b6b0c0             idle/1
0       17 3   1       200          103b6b4e0             sysmon smtaskq
0       16 3   0       200          103b6b900          cryptoret crypto_w
0       15 3   0       200          103b58020         pmfsuspend pmfsuspend
0       14 3   1       200          103b58440           pmfevent pmfevent
0       13 3   1       200          103b58860         sopendfree sopendfr
0       12 3   0       200          103b58c80           nfssilly nfssilly
0       11 3   0       200          103b590a0            cachegc cachegc
0       10 3   0       200          103b594c0              vrele vrele
0        9 3   0       200          103b598e0             vdrain vdrain
0        8 3   1       200          103b48000          modunload mod_unld
0        7 3   0       200          103b48420            xcall/0 xcall
0    >   6 7   0       200          103b48840          softser/0
0        5 1   0       200          103b48c60          softclk/0
0        4 1   0       200          103b49080          softbio/0
0        3 1   0       200          103b494a0          softnet/0
0    >   2 7   0       201          103b498c0             idle/0
0        1 3   1       200            1c8a200            swapper uvm



>How-To-Repeat:
s/a

>Fix:
n/a

>Audit-Trail:
From: Ryota Ozaki <ozaki-r@netbsd.org>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/51133: KASSERT on shutdown
Date: Fri, 13 May 2016 09:36:09 +0900

 On Fri, May 13, 2016 at 6:10 AM,  <martin@netbsd.org> wrote:
 >>Number:         51133
 >>Category:       kern
 >>Synopsis:       KASSERT on shutdown
 >>Confidential:   no
 >>Severity:       critical
 >>Priority:       high
 >>Responsible:    kern-bug-people
 >>State:          open
 >>Class:          sw-bug
 >>Submitter-Id:   net
 >>Arrival-Date:   Thu May 12 21:10:00 +0000 2016
 >>Originator:     Martin Husemann
 >>Release:        NetBSD 7.99.29
 >>Organization:
 > The NetBSD Foundation, Inc.
 >>Environment:
 > System: NetBSD thirdstage.duskware.de 7.99.29 NetBSD 7.99.29 (MODULAR) #483: Thu May 12 13:25:20 CEST 2016 martin@thirdstage.duskware.de:/usr/src/sys/arch/sparc64/compile/MODULAR sparc64
 > Architecture: sparc64
 > Machine: sparc64
 >>Description:
 >
 > When shutting down or rebooting a machine I reproducably get:
 >
 > sd1: detached
 > brgphy3: detached
 > brgphy2: detached
 > scsibus1: detached
 > atabus1: detached
 > atabus0: detached
 > brgphy1: detached
 > brgphy0: detached
 > bge3: detached
 > bge2: detached
 > Skipping crash dump on recursive panic
 > panic: kernel diagnostic assertion "!cpu_intr_p()" failed: file "../../../../ker
 > n/subr_xcall.c", line 351
 > db{0}> ps
 > PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
 > 24861    1 3   1         0          10653bba0               halt xchicv
 > 1        1 3   1   8020080          104762080               init wait
 > [..]
 > 0       19 1   1       200          103b6aca0          softnet/1
 > 0    >  18 7   1       201          103b6b0c0             idle/1
 > 0       17 3   1       200          103b6b4e0             sysmon smtaskq
 > 0       16 3   0       200          103b6b900          cryptoret crypto_w
 > 0       15 3   0       200          103b58020         pmfsuspend pmfsuspend
 > 0       14 3   1       200          103b58440           pmfevent pmfevent
 > 0       13 3   1       200          103b58860         sopendfree sopendfr
 > 0       12 3   0       200          103b58c80           nfssilly nfssilly
 > 0       11 3   0       200          103b590a0            cachegc cachegc
 > 0       10 3   0       200          103b594c0              vrele vrele
 > 0        9 3   0       200          103b598e0             vdrain vdrain
 > 0        8 3   1       200          103b48000          modunload mod_unld
 > 0        7 3   0       200          103b48420            xcall/0 xcall
 > 0    >   6 7   0       200          103b48840          softser/0
 > 0        5 1   0       200          103b48c60          softclk/0
 > 0        4 1   0       200          103b49080          softbio/0
 > 0        3 1   0       200          103b494a0          softnet/0
 > 0    >   2 7   0       201          103b498c0             idle/0
 > 0        1 3   1       200            1c8a200            swapper uvm

 Backtrace?

   ozaki-r

From: Martin Husemann <martin@duskware.de>
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Subject: Re: kern/51133: KASSERT on shutdown
Date: Fri, 13 May 2016 12:21:49 +0200

 On Fri, May 13, 2016 at 09:36:09AM +0900, Ryota Ozaki wrote:
 > Backtrace?

 Not that easy, but I changed the KASSERT to a KASSERTMSG:

 panic: kernel diagnostic assertion "!cpu_intr_p()" failed: file "../../../../kern/subr_xcall.c", line 353 xc__highpri_intr called from interrupt context, func=0x14fcd40 arg1=0x0 arg2=0x0
 db{0}> x/i 0x14fcd40
 netbsd:nullop:  jmpl            [%o7 + 0x8], %g0

 this time we seem to be processing softclock and softser:

 0    >  18 7   1       201          103b6b0c0             idle/1
 0       17 3   0       200          103b6b4e0             sysmon smtaskq
 0       16 3   0       200          103b6b900          cryptoret crypto_w
 0       15 3   0       200          103b58020         pmfsuspend pmfsuspend
 0       14 3   1       200          103b58440           pmfevent pmfevent
 0       13 3   1       200          103b58860         sopendfree sopendfr
 0       12 3   0       200          103b58c80           nfssilly nfssilly
 0       11 3   1       200          103b590a0            cachegc cachegc
 0       10 3   0       200          103b594c0              vrele vrele
 0        9 3   1       200          103b598e0             vdrain vdrain
 0        8 3   1       200          103b48000          modunload mod_unld
 0        7 3   0       200          103b48420            xcall/0 xcall
 0    >   6 7   0       200          103b48840          softser/0
 0    >   5 7   0       200          103b48c60          softclk/0
 0        4 1   0       200          103b49080          softbio/0
 0        3 1   0       200          103b494a0          softnet/0
 0    >   2 7   0       201          103b498c0             idle/0
 0        1 3   0       200            1c8a200            swapper uvm


 ddb doesn't like to print a backtrace, trying to get a crash dump
 I get:

  netbsd:vpanic+0x16c(18806d8, 1ce6ac0, 18893d0, 1b026fc58, 1ce6bc0, 1c67000) fp = 1b026f261
  netbsd:kern_assert+0x34(18893d0, 1b026fc58, 1ce5800, 1ce6ac0, 1ce6800, 4) fp = 1b026f311
  netbsd:xc__highpri_intr+0xc4(18893d0, 18053c0, 1869640, 18893b0, 161, 17f8e08) fp = 1b026f3d1
  netbsd:softint_dispatch+0xf8(1c9a000, 1c8b3a0, 50, 1c8b3a0, 276, aa670ae38cc28e39) fp = 1b026f4a1
  netbsd:softint_fastintr+0x80(0, 4, 103b48840, 0, 1b018e178, 1b018e388) fp = 1b026f571
  netbsd:softint_schedule+0x4(103b48840, 4, 1cdd800, 103b498c0, 0, 2014000) fp = 1b026f621
  netbsd:100030+0(f005eab8, 112f00, 1, f, fedc9c48, 0) fp = fedc9651


 Martin

From: Martin Husemann <martin@duskware.de>
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Subject: Re: kern/51133: KASSERT on shutdown
Date: Sat, 14 May 2016 23:25:34 +0200

 The initiator is subr_pserialize.c:pserialize_perform, not suprisingly,
 as it is the only caller of nullop via XC_HIGHPRI that I could find.

 And this is called from if_detach:

 brgphy3: detached
 brgphy2: detached
 scsibus1: detached
 atabus1: detached
 atabus0: detached
 brgphy1: detached
 brgphy0: detached
 Skipping crash dump on recursive panic
 panic: kernel diagnostic assertion "(boothowto & RB_HALT) == 0" failed: file "../../../../kern/subr_pserialize.c", line 174 
 db{1}> mach stack
 Window 0 frame64 0x1b1b373b0 locals, ins:
 103b5f598 1b1b37418 fffffffffffffff8 1805088 0 ffffffffffffffff a 2
 1805050 1b1b375a8 1ce5400 1ce6a00 1ce6800 4 1b1b36c61=sp 1611094=pc:netbsd:kern_
 assert+0x34
 Window 1 frame64 0x1b1b37460 locals, ins:
 103b09f70 103b09e70 0 1805088 0 0 0 1
 1805050 1805088 1887de8 1887db0 ae 1c8b400 1b1b36d21=sp 1527704=pc:netbsd:pseria
 lize_perform+0x184
 Window 2 frame64 0x1b1b37520 locals, ins:
 1805050 1805088 1887de8 1c8ab08 1ce0c00 4 10476d2d4 1
 103b28510 1887db0 1887c00 1cdd828 14fcc80 1c9a140 1b1b36dd1=sp 15ac700=pc:netbsd
 :if_detach+0xc0
 Window 3 frame64 0x1b1b375d0 locals, ins:
 441d000605 0 ffffffffffffffff 0 e0048000 1 1 0
 1046ba008 0 ffffffffffffffff 0 1046ba390 1c9a800 1b1b370d1=sp 10ddf68=pc:netbsd:
 bge_detach+0x68
 Window 4 frame64 0x1b1b378d0 locals, ins:
 1ce5400 1805050 182b450 182b478 e0048000 ffffffffffffffff a 0
 0 4 ff070000000001 1046ba498 1046ba008 1046ba000 1b1b37181=sp 150e5a4=pc:netbsd:
 config_detach+0xe4

 All looks fine, guess I need to bisect what made icnt depths go off.

 Martin

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Martin Husemann <martin@duskware.de>
Cc: Ryota Ozaki <ozaki-r@netbsd.org>,
	gnats-bugs@NetBSD.org
Subject: Re: kern/51133: KASSERT on shutdown
Date: Fri, 14 Apr 2023 07:59:17 +0000

 > panic: kernel diagnostic assertion "(boothowto & RB_HALT) =3D=3D 0" faile=
 d: file "../../../../kern/subr_pserialize.c", line 174=20

 I think this is a red herring.  bge_detach -> if_detach always leads
 to pserialize_perform, and from your log earlier, bge_detach completed
 successfully before the original panic:

 > bge3: detached
 > bge2: detached
 > Skipping crash dump on recursive panic
 > panic: kernel diagnostic assertion "!cpu_intr_p()" failed: file "../../..=
 /../kern/subr_xcall.c", line 351

 The fact that it's on line 351, presumably from subr_xcall.c 1.18,
 shows that somehow the softint handler is running in hard interrupt
 context, according to cpu_intr_p (xc__highpri_intr is only ever used
 as a softint function):

    344  void
    345  xc__highpri_intr(void *dummy)
    346  {
    347          xc_state_t *xc =3D &xc_high_pri;
    348          void *arg1, *arg2;
    349          xcfunc_t func;
    350 =20
    351          KASSERT(!cpu_intr_p());

 So this could be a buggy softint_dispatch vector.  That's consistent
 with the line from ps saying it's in softser:

 > 0    >   6 7   0       200          103b48840          softser/0

 Here's a wild guess (line numbers from locore.s 1.443, in HEAD):

 https://nxr.netbsd.org/xref/src/sys/arch/sparc64/sparc64/locore.s?r=3D1.433

    4661 	! Increment the per-cpu interrupt depth in case of hardintrs
    4662 	btst	SOFTINT_INT, %l3
    4663 	bnz,pn	%icc, sparc_intr_retry
    4664 	 sethi	%hi(CPUINFO_VA+CI_IDEPTH), %l1
    4665 	ld	[%l1 + %lo(CPUINFO_VA+CI_IDEPTH)], %l2
    4666 	inc	%l2
    4667 	st	%l2, [%l1 + %lo(CPUINFO_VA+CI_IDEPTH)]
    4668=20
    4669 sparc_intr_retry:
 ...
    4763 	/*
    4764 	 * Re-read SOFTINT to see if any new  pending interrupts
    4765 	 * at this level.
    4766 	 */
    4767 	mov	1, %l3			! Ack softint
    4768 	rd	SOFTINT, %l7		! %l5 contains #intr handled.
    4769 	sll	%l3, %l6, %l3		! Generate IRQ mask
    4770 	btst	%l3, %l7		! leave mask in %l3 for retry code
    4771 	bnz,pn	%icc, sparc_intr_retry
    4772 	 mov	1, %l5			! initialize intr count for next run
    4773=20
    4774 	! Decrement this cpu's interrupt depth in case of hardintrs
    4775 	btst	SOFTINT_INT, %l3
    4776 	bnz,pn	%icc, 1f
    4777 	 sethi	%hi(CPUINFO_VA+CI_IDEPTH), %l4
    4778 	ld	[%l4 + %lo(CPUINFO_VA+CI_IDEPTH)], %l5
    4779 	dec	%l5
    4780 	st	%l5, [%l4 + %lo(CPUINFO_VA+CI_IDEPTH)]

 When this re-reads SOFTINT, can it start invoking a new softint
 handler, before decrementing ci_idepth?

 I don't understand this stack trace:

 > netbsd:vpanic+0x16c(18806d8, 1ce6ac0, 18893d0, 1b026fc58, 1ce6bc0, 1c6700=
 0) fp =3D 1b026f261
 > netbsd:kern_assert+0x34(18893d0, 1b026fc58, 1ce5800, 1ce6ac0, 1ce6800, 4)=
  fp =3D 1b026f311
 > netbsd:xc__highpri_intr+0xc4(18893d0, 18053c0, 1869640, 18893b0, 161, 17f=
 8e08) fp =3D 1b026f3d1
 > netbsd:softint_dispatch+0xf8(1c9a000, 1c8b3a0, 50, 1c8b3a0, 276, aa670ae3=
 8cc28e39) fp =3D 1b026f4a1
 > netbsd:softint_fastintr+0x80(0, 4, 103b48840, 0, 1b018e178, 1b018e388) fp=
  =3D 1b026f571
 > netbsd:softint_schedule+0x4(103b48840, 4, 1cdd800, 103b498c0, 0, 2014000)=
  fp =3D 1b026f621

 softint_schedule+0x4 might be the call to kpreempt_disabled?  But I
 don't see how it could lead to softint_fastintr -- surely there should
 be an interrupt frame, or if nothing else, a frame with a return
 address pointing into sparc_interrupt?

From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc: Taylor R Campbell <riastradh@NetBSD.org>
Subject: Re: kern/51133: KASSERT on shutdown
Date: Fri, 14 Apr 2023 21:07:20 +0300

 On Fri, Apr 14, 2023 at 08:00:03 +0000, Taylor R Campbell wrote:

 >  I don't understand this stack trace:

 Could this be the fallout from:

   https://mail-index.netbsd.org/tech-kern/2022/12/09/msg028572.html

 for which I forgot to file a PR and never got about to fix
 properly... :(

 -uwe

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Valery Ushakov <uwe@stderr.spb.ru>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/51133: KASSERT on shutdown
Date: Fri, 14 Apr 2023 18:31:09 +0000

 > Date: Fri, 14 Apr 2023 21:07:20 +0300
 > From: Valery Ushakov <uwe@stderr.spb.ru>
 > 
 > On Fri, Apr 14, 2023 at 08:00:03 +0000, Taylor R Campbell wrote:
 > 
 > >  I don't understand this stack trace:
 > 
 > Could this be the fallout from:
 > 
 >   https://mail-index.netbsd.org/tech-kern/2022/12/09/msg028572.html
 > 
 > for which I forgot to file a PR and never got about to fix
 > properly... :(

 Unlikely, because the stack trace I quoted is from 2016!  (But don't
 let that get in the way of fixing something else in ddb!)

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.