NetBSD Problem Report #51133
From martin@duskware.de Thu May 12 21:09:58 2016
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 4ED457A476
for <gnats-bugs@gnats.NetBSD.org>; Thu, 12 May 2016 21:09:58 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: KASSERT on shutdown
X-Send-Pr-Version: 3.95
>Number: 51133
>Category: kern
>Synopsis: KASSERT on shutdown
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu May 12 21:10:00 +0000 2016
>Last-Modified: Fri Apr 14 18:35:01 +0000 2023
>Originator: Martin Husemann
>Release: NetBSD 7.99.29
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD thirdstage.duskware.de 7.99.29 NetBSD 7.99.29 (MODULAR) #483: Thu May 12 13:25:20 CEST 2016 martin@thirdstage.duskware.de:/usr/src/sys/arch/sparc64/compile/MODULAR sparc64
Architecture: sparc64
Machine: sparc64
>Description:
When shutting down or rebooting a machine I reproducably get:
sd1: detached
brgphy3: detached
brgphy2: detached
scsibus1: detached
atabus1: detached
atabus0: detached
brgphy1: detached
brgphy0: detached
bge3: detached
bge2: detached
Skipping crash dump on recursive panic
panic: kernel diagnostic assertion "!cpu_intr_p()" failed: file "../../../../ker
n/subr_xcall.c", line 351
db{0}> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
24861 1 3 1 0 10653bba0 halt xchicv
1 1 3 1 8020080 104762080 init wait
[..]
0 19 1 1 200 103b6aca0 softnet/1
0 > 18 7 1 201 103b6b0c0 idle/1
0 17 3 1 200 103b6b4e0 sysmon smtaskq
0 16 3 0 200 103b6b900 cryptoret crypto_w
0 15 3 0 200 103b58020 pmfsuspend pmfsuspend
0 14 3 1 200 103b58440 pmfevent pmfevent
0 13 3 1 200 103b58860 sopendfree sopendfr
0 12 3 0 200 103b58c80 nfssilly nfssilly
0 11 3 0 200 103b590a0 cachegc cachegc
0 10 3 0 200 103b594c0 vrele vrele
0 9 3 0 200 103b598e0 vdrain vdrain
0 8 3 1 200 103b48000 modunload mod_unld
0 7 3 0 200 103b48420 xcall/0 xcall
0 > 6 7 0 200 103b48840 softser/0
0 5 1 0 200 103b48c60 softclk/0
0 4 1 0 200 103b49080 softbio/0
0 3 1 0 200 103b494a0 softnet/0
0 > 2 7 0 201 103b498c0 idle/0
0 1 3 1 200 1c8a200 swapper uvm
>How-To-Repeat:
s/a
>Fix:
n/a
>Audit-Trail:
From: Ryota Ozaki <ozaki-r@netbsd.org>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/51133: KASSERT on shutdown
Date: Fri, 13 May 2016 09:36:09 +0900
On Fri, May 13, 2016 at 6:10 AM, <martin@netbsd.org> wrote:
>>Number: 51133
>>Category: kern
>>Synopsis: KASSERT on shutdown
>>Confidential: no
>>Severity: critical
>>Priority: high
>>Responsible: kern-bug-people
>>State: open
>>Class: sw-bug
>>Submitter-Id: net
>>Arrival-Date: Thu May 12 21:10:00 +0000 2016
>>Originator: Martin Husemann
>>Release: NetBSD 7.99.29
>>Organization:
> The NetBSD Foundation, Inc.
>>Environment:
> System: NetBSD thirdstage.duskware.de 7.99.29 NetBSD 7.99.29 (MODULAR) #483: Thu May 12 13:25:20 CEST 2016 martin@thirdstage.duskware.de:/usr/src/sys/arch/sparc64/compile/MODULAR sparc64
> Architecture: sparc64
> Machine: sparc64
>>Description:
>
> When shutting down or rebooting a machine I reproducably get:
>
> sd1: detached
> brgphy3: detached
> brgphy2: detached
> scsibus1: detached
> atabus1: detached
> atabus0: detached
> brgphy1: detached
> brgphy0: detached
> bge3: detached
> bge2: detached
> Skipping crash dump on recursive panic
> panic: kernel diagnostic assertion "!cpu_intr_p()" failed: file "../../../../ker
> n/subr_xcall.c", line 351
> db{0}> ps
> PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
> 24861 1 3 1 0 10653bba0 halt xchicv
> 1 1 3 1 8020080 104762080 init wait
> [..]
> 0 19 1 1 200 103b6aca0 softnet/1
> 0 > 18 7 1 201 103b6b0c0 idle/1
> 0 17 3 1 200 103b6b4e0 sysmon smtaskq
> 0 16 3 0 200 103b6b900 cryptoret crypto_w
> 0 15 3 0 200 103b58020 pmfsuspend pmfsuspend
> 0 14 3 1 200 103b58440 pmfevent pmfevent
> 0 13 3 1 200 103b58860 sopendfree sopendfr
> 0 12 3 0 200 103b58c80 nfssilly nfssilly
> 0 11 3 0 200 103b590a0 cachegc cachegc
> 0 10 3 0 200 103b594c0 vrele vrele
> 0 9 3 0 200 103b598e0 vdrain vdrain
> 0 8 3 1 200 103b48000 modunload mod_unld
> 0 7 3 0 200 103b48420 xcall/0 xcall
> 0 > 6 7 0 200 103b48840 softser/0
> 0 5 1 0 200 103b48c60 softclk/0
> 0 4 1 0 200 103b49080 softbio/0
> 0 3 1 0 200 103b494a0 softnet/0
> 0 > 2 7 0 201 103b498c0 idle/0
> 0 1 3 1 200 1c8a200 swapper uvm
Backtrace?
ozaki-r
From: Martin Husemann <martin@duskware.de>
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Subject: Re: kern/51133: KASSERT on shutdown
Date: Fri, 13 May 2016 12:21:49 +0200
On Fri, May 13, 2016 at 09:36:09AM +0900, Ryota Ozaki wrote:
> Backtrace?
Not that easy, but I changed the KASSERT to a KASSERTMSG:
panic: kernel diagnostic assertion "!cpu_intr_p()" failed: file "../../../../kern/subr_xcall.c", line 353 xc__highpri_intr called from interrupt context, func=0x14fcd40 arg1=0x0 arg2=0x0
db{0}> x/i 0x14fcd40
netbsd:nullop: jmpl [%o7 + 0x8], %g0
this time we seem to be processing softclock and softser:
0 > 18 7 1 201 103b6b0c0 idle/1
0 17 3 0 200 103b6b4e0 sysmon smtaskq
0 16 3 0 200 103b6b900 cryptoret crypto_w
0 15 3 0 200 103b58020 pmfsuspend pmfsuspend
0 14 3 1 200 103b58440 pmfevent pmfevent
0 13 3 1 200 103b58860 sopendfree sopendfr
0 12 3 0 200 103b58c80 nfssilly nfssilly
0 11 3 1 200 103b590a0 cachegc cachegc
0 10 3 0 200 103b594c0 vrele vrele
0 9 3 1 200 103b598e0 vdrain vdrain
0 8 3 1 200 103b48000 modunload mod_unld
0 7 3 0 200 103b48420 xcall/0 xcall
0 > 6 7 0 200 103b48840 softser/0
0 > 5 7 0 200 103b48c60 softclk/0
0 4 1 0 200 103b49080 softbio/0
0 3 1 0 200 103b494a0 softnet/0
0 > 2 7 0 201 103b498c0 idle/0
0 1 3 0 200 1c8a200 swapper uvm
ddb doesn't like to print a backtrace, trying to get a crash dump
I get:
netbsd:vpanic+0x16c(18806d8, 1ce6ac0, 18893d0, 1b026fc58, 1ce6bc0, 1c67000) fp = 1b026f261
netbsd:kern_assert+0x34(18893d0, 1b026fc58, 1ce5800, 1ce6ac0, 1ce6800, 4) fp = 1b026f311
netbsd:xc__highpri_intr+0xc4(18893d0, 18053c0, 1869640, 18893b0, 161, 17f8e08) fp = 1b026f3d1
netbsd:softint_dispatch+0xf8(1c9a000, 1c8b3a0, 50, 1c8b3a0, 276, aa670ae38cc28e39) fp = 1b026f4a1
netbsd:softint_fastintr+0x80(0, 4, 103b48840, 0, 1b018e178, 1b018e388) fp = 1b026f571
netbsd:softint_schedule+0x4(103b48840, 4, 1cdd800, 103b498c0, 0, 2014000) fp = 1b026f621
netbsd:100030+0(f005eab8, 112f00, 1, f, fedc9c48, 0) fp = fedc9651
Martin
From: Martin Husemann <martin@duskware.de>
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Subject: Re: kern/51133: KASSERT on shutdown
Date: Sat, 14 May 2016 23:25:34 +0200
The initiator is subr_pserialize.c:pserialize_perform, not suprisingly,
as it is the only caller of nullop via XC_HIGHPRI that I could find.
And this is called from if_detach:
brgphy3: detached
brgphy2: detached
scsibus1: detached
atabus1: detached
atabus0: detached
brgphy1: detached
brgphy0: detached
Skipping crash dump on recursive panic
panic: kernel diagnostic assertion "(boothowto & RB_HALT) == 0" failed: file "../../../../kern/subr_pserialize.c", line 174
db{1}> mach stack
Window 0 frame64 0x1b1b373b0 locals, ins:
103b5f598 1b1b37418 fffffffffffffff8 1805088 0 ffffffffffffffff a 2
1805050 1b1b375a8 1ce5400 1ce6a00 1ce6800 4 1b1b36c61=sp 1611094=pc:netbsd:kern_
assert+0x34
Window 1 frame64 0x1b1b37460 locals, ins:
103b09f70 103b09e70 0 1805088 0 0 0 1
1805050 1805088 1887de8 1887db0 ae 1c8b400 1b1b36d21=sp 1527704=pc:netbsd:pseria
lize_perform+0x184
Window 2 frame64 0x1b1b37520 locals, ins:
1805050 1805088 1887de8 1c8ab08 1ce0c00 4 10476d2d4 1
103b28510 1887db0 1887c00 1cdd828 14fcc80 1c9a140 1b1b36dd1=sp 15ac700=pc:netbsd
:if_detach+0xc0
Window 3 frame64 0x1b1b375d0 locals, ins:
441d000605 0 ffffffffffffffff 0 e0048000 1 1 0
1046ba008 0 ffffffffffffffff 0 1046ba390 1c9a800 1b1b370d1=sp 10ddf68=pc:netbsd:
bge_detach+0x68
Window 4 frame64 0x1b1b378d0 locals, ins:
1ce5400 1805050 182b450 182b478 e0048000 ffffffffffffffff a 0
0 4 ff070000000001 1046ba498 1046ba008 1046ba000 1b1b37181=sp 150e5a4=pc:netbsd:
config_detach+0xe4
All looks fine, guess I need to bisect what made icnt depths go off.
Martin
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Martin Husemann <martin@duskware.de>
Cc: Ryota Ozaki <ozaki-r@netbsd.org>,
gnats-bugs@NetBSD.org
Subject: Re: kern/51133: KASSERT on shutdown
Date: Fri, 14 Apr 2023 07:59:17 +0000
> panic: kernel diagnostic assertion "(boothowto & RB_HALT) =3D=3D 0" faile=
d: file "../../../../kern/subr_pserialize.c", line 174=20
I think this is a red herring. bge_detach -> if_detach always leads
to pserialize_perform, and from your log earlier, bge_detach completed
successfully before the original panic:
> bge3: detached
> bge2: detached
> Skipping crash dump on recursive panic
> panic: kernel diagnostic assertion "!cpu_intr_p()" failed: file "../../..=
/../kern/subr_xcall.c", line 351
The fact that it's on line 351, presumably from subr_xcall.c 1.18,
shows that somehow the softint handler is running in hard interrupt
context, according to cpu_intr_p (xc__highpri_intr is only ever used
as a softint function):
344 void
345 xc__highpri_intr(void *dummy)
346 {
347 xc_state_t *xc =3D &xc_high_pri;
348 void *arg1, *arg2;
349 xcfunc_t func;
350 =20
351 KASSERT(!cpu_intr_p());
So this could be a buggy softint_dispatch vector. That's consistent
with the line from ps saying it's in softser:
> 0 > 6 7 0 200 103b48840 softser/0
Here's a wild guess (line numbers from locore.s 1.443, in HEAD):
https://nxr.netbsd.org/xref/src/sys/arch/sparc64/sparc64/locore.s?r=3D1.433
4661 ! Increment the per-cpu interrupt depth in case of hardintrs
4662 btst SOFTINT_INT, %l3
4663 bnz,pn %icc, sparc_intr_retry
4664 sethi %hi(CPUINFO_VA+CI_IDEPTH), %l1
4665 ld [%l1 + %lo(CPUINFO_VA+CI_IDEPTH)], %l2
4666 inc %l2
4667 st %l2, [%l1 + %lo(CPUINFO_VA+CI_IDEPTH)]
4668=20
4669 sparc_intr_retry:
...
4763 /*
4764 * Re-read SOFTINT to see if any new pending interrupts
4765 * at this level.
4766 */
4767 mov 1, %l3 ! Ack softint
4768 rd SOFTINT, %l7 ! %l5 contains #intr handled.
4769 sll %l3, %l6, %l3 ! Generate IRQ mask
4770 btst %l3, %l7 ! leave mask in %l3 for retry code
4771 bnz,pn %icc, sparc_intr_retry
4772 mov 1, %l5 ! initialize intr count for next run
4773=20
4774 ! Decrement this cpu's interrupt depth in case of hardintrs
4775 btst SOFTINT_INT, %l3
4776 bnz,pn %icc, 1f
4777 sethi %hi(CPUINFO_VA+CI_IDEPTH), %l4
4778 ld [%l4 + %lo(CPUINFO_VA+CI_IDEPTH)], %l5
4779 dec %l5
4780 st %l5, [%l4 + %lo(CPUINFO_VA+CI_IDEPTH)]
When this re-reads SOFTINT, can it start invoking a new softint
handler, before decrementing ci_idepth?
I don't understand this stack trace:
> netbsd:vpanic+0x16c(18806d8, 1ce6ac0, 18893d0, 1b026fc58, 1ce6bc0, 1c6700=
0) fp =3D 1b026f261
> netbsd:kern_assert+0x34(18893d0, 1b026fc58, 1ce5800, 1ce6ac0, 1ce6800, 4)=
fp =3D 1b026f311
> netbsd:xc__highpri_intr+0xc4(18893d0, 18053c0, 1869640, 18893b0, 161, 17f=
8e08) fp =3D 1b026f3d1
> netbsd:softint_dispatch+0xf8(1c9a000, 1c8b3a0, 50, 1c8b3a0, 276, aa670ae3=
8cc28e39) fp =3D 1b026f4a1
> netbsd:softint_fastintr+0x80(0, 4, 103b48840, 0, 1b018e178, 1b018e388) fp=
=3D 1b026f571
> netbsd:softint_schedule+0x4(103b48840, 4, 1cdd800, 103b498c0, 0, 2014000)=
fp =3D 1b026f621
softint_schedule+0x4 might be the call to kpreempt_disabled? But I
don't see how it could lead to softint_fastintr -- surely there should
be an interrupt frame, or if nothing else, a frame with a return
address pointing into sparc_interrupt?
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc: Taylor R Campbell <riastradh@NetBSD.org>
Subject: Re: kern/51133: KASSERT on shutdown
Date: Fri, 14 Apr 2023 21:07:20 +0300
On Fri, Apr 14, 2023 at 08:00:03 +0000, Taylor R Campbell wrote:
> I don't understand this stack trace:
Could this be the fallout from:
https://mail-index.netbsd.org/tech-kern/2022/12/09/msg028572.html
for which I forgot to file a PR and never got about to fix
properly... :(
-uwe
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Valery Ushakov <uwe@stderr.spb.ru>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/51133: KASSERT on shutdown
Date: Fri, 14 Apr 2023 18:31:09 +0000
> Date: Fri, 14 Apr 2023 21:07:20 +0300
> From: Valery Ushakov <uwe@stderr.spb.ru>
>
> On Fri, Apr 14, 2023 at 08:00:03 +0000, Taylor R Campbell wrote:
>
> > I don't understand this stack trace:
>
> Could this be the fallout from:
>
> https://mail-index.netbsd.org/tech-kern/2022/12/09/msg028572.html
>
> for which I forgot to file a PR and never got about to fix
> properly... :(
Unlikely, because the stack trace I quoted is from 2016! (But don't
let that get in the way of fixing something else in ddb!)
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.