NetBSD Problem Report #55611
From martin@duskware.de Fri Aug 28 08:50:16 2020
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 7B3FE1A923A
for <gnats-bugs@gnats.NetBSD.org>; Fri, 28 Aug 2020 08:50:16 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: panic on shutdown
X-Send-Pr-Version: 3.95
>Number: 55611
>Category: kern
>Synopsis: panic on shutdown
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Aug 28 08:55:00 +0000 2020
>Last-Modified: Sat Jul 17 16:35:01 +0000 2021
>Originator: Martin Husemann
>Release: NetBSD 9.99.71
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD gethsemane.duskware.de 9.99.71 NetBSD 9.99.71 (GETHSEMANE) #75: Fri Aug 28 09:32:50 CEST 2020 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/macppc/compile/GETHSEMANE macppc
Architecture: powerpc
Machine: macppc
(actually the running kernel was ~1 day older, I was rebooting after installing
the new kernel)
>Description:
Installed a new kernel, did "shutdown -r now" and it ended like this:
syslogd[993]: Exiting on signal 15
[ 954.5799395] panic: kernel diagnostic assertion "sn->sn_opencnt == 0" failed: file "../../../../miscfs/specfs/spec_vnops.c", line 414
[ 954.5999499] cpu1: Begin traceback...
[ 954.6099548] 0x1065db40: at vpanic+0x12c
[ 954.6199623] 0x1065db70: at kern_assert+0x60
[ 954.6299665] 0x1065dbb0: at spec_node_revoke+0x108
[ 954.6399731] 0x1065dbd0: at vcache_reclaim+0x664
[ 954.6399731] 0x1065dc60: at vgone+0x114
[ 954.6399731] 0x1065dc80: at vrevoke+0x114
[ 954.6399731] 0x1065dcb0: at genfs_revoke+0x1c
[ 954.6800071] 0x1065dcc0: at VOP_REVOKE+0x48
[ 954.6800071] 0x1065dce0: at exit1+0x784
[ 954.6800071] 0x1065dd90: at sigexit+0x20c
[ 954.7000105] 0x1065ddc0: at postsig+0x284
[ 954.7000105] 0x1065de70: at lwp_userret+0x204
[ 954.7000105] 0x1065deb0: at syscall+0x4cc
[ 954.7400380] r1=0xffffe440 cr=0x22082224 xer=0 ctr=0xfdc30d30
[ 954.7400380] cpu1: End traceback...
[ 954.7700474] halting CPU 0
Not reproducable.
>How-To-Repeat:
Only saw it once.
>Fix:
n/a
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55611: panic on shutdown
Date: Sun, 4 Jul 2021 16:44:30 +0000
On Fri, Aug 28, 2020 at 08:55:00AM +0000, martin@NetBSD.org wrote:
> Installed a new kernel, did "shutdown -r now" and it ended like this:
>
> syslogd[993]: Exiting on signal 15
> [ 954.5799395] panic: kernel diagnostic assertion "sn->sn_opencnt == 0" failed: file "../../../../miscfs/specfs/spec_vnops.c", line 414
> [ 954.5999499] cpu1: Begin traceback...
> [ 954.6099548] 0x1065db40: at vpanic+0x12c
> [ 954.6199623] 0x1065db70: at kern_assert+0x60
> [ 954.6299665] 0x1065dbb0: at spec_node_revoke+0x108
> [ 954.6399731] 0x1065dbd0: at vcache_reclaim+0x664
> [ 954.6399731] 0x1065dc60: at vgone+0x114
> [ 954.6399731] 0x1065dc80: at vrevoke+0x114
> [ 954.6399731] 0x1065dcb0: at genfs_revoke+0x1c
> [ 954.6800071] 0x1065dcc0: at VOP_REVOKE+0x48
> [ 954.6800071] 0x1065dce0: at exit1+0x784
> [ 954.6800071] 0x1065dd90: at sigexit+0x20c
> [ 954.7000105] 0x1065ddc0: at postsig+0x284
> [ 954.7000105] 0x1065de70: at lwp_userret+0x204
> [ 954.7000105] 0x1065deb0: at syscall+0x4cc
> [ 954.7400380] r1=0xffffe440 cr=0x22082224 xer=0 ctr=0xfdc30d30
> [ 954.7400380] cpu1: End traceback...
> [ 954.7700474] halting CPU 0
revoke is evil :-(
> Not reproducable.
>
> >How-To-Repeat:
> Only saw it once.
At least twice now; see 56290.
(noting this is the real purpose of this email)
If you see it again, might be useful to try to check what device it's
the vnode for. (Also maybe if the open count is something reasonable
like 1 or 2, or garbage or deadbeef.)
--
David A. Holland
dholland@netbsd.org
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/55611: sn_opencount wrong
Date: Sat, 17 Jul 2021 18:27:04 +0200
Missed to show the mount, it is the root:
db{0}> show mount 0x5fb0f000
vnodecovered = 0x0 data = 0x1027f500
fs_bshift 13 dev_bshift = 9
flag = 0x2005000<MNT_LOG,MNT_ROOTFS,MNT_LOCAL>
iflag = 0x7e0<IMNT_ONWORKLIST,IMNT_CAN_RWTORO,IMNT_MPSAFE,IMNT_SHRLOOKUP,IMNT_DT
YPE,IMNT_NCLOOKUP>
refcnt = 2062 updating @ 0x5fbcc9c0
statvfs cache:
bsize = 8192
frsize = 1024
iosize = 8192
blocks = 18856513
bfree = 16691657
bavail = 15748832
bresvd = 942825
files = 4735102
ffree = 4715901
favail = 4715901
fresvd = 0
f_fsidx = { 0xa04, 0x78b }
owner = 0
namemax = 255
flag = 0
syncwrites = 2
asyncwrites = 246
syncreads = 13528
asyncreads = 0
fstypename = ffs
mntonname = /
mntfromname = /dev/wd0e
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/55611: sn_opencount wrong
Date: Sat, 17 Jul 2021 18:34:49 +0200
I modified the KASSERT to show the vnode and opencount and got this:
[ 274.3503751] panic: kernel diagnostic assertion "sn->sn_opencnt == 0" failed: file "../../../../miscfs/specfs/spec_vnops.c", line 415 vnode 0x10af4c00 sn_opencnt 4294967295
[ 274.3503751] cpu0: Begin traceback...
[ 274.3503751] 0x1072eb40: at vpanic+0x12c
[ 274.3503751] 0x1072eb70: at kern_assert+0x60
[ 274.3503751] 0x1072ebb0: at spec_node_revoke+0x10c
[ 274.3503751] 0x1072ebd0: at vcache_reclaim+0x6a4
[ 274.3503751] 0x1072ec60: at vgone+0x114
[ 274.3503751] 0x1072ec80: at vrevoke+0x114
[ 274.3503751] 0x1072ecb0: at genfs_revoke+0x1c
[ 274.3503751] 0x1072ecc0: at VOP_REVOKE+0x48
[ 274.3503751] 0x1072ece0: at exit1+0x794
[ 274.3503751] 0x1072ed90: at sigexit+0x1f4
[ 274.3503751] 0x1072edc0: at postsig+0x288
[ 274.3503751] 0x1072ee70: at lwp_userret+0x204
[ 274.3503751] 0x1072eeb0: at syscall+0x528
[ 274.3503751] 0x1072ef20: user SC trap #449 by 0xfdc208d4: srr1=0x200d032
[ 274.3503751] r1=0xffffe450 cr=0x42082222 xer=0x20000000 ctr=0xfdc208d0
[ 274.3503751] cpu0: End traceback...
Stopped in pid 895.895 (login) at netbsd:vpanic+0x130: or r3, r26,
r26
db{0}> show vnode 0x10af4c00
OBJECT 0x10af4c00: locked=0, pgops=0xb7d2fc, npages=0, refs=1
vnode 0x10af4c00 flags 0x8014<ISTTY,MPSAFE>
tag VT_NON(0) type VCHR(4) mount 0x5fb0f000 typedata 0x1037b240
usecount 1 writecount 0 holdcount 0
size 0 writesize 0 numoutput 0
data 0x0 lock 0x10af4d40
state RECLAIMING key(0x5fb0f000 0)
lrulisthd 0xc0fa08
I can still reproduce it "often" on this particular machine.
Martin
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.