NetBSD Problem Report #55611

From martin@duskware.de  Fri Aug 28 08:50:16 2020
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 7B3FE1A923A
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 28 Aug 2020 08:50:16 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: panic on shutdown
X-Send-Pr-Version: 3.95

>Number:         55611
>Category:       kern
>Synopsis:       panic on shutdown
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 28 08:55:00 +0000 2020
>Last-Modified:  Sat Jul 17 16:35:01 +0000 2021
>Originator:     Martin Husemann
>Release:        NetBSD 9.99.71
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD gethsemane.duskware.de 9.99.71 NetBSD 9.99.71 (GETHSEMANE) #75: Fri Aug 28 09:32:50 CEST 2020 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/macppc/compile/GETHSEMANE macppc
Architecture: powerpc
Machine: macppc

(actually the running kernel was ~1 day older, I was rebooting after installing
the new kernel)

>Description:

Installed a new kernel, did "shutdown -r now" and it ended like this:

syslogd[993]: Exiting on signal 15
[ 954.5799395] panic: kernel diagnostic assertion "sn->sn_opencnt == 0" failed: file "../../../../miscfs/specfs/spec_vnops.c", line 414 
[ 954.5999499] cpu1: Begin traceback...
[ 954.6099548] 0x1065db40: at vpanic+0x12c
[ 954.6199623] 0x1065db70: at kern_assert+0x60
[ 954.6299665] 0x1065dbb0: at spec_node_revoke+0x108
[ 954.6399731] 0x1065dbd0: at vcache_reclaim+0x664
[ 954.6399731] 0x1065dc60: at vgone+0x114
[ 954.6399731] 0x1065dc80: at vrevoke+0x114
[ 954.6399731] 0x1065dcb0: at genfs_revoke+0x1c
[ 954.6800071] 0x1065dcc0: at VOP_REVOKE+0x48
[ 954.6800071] 0x1065dce0: at exit1+0x784
[ 954.6800071] 0x1065dd90: at sigexit+0x20c
[ 954.7000105] 0x1065ddc0: at postsig+0x284
[ 954.7000105] 0x1065de70: at lwp_userret+0x204
[ 954.7000105] 0x1065deb0: at syscall+0x4cc
[ 954.7400380]             r1=0xffffe440 cr=0x22082224 xer=0 ctr=0xfdc30d30
[ 954.7400380] cpu1: End traceback...
[ 954.7700474] halting CPU 0


Not reproducable.

>How-To-Repeat:
Only saw it once.

>Fix:
n/a

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55611: panic on shutdown
Date: Sun, 4 Jul 2021 16:44:30 +0000

 On Fri, Aug 28, 2020 at 08:55:00AM +0000, martin@NetBSD.org wrote:
  > Installed a new kernel, did "shutdown -r now" and it ended like this:
  > 
  > syslogd[993]: Exiting on signal 15
  > [ 954.5799395] panic: kernel diagnostic assertion "sn->sn_opencnt == 0" failed: file "../../../../miscfs/specfs/spec_vnops.c", line 414 
  > [ 954.5999499] cpu1: Begin traceback...
  > [ 954.6099548] 0x1065db40: at vpanic+0x12c
  > [ 954.6199623] 0x1065db70: at kern_assert+0x60
  > [ 954.6299665] 0x1065dbb0: at spec_node_revoke+0x108
  > [ 954.6399731] 0x1065dbd0: at vcache_reclaim+0x664
  > [ 954.6399731] 0x1065dc60: at vgone+0x114
  > [ 954.6399731] 0x1065dc80: at vrevoke+0x114
  > [ 954.6399731] 0x1065dcb0: at genfs_revoke+0x1c
  > [ 954.6800071] 0x1065dcc0: at VOP_REVOKE+0x48
  > [ 954.6800071] 0x1065dce0: at exit1+0x784
  > [ 954.6800071] 0x1065dd90: at sigexit+0x20c
  > [ 954.7000105] 0x1065ddc0: at postsig+0x284
  > [ 954.7000105] 0x1065de70: at lwp_userret+0x204
  > [ 954.7000105] 0x1065deb0: at syscall+0x4cc
  > [ 954.7400380]             r1=0xffffe440 cr=0x22082224 xer=0 ctr=0xfdc30d30
  > [ 954.7400380] cpu1: End traceback...
  > [ 954.7700474] halting CPU 0

 revoke is evil :-(

  > Not reproducable.
  > 
  > >How-To-Repeat:
  > Only saw it once.

 At least twice now; see 56290.
 (noting this is the real purpose of this email)

 If you see it again, might be useful to try to check what device it's
 the vnode for. (Also maybe if the open count is something reasonable
 like 1 or 2, or garbage or deadbeef.)

 -- 
 David A. Holland
 dholland@netbsd.org

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/55611: sn_opencount wrong
Date: Sat, 17 Jul 2021 18:27:04 +0200

 Missed to show the mount, it is the root:

 db{0}> show mount 0x5fb0f000
 vnodecovered = 0x0 data = 0x1027f500
 fs_bshift 13 dev_bshift = 9
 flag = 0x2005000<MNT_LOG,MNT_ROOTFS,MNT_LOCAL>
 iflag = 0x7e0<IMNT_ONWORKLIST,IMNT_CAN_RWTORO,IMNT_MPSAFE,IMNT_SHRLOOKUP,IMNT_DT
 YPE,IMNT_NCLOOKUP>
 refcnt = 2062 updating @ 0x5fbcc9c0
 statvfs cache:
         bsize = 8192
         frsize = 1024
         iosize = 8192
         blocks = 18856513
         bfree = 16691657
         bavail = 15748832
         bresvd = 942825
         files = 4735102
         ffree = 4715901
         favail = 4715901
         fresvd = 0
         f_fsidx = { 0xa04, 0x78b }
         owner = 0
         namemax = 255
         flag = 0
         syncwrites = 2
         asyncwrites = 246
         syncreads = 13528
         asyncreads = 0
         fstypename = ffs
         mntonname = /
         mntfromname = /dev/wd0e

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/55611: sn_opencount wrong
Date: Sat, 17 Jul 2021 18:34:49 +0200

 I modified the KASSERT to show the vnode and opencount and got this:

 [ 274.3503751] panic: kernel diagnostic assertion "sn->sn_opencnt == 0" failed: file "../../../../miscfs/specfs/spec_vnops.c", line 415 vnode 0x10af4c00 sn_opencnt 4294967295

 [ 274.3503751] cpu0: Begin traceback...
 [ 274.3503751] 0x1072eb40: at vpanic+0x12c
 [ 274.3503751] 0x1072eb70: at kern_assert+0x60
 [ 274.3503751] 0x1072ebb0: at spec_node_revoke+0x10c
 [ 274.3503751] 0x1072ebd0: at vcache_reclaim+0x6a4
 [ 274.3503751] 0x1072ec60: at vgone+0x114
 [ 274.3503751] 0x1072ec80: at vrevoke+0x114
 [ 274.3503751] 0x1072ecb0: at genfs_revoke+0x1c
 [ 274.3503751] 0x1072ecc0: at VOP_REVOKE+0x48
 [ 274.3503751] 0x1072ece0: at exit1+0x794
 [ 274.3503751] 0x1072ed90: at sigexit+0x1f4
 [ 274.3503751] 0x1072edc0: at postsig+0x288
 [ 274.3503751] 0x1072ee70: at lwp_userret+0x204
 [ 274.3503751] 0x1072eeb0: at syscall+0x528
 [ 274.3503751] 0x1072ef20: user SC trap #449 by 0xfdc208d4: srr1=0x200d032
 [ 274.3503751]             r1=0xffffe450 cr=0x42082222 xer=0x20000000 ctr=0xfdc208d0
 [ 274.3503751] cpu0: End traceback...
 Stopped in pid 895.895 (login) at       netbsd:vpanic+0x130:    or      r3, r26,
  r26
 db{0}> show vnode 0x10af4c00
 OBJECT 0x10af4c00: locked=0, pgops=0xb7d2fc, npages=0, refs=1

 vnode 0x10af4c00 flags 0x8014<ISTTY,MPSAFE>
 tag VT_NON(0) type VCHR(4) mount 0x5fb0f000 typedata 0x1037b240
 usecount 1 writecount 0 holdcount 0
 size 0 writesize 0 numoutput 0
 data 0x0 lock 0x10af4d40
 state RECLAIMING key(0x5fb0f000 0)
 lrulisthd 0xc0fa08


 I can still reproduce it "often" on this particular machine.

 Martin

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.