NetBSD Problem Report #57775
From spz@netbsd.org Fri Dec 15 08:52:43 2023
Return-Path: <spz@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id C00951A9238
for <gnats-bugs@gnats.NetBSD.org>; Fri, 15 Dec 2023 08:52:43 +0000 (UTC)
Message-Id: <20231215085242.1C93542D3A@shadow.netbsd.org>
Date: Fri, 15 Dec 2023 08:52:42 +0000 (UTC)
From: spz@NetBSD.org
Reply-To: spz@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: "panic: unmount: dangling vnode" while umounting procfs
X-Send-Pr-Version: 3.95
>Number: 57775
>Category: kern
>Synopsis: "panic: unmount: dangling vnode" while umounting procfs
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Dec 15 08:55:00 +0000 2023
>Last-Modified: Thu Apr 18 18:25:02 +0000 2024
>Originator: S.P.Zeidler
>Release: NetBSD 10.0_RC1
>Organization:
The NetBSD Foundation
>Environment:
System: NetBSD shadow.netbsd.org 10.0_RC1 NetBSD 10.0_RC1 (SHADOW) #6: Tue Dec 12 22:32:36 UTC 2023 spz@franklin.NetBSD.org:/home/netbsd/10/amd64/obj/sys/arch/amd64/compile/SHADOW amd64
Architecture: x86_64
Machine: amd64
This kernel has LOCKDEBUG
>Description:
[ 150137.1746769] panic: unmount: dangling vnode
[ 150137.1746769] cpu2: Begin traceback...
[ 150137.1846769] vpanic() at netbsd:vpanic+0x183
[ 150137.1846769] panic() at netbsd:panic+0x3c
[ 150137.1946765] dounmount() at netbsd:dounmount+0x23e
[ 150137.1946765] sys_unmount() at netbsd:sys_unmount+0xf8
[ 150137.2046767] syscall() at netbsd:syscall+0x211
[ 150137.2046767] --- syscall (number 22) ---
[ 150137.2146772] netbsd:syscall+0x211:
[ 150137.2146772] cpu2: End traceback...
[ 150137.2146772] fatal breakpoint trap in supervisor mode
[ 150137.2246772] trap type 1 code 0 rip 0xffffffff80235385 cs 0x8 rflags 0x202
cr2 0x78d906f76a95 ilevel 0 rsp 0xffff9604f90abdf0
[ 150137.2346777] curlwp 0xffff807c6750fb80 pid 15368.15368 lowest kstack 0xffff
9604f90a72c0
Stopped in pid 15368.15368 (umount) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x183
panic() at netbsd:panic+0x3c
dounmount() at netbsd:dounmount+0x23e
sys_unmount() at netbsd:sys_unmount+0xf8
syscall() at netbsd:syscall+0x211
--- syscall (number 22) ---
netbsd:syscall+0x211:
ds 8
es 2
fs 180
gs bda0
rdi 0
rsi 2da
rbp ffff9604f90abdf0
rbx 0
rdx 1
rcx ffffffffffffff
rax 800000000000000
r8 0
r9 0
r10 ffff9604f90ab470
r11 fffffffe
r12 ffffffff80c5072a ostype+0xa0f31
r13 ffff9604f90abe38
r14 104
r15 ffff807c7ba39000
rip ffffffff80235385 breakpoint+0x5
cs 8
rflags 202
rsp ffff9604f90abdf0
ss 10
netbsd:breakpoint+0x5: leave
(gdb)
0xffffffff808d55b2 <dounmount+215>:
call 0xffffffff808d545c <mountlist_remove>
(gdb)
0xffffffff808d55b7 <dounmount+220>: cmpq $0x0,0x88(%r15)
(gdb)
0xffffffff808d55bf <dounmount+228>:
jne 0xffffffff808d570b <dounmount+560>
(gdb) x/i 0xffffffff808d570b
0xffffffff808d570b <dounmount+560>: mov $0xffffffff80c5072a,%rdi
(gdb)
0xffffffff808d5712 <dounmount+567>: xor %eax,%eax
(gdb)
0xffffffff808d5714 <dounmount+569>: call 0xffffffff808887e8 <panic>
(gdb) x/s 0xffffffff80c5072a
0xffffffff80c5072a: "unmount: dangling vnode"
(gdb) print *(struct mount *) 0xffff807c7ba39000
$2 = {mnt_vnodelock = 0xffff8093018f7840,
mnt_op = 0xffffffff80e5c020 <procfs_vfsops>,
mnt_vnodecovered = 0xffff807d5bb724c0, mnt_lower = 0x0, mnt_transinfo = 0x0,
mnt_data = 0x0, mnt_renamelock = 0xffff807df9571040, mnt_flag = 4096,
mnt_iflag = 387, mnt_fs_bshift = 0, mnt_dev_bshift = 0, mnt_specdataref = {
specdataref_container = 0x0, specdataref_lock = {u = {mtxa_owner = 0, s = {
mtxs_dummy = 0 '\000', mtxs_ipl = {_ipl = 0 '\000'},
mtxs_lock = 0 '\000', mtxs_unused = 0 '\000'}}}},
mnt_updating = 0xffff808587f27740, mnt_wapbl_op = 0x0, mnt_wapbl = 0x0,
mnt_wapbl_replay = 0x0, mnt_gen = 778, mnt_refcnt = 2,
mnt_synclist_slot = 15, mnt_vnodelist = {tqh_first = 0x0,
tqh_last = 0xffff807c7ba39088}, mnt_stat = {f_flag = 0, f_bsize = 4096,
f_frsize = 4096, f_iosize = 4096, f_blocks = 1, f_bfree = 0, f_bavail = 0,
f_bresvd = 0, f_files = 2068, f_ffree = 1848, f_favail = 1848,
f_fresvd = 0, f_syncreads = 0, f_syncwrites = 0, f_asyncreads = 0,
f_asyncwrites = 0, f_fsidx = {__fsid_val = {3152645, 110107}},
f_fsid = 3152645, f_namemax = 255, f_owner = 0, f_spare = {0, 0, 0, 0},
f_fstypename = "procfs", '\000' <repeats 25 times>,
f_mntonname = "/bulk/work/x86_64-9.0-HEAD/s3/proc", '\000' <repeats 989 times>, f_mntfromname = "procfs", '\000' <repeats 1017 times>,
f_mntfromlabel = '\000' <repeats 1023 times>}}
we have a coredump. its netbsd.gdb and its fitting source.
>How-To-Repeat:
Have a pkg bulk build on shadow (in chroot) finishing and the sandbox
umounting. procfs gets a umount -f
This is the second "dangling vnode" panic we have seen there
this month, but we didn't get a dump the first time.
>Fix:
>Audit-Trail:
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57775 CVS commit: src/sys/kern
Date: Wed, 17 Jan 2024 10:17:29 +0000
Module Name: src
Committed By: hannken
Date: Wed Jan 17 10:17:29 UTC 2024
Modified Files:
src/sys/kern: vfs_mount.c
Log Message:
Print dangling vnode before panic() to help debug.
PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
To generate a diff of this commit:
cvs rdiff -u -r1.103 -r1.104 src/sys/kern/vfs_mount.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57775 CVS commit: src/sys/miscfs/procfs
Date: Wed, 17 Jan 2024 10:20:12 +0000
Module Name: src
Committed By: hannken
Date: Wed Jan 17 10:20:12 UTC 2024
Modified Files:
src/sys/miscfs/procfs: procfs.h procfs_subr.c procfs_vfsops.c
Log Message:
Using the exechook to revoke procfs nodes is racy and may deadlock:
one thread runs doexechooks() -> procfs_revoke_vnodes() and wants to suspend
the file system for vgone(), while another thread runs a forced unmount,
has the file system suspended, tries to disestablish the exechook and
waits for doexechooks() to complete.
Establish/disestablish the exechook on module load/unload instead
mount/unmount and use the hashmap to access all procfs nodes for this pid.
May fix PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
To generate a diff of this commit:
cvs rdiff -u -r1.83 -r1.84 src/sys/miscfs/procfs/procfs.h
cvs rdiff -u -r1.116 -r1.117 src/sys/miscfs/procfs/procfs_subr.c
cvs rdiff -u -r1.112 -r1.113 src/sys/miscfs/procfs/procfs_vfsops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57775 CVS commit: [netbsd-10] src/sys
Date: Thu, 18 Apr 2024 18:22:10 +0000
Module Name: src
Committed By: martin
Date: Thu Apr 18 18:22:10 UTC 2024
Modified Files:
src/sys/kern [netbsd-10]: init_main.c kern_hook.c vfs_mount.c
src/sys/miscfs/procfs [netbsd-10]: procfs.h procfs_subr.c
procfs_vfsops.c procfs_vnops.c
Log Message:
Pull up following revision(s) (requested by hannken in ticket #668):
sys/miscfs/procfs/procfs.h: revision 1.83
sys/miscfs/procfs/procfs.h: revision 1.84
sys/kern/vfs_mount.c: revision 1.104
sys/miscfs/procfs/procfs_vnops.c: revision 1.230
sys/kern/init_main.c: revision 1.547
sys/kern/kern_hook.c: revision 1.15
sys/miscfs/procfs/procfs_vfsops.c: revision 1.112
sys/miscfs/procfs/procfs_vfsops.c: revision 1.113
sys/miscfs/procfs/procfs_vfsops.c: revision 1.114
sys/miscfs/procfs/procfs_subr.c: revision 1.117
Print dangling vnode before panic() to help debug.
PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
Protect kernel hooks exechook, exithook and forkhook with rwlock.
Lock as writer on establish/disestablish and as reader on list traverse.
For exechook ride "exec_lock" as it is already take as reader when
traversing the list. Add local locks for exithook and forkhook.
Move exec_init before signal_init as signal_init calls exechook_establish()
that needs "exec_lock".
PR kern/39913 "exec, fork, exit hooks need locking"
Add a hashmap to access all procfs nodes by pid.
Using the exechook to revoke procfs nodes is racy and may deadlock:
one thread runs doexechooks() -> procfs_revoke_vnodes() and wants to suspend
the file system for vgone(), while another thread runs a forced unmount,
has the file system suspended, tries to disestablish the exechook and
waits for doexechooks() to complete.
Establish/disestablish the exechook on module load/unload instead
mount/unmount and use the hashmap to access all procfs nodes for this pid.
May fix PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
Remove all procfs nodes for this process on process exit.
To generate a diff of this commit:
cvs rdiff -u -r1.541 -r1.541.2.1 src/sys/kern/init_main.c
cvs rdiff -u -r1.14 -r1.14.2.1 src/sys/kern/kern_hook.c
cvs rdiff -u -r1.101 -r1.101.2.1 src/sys/kern/vfs_mount.c
cvs rdiff -u -r1.82 -r1.82.4.1 src/sys/miscfs/procfs/procfs.h
cvs rdiff -u -r1.116 -r1.116.20.1 src/sys/miscfs/procfs/procfs_subr.c
cvs rdiff -u -r1.111 -r1.111.4.1 src/sys/miscfs/procfs/procfs_vfsops.c
cvs rdiff -u -r1.229 -r1.229.4.1 src/sys/miscfs/procfs/procfs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
source of 20231212
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.