NetBSD Problem Report #57775

From spz@netbsd.org  Fri Dec 15 08:52:43 2023
Return-Path: <spz@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id C00951A9238
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 15 Dec 2023 08:52:43 +0000 (UTC)
Message-Id: <20231215085242.1C93542D3A@shadow.netbsd.org>
Date: Fri, 15 Dec 2023 08:52:42 +0000 (UTC)
From: spz@NetBSD.org
Reply-To: spz@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: "panic: unmount: dangling vnode" while umounting procfs
X-Send-Pr-Version: 3.95

>Number:         57775
>Category:       kern
>Synopsis:       "panic: unmount: dangling vnode" while umounting procfs
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Dec 15 08:55:00 +0000 2023
>Last-Modified:  Thu Apr 18 18:25:02 +0000 2024
>Originator:     S.P.Zeidler
>Release:        NetBSD 10.0_RC1
>Organization:
The NetBSD Foundation
>Environment:
System: NetBSD shadow.netbsd.org 10.0_RC1 NetBSD 10.0_RC1 (SHADOW) #6: Tue Dec 12 22:32:36 UTC 2023 spz@franklin.NetBSD.org:/home/netbsd/10/amd64/obj/sys/arch/amd64/compile/SHADOW amd64
Architecture: x86_64
Machine: amd64

This kernel has LOCKDEBUG
>Description:

[ 150137.1746769] panic: unmount: dangling vnode
[ 150137.1746769] cpu2: Begin traceback...
[ 150137.1846769] vpanic() at netbsd:vpanic+0x183
[ 150137.1846769] panic() at netbsd:panic+0x3c
[ 150137.1946765] dounmount() at netbsd:dounmount+0x23e
[ 150137.1946765] sys_unmount() at netbsd:sys_unmount+0xf8
[ 150137.2046767] syscall() at netbsd:syscall+0x211
[ 150137.2046767] --- syscall (number 22) ---
[ 150137.2146772] netbsd:syscall+0x211:
[ 150137.2146772] cpu2: End traceback...
[ 150137.2146772] fatal breakpoint trap in supervisor mode
[ 150137.2246772] trap type 1 code 0 rip 0xffffffff80235385 cs 0x8 rflags 0x202 
cr2 0x78d906f76a95 ilevel 0 rsp 0xffff9604f90abdf0
[ 150137.2346777] curlwp 0xffff807c6750fb80 pid 15368.15368 lowest kstack 0xffff
9604f90a72c0
Stopped in pid 15368.15368 (umount) at  netbsd:breakpoint+0x5:  leave
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x183
panic() at netbsd:panic+0x3c
dounmount() at netbsd:dounmount+0x23e
sys_unmount() at netbsd:sys_unmount+0xf8
syscall() at netbsd:syscall+0x211
--- syscall (number 22) ---
netbsd:syscall+0x211:
ds          8
es          2
fs          180
gs          bda0
rdi         0
rsi         2da
rbp         ffff9604f90abdf0
rbx         0
rdx         1
rcx         ffffffffffffff
rax         800000000000000
r8          0
r9          0
r10         ffff9604f90ab470
r11         fffffffe
r12         ffffffff80c5072a    ostype+0xa0f31
r13         ffff9604f90abe38
r14         104
r15         ffff807c7ba39000
rip         ffffffff80235385    breakpoint+0x5
cs          8
rflags      202
rsp         ffff9604f90abdf0
ss          10
netbsd:breakpoint+0x5:  leave

(gdb) 
   0xffffffff808d55b2 <dounmount+215>:
    call   0xffffffff808d545c <mountlist_remove>
(gdb) 
   0xffffffff808d55b7 <dounmount+220>:  cmpq   $0x0,0x88(%r15)
(gdb) 
   0xffffffff808d55bf <dounmount+228>:
    jne    0xffffffff808d570b <dounmount+560>
(gdb) x/i 0xffffffff808d570b
   0xffffffff808d570b <dounmount+560>:  mov    $0xffffffff80c5072a,%rdi
(gdb) 
   0xffffffff808d5712 <dounmount+567>:  xor    %eax,%eax
(gdb) 
   0xffffffff808d5714 <dounmount+569>:  call   0xffffffff808887e8 <panic>
(gdb) x/s 0xffffffff80c5072a
0xffffffff80c5072a:     "unmount: dangling vnode"
(gdb) print *(struct mount *) 0xffff807c7ba39000
$2 = {mnt_vnodelock = 0xffff8093018f7840, 
  mnt_op = 0xffffffff80e5c020 <procfs_vfsops>, 
  mnt_vnodecovered = 0xffff807d5bb724c0, mnt_lower = 0x0, mnt_transinfo = 0x0, 
  mnt_data = 0x0, mnt_renamelock = 0xffff807df9571040, mnt_flag = 4096, 
  mnt_iflag = 387, mnt_fs_bshift = 0, mnt_dev_bshift = 0, mnt_specdataref = {
    specdataref_container = 0x0, specdataref_lock = {u = {mtxa_owner = 0, s = {
          mtxs_dummy = 0 '\000', mtxs_ipl = {_ipl = 0 '\000'}, 
          mtxs_lock = 0 '\000', mtxs_unused = 0 '\000'}}}}, 
  mnt_updating = 0xffff808587f27740, mnt_wapbl_op = 0x0, mnt_wapbl = 0x0, 
  mnt_wapbl_replay = 0x0, mnt_gen = 778, mnt_refcnt = 2, 
  mnt_synclist_slot = 15, mnt_vnodelist = {tqh_first = 0x0, 
    tqh_last = 0xffff807c7ba39088}, mnt_stat = {f_flag = 0, f_bsize = 4096, 
    f_frsize = 4096, f_iosize = 4096, f_blocks = 1, f_bfree = 0, f_bavail = 0, 
    f_bresvd = 0, f_files = 2068, f_ffree = 1848, f_favail = 1848, 
    f_fresvd = 0, f_syncreads = 0, f_syncwrites = 0, f_asyncreads = 0, 
    f_asyncwrites = 0, f_fsidx = {__fsid_val = {3152645, 110107}}, 
    f_fsid = 3152645, f_namemax = 255, f_owner = 0, f_spare = {0, 0, 0, 0}, 
    f_fstypename = "procfs", '\000' <repeats 25 times>, 
    f_mntonname = "/bulk/work/x86_64-9.0-HEAD/s3/proc", '\000' <repeats 989 times>, f_mntfromname = "procfs", '\000' <repeats 1017 times>, 
    f_mntfromlabel = '\000' <repeats 1023 times>}}

we have a coredump. its netbsd.gdb and its fitting source.

>How-To-Repeat:
	Have a pkg bulk build on shadow (in chroot) finishing and the sandbox
	umounting. procfs gets a umount -f
	This is the second "dangling vnode" panic we have seen there
	this month, but we didn't get a dump the first time.
>Fix:


>Audit-Trail:
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57775 CVS commit: src/sys/kern
Date: Wed, 17 Jan 2024 10:17:29 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Wed Jan 17 10:17:29 UTC 2024

 Modified Files:
 	src/sys/kern: vfs_mount.c

 Log Message:
 Print dangling vnode before panic() to help debug.

 PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"


 To generate a diff of this commit:
 cvs rdiff -u -r1.103 -r1.104 src/sys/kern/vfs_mount.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57775 CVS commit: src/sys/miscfs/procfs
Date: Wed, 17 Jan 2024 10:20:12 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Wed Jan 17 10:20:12 UTC 2024

 Modified Files:
 	src/sys/miscfs/procfs: procfs.h procfs_subr.c procfs_vfsops.c

 Log Message:
 Using the exechook to revoke procfs nodes is racy and may deadlock:

 one thread runs doexechooks() -> procfs_revoke_vnodes() and wants to suspend
 the file system for vgone(), while another thread runs a forced unmount,
 has the file system suspended, tries to disestablish the exechook and
 waits for doexechooks() to complete.

 Establish/disestablish the exechook on module load/unload instead
 mount/unmount and use the hashmap to access all procfs nodes for this pid.

 May fix PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"


 To generate a diff of this commit:
 cvs rdiff -u -r1.83 -r1.84 src/sys/miscfs/procfs/procfs.h
 cvs rdiff -u -r1.116 -r1.117 src/sys/miscfs/procfs/procfs_subr.c
 cvs rdiff -u -r1.112 -r1.113 src/sys/miscfs/procfs/procfs_vfsops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57775 CVS commit: [netbsd-10] src/sys
Date: Thu, 18 Apr 2024 18:22:10 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Thu Apr 18 18:22:10 UTC 2024

 Modified Files:
 	src/sys/kern [netbsd-10]: init_main.c kern_hook.c vfs_mount.c
 	src/sys/miscfs/procfs [netbsd-10]: procfs.h procfs_subr.c
 	    procfs_vfsops.c procfs_vnops.c

 Log Message:
 Pull up following revision(s) (requested by hannken in ticket #668):

 	sys/miscfs/procfs/procfs.h: revision 1.83
 	sys/miscfs/procfs/procfs.h: revision 1.84
 	sys/kern/vfs_mount.c: revision 1.104
 	sys/miscfs/procfs/procfs_vnops.c: revision 1.230
 	sys/kern/init_main.c: revision 1.547
 	sys/kern/kern_hook.c: revision 1.15
 	sys/miscfs/procfs/procfs_vfsops.c: revision 1.112
 	sys/miscfs/procfs/procfs_vfsops.c: revision 1.113
 	sys/miscfs/procfs/procfs_vfsops.c: revision 1.114
 	sys/miscfs/procfs/procfs_subr.c: revision 1.117

 Print dangling vnode before panic() to help debug.

 PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
 Protect kernel hooks exechook, exithook and forkhook with rwlock.

 Lock as writer on establish/disestablish and as reader on list traverse.

 For exechook ride "exec_lock" as it is already take as reader when
 traversing the list.  Add local locks for exithook and forkhook.

 Move exec_init before signal_init as signal_init calls exechook_establish()
 that needs "exec_lock".

 PR kern/39913 "exec, fork, exit hooks need locking"

 Add a hashmap to access all procfs nodes by pid.

 Using the exechook to revoke procfs nodes is racy and may deadlock:
 one thread runs doexechooks() -> procfs_revoke_vnodes() and wants to suspend
 the file system for vgone(), while another thread runs a forced unmount,
 has the file system suspended, tries to disestablish the exechook and
 waits for doexechooks() to complete.

 Establish/disestablish the exechook on module load/unload instead
 mount/unmount and use the hashmap to access all procfs nodes for this pid.

 May fix PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"

 Remove all procfs nodes for this process on process exit.


 To generate a diff of this commit:
 cvs rdiff -u -r1.541 -r1.541.2.1 src/sys/kern/init_main.c
 cvs rdiff -u -r1.14 -r1.14.2.1 src/sys/kern/kern_hook.c
 cvs rdiff -u -r1.101 -r1.101.2.1 src/sys/kern/vfs_mount.c
 cvs rdiff -u -r1.82 -r1.82.4.1 src/sys/miscfs/procfs/procfs.h
 cvs rdiff -u -r1.116 -r1.116.20.1 src/sys/miscfs/procfs/procfs_subr.c
 cvs rdiff -u -r1.111 -r1.111.4.1 src/sys/miscfs/procfs/procfs_vfsops.c
 cvs rdiff -u -r1.229 -r1.229.4.1 src/sys/miscfs/procfs/procfs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:
 	source of 20231212

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.