NetBSD Problem Report #55237
From ad@netbsd.org Tue May 5 21:33:20 2020
Return-Path: <ad@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 6DEE21A9213
for <gnats-bugs@gnats.NetBSD.org>; Tue, 5 May 2020 21:33:20 +0000 (UTC)
Message-Id: <20200505213320.2C98884C85@mail.netbsd.org>
Date: Tue, 5 May 2020 21:33:20 +0000 (UTC)
From: ad@netbsd.org
Reply-To: ad@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: Panic: vrelel: bad ref count (9.99.54)
X-Send-Pr-Version: 3.95
>Number: 55237
>Category: kern
>Synopsis: Panic: vrelel: bad ref count (9.99.54)
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue May 05 21:35:01 +0000 2020
>Last-Modified: Sat Nov 28 22:37:04 +0000 2020
>Originator: Andrew Doran
>Release: NetBSD 9.99.54
>Organization:
The NetBSD Project
>Environment:
amd64
>Description:
Date: Mon, 04 May 2020 15:54:57 +0200
From: Leonardo Taccari <leot@NetBSD.org>
To: Yorick Hardy <yorickhardy@gmail.com>, Andrew Doran <ad@netbsd.org>
cc: current-users@NetBSD.org
Subject: Re: Panic: vrelel: bad ref count (9.99.54)
Hello Yorick and Andrew,
Yorick Hardy writes:
> > > > [...]
> > > >
> > > > Crash version 9.99.55, image version 9.99.55.
> > > > crash: _kvm_kvatop(0)
> > > > Kernel compiled without options LOCKDEBUG.
> > > > System panicked: vrelel: bad ref count
> > > > Backtrace from time of crash is available.
> > > > crash> bt
> > > > _KERNEL_OPT_NAGR() at 0
> > > > ?() at 7f7ff7ecf000
> > > > sys_reboot() at sys_reboot
> > > > vpanic() at vpanic+0x181
> > > > vtryrele() at vtryrele
> > > > vcache_dealloc() at vcache_dealloc
> > > > uvm_unmap_detach() at uvm_unmap_detach+0x76
> > > > uvm_unmap1() at uvm_unmap1+0x4e
> > > > uvm_mremap() at uvm_mremap+0x36b
> > > > sys_mremap() at sys_mremap+0x68
> > > > syscall() at syscall+0x227
> > > > --- syscall (number 411) ---
> > > > 797459842e9a:
> > > > crash>
> > >
> > > The same just happened on 9.99.56 while fetching (POP) mail using mail/fdm.
> >
> > Could you file a PR please? If this panics again could you please run the
> > "dmesg" command in crash and find out what it printed about the vnode? That
> > would be very useful.
> >
> > Thanks,
> > Andrew
>
> I will do so (... perhaps only this weekend).
> [...]
I was able to reproduce it too with a yesterday evening NetBSD/amd64
-current when using mail/fdm and I will try to prepare a minimal
reproducer using mail/fdm and file a PR if noone beat me.
In the meantime here the information from dmesg:
[ 6107.6380323] vnode 0xffffa95219747d40 flags 0x418<MAPPED,MPSAFE,WRMAP>
[ 6107.6380323] tag VT_TMPFS(25) type VREG(1) mount 0xffffa951f6d89000 typedata 0xffffa95255e32c90
[ 6107.6380323] usecount 1 writecount 1 holdcount 0
[ 6107.6380323] size 18000 writesize 18000 numoutput 0
[ 6107.6380323] data 0xffffa952583304a0 lock 0xffffa95219747f00
[ 6107.6380323] state LOADED key(0xffffa951f6d89000 8) a0 04 33 58 52 a9 ff ff
[ 6107.6380323] lrulisthd 0xffffffff816b5ed0
[ 6107.6380323] tag VT_TMPFS, tmpfs_node 0xffffa952583304a0, flags 0x0, links 1
[ 6107.6380323] mode 0600, owner 1000, group 0, size 98304
[ 6107.6380323] panic: vrelel: bad ref count
[ 6107.6380323] cpu0: Begin traceback...
[ 6107.6380323] vpanic() at netbsd:vpanic+0x178
[ 6107.6480364] vnpanic() at netbsd:vnpanic+0x49
[ 6107.6480364] vrelel() at netbsd:vrelel+0x5b6
[ 6107.6480364] uvm_unmap_detach() at netbsd:uvm_unmap_detach+0x8e
[ 6107.6480364] sys_munmap() at netbsd:sys_munmap+0x85
[ 6107.6480364] syscall() at netbsd:syscall+0x2a0
[ 6107.6480364] --- syscall (number 73) ---
[ 6107.6480364] 7c1e5d18414a:
[ 6107.6480364] cpu0: End traceback...
[ 6107.6480364] fatal breakpoint trap in supervisor mode
[ 6107.6480364] trap type 1 code 0 rip 0xffffffff802219fd cs 0x8 rflags 0x202 cr2 0x7f7ff7ee5000 ilevel 0 rsp 0xffffc100c227ae20
[ 6107.6480364] curlwp 0xffffa9521e1b1600 pid 20756.20756 lowest kstack 0xffffc100c22772c0
[ 6107.6480364] dumping to dev 0,1 (offset=276847, size=2062664):
[ 6107.6480364] dump
If any possible further information is needed do not hesitate to
contact me!
Thanks!
>How-To-Repeat:
Suggestion above.
>Fix:
Yes please.
>Release-Note:
>Audit-Trail:
From: Yorick Hardy <yorickhardy@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55237
Date: Fri, 22 May 2020 21:52:51 +0200
Hi,
I tried to narrow down the cause, by showing the v_usecount (u) and v_writecount (w)
when the panic happens:
https://nxr.netbsd.org/xref/src/sys/kern/vfs_vnode.c#788
and it seems v_writecount is not zero when it was expected to be (it was 1). I am sorry,
I have not made any further progress in tracking down the problem.
Crash version 9.99.63, image version 9.99.63.
crash: _kvm_kvatop(0)
Kernel compiled without options LOCKDEBUG.
System panicked: vrelel: bad ref count [u:1 w:1]
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NAGR() at 0
?() at 7f7ff7ecf000
sys_reboot() at sys_reboot
vpanic() at vpanic+0x15b
vtryrele() at vtryrele
vcache_dealloc() at vcache_dealloc
uvm_unmap_detach() at uvm_unmap_detach+0x63
uvm_unmap1() at uvm_unmap1+0x4e
uvm_mremap() at uvm_mremap+0x35a
sys_mremap() at sys_mremap+0x68
syscall() at syscall+0x1fd
--- syscall (number 411) ---
syscall+0x1fd:
--
Kind regards,
Yorick Hardy
From: Yorick Hardy <yorickhardy@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55237
Date: Sat, 23 May 2020 23:47:11 +0200
A very wild guess, I was wondering if this change affects the writecount
(after a superficial reading of fd_putfile):
cvs rdiff -u -r1.543 -r1.544 src/sys/kern/vfs_syscalls.c
mail/fdm depends on databases/tdb, which uses fdatasync (checked only by a quick grep).
I am building a test kernel with the change reverted, and will try it out for
a few days to see if it is stable.
--
Kind regards,
Yorick Hardy
From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55237 CVS commit: src/sys/kern
Date: Sat, 23 May 2020 22:16:18 +0000
Module Name: src
Committed By: ad
Date: Sat May 23 22:16:17 UTC 2020
Modified Files:
src/sys/kern: tty_ptm.c
Log Message:
PR kern/55237: Panic: vrelel: bad ref count (9.99.54)
Adjust v_writecount with v_interlock held.
To generate a diff of this commit:
cvs rdiff -u -r1.41 -r1.42 src/sys/kern/tty_ptm.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/55237
Date: Sat, 23 May 2020 22:26:32 +0000
On Sat, May 23, 2020 at 09:50:02PM +0000, Yorick Hardy wrote:
> The following reply was made to PR kern/55237; it has been noted by GNATS.
>
> From: Yorick Hardy <yorickhardy@gmail.com>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: kern/55237
> Date: Sat, 23 May 2020 23:47:11 +0200
>
> A very wild guess, I was wondering if this change affects the writecount
> (after a superficial reading of fd_putfile):
>
> cvs rdiff -u -r1.543 -r1.544 src/sys/kern/vfs_syscalls.c
>
> mail/fdm depends on databases/tdb, which uses fdatasync (checked only by a quick grep).
>
> I am building a test kernel with the change reverted, and will try it out for
> a few days to see if it is stable.
At a guess is not responsible, but worthwhile trying with it reverted.
I made a commit but I don't think that one will cure it either.
I'm suspicious of the assertion actually, there might be a race condition.
Will think about it some more.
Andrew
From: Yorick Hardy <yorickhardy@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55237
Date: Sun, 24 May 2020 12:18:38 +0200
The commit referenced above is not the cause, apologies!
--
Kind regards,
Yorick Hardy
From: Yorick Hardy <yorickhardy@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55237
Date: Sun, 1 Nov 2020 22:49:18 +0200
I have put a vprint just before the panic. I am not sure what has
changed - the usecount and writecount are now both 1 - here are
two examples. In both cases the inodes are reported as unallocated
by fsdb (after reboot). I don't seem to find the time or expertise
to examine the code on my own, are there any other diagnostics I
could add to try and diagnose the problem?
[ 53296.926965] vnode bad write count: vnode 0xffff87564dd44b80 flags 0x4c38<MAPPED,MPSAFE,LOCKSWORK,WRMAP,PAGES,ONWORKLST>
[ 53296.926965] tag VT_UFS(1) type VREG(1) mount 0xffff87542b686000 typedata 0x0
[ 53296.926965] usecount 1 writecount 1 holdcount 1
[ 53296.926965] size 30000 writesize 30000 numoutput 0
[ 53296.926965] data 0xffff8756501facd0 lock 0xffff87564dd44d40
[ 53296.926965] state LOADED key(0xffff87542b686000 8) ee 1b 87 00 00 00 00 00
[ 53296.926965] lrulisthd 0xffffffff8105b6d0
[ 53296.926965] tag VT_UFS, ino 8854510, on dev 0, 0 flags 0x0, nlink 1
[ 53296.926965] mode 0100600, owner 1000, group 0, size 196608
[ 53296.926965] panic: vrelel: bad write count 1 (pending 0)
[ 53296.926965] cpu0: Begin traceback...
[ 53296.926965] vpanic() at netbsd:vpanic+0x156
[ 53296.936965] vtryrele() at netbsd:vtryrele
[ 53296.946962] vrelel() at netbsd:vrelel+0x318
[ 53296.946962] uvm_unmap_detach() at netbsd:uvm_unmap_detach+0x68
[ 53296.946962] uvm_unmap1() at netbsd:uvm_unmap1+0x4e
[ 53296.956962] uvm_mremap() at netbsd:uvm_mremap+0x39a
[ 53296.956962] sys_mremap() at netbsd:sys_mremap+0x67
[ 53296.956962] syscall() at netbsd:syscall+0x1cc
[ 53296.966961] --- syscall (number 411) ---
[ 53296.966961] netbsd:syscall+0x1cc:
[ 53296.966961] cpu0: End traceback...
[ 30742.529952] vnode bad write count: vnode 0xffff84c0e53554c0 flags 0x4c38<MAPPED,MPSAFE,LOCKSWORK,WRMAP,PAGES,ONWORKLST>
[ 30742.529952] tag VT_UFS(1) type VREG(1) mount 0xffff84c04b68e000 typedata 0x0
[ 30742.529952] usecount 1 writecount 1 holdcount 1
[ 30742.529952] size c000 writesize c000 numoutput 0
[ 30742.529952] data 0xffff84c0dc50e778 lock 0xffff84c0e5355680
[ 30742.529952] state LOADED key(0xffff84c04b68e000 8) a9 1b 87 00 00 00 00 00
[ 30742.529952] lrulisthd 0xffffffff8105b6d0
[ 30742.529952] tag VT_UFS, ino 8854441, on dev 0, 0 flags 0x18, nlink 1
[ 30742.529952] mode 0100600, owner 1000, group 0, size 49152
[ 30742.529952] panic: vrelel: bad write count 1 (pending 0)
[ 30742.529952] cpu1: Begin traceback...
[ 30742.529952] vpanic() at netbsd:vpanic+0x156
[ 30742.539957] vtryrele() at netbsd:vtryrele
[ 30742.539957] vrelel() at netbsd:vrelel+0x318
[ 30742.549951] uvm_unmap_detach() at netbsd:uvm_unmap_detach+0x68
[ 30742.549951] uvmspace_free() at netbsd:uvmspace_free+0xdc
[ 30742.549951] exit1() at netbsd:exit1+0x188
[ 30742.559948] sys_exit() at netbsd:sys_exit+0x39
[ 30742.559948] syscall() at netbsd:syscall+0x1cc
[ 30742.559948] --- syscall (number 1) ---
[ 30742.559948] netbsd:syscall+0x1cc:
[ 30742.559948] cpu1: End traceback...
--
Kind regards,
Yorick Hardy
From: Yorick Hardy <yorickhardy@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55237
Date: Sat, 14 Nov 2020 21:42:58 +0200
Another panic, but the KASSERT caught it a bit earlier,
still triggered by uvm_unmap_detach.
Crash version 9.99.42, image version 9.99.75.
WARNING: versions differ, you may not be able to examine this image.
System panicked: kernel diagnostic assertion "(use & VUSECOUNT_MASK) > 1" failed: file "/usr/src/local/sys/kern/vfs_vnode.c", line 729
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NAGR() at 0
_KERNEL_OPT_NAGR() at 0
sys_reboot() at sys_reboot
vpanic() at vpanic+0x160
__x86_indirect_thunk_rax() at __x86_indirect_thunk_rax
vtryrele() at vtryrele+0xb2
vrele() at vrele+0x12
uvm_unmap_detach() at uvm_unmap_detach+0x7b
uvmspace_free() at uvmspace_free+0xeb
exit1() at exit1+0x1b8
sys_exit() at sys_exit+0x39
syscall() at syscall+0x23e
--- syscall (number 1) ---
799c8398606a:
crash>
--
Kind regards,
Yorick Hardy
From: Yorick Hardy <yorickhardy@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55237
Date: Mon, 16 Nov 2020 14:35:53 +0200
I added the KASSERT below (is it correct?), and triggered a panic
early during boot (it is not easily reproducible so far):
[ 75.787066] panic: kernel diagnostic assertion "!rw_lock_held(uobj->vmobjlock)" failed: file "/usr/src/local/sys/uvm/uvm_vnode.c", line 137
[ 75.787066] cpu0: Begin traceback...
[ 75.787066] vpanic() at netbsd:vpanic+0x156
[ 75.787066] __x86_indirect_thunk_rax() at netbsd:__x86_indirect_thunk_rax
[ 75.787066] uvn_detach() at netbsd:uvn_detach+0x51
[ 75.787066] uvm_unmap_detach() at netbsd:uvm_unmap_detach+0x7b
[ 75.787066] uvmspace_free() at netbsd:uvmspace_free+0xeb
[ 75.797068] exit1() at netbsd:exit1+0x1b8
[ 75.797068] sys_exit() at netbsd:sys_exit+0x39
[ 75.797068] syscall() at netbsd:syscall+0x23e
[ 75.797068] --- syscall (number 1) ---
[ 75.797068] netbsd:syscall+0x23e:
[ 75.797068] cpu0: End traceback...
Index: sys/uvm/uvm_vnode.c
===================================================================
RCS file: /cvsroot/src/sys/uvm/uvm_vnode.c,v
retrieving revision 1.117
diff -u -u -r1.117 uvm_vnode.c
--- sys/uvm/uvm_vnode.c 16 Aug 2020 00:24:41 -0000 1.117
+++ sys/uvm/uvm_vnode.c 16 Nov 2020 12:30:39 -0000
@@ -134,6 +134,7 @@
static void
uvn_detach(struct uvm_object *uobj)
{
+ KASSERT(!rw_lock_held(uobj->vmobjlock));
vrele((struct vnode *)uobj);
}
--
Kind regards,
Yorick Hardy
From: "Yorick Hardy" <yhardy@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55237 CVS commit: src/sys/uvm
Date: Fri, 27 Nov 2020 22:32:43 +0000
Module Name: src
Committed By: yhardy
Date: Fri Nov 27 22:32:43 UTC 2020
Modified Files:
src/sys/uvm: uvm_mremap.c
Log Message:
uvm_mremap: reference the appropriate backing object.
The previous approach was appropriate for anonymous
memory and device objects, which continue to work in
the same way.
OK: chs@
Fixes: PR 55237
To generate a diff of this commit:
cvs rdiff -u -r1.20 -r1.21 src/sys/uvm/uvm_mremap.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.