NetBSD Problem Report #55237

From ad@netbsd.org  Tue May  5 21:33:20 2020
Return-Path: <ad@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 6DEE21A9213
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  5 May 2020 21:33:20 +0000 (UTC)
Message-Id: <20200505213320.2C98884C85@mail.netbsd.org>
Date: Tue,  5 May 2020 21:33:20 +0000 (UTC)
From: ad@netbsd.org
Reply-To: ad@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: Panic: vrelel: bad ref count (9.99.54)
X-Send-Pr-Version: 3.95

>Number:         55237
>Category:       kern
>Synopsis:       Panic: vrelel: bad ref count (9.99.54)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue May 05 21:35:01 +0000 2020
>Last-Modified:  Sat Nov 28 22:37:04 +0000 2020
>Originator:     Andrew Doran
>Release:        NetBSD 9.99.54
>Organization:
The NetBSD Project
>Environment:
amd64
>Description:

Date: Mon, 04 May 2020 15:54:57 +0200                                                                                                                                                                                                                   
From: Leonardo Taccari <leot@NetBSD.org>
To: Yorick Hardy <yorickhardy@gmail.com>, Andrew Doran <ad@netbsd.org>
cc: current-users@NetBSD.org
Subject: Re: Panic: vrelel: bad ref count (9.99.54)

Hello Yorick and Andrew,

Yorick Hardy writes:
> > > > [...]
> > > >
> > > >   Crash version 9.99.55, image version 9.99.55.
> > > >   crash: _kvm_kvatop(0)
> > > >   Kernel compiled without options LOCKDEBUG.
> > > >   System panicked: vrelel: bad ref count
> > > >   Backtrace from time of crash is available.
> > > >   crash> bt
> > > >   _KERNEL_OPT_NAGR() at 0
> > > >   ?() at 7f7ff7ecf000
> > > >   sys_reboot() at sys_reboot
> > > >   vpanic() at vpanic+0x181
> > > >   vtryrele() at vtryrele
> > > >   vcache_dealloc() at vcache_dealloc
> > > >   uvm_unmap_detach() at uvm_unmap_detach+0x76
> > > >   uvm_unmap1() at uvm_unmap1+0x4e
> > > >   uvm_mremap() at uvm_mremap+0x36b
> > > >   sys_mremap() at sys_mremap+0x68
> > > >   syscall() at syscall+0x227
> > > >   --- syscall (number 411) ---
> > > >   797459842e9a:
> > > >   crash>
> > >
> > > The same just happened on 9.99.56 while fetching (POP) mail using mail/fdm.
> >
> > Could you file a PR please?  If this panics again could you please run the
> > "dmesg" command in crash and find out what it printed about the vnode?  That
> > would be very useful.
> >
> > Thanks,
> > Andrew
>
> I will do so (... perhaps only this weekend).
> [...]

I was able to reproduce it too with a yesterday evening NetBSD/amd64
-current when using mail/fdm and I will try to prepare a minimal
reproducer using mail/fdm and file a PR if noone beat me.

In the meantime here the information from dmesg:

[ 6107.6380323] vnode 0xffffa95219747d40 flags 0x418<MAPPED,MPSAFE,WRMAP>
[ 6107.6380323]    tag VT_TMPFS(25) type VREG(1) mount 0xffffa951f6d89000 typedata 0xffffa95255e32c90
[ 6107.6380323]    usecount 1 writecount 1 holdcount 0
[ 6107.6380323]    size 18000 writesize 18000 numoutput 0
[ 6107.6380323]    data 0xffffa952583304a0 lock 0xffffa95219747f00
[ 6107.6380323]    state LOADED key(0xffffa951f6d89000 8) a0 04 33 58 52 a9 ff ff
[ 6107.6380323]    lrulisthd 0xffffffff816b5ed0
[ 6107.6380323]    tag VT_TMPFS, tmpfs_node 0xffffa952583304a0, flags 0x0, links 1
[ 6107.6380323]    mode 0600, owner 1000, group 0, size 98304
[ 6107.6380323] panic: vrelel: bad ref count
[ 6107.6380323] cpu0: Begin traceback...
[ 6107.6380323] vpanic() at netbsd:vpanic+0x178
[ 6107.6480364] vnpanic() at netbsd:vnpanic+0x49
[ 6107.6480364] vrelel() at netbsd:vrelel+0x5b6
[ 6107.6480364] uvm_unmap_detach() at netbsd:uvm_unmap_detach+0x8e
[ 6107.6480364] sys_munmap() at netbsd:sys_munmap+0x85
[ 6107.6480364] syscall() at netbsd:syscall+0x2a0
[ 6107.6480364] --- syscall (number 73) ---
[ 6107.6480364] 7c1e5d18414a:
[ 6107.6480364] cpu0: End traceback...
[ 6107.6480364] fatal breakpoint trap in supervisor mode
[ 6107.6480364] trap type 1 code 0 rip 0xffffffff802219fd cs 0x8 rflags 0x202 cr2 0x7f7ff7ee5000 ilevel 0 rsp 0xffffc100c227ae20
[ 6107.6480364] curlwp 0xffffa9521e1b1600 pid 20756.20756 lowest kstack 0xffffc100c22772c0
[ 6107.6480364] dumping to dev 0,1 (offset=276847, size=2062664):
[ 6107.6480364] dump

If any possible further information is needed do not hesitate to
contact me!


Thanks!

>How-To-Repeat:
	Suggestion above.
>Fix:
	Yes please.

>Release-Note:

>Audit-Trail:
From: Yorick Hardy <yorickhardy@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55237
Date: Fri, 22 May 2020 21:52:51 +0200

 Hi,

 I tried to narrow down the cause, by showing the v_usecount (u) and v_writecount (w)
 when the panic happens:

  https://nxr.netbsd.org/xref/src/sys/kern/vfs_vnode.c#788

 and it seems v_writecount is not zero when it was expected to be (it was 1). I am sorry,
 I have not made any further progress in tracking down the problem.

  Crash version 9.99.63, image version 9.99.63.
  crash: _kvm_kvatop(0)
  Kernel compiled without options LOCKDEBUG.
  System panicked: vrelel: bad ref count [u:1 w:1]
  Backtrace from time of crash is available.
  crash> bt
  _KERNEL_OPT_NAGR() at 0
  ?() at 7f7ff7ecf000
  sys_reboot() at sys_reboot
  vpanic() at vpanic+0x15b
  vtryrele() at vtryrele
  vcache_dealloc() at vcache_dealloc
  uvm_unmap_detach() at uvm_unmap_detach+0x63
  uvm_unmap1() at uvm_unmap1+0x4e
  uvm_mremap() at uvm_mremap+0x35a
  sys_mremap() at sys_mremap+0x68
  syscall() at syscall+0x1fd
  --- syscall (number 411) ---
  syscall+0x1fd:

 --
 Kind regards,

 Yorick Hardy

From: Yorick Hardy <yorickhardy@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55237
Date: Sat, 23 May 2020 23:47:11 +0200

 A very wild guess, I was wondering if this change affects the writecount
 (after a superficial reading of fd_putfile):

  cvs rdiff -u -r1.543 -r1.544 src/sys/kern/vfs_syscalls.c

 mail/fdm depends on databases/tdb, which uses fdatasync (checked only by a quick grep).

 I am building a test kernel with the change reverted, and will try it out for
 a few days to see if it is stable.

 --
 Kind regards,

 Yorick Hardy

From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55237 CVS commit: src/sys/kern
Date: Sat, 23 May 2020 22:16:18 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Sat May 23 22:16:17 UTC 2020

 Modified Files:
 	src/sys/kern: tty_ptm.c

 Log Message:
 PR kern/55237: Panic: vrelel: bad ref count (9.99.54)

 Adjust v_writecount with v_interlock held.


 To generate a diff of this commit:
 cvs rdiff -u -r1.41 -r1.42 src/sys/kern/tty_ptm.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/55237
Date: Sat, 23 May 2020 22:26:32 +0000

 On Sat, May 23, 2020 at 09:50:02PM +0000, Yorick Hardy wrote:
 > The following reply was made to PR kern/55237; it has been noted by GNATS.
 >
 > From: Yorick Hardy <yorickhardy@gmail.com>
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: kern/55237
 > Date: Sat, 23 May 2020 23:47:11 +0200
 >
 >  A very wild guess, I was wondering if this change affects the writecount
 >  (after a superficial reading of fd_putfile):
 >
 >   cvs rdiff -u -r1.543 -r1.544 src/sys/kern/vfs_syscalls.c
 >
 >  mail/fdm depends on databases/tdb, which uses fdatasync (checked only by a quick grep).
 >
 >  I am building a test kernel with the change reverted, and will try it out for
 >  a few days to see if it is stable.

 At a guess is not responsible, but worthwhile trying with it reverted.

 I made a commit but I don't think that one will cure it either.

 I'm suspicious of the assertion actually, there might be a race condition.
 Will think about it some more.

 Andrew

From: Yorick Hardy <yorickhardy@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55237
Date: Sun, 24 May 2020 12:18:38 +0200

 The commit referenced above is not the cause, apologies!

 --
 Kind regards,

 Yorick Hardy

From: Yorick Hardy <yorickhardy@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55237
Date: Sun, 1 Nov 2020 22:49:18 +0200

 I have put a vprint just before the panic. I am not sure what has
 changed - the usecount and writecount are now both 1 - here are
 two examples. In both cases the inodes are reported as unallocated
 by fsdb (after reboot). I don't seem to find the time or expertise
 to examine the code on my own, are there any other diagnostics I
 could add to try and diagnose the problem?

 [ 53296.926965] vnode bad write count: vnode 0xffff87564dd44b80 flags 0x4c38<MAPPED,MPSAFE,LOCKSWORK,WRMAP,PAGES,ONWORKLST>
 [ 53296.926965]         tag VT_UFS(1) type VREG(1) mount 0xffff87542b686000 typedata 0x0
 [ 53296.926965]         usecount 1 writecount 1 holdcount 1
 [ 53296.926965]         size 30000 writesize 30000 numoutput 0
 [ 53296.926965]         data 0xffff8756501facd0 lock 0xffff87564dd44d40
 [ 53296.926965]         state LOADED key(0xffff87542b686000 8) ee 1b 87 00 00 00 00 00
 [ 53296.926965]         lrulisthd 0xffffffff8105b6d0
 [ 53296.926965]         tag VT_UFS, ino 8854510, on dev 0, 0 flags 0x0, nlink 1
 [ 53296.926965]         mode 0100600, owner 1000, group 0, size 196608
 [ 53296.926965] panic: vrelel: bad write count 1 (pending 0)
 [ 53296.926965] cpu0: Begin traceback...
 [ 53296.926965] vpanic() at netbsd:vpanic+0x156
 [ 53296.936965] vtryrele() at netbsd:vtryrele
 [ 53296.946962] vrelel() at netbsd:vrelel+0x318
 [ 53296.946962] uvm_unmap_detach() at netbsd:uvm_unmap_detach+0x68
 [ 53296.946962] uvm_unmap1() at netbsd:uvm_unmap1+0x4e
 [ 53296.956962] uvm_mremap() at netbsd:uvm_mremap+0x39a
 [ 53296.956962] sys_mremap() at netbsd:sys_mremap+0x67
 [ 53296.956962] syscall() at netbsd:syscall+0x1cc
 [ 53296.966961] --- syscall (number 411) ---
 [ 53296.966961] netbsd:syscall+0x1cc:
 [ 53296.966961] cpu0: End traceback...

 [ 30742.529952] vnode bad write count: vnode 0xffff84c0e53554c0 flags 0x4c38<MAPPED,MPSAFE,LOCKSWORK,WRMAP,PAGES,ONWORKLST>
 [ 30742.529952]         tag VT_UFS(1) type VREG(1) mount 0xffff84c04b68e000 typedata 0x0
 [ 30742.529952]         usecount 1 writecount 1 holdcount 1
 [ 30742.529952]         size c000 writesize c000 numoutput 0
 [ 30742.529952]         data 0xffff84c0dc50e778 lock 0xffff84c0e5355680
 [ 30742.529952]         state LOADED key(0xffff84c04b68e000 8) a9 1b 87 00 00 00 00 00
 [ 30742.529952]         lrulisthd 0xffffffff8105b6d0
 [ 30742.529952]         tag VT_UFS, ino 8854441, on dev 0, 0 flags 0x18, nlink 1
 [ 30742.529952]         mode 0100600, owner 1000, group 0, size 49152
 [ 30742.529952] panic: vrelel: bad write count 1 (pending 0)
 [ 30742.529952] cpu1: Begin traceback...
 [ 30742.529952] vpanic() at netbsd:vpanic+0x156
 [ 30742.539957] vtryrele() at netbsd:vtryrele
 [ 30742.539957] vrelel() at netbsd:vrelel+0x318
 [ 30742.549951] uvm_unmap_detach() at netbsd:uvm_unmap_detach+0x68
 [ 30742.549951] uvmspace_free() at netbsd:uvmspace_free+0xdc
 [ 30742.549951] exit1() at netbsd:exit1+0x188
 [ 30742.559948] sys_exit() at netbsd:sys_exit+0x39
 [ 30742.559948] syscall() at netbsd:syscall+0x1cc
 [ 30742.559948] --- syscall (number 1) ---
 [ 30742.559948] netbsd:syscall+0x1cc:
 [ 30742.559948] cpu1: End traceback...

 --
 Kind regards,

 Yorick Hardy

From: Yorick Hardy <yorickhardy@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55237
Date: Sat, 14 Nov 2020 21:42:58 +0200

 Another panic, but the KASSERT caught it a bit earlier,
 still triggered by uvm_unmap_detach.

 Crash version 9.99.42, image version 9.99.75.
 WARNING: versions differ, you may not be able to examine this image.
 System panicked: kernel diagnostic assertion "(use & VUSECOUNT_MASK) > 1" failed: file "/usr/src/local/sys/kern/vfs_vnode.c", line 729
 Backtrace from time of crash is available.
 crash> bt
 _KERNEL_OPT_NAGR() at 0
 _KERNEL_OPT_NAGR() at 0
 sys_reboot() at sys_reboot
 vpanic() at vpanic+0x160
 __x86_indirect_thunk_rax() at __x86_indirect_thunk_rax
 vtryrele() at vtryrele+0xb2
 vrele() at vrele+0x12
 uvm_unmap_detach() at uvm_unmap_detach+0x7b
 uvmspace_free() at uvmspace_free+0xeb
 exit1() at exit1+0x1b8
 sys_exit() at sys_exit+0x39
 syscall() at syscall+0x23e
 --- syscall (number 1) ---
 799c8398606a:
 crash>

 --
 Kind regards,

 Yorick Hardy

From: Yorick Hardy <yorickhardy@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55237
Date: Mon, 16 Nov 2020 14:35:53 +0200

 I added the KASSERT below (is it correct?), and triggered a panic
 early during boot (it is not easily reproducible so far):

 [    75.787066] panic: kernel diagnostic assertion "!rw_lock_held(uobj->vmobjlock)" failed: file "/usr/src/local/sys/uvm/uvm_vnode.c", line 137
 [    75.787066] cpu0: Begin traceback...
 [    75.787066] vpanic() at netbsd:vpanic+0x156
 [    75.787066] __x86_indirect_thunk_rax() at netbsd:__x86_indirect_thunk_rax
 [    75.787066] uvn_detach() at netbsd:uvn_detach+0x51
 [    75.787066] uvm_unmap_detach() at netbsd:uvm_unmap_detach+0x7b
 [    75.787066] uvmspace_free() at netbsd:uvmspace_free+0xeb
 [    75.797068] exit1() at netbsd:exit1+0x1b8
 [    75.797068] sys_exit() at netbsd:sys_exit+0x39
 [    75.797068] syscall() at netbsd:syscall+0x23e
 [    75.797068] --- syscall (number 1) ---
 [    75.797068] netbsd:syscall+0x23e:
 [    75.797068] cpu0: End traceback...


 Index: sys/uvm/uvm_vnode.c
 ===================================================================
 RCS file: /cvsroot/src/sys/uvm/uvm_vnode.c,v
 retrieving revision 1.117
 diff -u -u -r1.117 uvm_vnode.c
 --- sys/uvm/uvm_vnode.c	16 Aug 2020 00:24:41 -0000	1.117
 +++ sys/uvm/uvm_vnode.c	16 Nov 2020 12:30:39 -0000
 @@ -134,6 +134,7 @@
  static void
  uvn_detach(struct uvm_object *uobj)
  {
 +	KASSERT(!rw_lock_held(uobj->vmobjlock));
  	vrele((struct vnode *)uobj);
  }


 --
 Kind regards,

 Yorick Hardy

From: "Yorick Hardy" <yhardy@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55237 CVS commit: src/sys/uvm
Date: Fri, 27 Nov 2020 22:32:43 +0000

 Module Name:	src
 Committed By:	yhardy
 Date:		Fri Nov 27 22:32:43 UTC 2020

 Modified Files:
 	src/sys/uvm: uvm_mremap.c

 Log Message:
 uvm_mremap: reference the appropriate backing object.

 The previous approach was appropriate for anonymous
 memory and device objects, which continue to work in
 the same way.

 OK: chs@
 Fixes: PR 55237


 To generate a diff of this commit:
 cvs rdiff -u -r1.20 -r1.21 src/sys/uvm/uvm_mremap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.