NetBSD Problem Report #43439
From mrg@eterna.com.au Wed Jun 9 05:14:13 2010
Return-Path: <mrg@eterna.com.au>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 1335563B8FF
for <gnats-bugs@gnats.NetBSD.org>; Wed, 9 Jun 2010 05:14:13 +0000 (UTC)
Message-Id: <20100609051411.265E93752C@splode.eterna.com.au>
Date: Wed, 9 Jun 2010 15:14:11 +1000 (EST)
From: mrg@eterna.com.au
Reply-To: mrg@eterna.com.au
To: gnats-bugs@gnats.NetBSD.org
Subject: mount_null panic: lockdebug_wantlock: locking against myself
X-Send-Pr-Version: 3.95
>Number: 43439
>Category: kern
>Synopsis: mount_null panic: lockdebug_wantlock: locking against myself
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: hannken
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jun 09 05:15:00 +0000 2010
>Closed-Date: Wed Jun 16 07:59:48 +0000 2010
>Last-Modified: Wed Jun 16 19:20:02 +0000 2010
>Originator: matthew green
>Release: NetBSD 5.99.30
>Organization:
people's front against (bozotic) www (softwar foundation)
>Environment:
Architecture: amd64
Machine: amd64
>Description:
mounting a r/w nullfs over a r/o nullfs crashes.
my system that has a copy of pkgsrc in /home/current/pkgsrc, where
/home is an FFS (wapbl or not -- in my testing). then i mount this
directory r/o nullfs to /usr/pkgsrc, and then i mount
/home/current/pkgsrc/distfiles on /usr/pkgsrc/distfiles as a r/w
nullfs.
the final mount crashes with this:
login: Reader / writer lock error: lockdebug_wantlock: locking against myself
lock address : 0xffff80004ba43ab0 type : sleep/adaptive
initialized : 0xffffffff805c72af
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 1
current cpu : 1 last held: 1
current lwp : 0xffff80004b8b0800 last held: 0xffff80004b8b0800
last locked : 0xffffffff805c4dc6 unlocked : 0xffffffff805c4e3c
owner/count : 0xffff80004b8b0800 flags : 0x0000000000000004
Turnstile chain at 0xffffffff809e9860.
=> No active turnstile for this lock.
panic: LOCKDEBUG
cpu1: Begin traceback...
printf_nolog() at netbsd:printf_nolog+0xbc
cpu1: End traceback...
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff801b9bed cs 8 rflags 246 cr2 7f7ffd472eb0 cpl 0 rsp ffff80004b8f1400
Stopped in pid 437.1 (mount_null) at netbsd:breakpoint+0x5: leave
db{1}> bt
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x2ba
lockdebug_locked() at netbsd:lockdebug_locked
rw_enter() at netbsd:rw_enter+0x2f1
vlockmgr() at netbsd:vlockmgr+0xdf
VOP_LOCK() at netbsd:VOP_LOCK+0x64
vn_lock() at netbsd:vn_lock+0xd5
cache_lookup() at netbsd:cache_lookup+0x212
ufs_lookup() at netbsd:ufs_lookup+0xc4
VOP_LOOKUP() at netbsd:VOP_LOOKUP+0x80
do_lookup() at netbsd:do_lookup+0x43a
namei() at netbsd:namei+0x276
nullfs_mount() at netbsd:nullfs_mount+0xb9
VFS_MOUNT() at netbsd:VFS_MOUNT+0x34
do_sys_mount() at netbsd:do_sys_mount+0x796
sys___mount50() at netbsd:sys___mount50+0x33
syscall() at netbsd:syscall+0xaa
>How-To-Repeat:
fstab with these three lines:
/dev/wd0e /home ffs rw,log 1 2
/home/current/pkgsrc /usr/pkgsrc null ro,hidden
/home/current/pkgsrc/distfiles /usr/pkgsrc/distfiles null rw,hidden
>Fix:
>Release-Note:
>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: re: kern/43439: mount_null panic: lockdebug_wantlock: locking against myself
Date: Wed, 09 Jun 2010 15:28:31 +1000
> lock address : 0xffff80004ba43ab0 type : sleep/adaptive
> initialized : 0xffffffff805c72af
> shared holds : 0 exclusive: 1
> shares wanted: 0 exclusive: 1
> current cpu : 1 last held: 1
> current lwp : 0xffff80004b8b0800 last held: 0xffff80004b8b0800
> last locked : 0xffffffff805c4dc6 unlocked : 0xffffffff805c4e3c
> owner/count : 0xffff80004b8b0800 flags : 0x0000000000000004
these addresses work out to be:
0xffffffff805c72af is in vnalloc (/usr/src/sys/kern/vfs_subr.c:723).
718 if (mp != NULL) {
719 vp->v_mount = mp;
720 vp->v_type = VBAD;
721 vp->v_iflag = VI_MARKER;
722 } else {
723 rw_init(&vp->v_lock.vl_lock);
724 }
725
726 return vp;
727 }
0xffffffff805c4dc6 is in vlockmgr (/usr/src/sys/kern/vfs_subr.c:2945).
2940 }
2941 rw_enter(&vl->vl_lock, RW_READER);
2942 return 0;
2943
2944 case LK_EXCLUSIVE:
2945 if (rw_tryenter(&vl->vl_lock, RW_WRITER)) {
2946 return 0;
2947 }
2948 if ((vl->vl_canrecurse || (flags & LK_CANRECURSE) != 0) &&
2949 rw_write_held(&vl->vl_lock)) {
0xffffffff805c4e3c is in vlockmgr (/usr/src/sys/kern/vfs_subr.c:2965).
2960 if (vl->vl_recursecnt != 0) {
2961 KASSERT(rw_write_held(&vl->vl_lock));
2962 vl->vl_recursecnt--;
2963 return 0;
2964 }
2965 rw_exit(&vl->vl_lock);
2966 return 0;
2967
2968 default:
2969 panic("vlockmgr: flags %x", flags);
From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/43439
Date: Wed, 9 Jun 2010 10:29:15 +0300
Has this twisty setup ever worked, or should it just be filed under
"don't do it"?
I would solve writable pkgdist by setting DISTDIR in /etc/mk.conf.
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: re: kern/43439
Date: Wed, 09 Jun 2010 18:02:39 +1000
> Has this twisty setup ever worked, or should it just be filed under
> "don't do it"?
i've been doing it like this since when ever pkgsrc first gained
r/o support, which is a decade or more ago.
why is it a twisty setup?
> I would solve writable pkgdist by setting DISTDIR in /etc/mk.conf.
that's a pretty crappy solution to something that has worked for a
very long time and panics *every* kernel. (it triggers one of the
very few situations that non-LOCKDEBUG kernels complain about.)
.mrg.
From: Antti Kantee <pooka@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/43439 CVS commit: src/tests/fs/nullfs
Date: Wed, 9 Jun 2010 08:37:16 +0000
Module Name: src
Committed By: pooka
Date: Wed Jun 9 08:37:16 UTC 2010
Modified Files:
src/tests/fs/nullfs: t_basic.c
Log Message:
``twistymount'' regression test for scenario described in PR kern/43439
To generate a diff of this commit:
cvs rdiff -u -r1.2 -r1.3 src/tests/fs/nullfs/t_basic.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Antti Kantee <pooka@NetBSD.org>
To: matthew green <mrg@eterna.com.au>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/43439
Date: Wed, 9 Jun 2010 11:38:41 +0300
On Wed, Jun 09, 2010 at 06:02:39PM +1000, matthew green wrote:
>
> > Has this twisty setup ever worked, or should it just be filed under
> > "don't do it"?
>
> i've been doing it like this since when ever pkgsrc first gained
> r/o support, which is a decade or more ago.
Ok, just wanted to verify if this is a new problem (which wasn't
clear from the original report).
> why is it a twisty setup?
It's first mounting a onto b and then a/dir onto b/dir, where b/dir
is actually a/dir when "unnullified". I'd call that twisty and
generally avoid doing it, but ymmv.
> > I would solve writable pkgdist by setting DISTDIR in /etc/mk.conf.
>
> that's a pretty crappy solution to something that has worked for a
> very long time and panics *every* kernel. (it triggers one of the
> very few situations that non-LOCKDEBUG kernels complain about.)
Fair enough.
I added ``twistymount'' to tests/fs/nullfs so you can monitor
progress in fixing this without having to crash your system.
On a tangent, atf-run gives me this:
tc-start: twistymount
tc-so:panic: rumpuser fatal failure 11 (Resource deadlock avoided)
tc-end: twistymount, passed
tp-end: t_basic
Tests which dumped core at least used to fail (I clearly remember
adding the code to atf to display the reason for failure was a core
dump). If upgrading my atf to 0.9 won't fix it, I'll have to file
an atf PR. Let's hope this stuff doesn't get more recursive than
it already is ...
Responsible-Changed-From-To: kern-bug-people->hannken
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Wed, 09 Jun 2010 09:02:06 +0000
Responsible-Changed-Why:
Broke it and have to handle it.
State-Changed-From-To: open->analyzed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Wed, 09 Jun 2010 09:02:06 +0000
State-Changed-Why:
- do_sys_mount makes the mounted-on vnode lock recursive. This is a
layered vnode.
- nullfs_mount looks up the lower vnode, traverses the leaf of the
mounted-on vnode.
- while the mounted-on vnode has recursive locking, its leaf has not
=> locking against self
Two possible solutions:
1) Remove recursive vnode locks and move mount-on lookup into the file systems
2) Add a bypass function to layered file systems to pass vn_setrecurse down
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: mrg@eterna.com.au
Subject: Re: kern/43439 -- workaround
Date: Wed, 9 Jun 2010 11:27:00 +0200
A workaround is to organize the mounts as
/dev/wd0e /home ffs rw,log 1 2
/home/current/pkgsrc /usr/pkgsrc null ro,hidden
- /home/current/pkgsrc/distfiles /usr/pkgsrc/distfiles null rw,hidden
+ /home/current/pkgsrc.distfiles /usr/pkgsrc/distfiles null rw,hidden
to avoid recursive vnode locks during lookup.
--
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: Antti Kantee <pooka@netbsd.org>
Subject: Re: kern/43439 (mount_null panic: lockdebug_wantlock: locking
against myself)
Date: Thu, 10 Jun 2010 04:48:51 +0000
On Wed, Jun 09, 2010 at 09:02:07AM +0000, hannken@NetBSD.org wrote:
> Two possible solutions:
> 1) Remove recursive vnode locks and move mount-on lookup into the
> file systems
> 2) Add a bypass function to layered file systems to pass vn_setrecurse down
Also (3) rototill how mounting works so that you prepare the fs first
and only then attach it to the namespace, which seems like the correct
order of operations and would avoid this problem. I have no idea how
hard this would be though.
--
David A. Holland
dholland@netbsd.org
From: Juergen Hannken-Illjes <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/43439 CVS commit: src/sys/kern
Date: Tue, 15 Jun 2010 09:43:37 +0000
Module Name: src
Committed By: hannken
Date: Tue Jun 15 09:43:37 UTC 2010
Modified Files:
src/sys/kern: vfs_syscalls.c
Log Message:
When mounting a file system re-lookup and lock the directory we mount on
after the file system is setup by VFS_MOUNT(). This way recursive vnode
locks are no longer needed here and mounts on null mounts no longer fail
as described in PR #43439 (mount_null panic: lockdebug_wantlock: locking
against myself).
Based on a proposal from and
Reviewed by: David A. Holland <dholland@netbsd.org>
To generate a diff of this commit:
cvs rdiff -u -r1.404 -r1.405 src/sys/kern/vfs_syscalls.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: analyzed->closed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Wed, 16 Jun 2010 07:59:48 +0000
State-Changed-Why:
Fixed in tree with rev. 1.405 of src/sys/kern/vfs_syscalls.c.
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: hannken@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: re: kern/43439 (mount_null panic: lockdebug_wantlock: locking against myself)
Date: Thu, 17 Jun 2010 05:15:24 +1000
> Synopsis: mount_null panic: lockdebug_wantlock: locking against myself
>
> State-Changed-From-To: analyzed->closed
> State-Changed-By: hannken@NetBSD.org
> State-Changed-When: Wed, 16 Jun 2010 07:59:48 +0000
> State-Changed-Why:
> Fixed in tree with rev. 1.405 of src/sys/kern/vfs_syscalls.c.
thanks! i can also confirm it is fixed.
.mrg.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.