NetBSD Problem Report #43439

From mrg@eterna.com.au  Wed Jun  9 05:14:13 2010
Return-Path: <mrg@eterna.com.au>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 1335563B8FF
	for <gnats-bugs@gnats.NetBSD.org>; Wed,  9 Jun 2010 05:14:13 +0000 (UTC)
Message-Id: <20100609051411.265E93752C@splode.eterna.com.au>
Date: Wed,  9 Jun 2010 15:14:11 +1000 (EST)
From: mrg@eterna.com.au
Reply-To: mrg@eterna.com.au
To: gnats-bugs@gnats.NetBSD.org
Subject: mount_null panic: lockdebug_wantlock: locking against myself
X-Send-Pr-Version: 3.95

>Number:         43439
>Category:       kern
>Synopsis:       mount_null panic: lockdebug_wantlock: locking against myself
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    hannken
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jun 09 05:15:00 +0000 2010
>Closed-Date:    Wed Jun 16 07:59:48 +0000 2010
>Last-Modified:  Wed Jun 16 19:20:02 +0000 2010
>Originator:     matthew green
>Release:        NetBSD 5.99.30
>Organization:
people's front against (bozotic) www (softwar foundation)
>Environment:
Architecture: amd64
Machine: amd64
>Description:

	mounting a r/w nullfs over a r/o nullfs crashes.

	my system that has a copy of pkgsrc in /home/current/pkgsrc, where
	/home is an FFS (wapbl or not -- in my testing).  then i mount this
	directory r/o nullfs to /usr/pkgsrc, and then i mount
	/home/current/pkgsrc/distfiles on /usr/pkgsrc/distfiles as a r/w
	nullfs.

	the final mount crashes with this:

login: Reader / writer lock error: lockdebug_wantlock: locking against myself

lock address : 0xffff80004ba43ab0 type     :     sleep/adaptive
initialized  : 0xffffffff805c72af
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  1
current cpu  :                  1 last held:                  1
current lwp  : 0xffff80004b8b0800 last held: 0xffff80004b8b0800
last locked  : 0xffffffff805c4dc6 unlocked : 0xffffffff805c4e3c
owner/count  : 0xffff80004b8b0800 flags    : 0x0000000000000004

Turnstile chain at 0xffffffff809e9860.
=> No active turnstile for this lock.

panic: LOCKDEBUG
cpu1: Begin traceback...
printf_nolog() at netbsd:printf_nolog+0xbc
cpu1: End traceback...
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff801b9bed cs 8 rflags 246 cr2  7f7ffd472eb0 cpl 0 rsp ffff80004b8f1400
Stopped in pid 437.1 (mount_null) at    netbsd:breakpoint+0x5:  leave
db{1}> bt
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x2ba
lockdebug_locked() at netbsd:lockdebug_locked
rw_enter() at netbsd:rw_enter+0x2f1
vlockmgr() at netbsd:vlockmgr+0xdf
VOP_LOCK() at netbsd:VOP_LOCK+0x64
vn_lock() at netbsd:vn_lock+0xd5
cache_lookup() at netbsd:cache_lookup+0x212
ufs_lookup() at netbsd:ufs_lookup+0xc4
VOP_LOOKUP() at netbsd:VOP_LOOKUP+0x80
do_lookup() at netbsd:do_lookup+0x43a
namei() at netbsd:namei+0x276
nullfs_mount() at netbsd:nullfs_mount+0xb9
VFS_MOUNT() at netbsd:VFS_MOUNT+0x34
do_sys_mount() at netbsd:do_sys_mount+0x796
sys___mount50() at netbsd:sys___mount50+0x33
syscall() at netbsd:syscall+0xaa

>How-To-Repeat:

	fstab with these three lines:

/dev/wd0e                       /home                  ffs     rw,log     1 2
/home/current/pkgsrc            /usr/pkgsrc            null    ro,hidden
/home/current/pkgsrc/distfiles  /usr/pkgsrc/distfiles  null    rw,hidden

>Fix:

>Release-Note:

>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: kern/43439: mount_null panic: lockdebug_wantlock: locking against myself
Date: Wed, 09 Jun 2010 15:28:31 +1000

 > lock address : 0xffff80004ba43ab0 type     :     sleep/adaptive
 > initialized  : 0xffffffff805c72af
 > shared holds :                  0 exclusive:                  1
 > shares wanted:                  0 exclusive:                  1
 > current cpu  :                  1 last held:                  1
 > current lwp  : 0xffff80004b8b0800 last held: 0xffff80004b8b0800
 > last locked  : 0xffffffff805c4dc6 unlocked : 0xffffffff805c4e3c
 > owner/count  : 0xffff80004b8b0800 flags    : 0x0000000000000004

 these addresses work out to be:

 0xffffffff805c72af is in vnalloc (/usr/src/sys/kern/vfs_subr.c:723).
 718             if (mp != NULL) {
 719                     vp->v_mount = mp;
 720                     vp->v_type = VBAD;
 721                     vp->v_iflag = VI_MARKER;
 722             } else {
 723                     rw_init(&vp->v_lock.vl_lock);
 724             }
 725
 726             return vp;
 727     }

 0xffffffff805c4dc6 is in vlockmgr (/usr/src/sys/kern/vfs_subr.c:2945).
 2940                    }
 2941                    rw_enter(&vl->vl_lock, RW_READER);
 2942                    return 0;
 2943
 2944            case LK_EXCLUSIVE:
 2945                    if (rw_tryenter(&vl->vl_lock, RW_WRITER)) {
 2946                            return 0;
 2947                    }
 2948                    if ((vl->vl_canrecurse || (flags & LK_CANRECURSE) != 0) &&
 2949                        rw_write_held(&vl->vl_lock)) {

 0xffffffff805c4e3c is in vlockmgr (/usr/src/sys/kern/vfs_subr.c:2965).
 2960                    if (vl->vl_recursecnt != 0) {
 2961                            KASSERT(rw_write_held(&vl->vl_lock));
 2962                            vl->vl_recursecnt--;
 2963                            return 0;
 2964                    }
 2965                    rw_exit(&vl->vl_lock);
 2966                    return 0;
 2967
 2968            default:
 2969                    panic("vlockmgr: flags %x", flags);

From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/43439
Date: Wed, 9 Jun 2010 10:29:15 +0300

 Has this twisty setup ever worked, or should it just be filed under
 "don't do it"?

 I would solve writable pkgdist by setting DISTDIR in /etc/mk.conf.

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: kern/43439
Date: Wed, 09 Jun 2010 18:02:39 +1000

 >  Has this twisty setup ever worked, or should it just be filed under
 >  "don't do it"?

 i've been doing it like this since when ever pkgsrc first gained
 r/o support, which is a decade or more ago.

 why is it a twisty setup?

 >  I would solve writable pkgdist by setting DISTDIR in /etc/mk.conf.

 that's a pretty crappy solution to something that has worked for a
 very long time and panics *every* kernel.  (it triggers one of the
 very few situations that non-LOCKDEBUG kernels complain about.)


 .mrg.

From: Antti Kantee <pooka@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/43439 CVS commit: src/tests/fs/nullfs
Date: Wed, 9 Jun 2010 08:37:16 +0000

 Module Name:	src
 Committed By:	pooka
 Date:		Wed Jun  9 08:37:16 UTC 2010

 Modified Files:
 	src/tests/fs/nullfs: t_basic.c

 Log Message:
 ``twistymount'' regression test for scenario described in PR kern/43439


 To generate a diff of this commit:
 cvs rdiff -u -r1.2 -r1.3 src/tests/fs/nullfs/t_basic.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Antti Kantee <pooka@NetBSD.org>
To: matthew green <mrg@eterna.com.au>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/43439
Date: Wed, 9 Jun 2010 11:38:41 +0300

 On Wed, Jun 09, 2010 at 06:02:39PM +1000, matthew green wrote:
 > 
 > >  Has this twisty setup ever worked, or should it just be filed under
 > >  "don't do it"?
 > 
 > i've been doing it like this since when ever pkgsrc first gained
 > r/o support, which is a decade or more ago.

 Ok, just wanted to verify if this is a new problem (which wasn't
 clear from the original report).

 > why is it a twisty setup?

 It's first mounting a onto b and then a/dir onto b/dir, where b/dir
 is actually a/dir when "unnullified".  I'd call that twisty and
 generally avoid doing it, but ymmv.

 > >  I would solve writable pkgdist by setting DISTDIR in /etc/mk.conf.
 > 
 > that's a pretty crappy solution to something that has worked for a
 > very long time and panics *every* kernel.  (it triggers one of the
 > very few situations that non-LOCKDEBUG kernels complain about.)

 Fair enough.

 I added ``twistymount'' to tests/fs/nullfs so you can monitor
 progress in fixing this without having to crash your system.


 On a tangent, atf-run gives me this:
 tc-start: twistymount
 tc-so:panic: rumpuser fatal failure 11 (Resource deadlock avoided)
 tc-end: twistymount, passed
 tp-end: t_basic

 Tests which dumped core at least used to fail (I clearly remember
 adding the code to atf to display the reason for failure was a core
 dump).  If upgrading my atf to 0.9 won't fix it, I'll have to file
 an atf PR.  Let's hope this stuff doesn't get more recursive than
 it already is ...

Responsible-Changed-From-To: kern-bug-people->hannken
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Wed, 09 Jun 2010 09:02:06 +0000
Responsible-Changed-Why:
Broke it and have to handle it.


State-Changed-From-To: open->analyzed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Wed, 09 Jun 2010 09:02:06 +0000
State-Changed-Why:
- do_sys_mount makes the mounted-on vnode lock recursive.  This is a
  layered vnode.
- nullfs_mount looks up the lower vnode, traverses the leaf of the
  mounted-on vnode.
- while the mounted-on vnode has recursive locking, its leaf has not
  => locking against self

Two possible solutions:
1) Remove recursive vnode locks and move mount-on lookup into the file systems
2) Add a bypass function to layered file systems to pass vn_setrecurse down


From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: mrg@eterna.com.au
Subject: Re: kern/43439 -- workaround
Date: Wed, 9 Jun 2010 11:27:00 +0200

 A workaround is to organize the mounts as

   /dev/wd0e                       /home                  ffs     rw,log     1 2
   /home/current/pkgsrc            /usr/pkgsrc            null    ro,hidden
 - /home/current/pkgsrc/distfiles  /usr/pkgsrc/distfiles  null    rw,hidden
 + /home/current/pkgsrc.distfiles  /usr/pkgsrc/distfiles  null    rw,hidden

 to avoid recursive vnode locks during lookup.

 -- 
 Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: Antti Kantee <pooka@netbsd.org>
Subject: Re: kern/43439 (mount_null panic: lockdebug_wantlock: locking
	against myself)
Date: Thu, 10 Jun 2010 04:48:51 +0000

 On Wed, Jun 09, 2010 at 09:02:07AM +0000, hannken@NetBSD.org wrote:
  > Two possible solutions:
  > 1) Remove recursive vnode locks and move mount-on lookup into the
  > file systems
  > 2) Add a bypass function to layered file systems to pass vn_setrecurse down

 Also (3) rototill how mounting works so that you prepare the fs first
 and only then attach it to the namespace, which seems like the correct
 order of operations and would avoid this problem. I have no idea how
 hard this would be though.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Juergen Hannken-Illjes <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/43439 CVS commit: src/sys/kern
Date: Tue, 15 Jun 2010 09:43:37 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Tue Jun 15 09:43:37 UTC 2010

 Modified Files:
 	src/sys/kern: vfs_syscalls.c

 Log Message:
 When mounting a file system re-lookup and lock the directory we mount on
 after the file system is setup by VFS_MOUNT().  This way recursive vnode
 locks are no longer needed here and mounts on null mounts no longer fail
 as described in PR #43439 (mount_null panic: lockdebug_wantlock: locking
 against myself).

 Based on a proposal from  and
 Reviewed by: David A. Holland <dholland@netbsd.org>


 To generate a diff of this commit:
 cvs rdiff -u -r1.404 -r1.405 src/sys/kern/vfs_syscalls.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->closed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Wed, 16 Jun 2010 07:59:48 +0000
State-Changed-Why:
Fixed in tree with rev. 1.405 of src/sys/kern/vfs_syscalls.c.


From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: hannken@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: re: kern/43439 (mount_null panic: lockdebug_wantlock: locking against myself)
Date: Thu, 17 Jun 2010 05:15:24 +1000

 > Synopsis: mount_null panic: lockdebug_wantlock: locking against myself
 > 
 > State-Changed-From-To: analyzed->closed
 > State-Changed-By: hannken@NetBSD.org
 > State-Changed-When: Wed, 16 Jun 2010 07:59:48 +0000
 > State-Changed-Why:
 > Fixed in tree with rev. 1.405 of src/sys/kern/vfs_syscalls.c.

 thanks!  i can also confirm it is fixed.


 .mrg.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.