NetBSD Problem Report #19110

Received: (qmail 373 invoked by uid 605); 20 Nov 2002 11:20:19 -0000
Message-Id: <200211201120.gAKBKEY00295@nelly.aprisoft.de>
Date: Wed, 20 Nov 2002 12:20:14 +0100 (CET)
From: Martin Husemann <martin@aprisoft.de>
Sender: gnats-bugs-owner@netbsd.org
Reply-To: martin@aprisoft.de
To: gnats-bugs@gnats.netbsd.org
Subject: nullfs mounts over NFS cause lock manager problems
X-Send-Pr-Version: 3.95

>Number:         19110
>Category:       kern
>Synopsis:       nullfs mounts over NFS cause lock manager problems
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    dholland
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 20 11:21:00 +0000 2002
>Closed-Date:    
>Last-Modified:  Sat Nov 05 16:46:48 +0000 2011
>Originator:     Martin Husemann
>Release:        NetBSD 1.6K
>Organization:
>Environment:
System: NetBSD nelly.aprisoft.de 1.6K NetBSD 1.6K (NELLY) #1: Wed Nov 20 09:51:28 CET 2002 martin@nelly.aprisoft.de:/usr/src/sys/arch/sparc64/compile/NELLY sparc64
Architecture: sparc64
Machine: sparc64
>Description:

I tried to do a sandbox build in pkgsrc with /usr/pkgsrc on NFS.
The pkgsrc/mk/bulk/mksandbox script creates a few NULL mounts of /usr/pkgsrc 
subdirectories into the sandbox directory. I build a pkg inside the sandbox
(in my case gmake) and the kernel paniced:

panic: lockmgr: locking against myself
kdb breakpoint at 113cde4
Stopped in pid 8455 (sh) at     cpu_Debugger+0x4:       nop
db> tr
lockmgr(ed78fb8, 2, ed78f00, f037bc0, 0, 0) at lockmgr+0x5f8
genfs_lock(f06f0c0, 10e520c, 100, ee931f0, 0, 0) at genfs_lock+0x10
vn_lock(ed78f00, 20002, 108, 2054600, 0, 0) at vn_lock+0x80
nfs_inactive(22db280, f0, 103ead0, 2054600, 0, 0) at nfs_inactive+0xb4
vrele(ebefe50, 21592c0, 183d2d8, 0, 180, 20e7058) at vrele+0xd8
layer_reclaim(0, f8, 10e85ac, ee931f0, 0, 0) at layer_reclaim+0x10c
vclean(ee02fb0, 8, ee931f0, 0, 0, 2050b00) at vclean+0x1a4
vgonel(ee02fb0, ee931f0, deadb, 209f600, 1816000, b4dd647c) at vgonel+0x44
getnewvnode(0, 217b400, 2054600, f06f6c8, ed979b3, 0) at getnewvnode+0x250
nfs_nget(217b400, 2051580, 1c, f06f7f8, 1f, 205157c) at nfs_nget+0x114
nfs_create(c, 200, 0, f06fa30, f06f828, 0) at nfs_create+0x1d9c
layer_bypass(f06fa30, 10e7d98, 20, 209f600, ec20310, eead010) at layer_bypass+0x11c
vn_open(f06fa60, 602, 1a4, e7a2720, 1b6, 0) at vn_open+0x10c
sys_open(0, f06fdd0, f06fdc0, 10df618, 0, eead010) at sys_open+0xa8
syscall(f06fed0, ffffffffffffffff, 4055ecec, 0, 1831800, 1831800) at syscall+0x26c
syscall_setup(22eaf8, 601, 1b6, 0, 0, 0) at syscall_setup+0xec


>How-To-Repeat:

mount -t nfs server:/usr/pkgsrc /usr/pkgsrc
mount -t null /usr/pkgsrc /tmp/sandbox
do some stuff in /tmp/sandbox

>Fix:
n/a
>Release-Note:
>Audit-Trail:

From: Chuck Silvers <chuq@chuq.com>
To: Martin Husemann <martin@aprisoft.de>
Cc: gnats-bugs@gnats.netbsd.org, wrstuden@netbsd.org, fvdl@netbsd.org
Subject: Re: kern/19110: NULL mounts over NFS cause lock manager problems
Date: Mon, 2 Dec 2002 01:49:01 -0800

 this looks like an MI problem that came about when NFS got real vnode locks.
 what's happening is that while creating a file in directory A, we end up
 reclaiming a nullfs vnode which is above a previously-deleted file also in
 directory A.  we've got A locked to create the new file, and when we reclaim
 the nullfs vnode, we drop the last reference to the previous-deleted file,
 thus allowing it to go inactive, which tries to process the silly-rename
 and really delete the file, for which it tries to lock A again.  deadlock.

 there are several ways this could be fixed:

  (1) don't process silly-rename deletes in-line, but rather put such vnodes
      on a queue and let a worker thread process them later.
  (2) don't drop the hold on the lower vnodes in layer_reclaim(), but rather
      put such vnodes on a queue and let a worker thread drop the hold later.
  (3) only use real vnode locks in NFS for regular files and use the noop-locks
      for other vnode types.

 the silly-rename processing in nfs_inactive() is kind of bogus now that NFS
 is using real locks, so I'd go for either (1) or (3).  well, another choice
 would be

  (4) redesign the vnode locking so that NFS isn't forced to even pretend that
      VOP_LOCK() makes sense for directories, but that's a lot more work.

 -Chuck

From: Bill Studenmund <wrstuden@netbsd.org>
To: Chuck Silvers <chuq@chuq.com>
Cc: Martin Husemann <martin@aprisoft.de>, <gnats-bugs@gnats.netbsd.org>,
  <fvdl@netbsd.org>, William Studenmund <wrstuden@netbsd.org>
Subject: Re: kern/19110: NULL mounts over NFS cause lock manager problems
Date: Mon, 2 Dec 2002 15:17:36 -0800 (PST)

 On Mon, 2 Dec 2002, Chuck Silvers wrote:

 > this looks like an MI problem that came about when NFS got real vnode locks.
 > what's happening is that while creating a file in directory A, we end up
 > reclaiming a nullfs vnode which is above a previously-deleted file also in
 > directory A.  we've got A locked to create the new file, and when we reclaim
 > the nullfs vnode, we drop the last reference to the previous-deleted file,
 > thus allowing it to go inactive, which tries to process the silly-rename
 > and really delete the file, for which it tries to lock A again.  deadlock.
 >
 > there are several ways this could be fixed:
 >
 >  (1) don't process silly-rename deletes in-line, but rather put such vnodes
 >      on a queue and let a worker thread process them later.

 I think this (or a variant) is the best idea, though there will be some
 areas of concern.

 >  (2) don't drop the hold on the lower vnodes in layer_reclaim(), but rather
 >      put such vnodes on a queue and let a worker thread drop the hold later.

 Problem is that when we're in layer_reclaim(), we are in the process of no
 longer being a nullfs vnode. So we can't put the vnode on a queue for
 subsequent processing.

 >  (3) only use real vnode locks in NFS for regular files and use the noop-locks
 >      for other vnode types.

 I don't think this is a good idea; we should make locking work.

 > the silly-rename processing in nfs_inactive() is kind of bogus now that NFS
 > is using real locks, so I'd go for either (1) or (3).  well, another choice
 > would be

 ?? The silly-rename processing is due to the fact that NFS is stateless
 and that we don't delete files that reach zero on-disk references if they
 are still open. While I'll agree our processing may be bogusly-designed,
 vnode locking itself is an orthogonal issue.

 I think it's fine to try the rename in-line, we just should be able to
 deal with it failing.

 If we do the put-it-on-a-queue idea, we could have some strange
 interactions. nfs_inactivate() got called because we were moving onto the
 freelist. To put the vnode in a work queue means adding an external
 reference to it, which really means it shouldn't be on the free list.

 So if we don't do the rename in-line, we have a situation where part-way
 through putting a vnode on the free list, we have to stop. Or we have to
 put the vnode on the work queue w/o holding a reference to it.

 Probably the easiest thing to do is have the rename code try to get the
 dir lock, and if it fails due to deadlock (i.e. we already have the lock),
 proceed with the removal anyway and just remember we didn't get the lock
 so we won't unlock on the way out.

 Take care,

 Bill



From: Chuck Silvers <chuq@chuq.com>
To: Bill Studenmund <wrstuden@netbsd.org>
Cc: Martin Husemann <martin@aprisoft.de>, gnats-bugs@gnats.netbsd.org,
  fvdl@netbsd.org
Subject: Re: kern/19110: NULL mounts over NFS cause lock manager problems
Date: Mon, 2 Dec 2002 22:10:22 -0800

 On Mon, Dec 02, 2002 at 03:17:36PM -0800, Bill Studenmund wrote:
 > >  (1) don't process silly-rename deletes in-line, but rather put such vnodes
 > >      on a queue and let a worker thread process them later.
 > 
 > I think this (or a variant) is the best idea, though there will be some
 > areas of concern.
 > 
 > >  (2) don't drop the hold on the lower vnodes in layer_reclaim(), but rather
 > >      put such vnodes on a queue and let a worker thread drop the hold later.
 > 
 > Problem is that when we're in layer_reclaim(), we are in the process of no
 > longer being a nullfs vnode. So we can't put the vnode on a queue for
 > subsequent processing.

 I meant put the lower vnode on a queue.  this would probably amount to putting
 the layer_node on a queue.  the layer vnode could be made available as normal.
 anyway, let's agree that this is not the way to go.


 > >  (3) only use real vnode locks in NFS for regular files and use the noop-locks
 > >      for other vnode types.
 > 
 > I don't think this is a good idea; we should make locking work.
 > 
 > > the silly-rename processing in nfs_inactive() is kind of bogus now that NFS
 > > is using real locks, so I'd go for either (1) or (3).  well, another choice
 > > would be
 > 
 > ?? The silly-rename processing is due to the fact that NFS is stateless
 > and that we don't delete files that reach zero on-disk references if they
 > are still open. While I'll agree our processing may be bogusly-designed,
 > vnode locking itself is an orthogonal issue.

 yes, I understand why the silly-rename stuff is there.
 I meant that the implementation is kind of bogus, in particular due to
 it being prone to deadlock now that NFS is using real locks.


 > I think it's fine to try the rename in-line, we just should be able to
 > deal with it failing.
 > 
 > If we do the put-it-on-a-queue idea, we could have some strange
 > interactions. nfs_inactivate() got called because we were moving onto the
 > freelist. To put the vnode in a work queue means adding an external
 > reference to it, which really means it shouldn't be on the free list.

 so take the vnode back off the free list...
 why is it "free" when VOP_INACTIVE() is still processing it anyway?
 that also constitues a reference to the vnode when it's on the free list.


 > So if we don't do the rename in-line, we have a situation where part-way
 > through putting a vnode on the free list, we have to stop. Or we have to
 > put the vnode on the work queue w/o holding a reference to it.
 > 
 > Probably the easiest thing to do is have the rename code try to get the
 > dir lock, and if it fails due to deadlock (i.e. we already have the lock),
 > proceed with the removal anyway and just remember we didn't get the lock
 > so we won't unlock on the way out.

 how is this proposal compatible with your previous assertion that
 we should make the locking work?  this is just skipping the locking
 sometimes instead of all the time.  if you're going to violate the
 protocol, why be half-hearted about it?

 however, at some point we'll want to use the page cache to store directory
 data as well, so at that point we'll need real locks for NFS directories
 as well.  so I think that rules out (3).  this leaves (1) for further
 consideration.


 hmm, it looks like we don't do correct locking in the silly-rename part of
 nfs_inactive() currently, in VOP_REMOVE() both the directory and the target
 vnode are supposed to be locked, but for a silly-rename removal we only
 have the directory locked.

 also, what if someone renames the silly-renamed file into a different
 directory while there's still a hold on that file?  the inactive code
 would take the lock on the directory that the file was in when it was
 silly-renamed, not where it currently lives.  I'm not sure what nasty
 effect this could have, but again it's not obeying the locking rules.
 I don't see that it's even possible to follow the rules in this case.

 -Chuck

From: Bill Studenmund <wrstuden@netbsd.org>
To: Chuck Silvers <chuq@chuq.com>
Cc: Martin Husemann <martin@aprisoft.de>, <gnats-bugs@gnats.netbsd.org>,
  <fvdl@netbsd.org>
Subject: Re: kern/19110: NULL mounts over NFS cause lock manager problems
Date: Tue, 3 Dec 2002 19:43:39 -0800 (PST)

 On Mon, 2 Dec 2002, Chuck Silvers wrote:

 > On Mon, Dec 02, 2002 at 03:17:36PM -0800, Bill Studenmund wrote:
 > > Problem is that when we're in layer_reclaim(), we are in the process of no
 > > longer being a nullfs vnode. So we can't put the vnode on a queue for
 > > subsequent processing.
 >
 > I meant put the lower vnode on a queue.  this would probably amount to putting
 > the layer_node on a queue.  the layer vnode could be made available as normal.
 > anyway, let's agree that this is not the way to go.

 Ahh.. That would work. But yes, let's not do it. :-)

 > > ?? The silly-rename processing is due to the fact that NFS is stateless
 > > and that we don't delete files that reach zero on-disk references if they
 > > are still open. While I'll agree our processing may be bogusly-designed,
 > > vnode locking itself is an orthogonal issue.
 >
 > yes, I understand why the silly-rename stuff is there.
 > I meant that the implementation is kind of bogus, in particular due to
 > it being prone to deadlock now that NFS is using real locks.

 My appologies; I misunderstood you. Yes, our code may (and seems to) well
 be bogus. :-)

 > > I think it's fine to try the rename in-line, we just should be able to
 > > deal with it failing.
 > >
 > > If we do the put-it-on-a-queue idea, we could have some strange
 > > interactions. nfs_inactivate() got called because we were moving onto the
 > > freelist. To put the vnode in a work queue means adding an external
 > > reference to it, which really means it shouldn't be on the free list.
 >
 > so take the vnode back off the free list...
 > why is it "free" when VOP_INACTIVE() is still processing it anyway?
 > that also constitues a reference to the vnode when it's on the free list.

 Yes & no.

 While the vp parameter constitutes a reference to the vnode, we're most
 interested in long-term-not-easily-found references. Being on the free
 list doesn't count as we can easily find references (i.e. TAILQ_REMOVE is
 happy :-)

 > > Probably the easiest thing to do is have the rename code try to get the
 > > dir lock, and if it fails due to deadlock (i.e. we already have the lock),
 > > proceed with the removal anyway and just remember we didn't get the lock
 > > so we won't unlock on the way out.
 >
 > how is this proposal compatible with your previous assertion that
 > we should make the locking work?  this is just skipping the locking
 > sometimes instead of all the time.  if you're going to violate the
 > protocol, why be half-hearted about it?

 Because as I understand it, if we fail due to deadlock, that means one of
 our callers, i.e. this process (or this lwp in _sa) has the lock. This
 means that we skip getting the lock if we already have it, not that we
 violate locking. All is well as the other lock-holding code can't run
 until we return. :-)

 > however, at some point we'll want to use the page cache to store directory
 > data as well, so at that point we'll need real locks for NFS directories
 > as well.  so I think that rules out (3).  this leaves (1) for further
 > consideration.

 Ok.

 > hmm, it looks like we don't do correct locking in the silly-rename part of
 > nfs_inactive() currently, in VOP_REMOVE() both the directory and the target
 > vnode are supposed to be locked, but for a silly-rename removal we only
 > have the directory locked.

 F*ck. You're right, we don't re-lock the file node. We should.

 To not make a race condition, we should VOP_LOCK() (not vn_lock()) the
 file node after we vn_lock the parent, then unlock it after the remove.

 Hmmm... It's worse than that. If we're in vclean, we should NOT re-lock
 the node. vclean() asserts LK_DRAIN, which as I understand makes future
 lock attemtps fail. We can notice this by detecting VXLOCK being set. If
 VXLOCK is set, no one else can lock the node, so we don't need to bother.

 vclean() is also the case that makes adding the vnode to a
 rename-delete-worker queue problematic. We can't pull the vnode off the
 free list since 1) it might not be there (on the free list), and 2) it
 won't be an nfs vnode for long. :-(

 So I think we need to be careful about the re-lock of the parent, and the
 re-lock of the node. We have to unlock the node before we can lock the
 parent, so we can't just leave it locked...

 > also, what if someone renames the silly-renamed file into a different
 > directory while there's still a hold on that file?  the inactive code
 > would take the lock on the directory that the file was in when it was
 > silly-renamed, not where it currently lives.  I'm not sure what nasty
 > effect this could have, but again it's not obeying the locking rules.
 > I don't see that it's even possible to follow the rules in this case.

 In that case, the nfs_removeit() just fails I think.

 The only other thing I can see is if we rename a node, we should clear out
 its silly-rename info. As the renaming indicates that we want to keep it,
 and it now has an on-disk reference.

 Take care,

 Bill


From: Chuck Silvers <chuq@chuq.com>
To: Bill Studenmund <wrstuden@netbsd.org>
Cc: Martin Husemann <martin@aprisoft.de>, gnats-bugs@gnats.netbsd.org,
  fvdl@netbsd.org
Subject: Re: kern/19110: NULL mounts over NFS cause lock manager problems
Date: Sun, 15 Dec 2002 22:23:24 -0800

 On Tue, Dec 03, 2002 at 07:43:39PM -0800, Bill Studenmund wrote:
 > > > I think it's fine to try the rename in-line, we just should be able to
 > > > deal with it failing.
 > > >
 > > > If we do the put-it-on-a-queue idea, we could have some strange
 > > > interactions. nfs_inactivate() got called because we were moving onto the
 > > > freelist. To put the vnode in a work queue means adding an external
 > > > reference to it, which really means it shouldn't be on the free list.
 > >
 > > so take the vnode back off the free list...
 > > why is it "free" when VOP_INACTIVE() is still processing it anyway?
 > > that also constitues a reference to the vnode when it's on the free list.
 > 
 > Yes & no.
 > 
 > While the vp parameter constitutes a reference to the vnode, we're most
 > interested in long-term-not-easily-found references. Being on the free
 > list doesn't count as we can easily find references (i.e. TAILQ_REMOVE is
 > happy :-)

 this is probably good enough for UP and giant-lock MP kernels,
 but once we get away from the giant lock, we'll have to fix this.
 let's leave that for another day.


 > > > Probably the easiest thing to do is have the rename code try to get the
 > > > dir lock, and if it fails due to deadlock (i.e. we already have the lock),
 > > > proceed with the removal anyway and just remember we didn't get the lock
 > > > so we won't unlock on the way out.
 > >
 > > how is this proposal compatible with your previous assertion that
 > > we should make the locking work?  this is just skipping the locking
 > > sometimes instead of all the time.  if you're going to violate the
 > > protocol, why be half-hearted about it?
 > 
 > Because as I understand it, if we fail due to deadlock, that means one of
 > our callers, i.e. this process (or this lwp in _sa) has the lock. This
 > means that we skip getting the lock if we already have it, not that we
 > violate locking. All is well as the other lock-holding code can't run
 > until we return. :-)

 wellll... that sounds like it might work, but this type of design isn't
 so good in general.  it's best if code is written such that you're
 never already holding a lock that you acquire, since doing otherwise
 introduces a huge amount of additional complexity.  this is really
 a choice of design philosophy, but I hope netbsd in general can
 agree to avoid recursive locking, including this variation of it.

 that being said, this may be the best choice for this particular problem
 in the short term.  in the longer term, I think we really need to redesign
 the vnode locking entirely to be MP-safe without the giant lock, and
 hopefully we can solve all these other problems at the same time.


 > > hmm, it looks like we don't do correct locking in the silly-rename part of
 > > nfs_inactive() currently, in VOP_REMOVE() both the directory and the target
 > > vnode are supposed to be locked, but for a silly-rename removal we only
 > > have the directory locked.
 > 
 > F*ck. You're right, we don't re-lock the file node. We should.
 > 
 > To not make a race condition, we should VOP_LOCK() (not vn_lock()) the
 > file node after we vn_lock the parent, then unlock it after the remove.
 > 
 > Hmmm... It's worse than that. If we're in vclean, we should NOT re-lock
 > the node. vclean() asserts LK_DRAIN, which as I understand makes future
 > lock attemtps fail. We can notice this by detecting VXLOCK being set. If
 > VXLOCK is set, no one else can lock the node, so we don't need to bother.
 > 
 > vclean() is also the case that makes adding the vnode to a
 > rename-delete-worker queue problematic. We can't pull the vnode off the
 > free list since 1) it might not be there (on the free list), and 2) it
 > won't be an nfs vnode for long. :-(

 I don't see how this can happen currently.  when nfs_inactive() is called,
 the to-be-deleted vnode has only just been put on the freelist right before
 that.  so in a UP or giant-lock kernel, it will still be on the freelist
 and we can just take it off again.  


 > So I think we need to be careful about the re-lock of the parent, and the
 > re-lock of the node. We have to unlock the node before we can lock the
 > parent, so we can't just leave it locked...

 ok, would you like to take care of this part?  or even all of the
 immediate problems that that we've been discussing?


 > > also, what if someone renames the silly-renamed file into a different
 > > directory while there's still a hold on that file?  the inactive code
 > > would take the lock on the directory that the file was in when it was
 > > silly-renamed, not where it currently lives.  I'm not sure what nasty
 > > effect this could have, but again it's not obeying the locking rules.
 > > I don't see that it's even possible to follow the rules in this case.
 > 
 > In that case, the nfs_removeit() just fails I think.

 yea, that's what would make sense, but I didn't examine the code enough
 to be sure.


 > The only other thing I can see is if we rename a node, we should clear out
 > its silly-rename info. As the renaming indicates that we want to keep it,
 > and it now has an on-disk reference.

 hmm, that's a good way to handle this case, yea.

 -Chuck
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/19110 CVS commit: src/sys/kern
Date: Sun, 2 Oct 2011 13:00:07 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Sun Oct  2 13:00:07 UTC 2011

 Modified Files:
 	src/sys/kern: vfs_vnode.c

 Log Message:
 The path getnewvnode()->getcleanvnode()->vclean()->VOP_LOCK() will panic
 if the vnode we want to clean is a layered vnode and the caller already
 locked its lower vnode.

 Change getnewvnode() to always allocate a fresh vnode and add a helper
 thread (vdrain) to keep the number of allocated vnodes within desiredvnodes.

 Rename getcleanvnode() to cleanvnode() and let it take a vnode from the
 lists, clean and free it.

 Reviewed by: David Holland <dholland@netbsd.org>

 Should fix:
 PR #19110 (nullfs mounts over NFS cause lock manager problems)
 PR #34102 (ffs panic in NetBSD 3.0_STABLE)
 PR #45115 (lock error panic when build.sh*3 and daily script is running)
 PR #45355 (Reader/writer lock error:  rw_vector_enter: locking against myself)


 To generate a diff of this commit:
 cvs rdiff -u -r1.11 -r1.12 src/sys/kern/vfs_vnode.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: kern-bug-people->dholland
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Sat, 05 Nov 2011 16:46:48 +0000
Responsible-Changed-Why:
I am not completely convinced yet from the description here that there
isn't also something else wrong in the nfs code. So I want to poke at
this some more. If I become unreachable in the long term for some reason
just close it.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.