NetBSD Problem Report #41374
From yamt@mwd.biglobe.ne.jp Thu May 7 04:41:38 2009
Return-Path: <yamt@mwd.biglobe.ne.jp>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id C78E063B8DF
for <gnats-bugs@gnats.NetBSD.org>; Thu, 7 May 2009 04:41:38 +0000 (UTC)
Message-Id: <20090507032327.CEBF997EFC@kaeru.lan>
Date: Thu, 7 May 2009 12:23:27 +0900 (JST)
From: yamt@mwd.biglobe.ne.jp
Reply-To: yamt@mwd.biglobe.ne.jp
To: gnats-bugs@gnats.NetBSD.org
Subject: getnewvnode deadlock
X-Send-Pr-Version: 3.95
>Number: 41374
>Category: kern
>Synopsis: getnewvnode deadlock
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu May 07 04:45:00 +0000 2009
>Closed-Date: Tue Jul 21 10:20:09 +0000 2009
>Last-Modified: Tue Jul 21 10:20:09 +0000 2009
>Originator: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
>Release: NetBSD 5.99.11
>Organization:
>Environment:
System: NetBSD kaeru 5.99.11
Architecture: i386
Machine: i386
>Description:
a thread doing getcleanvnode:
pick a vnode
acqure v_interlock
v_usecount++
call vclean
now, another thread doing cache_lookup:
picks the vnode
vtryget succeed
vn_lock succeed
now in vclean:
set VI_XLOCK (too late to be noticed by the competing thread)
wait on the vnode lock (this might violate locking order)
>How-To-Repeat:
code inspection.
>Fix:
in the case of getcleanvnode,
set VI_XLOCK before incrementing v_usecount?
>Release-Note:
>Audit-Trail:
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/41374: getnewvnode deadlock
Date: Sun, 10 May 2009 08:31:19 +0000
On Thu, May 07, 2009 at 04:45:01AM +0000, yamt@mwd.biglobe.ne.jp wrote:
> in the case of getcleanvnode,
> set VI_XLOCK before incrementing v_usecount?
I think insufficient because vtryget() is lockless. How about making XLOCK
a flag bit in v_useconut?
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: ad@netbsd.org
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/41374: getnewvnode deadlock
Date: Sun, 10 May 2009 09:44:41 +0000 (UTC)
hi,
> On Thu, May 07, 2009 at 04:45:01AM +0000, yamt@mwd.biglobe.ne.jp wrote:
>
>> in the case of getcleanvnode,
>> set VI_XLOCK before incrementing v_usecount?
>
> I think insufficient because vtryget() is lockless.
i was thinking something like the following. (untested)
> How about making XLOCK
> a flag bit in v_useconut?
it sounds like a good idea.
YAMAMOTO Takashi
Index: sys/vnode.h
===================================================================
RCS file: /cvsroot/src/sys/sys/vnode.h,v
retrieving revision 1.191.4.2
diff -u -p -r1.191.4.2 vnode.h
--- sys/vnode.h 4 May 2009 08:14:36 -0000 1.191.4.2
+++ sys/vnode.h 10 May 2009 09:42:33 -0000
@@ -227,6 +227,7 @@ typedef struct vnode vnode_t;
#define VI_WRMAP 0x00000400 /* might have PROT_WRITE u. mappings */
#define VI_WRMAPDIRTY 0x00000800 /* might have dirty pages */
#define VI_XLOCK 0x00001000 /* vnode is locked to change type */
+#define VI_CLEANING 0x00002000 /* being cleaned by vclean */
#define VI_ONWORKLST 0x00004000 /* On syncer work-list */
#define VI_MARKER 0x00008000 /* Dummy marker vnode */
#define VI_LAYER 0x00020000 /* vnode is on a layer filesystem */
@@ -243,7 +244,7 @@ typedef struct vnode vnode_t;
#define VNODE_FLAGBITS \
"\20\1ROOT\2SYSTEM\3ISTTY\4MAPPED\5MPSAFE\6LOCKSWORK\11TEXT\12EXECMAP" \
- "\13WRMAP\14WRMAPDIRTY\15XLOCK\17ONWORKLST\20MARKER" \
+ "\13WRMAP\14WRMAPDIRTY\15XLOCK\16CLEANING\17ONWORKLST\20MARKER" \
"\22LAYER\24CLEAN\25INACTPEND\26INACTREDO\27FREEING" \
"\30INACTNOW\31DIROP"
Index: kern/vfs_subr.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_subr.c,v
retrieving revision 1.336.4.2
diff -u -p -r1.336.4.2 vfs_subr.c
--- kern/vfs_subr.c 4 May 2009 08:13:49 -0000 1.336.4.2
+++ kern/vfs_subr.c 10 May 2009 09:42:34 -0000
@@ -361,6 +361,17 @@ try_nextlist:
* before doing this. If the vnode gains another reference while
* being cleaned out then we lose - retry.
*/
+ KASSERT(mutex_owned(&vp->v_interlock));
+ KASSERT((vp->v_iflag & VI_XLOCK) == 0);
+ KASSERT((vp->v_iflag & VI_CLEAN) == 0);
+ KASSERT(vp->v_usecount == 0);
+ /*
+ * after we increment v_usecount below, another thread can
+ * do vtryget successfully. set VI_XLOCK before that, so that
+ * the competing thread can notice our activities.
+ */
+ vp->v_iflag |= VI_XLOCK;
+ membar_producer();
atomic_inc_uint(&vp->v_usecount);
vclean(vp, DOCLOSE);
if (vp->v_usecount == 1) {
@@ -1783,8 +1794,8 @@ vclean(vnode_t *vp, int flags)
KASSERT(vp->v_usecount != 0);
/* If cleaning is already in progress wait until done and return. */
- if (vp->v_iflag & VI_XLOCK) {
- vwait(vp, VI_XLOCK);
+ if (vp->v_iflag & VI_CLEANING) {
+ vwait(vp, VI_CLEANING);
return;
}
@@ -1797,7 +1808,7 @@ vclean(vnode_t *vp, int flags)
* Prevent the vnode from being recycled or brought into use
* while we clean it out.
*/
- vp->v_iflag |= VI_XLOCK;
+ vp->v_iflag |= VI_XLOCK | VI_CLEANING;
if (vp->v_iflag & VI_EXECMAP) {
atomic_add_int(&uvmexp.execpages, -vp->v_uobj.uo_npages);
atomic_add_int(&uvmexp.filepages, vp->v_uobj.uo_npages);
@@ -1857,7 +1868,7 @@ vclean(vnode_t *vp, int flags)
vp->v_tag = VT_NON;
vp->v_vnlock = &vp->v_lock;
KNOTE(&vp->v_klist, NOTE_REVOKE);
- vp->v_iflag &= ~(VI_XLOCK | VI_FREEING);
+ vp->v_iflag &= ~(VI_XLOCK | VI_FREEING | VI_CLEANING);
vp->v_vflag &= ~VV_LOCKSWORK;
if ((flags & DOCLOSE) != 0) {
vp->v_iflag |= VI_CLEAN;
Index: kern/vfs_cache.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_cache.c,v
retrieving revision 1.74.4.2
diff -u -p -r1.74.4.2 vfs_cache.c
--- kern/vfs_cache.c 4 May 2009 08:13:49 -0000 1.74.4.2
+++ kern/vfs_cache.c 10 May 2009 09:42:34 -0000
@@ -464,6 +464,15 @@ cache_lookup_raw(struct vnode *dvp, stru
return ENOENT;
}
if (vtryget(vp)) {
+ membar_consumer();
+ if ((vp->v_iflag & VI_XLOCK) != 0) {
+ mutex_exit(&ncp->nc_lock);
+ mutex_exit(&cpup->cpu_lock);
+ vrele(vp);
+ COUNT(cpup->cpu_stats, ncs_falsehits);
+ *vpp = NULL;
+ return -1;
+ }
mutex_exit(&ncp->nc_lock);
mutex_exit(&cpup->cpu_lock);
} else {
From: Andrew Doran <ad@netbsd.org>
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/41374: getnewvnode deadlock
Date: Mon, 11 May 2009 21:06:03 +0000
On Sun, May 10, 2009 at 09:44:41AM +0000, YAMAMOTO Takashi wrote:
> > How about making XLOCK
> > a flag bit in v_useconut?
>
> it sounds like a good idea.
On second thought what you propose looks fine to me. Putting a flag bit into
v_usecount would have the advantage that no memory barrier would be required
in vtryget().
> @@ -1783,8 +1794,8 @@ vclean(vnode_t *vp, int flags)
> KASSERT(vp->v_usecount != 0);
>
> /* If cleaning is already in progress wait until done and return. */
> - if (vp->v_iflag & VI_XLOCK) {
> - vwait(vp, VI_XLOCK);
> + if (vp->v_iflag & VI_CLEANING) {
> + vwait(vp, VI_CLEANING);
> return;
> }
Can't this allow a concurrent vclean(), e.g. via revoke or forced unmount?
> @@ -464,6 +464,15 @@ cache_lookup_raw(struct vnode *dvp, stru
> return ENOENT;
> }
> if (vtryget(vp)) {
> + membar_consumer();
> + if ((vp->v_iflag & VI_XLOCK) != 0) {
> + mutex_exit(&ncp->nc_lock);
> + mutex_exit(&cpup->cpu_lock);
> + vrele(vp);
> + COUNT(cpup->cpu_stats, ncs_falsehits);
> + *vpp = NULL;
> + return -1;
> + }
> mutex_exit(&ncp->nc_lock);
> mutex_exit(&cpup->cpu_lock);
I spent a while today trying to remember why vget() does not do vtryget().
Of course it's due to VI_FREEING. I wonder if we could change the order of
events in ufs_inactive() to this, allowing us to remove VI_FREEING (I don't
know what ZFS will require):
UFS_UPDATE(vp, ...);
ufs_ihashrem(ip);
UFS_VFREE(vp, ...);
I'm not yet sure if fsck can handle it the other way around.
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: ad@netbsd.org
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/41374: getnewvnode deadlock
Date: Tue, 12 May 2009 11:04:21 +0000 (UTC)
hi,
> On Sun, May 10, 2009 at 09:44:41AM +0000, YAMAMOTO Takashi wrote:
>
>> > How about making XLOCK
>> > a flag bit in v_useconut?
>>
>> it sounds like a good idea.
>
> On second thought what you propose looks fine to me. Putting a flag bit into
> v_usecount would have the advantage that no memory barrier would be required
> in vtryget().
>
>> @@ -1783,8 +1794,8 @@ vclean(vnode_t *vp, int flags)
>> KASSERT(vp->v_usecount != 0);
>>
>> /* If cleaning is already in progress wait until done and return. */
>> - if (vp->v_iflag & VI_XLOCK) {
>> - vwait(vp, VI_XLOCK);
>> + if (vp->v_iflag & VI_CLEANING) {
>> + vwait(vp, VI_CLEANING);
>> return;
>> }
>
> Can't this allow a concurrent vclean(), e.g. via revoke or forced unmount?
VI_CLEANING is to check for concurrent vclean.
anyway i prefer your suggestion of putting a bit into v_usecount and
implemented it. see below.
>> @@ -464,6 +464,15 @@ cache_lookup_raw(struct vnode *dvp, stru
>> return ENOENT;
>> }
>> if (vtryget(vp)) {
>> + membar_consumer();
>> + if ((vp->v_iflag & VI_XLOCK) != 0) {
>> + mutex_exit(&ncp->nc_lock);
>> + mutex_exit(&cpup->cpu_lock);
>> + vrele(vp);
>> + COUNT(cpup->cpu_stats, ncs_falsehits);
>> + *vpp = NULL;
>> + return -1;
>> + }
>> mutex_exit(&ncp->nc_lock);
>> mutex_exit(&cpup->cpu_lock);
>
> I spent a while today trying to remember why vget() does not do vtryget().
> Of course it's due to VI_FREEING. I wonder if we could change the order of
> events in ufs_inactive() to this, allowing us to remove VI_FREEING (I don't
> know what ZFS will require):
>
> UFS_UPDATE(vp, ...);
> ufs_ihashrem(ip);
> UFS_VFREE(vp, ...);
>
> I'm not yet sure if fsck can handle it the other way around.
wapbl might need some tweaks, too, i guess.
YAMAMOTO Takashi
Index: sys/vnode.h
===================================================================
RCS file: /cvsroot/src/sys/sys/vnode.h,v
retrieving revision 1.191.4.2
diff -u -p -r1.191.4.2 vnode.h
--- sys/vnode.h 4 May 2009 08:14:36 -0000 1.191.4.2
+++ sys/vnode.h 12 May 2009 11:02:37 -0000
@@ -250,6 +250,12 @@ typedef struct vnode vnode_t;
#define VSIZENOTSET ((voff_t)-1)
/*
+ * v_usecount
+ */
+#define VC_XLOCK 0x80000000
+#define VC_MASK 0x7fffffff
+
+/*
* Vnode attributes. A field value of VNOVAL represents a field whose value
* is unavailable (getattr) or which is not to be changed (setattr).
*/
Index: kern/vfs_subr.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_subr.c,v
retrieving revision 1.336.4.2
diff -u -p -r1.336.4.2 vfs_subr.c
--- kern/vfs_subr.c 4 May 2009 08:13:49 -0000 1.336.4.2
+++ kern/vfs_subr.c 12 May 2009 11:02:37 -0000
@@ -361,8 +361,10 @@ try_nextlist:
* before doing this. If the vnode gains another reference while
* being cleaned out then we lose - retry.
*/
- atomic_inc_uint(&vp->v_usecount);
+ atomic_add_int(&vp->v_usecount, 1 + VC_XLOCK);
vclean(vp, DOCLOSE);
+ KASSERT(vp->v_usecount >= 1 + VC_XLOCK);
+ atomic_add_int(&vp->v_usecount, -VC_XLOCK);
if (vp->v_usecount == 1) {
/* We're about to dirty it. */
vp->v_iflag &= ~VI_CLEAN;
@@ -1229,7 +1231,7 @@ vtryget(vnode_t *vp)
return false;
}
for (use = vp->v_usecount;; use = next) {
- if (use == 0) {
+ if (use == 0 || __predict_false((use & VC_XLOCK) != 0)) {
/* Need interlock held if first reference. */
return false;
}
@@ -1318,9 +1320,10 @@ vtryrele(vnode_t *vp)
u_int use, next;
for (use = vp->v_usecount;; use = next) {
- if (use == 1) {
+ if (use == 1) {
return false;
}
+ KASSERT((use & VC_MASK) > 1);
next = atomic_cas_uint(&vp->v_usecount, use, use - 1);
if (__predict_true(next == use)) {
return true;
From: YAMAMOTO Takashi <yamt@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/41374 CVS commit: src/sys
Date: Sat, 16 May 2009 08:29:54 +0000
Module Name: src
Committed By: yamt
Date: Sat May 16 08:29:54 UTC 2009
Modified Files:
src/sys/kern: vfs_subr.c
src/sys/sys: vnode.h
Log Message:
put a flag bit into v_usecount to prevent vtryget during getcleanvnode.
this fixes the following deadlock.
a thread doing getcleanvnode:
pick a vnode
acqure v_interlock
v_usecount++
call vclean
now, another thread doing cache_lookup:
picks the vnode
vtryget succeed
vn_lock succeed
now in vclean:
set VI_XLOCK (too late to be noticed by the competing thread)
wait on the vnode lock (this might violate locking order)
the use of a flag bit was suggested by Andrew Doran. PR/41374.
To generate a diff of this commit:
cvs rdiff -u -r1.378 -r1.379 src/sys/kern/vfs_subr.c
cvs rdiff -u -r1.206 -r1.207 src/sys/sys/vnode.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/41374 CVS commit: [netbsd-5-0] src/sys
Date: Tue, 21 Jul 2009 00:29:33 +0000
Module Name: src
Committed By: snj
Date: Tue Jul 21 00:29:33 UTC 2009
Modified Files:
src/sys/kern [netbsd-5-0]: vfs_subr.c
src/sys/sys [netbsd-5-0]: vnode.h
Log Message:
Pull up following revision(s) (requested by rmind in ticket #863):
sys/sys/vnode.h: revision 1.207
sys/kern/vfs_subr.c: revision 1.379
put a flag bit into v_usecount to prevent vtryget during getcleanvnode.
this fixes the following deadlock.
a thread doing getcleanvnode:
pick a vnode
acqure v_interlock
v_usecount++
call vclean
now, another thread doing cache_lookup:
picks the vnode
vtryget succeed
vn_lock succeed
now in vclean:
set VI_XLOCK (too late to be noticed by the competing thread)
wait on the vnode lock (this might violate locking order)
the use of a flag bit was suggested by Andrew Doran. PR/41374.
To generate a diff of this commit:
cvs rdiff -u -r1.357.4.4 -r1.357.4.4.2.1 src/sys/kern/vfs_subr.c
cvs rdiff -u -r1.197 -r1.197.10.1 src/sys/sys/vnode.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/41374 CVS commit: [netbsd-5] src/sys
Date: Tue, 21 Jul 2009 00:31:58 +0000
Module Name: src
Committed By: snj
Date: Tue Jul 21 00:31:58 UTC 2009
Modified Files:
src/sys/kern [netbsd-5]: vfs_subr.c
src/sys/sys [netbsd-5]: vnode.h
Log Message:
Pull up following revision(s) (requested by rmind in ticket #863):
sys/sys/vnode.h: revision 1.207
sys/kern/vfs_subr.c: revision 1.379
put a flag bit into v_usecount to prevent vtryget during getcleanvnode.
this fixes the following deadlock.
a thread doing getcleanvnode:
pick a vnode
acqure v_interlock
v_usecount++
call vclean
now, another thread doing cache_lookup:
picks the vnode
vtryget succeed
vn_lock succeed
now in vclean:
set VI_XLOCK (too late to be noticed by the competing thread)
wait on the vnode lock (this might violate locking order)
the use of a flag bit was suggested by Andrew Doran. PR/41374.
To generate a diff of this commit:
cvs rdiff -u -r1.357.4.4 -r1.357.4.5 src/sys/kern/vfs_subr.c
cvs rdiff -u -r1.197 -r1.197.4.1 src/sys/sys/vnode.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Tue, 21 Jul 2009 10:20:09 +0000
State-Changed-Why:
It was fixed, <ad> confirms.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.