NetBSD Problem Report #41374

From yamt@mwd.biglobe.ne.jp  Thu May  7 04:41:38 2009
Return-Path: <yamt@mwd.biglobe.ne.jp>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id C78E063B8DF
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  7 May 2009 04:41:38 +0000 (UTC)
Message-Id: <20090507032327.CEBF997EFC@kaeru.lan>
Date: Thu,  7 May 2009 12:23:27 +0900 (JST)
From: yamt@mwd.biglobe.ne.jp
Reply-To: yamt@mwd.biglobe.ne.jp
To: gnats-bugs@gnats.NetBSD.org
Subject: getnewvnode deadlock 
X-Send-Pr-Version: 3.95

>Number:         41374
>Category:       kern
>Synopsis:       getnewvnode deadlock
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu May 07 04:45:00 +0000 2009
>Closed-Date:    Tue Jul 21 10:20:09 +0000 2009
>Last-Modified:  Tue Jul 21 10:20:09 +0000 2009
>Originator:     YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
>Release:        NetBSD 5.99.11
>Organization:

>Environment:


System: NetBSD kaeru 5.99.11
Architecture: i386
Machine: i386
>Description:

	a thread doing getcleanvnode:
	pick a vnode
	acqure v_interlock
	v_usecount++
	call vclean

		now, another thread doing cache_lookup:
		picks the vnode
		vtryget succeed
		vn_lock succeed

	now in vclean:
	set VI_XLOCK (too late to be noticed by the competing thread)
	wait on the vnode lock (this might violate locking order)

>How-To-Repeat:
	code inspection.

>Fix:
	in the case of getcleanvnode,
	set VI_XLOCK before incrementing v_usecount?

>Release-Note:

>Audit-Trail:
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/41374: getnewvnode deadlock
Date: Sun, 10 May 2009 08:31:19 +0000

 On Thu, May 07, 2009 at 04:45:01AM +0000, yamt@mwd.biglobe.ne.jp wrote:

 > 	in the case of getcleanvnode,
 > 	set VI_XLOCK before incrementing v_usecount?

 I think insufficient because vtryget() is lockless. How about making XLOCK
 a flag bit in v_useconut?

From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: ad@netbsd.org
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/41374: getnewvnode deadlock
Date: Sun, 10 May 2009 09:44:41 +0000 (UTC)

 hi,

 > On Thu, May 07, 2009 at 04:45:01AM +0000, yamt@mwd.biglobe.ne.jp wrote:
 > 
 >> 	in the case of getcleanvnode,
 >> 	set VI_XLOCK before incrementing v_usecount?
 > 
 > I think insufficient because vtryget() is lockless.

 i was thinking something like the following.  (untested)

 > How about making XLOCK
 > a flag bit in v_useconut?

 it sounds like a good idea.

 YAMAMOTO Takashi

 Index: sys/vnode.h
 ===================================================================
 RCS file: /cvsroot/src/sys/sys/vnode.h,v
 retrieving revision 1.191.4.2
 diff -u -p -r1.191.4.2 vnode.h
 --- sys/vnode.h	4 May 2009 08:14:36 -0000	1.191.4.2
 +++ sys/vnode.h	10 May 2009 09:42:33 -0000
 @@ -227,6 +227,7 @@ typedef struct vnode vnode_t;
  #define	VI_WRMAP	0x00000400	/* might have PROT_WRITE u. mappings */
  #define	VI_WRMAPDIRTY	0x00000800	/* might have dirty pages */
  #define	VI_XLOCK	0x00001000	/* vnode is locked to change type */
 +#define	VI_CLEANING	0x00002000	/* being cleaned by vclean */
  #define	VI_ONWORKLST	0x00004000	/* On syncer work-list */
  #define	VI_MARKER	0x00008000	/* Dummy marker vnode */
  #define	VI_LAYER	0x00020000	/* vnode is on a layer filesystem */
 @@ -243,7 +244,7 @@ typedef struct vnode vnode_t;

  #define	VNODE_FLAGBITS \
      "\20\1ROOT\2SYSTEM\3ISTTY\4MAPPED\5MPSAFE\6LOCKSWORK\11TEXT\12EXECMAP" \
 -    "\13WRMAP\14WRMAPDIRTY\15XLOCK\17ONWORKLST\20MARKER" \
 +    "\13WRMAP\14WRMAPDIRTY\15XLOCK\16CLEANING\17ONWORKLST\20MARKER" \
      "\22LAYER\24CLEAN\25INACTPEND\26INACTREDO\27FREEING" \
      "\30INACTNOW\31DIROP" 

 Index: kern/vfs_subr.c
 ===================================================================
 RCS file: /cvsroot/src/sys/kern/vfs_subr.c,v
 retrieving revision 1.336.4.2
 diff -u -p -r1.336.4.2 vfs_subr.c
 --- kern/vfs_subr.c	4 May 2009 08:13:49 -0000	1.336.4.2
 +++ kern/vfs_subr.c	10 May 2009 09:42:34 -0000
 @@ -361,6 +361,17 @@ try_nextlist:
  	 * before doing this.  If the vnode gains another reference while
  	 * being cleaned out then we lose - retry.
  	 */
 +	KASSERT(mutex_owned(&vp->v_interlock));
 +	KASSERT((vp->v_iflag & VI_XLOCK) == 0);
 +	KASSERT((vp->v_iflag & VI_CLEAN) == 0);
 +	KASSERT(vp->v_usecount == 0);
 +	/*
 +	 * after we increment v_usecount below, another thread can
 +	 * do vtryget successfully.  set VI_XLOCK before that, so that
 +	 * the competing thread can notice our activities.
 +	 */
 +	vp->v_iflag |= VI_XLOCK;
 +	membar_producer();
  	atomic_inc_uint(&vp->v_usecount);
  	vclean(vp, DOCLOSE);
  	if (vp->v_usecount == 1) {
 @@ -1783,8 +1794,8 @@ vclean(vnode_t *vp, int flags)
  	KASSERT(vp->v_usecount != 0);

  	/* If cleaning is already in progress wait until done and return. */
 -	if (vp->v_iflag & VI_XLOCK) {
 -		vwait(vp, VI_XLOCK);
 +	if (vp->v_iflag & VI_CLEANING) {
 +		vwait(vp, VI_CLEANING);
  		return;
  	}

 @@ -1797,7 +1808,7 @@ vclean(vnode_t *vp, int flags)
  	 * Prevent the vnode from being recycled or brought into use
  	 * while we clean it out.
  	 */
 -	vp->v_iflag |= VI_XLOCK;
 +	vp->v_iflag |= VI_XLOCK | VI_CLEANING;
  	if (vp->v_iflag & VI_EXECMAP) {
  		atomic_add_int(&uvmexp.execpages, -vp->v_uobj.uo_npages);
  		atomic_add_int(&uvmexp.filepages, vp->v_uobj.uo_npages);
 @@ -1857,7 +1868,7 @@ vclean(vnode_t *vp, int flags)
  	vp->v_tag = VT_NON;
  	vp->v_vnlock = &vp->v_lock;
  	KNOTE(&vp->v_klist, NOTE_REVOKE);
 -	vp->v_iflag &= ~(VI_XLOCK | VI_FREEING);
 +	vp->v_iflag &= ~(VI_XLOCK | VI_FREEING | VI_CLEANING);
  	vp->v_vflag &= ~VV_LOCKSWORK;
  	if ((flags & DOCLOSE) != 0) {
  		vp->v_iflag |= VI_CLEAN;
 Index: kern/vfs_cache.c
 ===================================================================
 RCS file: /cvsroot/src/sys/kern/vfs_cache.c,v
 retrieving revision 1.74.4.2
 diff -u -p -r1.74.4.2 vfs_cache.c
 --- kern/vfs_cache.c	4 May 2009 08:13:49 -0000	1.74.4.2
 +++ kern/vfs_cache.c	10 May 2009 09:42:34 -0000
 @@ -464,6 +464,15 @@ cache_lookup_raw(struct vnode *dvp, stru
  		return ENOENT;
  	}
  	if (vtryget(vp)) {
 +		membar_consumer();
 +		if ((vp->v_iflag & VI_XLOCK) != 0) {
 +			mutex_exit(&ncp->nc_lock);
 +			mutex_exit(&cpup->cpu_lock);
 +			vrele(vp);
 +			COUNT(cpup->cpu_stats, ncs_falsehits);
 +			*vpp = NULL;
 +			return -1;
 +		}
  		mutex_exit(&ncp->nc_lock);
  		mutex_exit(&cpup->cpu_lock);
  	} else {

From: Andrew Doran <ad@netbsd.org>
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/41374: getnewvnode deadlock
Date: Mon, 11 May 2009 21:06:03 +0000

 On Sun, May 10, 2009 at 09:44:41AM +0000, YAMAMOTO Takashi wrote:

 > > How about making XLOCK
 > > a flag bit in v_useconut?
 > 
 > it sounds like a good idea.

 On second thought what you propose looks fine to me. Putting a flag bit into
 v_usecount would have the advantage that no memory barrier would be required
 in vtryget().

 > @@ -1783,8 +1794,8 @@ vclean(vnode_t *vp, int flags)
 >  	KASSERT(vp->v_usecount != 0);
 >  
 >  	/* If cleaning is already in progress wait until done and return. */
 > -	if (vp->v_iflag & VI_XLOCK) {
 > -		vwait(vp, VI_XLOCK);
 > +	if (vp->v_iflag & VI_CLEANING) {
 > +		vwait(vp, VI_CLEANING);
 >  		return;
 >  	}

 Can't this allow a concurrent vclean(), e.g. via revoke or forced unmount?

 > @@ -464,6 +464,15 @@ cache_lookup_raw(struct vnode *dvp, stru
 >  		return ENOENT;
 >  	}
 >  	if (vtryget(vp)) {
 > +		membar_consumer();
 > +		if ((vp->v_iflag & VI_XLOCK) != 0) {
 > +			mutex_exit(&ncp->nc_lock);
 > +			mutex_exit(&cpup->cpu_lock);
 > +			vrele(vp);
 > +			COUNT(cpup->cpu_stats, ncs_falsehits);
 > +			*vpp = NULL;
 > +			return -1;
 > +		}
 >  		mutex_exit(&ncp->nc_lock);
 >  		mutex_exit(&cpup->cpu_lock);

 I spent a while today trying to remember why vget() does not do vtryget().
 Of course it's due to VI_FREEING. I wonder if we could change the order of
 events in ufs_inactive() to this, allowing us to remove VI_FREEING (I don't
 know what ZFS will require):

 	UFS_UPDATE(vp, ...);
 	ufs_ihashrem(ip);
 	UFS_VFREE(vp, ...);

 I'm not yet sure if fsck can handle it the other way around.

From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: ad@netbsd.org
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/41374: getnewvnode deadlock
Date: Tue, 12 May 2009 11:04:21 +0000 (UTC)

 hi,

 > On Sun, May 10, 2009 at 09:44:41AM +0000, YAMAMOTO Takashi wrote:
 > 
 >> > How about making XLOCK
 >> > a flag bit in v_useconut?
 >> 
 >> it sounds like a good idea.
 > 
 > On second thought what you propose looks fine to me. Putting a flag bit into
 > v_usecount would have the advantage that no memory barrier would be required
 > in vtryget().
 > 
 >> @@ -1783,8 +1794,8 @@ vclean(vnode_t *vp, int flags)
 >>  	KASSERT(vp->v_usecount != 0);
 >>  
 >>  	/* If cleaning is already in progress wait until done and return. */
 >> -	if (vp->v_iflag & VI_XLOCK) {
 >> -		vwait(vp, VI_XLOCK);
 >> +	if (vp->v_iflag & VI_CLEANING) {
 >> +		vwait(vp, VI_CLEANING);
 >>  		return;
 >>  	}
 > 
 > Can't this allow a concurrent vclean(), e.g. via revoke or forced unmount?

 VI_CLEANING is to check for concurrent vclean.

 anyway i prefer your suggestion of putting a bit into v_usecount and
 implemented it.  see below.

 >> @@ -464,6 +464,15 @@ cache_lookup_raw(struct vnode *dvp, stru
 >>  		return ENOENT;
 >>  	}
 >>  	if (vtryget(vp)) {
 >> +		membar_consumer();
 >> +		if ((vp->v_iflag & VI_XLOCK) != 0) {
 >> +			mutex_exit(&ncp->nc_lock);
 >> +			mutex_exit(&cpup->cpu_lock);
 >> +			vrele(vp);
 >> +			COUNT(cpup->cpu_stats, ncs_falsehits);
 >> +			*vpp = NULL;
 >> +			return -1;
 >> +		}
 >>  		mutex_exit(&ncp->nc_lock);
 >>  		mutex_exit(&cpup->cpu_lock);
 > 
 > I spent a while today trying to remember why vget() does not do vtryget().
 > Of course it's due to VI_FREEING. I wonder if we could change the order of
 > events in ufs_inactive() to this, allowing us to remove VI_FREEING (I don't
 > know what ZFS will require):
 > 
 > 	UFS_UPDATE(vp, ...);
 > 	ufs_ihashrem(ip);
 > 	UFS_VFREE(vp, ...);
 > 
 > I'm not yet sure if fsck can handle it the other way around.

 wapbl might need some tweaks, too, i guess.

 YAMAMOTO Takashi

 Index: sys/vnode.h
 ===================================================================
 RCS file: /cvsroot/src/sys/sys/vnode.h,v
 retrieving revision 1.191.4.2
 diff -u -p -r1.191.4.2 vnode.h
 --- sys/vnode.h	4 May 2009 08:14:36 -0000	1.191.4.2
 +++ sys/vnode.h	12 May 2009 11:02:37 -0000
 @@ -250,6 +250,12 @@ typedef struct vnode vnode_t;
  #define	VSIZENOTSET	((voff_t)-1)

  /*
 + * v_usecount
 + */
 +#define	VC_XLOCK	0x80000000
 +#define	VC_MASK		0x7fffffff
 +
 +/*
   * Vnode attributes.  A field value of VNOVAL represents a field whose value
   * is unavailable (getattr) or which is not to be changed (setattr).
   */
 Index: kern/vfs_subr.c
 ===================================================================
 RCS file: /cvsroot/src/sys/kern/vfs_subr.c,v
 retrieving revision 1.336.4.2
 diff -u -p -r1.336.4.2 vfs_subr.c
 --- kern/vfs_subr.c	4 May 2009 08:13:49 -0000	1.336.4.2
 +++ kern/vfs_subr.c	12 May 2009 11:02:37 -0000
 @@ -361,8 +361,10 @@ try_nextlist:
  	 * before doing this.  If the vnode gains another reference while
  	 * being cleaned out then we lose - retry.
  	 */
 -	atomic_inc_uint(&vp->v_usecount);
 +	atomic_add_int(&vp->v_usecount, 1 + VC_XLOCK);
  	vclean(vp, DOCLOSE);
 +	KASSERT(vp->v_usecount >= 1 + VC_XLOCK);
 +	atomic_add_int(&vp->v_usecount, -VC_XLOCK);
  	if (vp->v_usecount == 1) {
  		/* We're about to dirty it. */
  		vp->v_iflag &= ~VI_CLEAN;
 @@ -1229,7 +1231,7 @@ vtryget(vnode_t *vp)
  		return false;
  	}
  	for (use = vp->v_usecount;; use = next) {
 -		if (use == 0) { 
 +		if (use == 0 || __predict_false((use & VC_XLOCK) != 0)) {
  			/* Need interlock held if first reference. */
  			return false;
  		}
 @@ -1318,9 +1320,10 @@ vtryrele(vnode_t *vp)
  	u_int use, next;

  	for (use = vp->v_usecount;; use = next) {
 -		if (use == 1) { 
 +		if (use == 1) {
  			return false;
  		}
 +		KASSERT((use & VC_MASK) > 1);
  		next = atomic_cas_uint(&vp->v_usecount, use, use - 1);
  		if (__predict_true(next == use)) {
  			return true;

From: YAMAMOTO Takashi <yamt@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/41374 CVS commit: src/sys
Date: Sat, 16 May 2009 08:29:54 +0000

 Module Name:	src
 Committed By:	yamt
 Date:		Sat May 16 08:29:54 UTC 2009

 Modified Files:
 	src/sys/kern: vfs_subr.c
 	src/sys/sys: vnode.h

 Log Message:
 put a flag bit into v_usecount to prevent vtryget during getcleanvnode.
 this fixes the following deadlock.

 	a thread doing getcleanvnode:
 	pick a vnode
 	acqure v_interlock
 	v_usecount++
 	call vclean

 		now, another thread doing cache_lookup:
 		picks the vnode
 		vtryget succeed
 		vn_lock succeed

 	now in vclean:
 	set VI_XLOCK (too late to be noticed by the competing thread)
 	wait on the vnode lock (this might violate locking order)

 the use of a flag bit was suggested by Andrew Doran.  PR/41374.


 To generate a diff of this commit:
 cvs rdiff -u -r1.378 -r1.379 src/sys/kern/vfs_subr.c
 cvs rdiff -u -r1.206 -r1.207 src/sys/sys/vnode.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/41374 CVS commit: [netbsd-5-0] src/sys
Date: Tue, 21 Jul 2009 00:29:33 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Tue Jul 21 00:29:33 UTC 2009

 Modified Files:
 	src/sys/kern [netbsd-5-0]: vfs_subr.c
 	src/sys/sys [netbsd-5-0]: vnode.h

 Log Message:
 Pull up following revision(s) (requested by rmind in ticket #863):
 	sys/sys/vnode.h: revision 1.207
 	sys/kern/vfs_subr.c: revision 1.379
 put a flag bit into v_usecount to prevent vtryget during getcleanvnode.
 this fixes the following deadlock.
         a thread doing getcleanvnode:
         pick a vnode
         acqure v_interlock
         v_usecount++
         call vclean
                 now, another thread doing cache_lookup:
                 picks the vnode
                 vtryget succeed
                 vn_lock succeed
         now in vclean:
         set VI_XLOCK (too late to be noticed by the competing thread)
         wait on the vnode lock (this might violate locking order)
 the use of a flag bit was suggested by Andrew Doran.  PR/41374.


 To generate a diff of this commit:
 cvs rdiff -u -r1.357.4.4 -r1.357.4.4.2.1 src/sys/kern/vfs_subr.c
 cvs rdiff -u -r1.197 -r1.197.10.1 src/sys/sys/vnode.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/41374 CVS commit: [netbsd-5] src/sys
Date: Tue, 21 Jul 2009 00:31:58 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Tue Jul 21 00:31:58 UTC 2009

 Modified Files:
 	src/sys/kern [netbsd-5]: vfs_subr.c
 	src/sys/sys [netbsd-5]: vnode.h

 Log Message:
 Pull up following revision(s) (requested by rmind in ticket #863):
 	sys/sys/vnode.h: revision 1.207
 	sys/kern/vfs_subr.c: revision 1.379
 put a flag bit into v_usecount to prevent vtryget during getcleanvnode.
 this fixes the following deadlock.
         a thread doing getcleanvnode:
         pick a vnode
         acqure v_interlock
         v_usecount++
         call vclean
                 now, another thread doing cache_lookup:
                 picks the vnode
                 vtryget succeed
                 vn_lock succeed
         now in vclean:
         set VI_XLOCK (too late to be noticed by the competing thread)
         wait on the vnode lock (this might violate locking order)
 the use of a flag bit was suggested by Andrew Doran.  PR/41374.


 To generate a diff of this commit:
 cvs rdiff -u -r1.357.4.4 -r1.357.4.5 src/sys/kern/vfs_subr.c
 cvs rdiff -u -r1.197 -r1.197.4.1 src/sys/sys/vnode.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Tue, 21 Jul 2009 10:20:09 +0000
State-Changed-Why:
It was fixed, <ad> confirms.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.