NetBSD Problem Report #10731

Received: (qmail 1529 invoked from network); 1 Aug 2000 10:52:06 -0000
Message-Id: <200008011046.e71Akud00464@duplo.sm.sony.co.jp>
Date: Tue, 1 Aug 2000 06:46:56 -0400 (EDT)
From: Atsushi Onoe <onoe@sm.sony.co.jp>
Reply-To: onoe@sm.sony.co.jp
To: gnats-bugs@gnats.netbsd.org
Subject: writing to ffs on vnd can hang all processes
X-Send-Pr-Version: 3.95

>Number:         10731
>Category:       kern
>Synopsis:       writing to ffs on vnd can hang all processes
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          analyzed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Aug 01 10:53:01 +0000 2000
>Closed-Date:    
>Last-Modified:  Mon May 30 05:30:01 +0000 2016
>Originator:     Atsushi Onoe
>Release:        NetBSD-current 20000728
>Organization:
	Sony Corporation
>Environment:
System: NetBSD duplo.sm.sony.co.jp 1.5C NetBSD 1.5C (DUPLO) #21: Mon Jul 31 03:38:37 EDT 2000 onoe@duplo.sm.sony.co.jp:/work/netbsd/obj/DUPLO i386

>Description:
	writing many files to ffs on vnd eventually need to write block
	to the files where vnd resides on.  In that circumstance, getblk()
	in kern/vfs_bio.c is called.  The function set B_BUSY of some
	blocks within the file where vnd resides on, and then it calls
	allocbuf().

	It is possible to allocbuf() find the bp with DELWRI bit set, which
	is also a block of the ffs on vnd.  This needs to write the same file
	mounted via vnd.  Since it is already marked as BUSY, it enters a
	dead lock waiting for the BUSY bit cleared in getblk().

	In this state, pagedaemon() will be blocked, and all processes
	can hang eventually.

	Below is a trace back written by hand.
		getblk()
		ufs_bmaparray()
		ufs_bmap()
		vndstrategy()
		spec_strategy()
		ufs_strategy()
		bwrite()
		vn_bwrite()
		bawrite()
		getnewbuf()
		allocbuf()
		getblk()
		ufs_bmaparray()
		ufs_bmap()
		vndstrategy()
		spec_strategy()
		ufs_strategy()
		bwrite()
		vn_bwrite()
		bawrite()
		getnewbuf()
		getblk()
		ffs_balloc()
		ffs_write()
		vn_write()
		dofilewrite()
		sys_write()
		syscall()

>How-To-Repeat:
	Though it is not required whole of this, but what I did is:

	/work is mounted ffs with softupdate.

	# dd if=/dev/zero of=/work/fs bs=1024k count=1536	(yes, 1.5GB)
	# vnconfig vnd0 /work/fs
	# newfs /dev/rvnd0a
	# mount -o async /dev/vnd0a /mnt
				-o async is not required to reproduce the
				problem, but the time is shorter.
	# rsync -apv /cvsroot /mnt

>Fix:
	Not provided.

	# mount -o sync /dev/vnd0a /mnt
	may be effective to avoid the problem.
>Release-Note:
>Audit-Trail:

From: itojun@iijlab.net
To: onoe@sm.sony.co.jp
Cc: millert@openbsd.org
Subject: Re: kern/10731: writing to ffs on vnd can hang all processes
Date: Wed, 02 Aug 2000 00:48:00 +0900

 >>Synopsis:       writing to ffs on vnd can hang all processes
 >>Description:
 >	writing many files to ffs on vnd eventually need to write block
 >	to the files where vnd resides on.  In that circumstance, getblk()
 >	in kern/vfs_bio.c is called.  The function set B_BUSY of some
 >	blocks within the file where vnd resides on, and then it calls
 >	allocbuf().

 	millert@openbsd.org and couple of folks did some analysis on this
 	problem earlier this year.  what I have heard from him was that, if we
 	need a write to the same block via vnd, and via native filesystem, that
 	is just like writing to raw device and block device at the same time.
 	not sure if the claim is totally valid or not, but it made sense to
 	me.  we should fix it, but i'm not sure how (or should we require
 	no writes to lower layer?)

 itojun

From: Atsushi Onoe <onoe@sm.sony.co.jp>
To: itojun@iijlab.net
Cc: millert@openbsd.org, gnats-bugs@gnats.netbsd.org, tech-kern@netbsd.org
Subject: Re: kern/10731: writing to ffs on vnd can hang all processes
Date: Tue, 1 Aug 2000 13:52:34 -0400 (EDT)

 > >>Synopsis:       writing to ffs on vnd can hang all processes
 > >>Description:
 > >	writing many files to ffs on vnd eventually need to write block
 > >	to the files where vnd resides on.  In that circumstance, getblk()
 > >	in kern/vfs_bio.c is called.  The function set B_BUSY of some
 > >	blocks within the file where vnd resides on, and then it calls
 > >	allocbuf().
 > 
 > 	millert@openbsd.org and couple of folks did some analysis on this
 > 	problem earlier this year.  what I have heard from him was that, if we
 > 	need a write to the same block via vnd, and via native filesystem, that
 > 	is just like writing to raw device and block device at the same time.
 > 	not sure if the claim is totally valid or not, but it made sense to
 > 	me.  we should fix it, but i'm not sure how (or should we require
 > 	no writes to lower layer?)

 Hmm, it might not be the case.  I don't need to write to native file system
 at that point, but kernel need to write just to make a free buffer.

 Though none of these looks right, possible solutions are:

 	- make sure no dirty block on ffs on vnd left before the any
 	  write attempt of blocks within the file mounted via vnd.
 	  (e.g. with sync mount option)

 	- gather all dirty block on ffs on vnd at flush.

 	- try keep a couple of free blocks anytime instead of write and
 	  free on demand.
 	  (still possible deadlock)

 Atsushi Onoe


From: Chuck Silvers <chuq@chuq.com>
To: Atsushi Onoe <onoe@sm.sony.co.jp>
Cc: itojun@iijlab.net, gnats-bugs@gnats.netbsd.org
Subject: Re: kern/10731: writing to ffs on vnd can hang all processes
Date: Sun, 18 Feb 2001 21:40:17 -0800

 see also PR 12189.

 -Chuck
State-Changed-From-To: open->closed
State-Changed-By: christos@netbsd.org
State-Changed-When: Sun, 27 Aug 2006 14:46:15 -0400
State-Changed-Why:
fixed, thanks


From: Christos Zoulas <christos@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: PR/10731 CVS commit: src/sys/dev
Date: Sun, 27 Aug 2006 18:45:20 +0000 (UTC)

 Module Name:	src
 Committed By:	christos
 Date:		Sun Aug 27 18:45:20 UTC 2006

 Modified Files:
 	src/sys/dev: vnd.c

 Log Message:
 PR/34293: Michael van Elst: vnd deadlocks on I/O buffers
 Also fixes: PR/10731, PR/12189, PR/20296
 Sleep while there a buffer shortage.


 To generate a diff of this commit:
 cvs rdiff -r1.148 -r1.149 src/sys/dev/vnd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: closed->open
State-Changed-By: yamt@netbsd.org
State-Changed-When: Mon, 28 Aug 2006 05:09:58 +0000
State-Changed-Why:
not really fixed.  ok'ed by christos.


From: Manuel Bouyer <bouyer@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: PR/10731 CVS commit: src/sys/dev
Date: Sun,  3 Sep 2006 19:49:34 +0000 (UTC)

 Module Name:	src
 Committed By:	bouyer
 Date:		Sun Sep  3 19:49:34 UTC 2006

 Modified Files:
 	src/sys/dev: vnd.c

 Log Message:
 Back out rev 1.149.
 From various discussion about vndstrategy (see
 http://mail-index.netbsd.org/tech-kern/2005/03/29/0034.html
 http://mail-index.netbsd.org/tech-kern/2005/03/23/0015.html)
 it's not correct to tsleep() in a strategy routine, which may be called from
 interrupt context.
 Unfortunably this reopens PR/10731, PR/12189, PR/20296, PR/34293

 As for what the correct fix it, this needs to be analysed deeper. I suspect
 throttling the caller in vnd only hides the problem; the same caller writing
 to some other device could exaust all buffers as well. If this driver doesn't
 need to allocate buffer this won't cause a deadlock, but it's bad for
 performances on systems with e.g. multiple drives. Also, others stacked
 block device drivers may also have this issue.


 To generate a diff of this commit:
 cvs rdiff -r1.149 -r1.150 src/sys/dev/vnd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->analyzed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 30 May 2016 05:24:21 +0000
State-Changed-Why:
the problem is understood. and also not trivial...


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/10731 (writing to ffs on vnd can hang all processes)
Date: Mon, 30 May 2016 05:27:37 +0000

 On Mon, May 30, 2016 at 05:24:21AM +0000, dholland@NetBSD.org wrote:
  > Synopsis: writing to ffs on vnd can hang all processes
  >
  > the problem is understood. and also not trivial...

 Although it occurs to me that the problem is probably different
 (though not likely any better) in the wonderful world of
 genfs_putpages.

 -- 
 David A. Holland
 dholland@netbsd.org

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.