NetBSD Problem Report #40389
From www@NetBSD.org Tue Jan 13 17:24:58 2009
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 7FF4363BF57
for <gnats-bugs@gnats.netbsd.org>; Tue, 13 Jan 2009 17:24:58 +0000 (UTC)
Message-Id: <20090113172458.200B263B884@narn.NetBSD.org>
Date: Tue, 13 Jan 2009 17:24:58 +0000 (UTC)
From: pooka@iki.fi
Reply-To: pooka@iki.fi
To: gnats-bugs@NetBSD.org
Subject: page busy vs. glock deadlock
X-Send-Pr-Version: www-1.0
>Number: 40389
>Category: kern
>Synopsis: page busy vs. glock deadlock
>Confidential: no
>Severity: serious
>Priority: low
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jan 13 17:25:00 +0000 2009
>Closed-Date: Wed Sep 01 17:10:12 +0000 2010
>Last-Modified: Tue Sep 07 19:35:04 +0000 2010
>Originator: Antti Kantee
>Release:
>Organization:
>Environment:
>Description:
Simon reported a problem where he's writing to a file (log of build -j4)
and repeatedly doing tail -5000. He says that this causes hangs sooner
or later.
Upon examination, it seems that writing to the log goes through
ufs_balloc_range(), which busies the new pages it wants to enlarge the
file to and then takes the genfs node lock to do actual block allocation.
Meanwhile, tail coming in through mmap + genfs_getpages and holding the
genfs lock tries to uvn_findpages.
==> deadlock between PG_BUSY and glock
This can happen because the file is already large enough to contain
partially valid data in the "new" page that ufs_balloc_range() is trying
to enlarge to i.e. old file size is not a multiple of page size.
from simon:
I can reproduce a case where nbmake has it's output redirected to a
file, and gets blocked in "tstile" with this backtrace (10 finger
cut'n'paste):
sleepq_block
turnstile_block
rw_vector_enter
genfs_node_wrlock
ufs_balloc_range
ffs_write
VOP_WRITE
vn_write
dofilewrite
sys_write
syscall
and tail is blocked in "uvn_fp2" with this backtrace:
sleepq_block
mtsleep
uvn_findpage
uvn_findpages
genfs_getpages
VOP_GETPAGES
uvn_get
uvm_fault_internal
trap
>How-To-Repeat:
build -j4 and constantly do tail -5000 in a loop
(and maybe be Simon)
>Fix:
>Release-Note:
>Audit-Trail:
From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@netbsd.org
Cc: simonb@netbsd.org
Subject: Re: PR kern/40389
Date: Mon, 19 Jan 2009 15:56:43 +0200
Since this seems to be a larger problem, does the following quick
hack help?
notes: not tested with a real kernel, just rump (does not duplicate
all uvm features AND problem had to be simulated). plus, it's been a
while since I've been mucking around this code, so I don't remember all
the details. But maybe it helps ;)
Index: ufs_inode.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_inode.c,v
retrieving revision 1.76
diff -p -u -r1.76 ufs_inode.c
--- ufs_inode.c 31 Jul 2008 05:38:06 -0000 1.76
+++ ufs_inode.c 19 Jan 2009 13:52:11 -0000
@@ -267,6 +267,15 @@ ufs_balloc_range(struct vnode *vp, off_t
pgssize = npages * sizeof(struct vm_page *);
pgs = kmem_zalloc(pgssize, KM_SLEEP);
+ /*
+ * adjust off to be block-aligned.
+ */
+
+ delta = off & (bsize - 1);
+ off -= delta;
+ len += delta;
+
+ retry:
mutex_enter(&uobj->vmobjlock);
error = VOP_GETPAGES(vp, pagestart, pgs, &npages, 0,
VM_PROT_WRITE, 0,
@@ -286,18 +295,28 @@ ufs_balloc_range(struct vnode *vp, off_t
mutex_exit(&uobj->vmobjlock);
/*
- * adjust off to be block-aligned.
+ * now allocate the range.
*/
- delta = off & (bsize - 1);
- off -= delta;
- len += delta;
-
/*
- * now allocate the range.
+ * XXX: hack around deadlock with pagebusy and genfs node lock
*/
+ {
+ struct genfs_node *gp = VTOG(vp); /* XXX */
+
+ if (!rw_tryenter(&gp->g_glock, RW_WRITER)) {
+ mutex_enter(&uobj->vmobjlock);
+ for (i = 0; i < npages; i++)
+ pgs[i]->flags |= PG_RELEASED | PG_CLEAN;
+ mutex_enter(&uvm_pageqlock);
+ uvm_page_unbusy(pgs, npages);
+ mutex_exit(&uvm_pageqlock);
+ mutex_exit(&uobj->vmobjlock);
+ kpause("uballo", false, 1, NULL);
+ printf("ufs_balloc_range: potential deadlock, retrying\n");
+ goto retry;
+ }}
- genfs_node_wrlock(vp);
error = GOP_ALLOC(vp, off, len, flags, cred);
genfs_node_unlock(vp);
From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@netbsd.org
Cc: simonb@netbsd.org
Subject: Re: PR kern/40389
Date: Mon, 19 Jan 2009 16:02:10 +0200
On Mon Jan 19 2009 at 15:56:43 +0200, Antti Kantee wrote:
> Since this seems to be a larger problem, does the following quick
> hack help?
Oh, and does the printf show up often?
State-Changed-From-To: open->analyzed
State-Changed-By: pooka@NetBSD.org
State-Changed-When: Mon, 19 Jan 2009 20:32:47 +0200
State-Changed-Why:
analyzed already when posted
From: Antti Kantee <pooka@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40389 CVS commit: src/sys/ufs/ufs
Date: Wed, 4 Feb 2009 21:07:19 +0000 (UTC)
Module Name: src
Committed By: pooka
Date: Wed Feb 4 21:07:19 UTC 2009
Modified Files:
src/sys/ufs/ufs: ufs_inode.c
Log Message:
Break hold-and-wait which happens in ufs_balloc_range() when we
have pages busied and are trying to get the genfs node lock.
This causes a lock order reversal described in PR kern/40389.
This is not a proper fix and only a workaround for NetBSD 5.0.
problem first reported by simonb, patch tested by rmind
To generate a diff of this commit:
cvs rdiff -r1.76 -r1.77 src/sys/ufs/ufs/ufs_inode.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40389 CVS commit: [netbsd-5] src/sys/ufs/ufs
Date: Sun, 8 Feb 2009 19:08:23 +0000 (UTC)
Module Name: src
Committed By: snj
Date: Sun Feb 8 19:08:23 UTC 2009
Modified Files:
src/sys/ufs/ufs [netbsd-5]: ufs_inode.c
Log Message:
Pull up following revision(s) (requested by pooka in ticket #413):
sys/ufs/ufs/ufs_inode.c: revision 1.77
Break hold-and-wait which happens in ufs_balloc_range() when we
have pages busied and are trying to get the genfs node lock.
This causes a lock order reversal described in PR kern/40389.
This is not a proper fix and only a workaround for NetBSD 5.0.
problem first reported by simonb, patch tested by rmind
To generate a diff of this commit:
cvs rdiff -r1.76 -r1.76.4.1 src/sys/ufs/ufs/ufs_inode.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Chuck Silvers <chs@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40389 CVS commit: src/sys
Date: Wed, 1 Sep 2010 16:56:20 +0000
Module Name: src
Committed By: chs
Date: Wed Sep 1 16:56:20 UTC 2010
Modified Files:
src/sys/miscfs/genfs: genfs_io.c genfs_node.h genfs_vnops.c
src/sys/ufs/ufs: ufs_inode.c
src/sys/uvm: uvm_pager.h
Log Message:
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
To generate a diff of this commit:
cvs rdiff -u -r1.39 -r1.40 src/sys/miscfs/genfs/genfs_io.c
cvs rdiff -u -r1.19 -r1.20 src/sys/miscfs/genfs/genfs_node.h
cvs rdiff -u -r1.182 -r1.183 src/sys/miscfs/genfs/genfs_vnops.c
cvs rdiff -u -r1.82 -r1.83 src/sys/ufs/ufs/ufs_inode.c
cvs rdiff -u -r1.38 -r1.39 src/sys/uvm/uvm_pager.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: analyzed->closed
State-Changed-By: chs@NetBSD.org
State-Changed-When: Wed, 01 Sep 2010 17:10:12 +0000
State-Changed-Why:
fixed now.
From: Manuel Bouyer <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40389 CVS commit: [netbsd-5] src/sys
Date: Tue, 7 Sep 2010 19:33:35 +0000
Module Name: src
Committed By: bouyer
Date: Tue Sep 7 19:33:35 UTC 2010
Modified Files:
src/sys/miscfs/genfs [netbsd-5]: genfs_io.c genfs_node.h genfs_vnops.c
src/sys/ufs/ufs [netbsd-5]: ufs_inode.c
src/sys/uvm [netbsd-5]: uvm_pager.h
Log Message:
Pull up following revision(s) (requested by chs in ticket #1448):
sys/uvm/uvm_pager.h: revision 1.39 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.183 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.83 via patch
sys/miscfs/genfs/genfs_io.c: revision 1.40 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.20 via patch
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
To generate a diff of this commit:
cvs rdiff -u -r1.13.4.2 -r1.13.4.3 src/sys/miscfs/genfs/genfs_io.c
cvs rdiff -u -r1.17 -r1.17.8.1 src/sys/miscfs/genfs/genfs_node.h
cvs rdiff -u -r1.167 -r1.167.10.1 src/sys/miscfs/genfs/genfs_vnops.c
cvs rdiff -u -r1.76.4.3 -r1.76.4.4 src/sys/ufs/ufs/ufs_inode.c
cvs rdiff -u -r1.38 -r1.38.4.1 src/sys/uvm/uvm_pager.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Manuel Bouyer <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40389 CVS commit: [netbsd-5-0] src/sys
Date: Tue, 7 Sep 2010 19:33:44 +0000
Module Name: src
Committed By: bouyer
Date: Tue Sep 7 19:33:44 UTC 2010
Modified Files:
src/sys/miscfs/genfs [netbsd-5-0]: genfs_io.c genfs_node.h
genfs_vnops.c
src/sys/ufs/ufs [netbsd-5-0]: ufs_inode.c
src/sys/uvm [netbsd-5-0]: uvm_pager.h
Log Message:
Pull up following revision(s) (requested by chs in ticket #1448):
sys/uvm/uvm_pager.h: revision 1.39 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.183 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.83 via patch
sys/miscfs/genfs/genfs_io.c: revision 1.40 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.20 via patch
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
To generate a diff of this commit:
cvs rdiff -u -r1.13.4.2 -r1.13.4.2.2.1 src/sys/miscfs/genfs/genfs_io.c
cvs rdiff -u -r1.17 -r1.17.14.1 src/sys/miscfs/genfs/genfs_node.h
cvs rdiff -u -r1.167 -r1.167.16.1 src/sys/miscfs/genfs/genfs_vnops.c
cvs rdiff -u -r1.76.4.1 -r1.76.4.1.2.1 src/sys/ufs/ufs/ufs_inode.c
cvs rdiff -u -r1.38 -r1.38.10.1 src/sys/uvm/uvm_pager.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.