NetBSD Problem Report #40525
From dholland@netbsd.org Sat Jan 31 08:29:35 2009
Return-Path: <dholland@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id CF36163B879
for <gnats-bugs@gnats.NetBSD.org>; Sat, 31 Jan 2009 08:29:35 +0000 (UTC)
Message-Id: <20090131082935.A8A1463B120@mail.netbsd.org>
Date: Sat, 31 Jan 2009 08:29:35 +0000 (UTC)
From: dholland@netbsd.org
Reply-To: dholland@netbsd.org
To: gnats-bugs@gnats.NetBSD.org
Subject: panic: ffs_valloc: dup alloc
X-Send-Pr-Version: 3.95
>Number: 40525
>Category: kern
>Synopsis: panic: ffs_valloc: dup alloc
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: dholland
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jan 31 08:30:00 +0000 2009
>Closed-Date: Wed Mar 25 02:59:59 +0000 2009
>Last-Modified: Wed Mar 25 02:59:59 +0000 2009
>Originator: David A. Holland
>Release: NetBSD 5.99.7 (20090130) and probably 5.0
>Organization:
>Environment:
various
>Description:
ffs (non-wapbl, non-softdep) panic allocating an already allocated inode.
See thread on current-users starting here:
http://mail-index.netbsd.org/current-users/2009/01/10/msg007167.html
>How-To-Repeat:
>Fix:
Candidate analysis and fix:
http://mail-index.netbsd.org/current-users/2009/01/26/msg007595.html
I'd like a second opinion before I commit it, in case I've overlooked
something.
>Release-Note:
>Audit-Trail:
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/40525: panic: ffs_valloc: dup alloc
Date: Sat, 31 Jan 2009 12:38:18 +0100
On Sat, Jan 31, 2009 at 08:30:00AM +0000, dholland@netbsd.org wrote:
> ffs (non-wapbl, non-softdep) panic allocating an already allocated inode.
> See thread on current-users starting here:
>
> http://mail-index.netbsd.org/current-users/2009/01/10/msg007167.html
>
> I'd like a second opinion before I commit it, in case I've overlooked
> something.
Looks good to me -- as long as it only happens on UFS2 file systems.
--
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
From: David Holland <dholland-bugs@netbsd.org>
To: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
gnats-bugs@netbsd.org, netbsd-bugs@netbsd.org, dholland@netbsd.org
Subject: Re: kern/40525: panic: ffs_valloc: dup alloc
Date: Sat, 7 Feb 2009 21:21:15 +0000
On Sat, Jan 31, 2009 at 11:40:04AM +0000, Juergen Hannken-Illjes wrote:
> From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: kern/40525: panic: ffs_valloc: dup alloc
> Date: Sat, 31 Jan 2009 12:38:18 +0100
>
> On Sat, Jan 31, 2009 at 08:30:00AM +0000, dholland@netbsd.org wrote:
> > ffs (non-wapbl, non-softdep) panic allocating an already allocated inode.
> > See thread on current-users starting here:
> >
> > http://mail-index.netbsd.org/current-users/2009/01/10/msg007167.html
> >
> > I'd like a second opinion before I commit it, in case I've overlooked
> > something.
>
> Looks good to me -- as long as it only happens on UFS2 file systems.
I don't see what's UFS2-specific about it?
But it sounds like you agree that my patch fixes *a* bug, even if it's
not the same bug reported on current-users. Right?
--
David A. Holland
dholland@netbsd.org
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
To: David Holland <dholland-bugs@netbsd.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/40525: panic: ffs_valloc: dup alloc
Date: Sun, 8 Feb 2009 10:45:29 +0100
On Sat, Feb 07, 2009 at 09:21:15PM +0000, David Holland wrote:
>
> I don't see what's UFS2-specific about it?
UFS1 has preallocated inodes, on-demand allocation is a property of UFS2.
> But it sounds like you agree that my patch fixes *a* bug, even if it's
> not the same bug reported on current-users. Right?
Please get a second OK then.
--
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Responsible-Changed-From-To: kern-bug-people->dholland
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Sun, 22 Feb 2009 05:53:50 +0000
Responsible-Changed-Why:
I am looking after this.
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/40525: panic: ffs_valloc: dup alloc
Date: Sun, 22 Feb 2009 05:50:43 +0000
On Sun, Feb 08, 2009 at 10:45:29AM +0100, Juergen Hannken-Illjes wrote:
> > But it sounds like you agree that my patch fixes *a* bug, even if it's
> > not the same bug reported on current-users. Right?
>
> Please get a second OK then.
Discussion at some length with pooka and ad has produced the following
conclusions:
1. It's a real problem, and the proposed patch does fix it.
2. BC_INVAL has to be removed from the corresponding brelse call
below "fail:" too.
3. We think the buffer cache will DTRT if the call to getblk returns
a fresh buffer with no real data and we call brelse(bp, 0) on it
(that is, don't explicitly invalidate it). I was worried about
this, and still am a bit, and will look into it further later.
4. We're Concerned(TM) about other uses of BC_INVAL in FS code.
There are quite a few that look suspect.
ad said this afternoon he was going to commit the patch, but hasn't
yet; I will if he doesn't in the next day or two.
This is the full patch.
Index: ffs_alloc.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_alloc.c,v
retrieving revision 1.120
diff -u -p -r1.120 ffs_alloc.c
--- ffs_alloc.c 11 Jan 2009 02:45:56 -0000 1.120
+++ ffs_alloc.c 22 Feb 2009 05:49:33 -0000
@@ -1284,7 +1284,7 @@ retry:
if (ibp != NULL &&
initediblk != ufs_rw32(cgp->cg_initediblk, needswap)) {
/* Another thread allocated more inodes so we retry the test. */
- brelse(ibp, BC_INVAL);
+ brelse(ibp, 0);
ibp = NULL;
}
/*
@@ -1396,7 +1396,7 @@ gotit:
if (bp != NULL)
brelse(bp, 0);
if (ibp != NULL)
- brelse(ibp, BC_INVAL);
+ brelse(ibp, 0);
mutex_enter(&ump->um_lock);
return (0);
}
--
David A. Holland
dholland@netbsd.org
State-Changed-From-To: open->feedback
State-Changed-By: ad@NetBSD.org
State-Changed-When: Sun, 22 Feb 2009 20:14:20 +0000
State-Changed-Why:
pr assigned to dh
let him close it if satisfied
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40525 CVS commit: src/sys
Date: Sun, 22 Feb 2009 20:10:25 +0000 (UTC)
Module Name: src
Committed By: ad
Date: Sun Feb 22 20:10:25 UTC 2009
Modified Files:
src/sys/kern: vfs_wapbl.c
src/sys/miscfs/syncfs: sync_subr.c sync_vnops.c
src/sys/ufs/ffs: ffs_alloc.c ffs_vfsops.c ffs_vnops.c
Log Message:
PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
To generate a diff of this commit:
cvs rdiff -r1.22 -r1.23 src/sys/kern/vfs_wapbl.c
cvs rdiff -r1.35 -r1.36 src/sys/miscfs/syncfs/sync_subr.c
cvs rdiff -r1.25 -r1.26 src/sys/miscfs/syncfs/sync_vnops.c
cvs rdiff -r1.120 -r1.121 src/sys/ufs/ffs/ffs_alloc.c
cvs rdiff -r1.241 -r1.242 src/sys/ufs/ffs/ffs_vfsops.c
cvs rdiff -r1.109 -r1.110 src/sys/ufs/ffs/ffs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40525 CVS commit: [netbsd-5] src/sys
Date: Tue, 24 Feb 2009 04:13:35 +0000 (UTC)
Module Name: src
Committed By: snj
Date: Tue Feb 24 04:13:35 UTC 2009
Modified Files:
src/sys/kern [netbsd-5]: vfs_wapbl.c
src/sys/miscfs/syncfs [netbsd-5]: sync_subr.c sync_vnops.c
src/sys/ufs/ffs [netbsd-5]: ffs_alloc.c ffs_vfsops.c ffs_vnops.c
Log Message:
Pull up following revision(s) (requested by ad in ticket #490):
sys/kern/vfs_wapbl.c: revision 1.23
sys/miscfs/syncfs/sync_subr.c: revision 1.36
sys/miscfs/syncfs/sync_vnops.c: revision 1.26
sys/ufs/ffs/ffs_alloc.c: revision 1.121
sys/ufs/ffs/ffs_vfsops.c: revision 1.242
sys/ufs/ffs/ffs_vnops.c: revision 1.110
PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
To generate a diff of this commit:
cvs rdiff -r1.3 -r1.3.8.1 src/sys/kern/vfs_wapbl.c
cvs rdiff -r1.34 -r1.34.20.1 src/sys/miscfs/syncfs/sync_subr.c
cvs rdiff -r1.25 -r1.25.10.1 src/sys/miscfs/syncfs/sync_vnops.c
cvs rdiff -r1.113 -r1.113.4.1 src/sys/ufs/ffs/ffs_alloc.c
cvs rdiff -r1.239 -r1.239.2.1 src/sys/ufs/ffs/ffs_vfsops.c
cvs rdiff -r1.104.4.5 -r1.104.4.6 src/sys/ufs/ffs/ffs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Wed, 25 Mar 2009 02:59:59 +0000
State-Changed-Why:
The problem I found by reading the source is fixed. Other stuff may still
be lurking of course...
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.