NetBSD Problem Report #40525

From dholland@netbsd.org  Sat Jan 31 08:29:35 2009
Return-Path: <dholland@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id CF36163B879
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 31 Jan 2009 08:29:35 +0000 (UTC)
Message-Id: <20090131082935.A8A1463B120@mail.netbsd.org>
Date: Sat, 31 Jan 2009 08:29:35 +0000 (UTC)
From: dholland@netbsd.org
Reply-To: dholland@netbsd.org
To: gnats-bugs@gnats.NetBSD.org
Subject: panic: ffs_valloc: dup alloc
X-Send-Pr-Version: 3.95

>Number:         40525
>Category:       kern
>Synopsis:       panic: ffs_valloc: dup alloc
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    dholland
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jan 31 08:30:00 +0000 2009
>Closed-Date:    Wed Mar 25 02:59:59 +0000 2009
>Last-Modified:  Wed Mar 25 02:59:59 +0000 2009
>Originator:     David A. Holland
>Release:        NetBSD 5.99.7 (20090130) and probably 5.0
>Organization:
>Environment:
various
>Description:

ffs (non-wapbl, non-softdep) panic allocating an already allocated inode.
See thread on current-users starting here:

   http://mail-index.netbsd.org/current-users/2009/01/10/msg007167.html

>How-To-Repeat:

>Fix:

Candidate analysis and fix:

   http://mail-index.netbsd.org/current-users/2009/01/26/msg007595.html

I'd like a second opinion before I commit it, in case I've overlooked
something.

>Release-Note:

>Audit-Trail:
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/40525: panic: ffs_valloc: dup alloc
Date: Sat, 31 Jan 2009 12:38:18 +0100

 On Sat, Jan 31, 2009 at 08:30:00AM +0000, dholland@netbsd.org wrote:
 > ffs (non-wapbl, non-softdep) panic allocating an already allocated inode.
 > See thread on current-users starting here:
 > 
 >    http://mail-index.netbsd.org/current-users/2009/01/10/msg007167.html
 > 
 > I'd like a second opinion before I commit it, in case I've overlooked
 > something.

 Looks good to me -- as long as it only happens on UFS2 file systems.

 -- 
 Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: David Holland <dholland-bugs@netbsd.org>
To: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	gnats-bugs@netbsd.org, netbsd-bugs@netbsd.org, dholland@netbsd.org
Subject: Re: kern/40525: panic: ffs_valloc: dup alloc
Date: Sat, 7 Feb 2009 21:21:15 +0000

 On Sat, Jan 31, 2009 at 11:40:04AM +0000, Juergen Hannken-Illjes wrote:
  > From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
  > To: gnats-bugs@netbsd.org
  > Cc: 
  > Subject: Re: kern/40525: panic: ffs_valloc: dup alloc
  > Date: Sat, 31 Jan 2009 12:38:18 +0100
  > 
  >  On Sat, Jan 31, 2009 at 08:30:00AM +0000, dholland@netbsd.org wrote:
  >  > ffs (non-wapbl, non-softdep) panic allocating an already allocated inode.
  >  > See thread on current-users starting here:
  >  > 
  >  >    http://mail-index.netbsd.org/current-users/2009/01/10/msg007167.html
  >  > 
  >  > I'd like a second opinion before I commit it, in case I've overlooked
  >  > something.
  >  
  >  Looks good to me -- as long as it only happens on UFS2 file systems.

 I don't see what's UFS2-specific about it?

 But it sounds like you agree that my patch fixes *a* bug, even if it's
 not the same bug reported on current-users. Right?

 -- 
 David A. Holland
 dholland@netbsd.org

From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
To: David Holland <dholland-bugs@netbsd.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/40525: panic: ffs_valloc: dup alloc
Date: Sun, 8 Feb 2009 10:45:29 +0100

 On Sat, Feb 07, 2009 at 09:21:15PM +0000, David Holland wrote:
 > 
 > I don't see what's UFS2-specific about it?

 UFS1 has preallocated inodes, on-demand allocation is a property of UFS2.

 > But it sounds like you agree that my patch fixes *a* bug, even if it's
 > not the same bug reported on current-users. Right?

 Please get a second OK then.

 -- 
 Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

Responsible-Changed-From-To: kern-bug-people->dholland
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Sun, 22 Feb 2009 05:53:50 +0000
Responsible-Changed-Why:
I am looking after this.


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/40525: panic: ffs_valloc: dup alloc
Date: Sun, 22 Feb 2009 05:50:43 +0000

 On Sun, Feb 08, 2009 at 10:45:29AM +0100, Juergen Hannken-Illjes wrote:
  > > But it sounds like you agree that my patch fixes *a* bug, even if it's
  > > not the same bug reported on current-users. Right?
  > 
  > Please get a second OK then.

 Discussion at some length with pooka and ad has produced the following
 conclusions:

   1. It's a real problem, and the proposed patch does fix it.
   2. BC_INVAL has to be removed from the corresponding brelse call
      below "fail:" too.
   3. We think the buffer cache will DTRT if the call to getblk returns
      a fresh buffer with no real data and we call brelse(bp, 0) on it
      (that is, don't explicitly invalidate it). I was worried about
      this, and still am a bit, and will look into it further later.
   4. We're Concerned(TM) about other uses of BC_INVAL in FS code.
      There are quite a few that look suspect.

 ad said this afternoon he was going to commit the patch, but hasn't
 yet; I will if he doesn't in the next day or two.

 This is the full patch.

 Index: ffs_alloc.c
 ===================================================================
 RCS file: /cvsroot/src/sys/ufs/ffs/ffs_alloc.c,v
 retrieving revision 1.120
 diff -u -p -r1.120 ffs_alloc.c
 --- ffs_alloc.c	11 Jan 2009 02:45:56 -0000	1.120
 +++ ffs_alloc.c	22 Feb 2009 05:49:33 -0000
 @@ -1284,7 +1284,7 @@ retry:
  	if (ibp != NULL &&
  	    initediblk != ufs_rw32(cgp->cg_initediblk, needswap)) {
  		/* Another thread allocated more inodes so we retry the test. */
 -		brelse(ibp, BC_INVAL);
 +		brelse(ibp, 0);
  		ibp = NULL;
  	}
  	/*
 @@ -1396,7 +1396,7 @@ gotit:
  	if (bp != NULL)
  		brelse(bp, 0);
  	if (ibp != NULL)
 -		brelse(ibp, BC_INVAL);
 +		brelse(ibp, 0);
  	mutex_enter(&ump->um_lock);
  	return (0);
  }


 -- 
 David A. Holland
 dholland@netbsd.org

State-Changed-From-To: open->feedback
State-Changed-By: ad@NetBSD.org
State-Changed-When: Sun, 22 Feb 2009 20:14:20 +0000
State-Changed-Why:
pr assigned to dh
let him close it if satisfied


From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/40525 CVS commit: src/sys
Date: Sun, 22 Feb 2009 20:10:25 +0000 (UTC)

 Module Name:	src
 Committed By:	ad
 Date:		Sun Feb 22 20:10:25 UTC 2009

 Modified Files:
 	src/sys/kern: vfs_wapbl.c
 	src/sys/miscfs/syncfs: sync_subr.c sync_vnops.c
 	src/sys/ufs/ffs: ffs_alloc.c ffs_vfsops.c ffs_vnops.c

 Log Message:
 PR kern/39564 wapbl performance issues with disk cache flushing
 PR kern/40361 WAPBL locking panic in -current
 PR kern/40361 WAPBL locking panic in -current
 PR kern/40470 WAPBL corrupts ext2fs
 PR kern/40562 busy loop in ffs_sync when unmounting a file system
 PR kern/40525 panic: ffs_valloc: dup alloc

 - A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
   buffers being invalidated. Problem discovered and patch by dholland@.

 - If the syncer fails to lazily sync a vnode due to lock contention,
   retry 1 second later instead of 30 seconds later.

 - Flush inode atime updates every ~10 seconds (this makes most sense with
   logging). Presently they didn't hit the disk for read-only files or
   devices until the file system was unmounted. It would be better to trickle
   the updates out but that would require more extensive changes.

 - Fix issues with file system corruption, busy looping and other nasty
   problems when logging and non-logging file systems are intermixed,
   with one being the root file system.

 - For logging, do not flush metadata on an inode-at-a-time basis if the sync
   has been requested by ioflush. Previously, we could try hundreds of log
   sync operations a second due to inode update activity, causing the syncer
   to fall behind and metadata updates to be serialized across the entire
   file system. Instead, burst out metadata and log flushes at a minimum
   interval of every 10 seconds on an active file system (happens more often
   if the log becomes full). Note this does not change the operation of
   fsync() etc.

 - With the flush issue fixed, re-enable concurrent metadata updates in
   vfs_wapbl.c.


 To generate a diff of this commit:
 cvs rdiff -r1.22 -r1.23 src/sys/kern/vfs_wapbl.c
 cvs rdiff -r1.35 -r1.36 src/sys/miscfs/syncfs/sync_subr.c
 cvs rdiff -r1.25 -r1.26 src/sys/miscfs/syncfs/sync_vnops.c
 cvs rdiff -r1.120 -r1.121 src/sys/ufs/ffs/ffs_alloc.c
 cvs rdiff -r1.241 -r1.242 src/sys/ufs/ffs/ffs_vfsops.c
 cvs rdiff -r1.109 -r1.110 src/sys/ufs/ffs/ffs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/40525 CVS commit: [netbsd-5] src/sys
Date: Tue, 24 Feb 2009 04:13:35 +0000 (UTC)

 Module Name:	src
 Committed By:	snj
 Date:		Tue Feb 24 04:13:35 UTC 2009

 Modified Files:
 	src/sys/kern [netbsd-5]: vfs_wapbl.c
 	src/sys/miscfs/syncfs [netbsd-5]: sync_subr.c sync_vnops.c
 	src/sys/ufs/ffs [netbsd-5]: ffs_alloc.c ffs_vfsops.c ffs_vnops.c

 Log Message:
 Pull up following revision(s) (requested by ad in ticket #490):
 	sys/kern/vfs_wapbl.c: revision 1.23
 	sys/miscfs/syncfs/sync_subr.c: revision 1.36
 	sys/miscfs/syncfs/sync_vnops.c: revision 1.26
 	sys/ufs/ffs/ffs_alloc.c: revision 1.121
 	sys/ufs/ffs/ffs_vfsops.c: revision 1.242
 	sys/ufs/ffs/ffs_vnops.c: revision 1.110
 PR kern/39564 wapbl performance issues with disk cache flushing
 PR kern/40361 WAPBL locking panic in -current
 PR kern/40361 WAPBL locking panic in -current
 PR kern/40470 WAPBL corrupts ext2fs
 PR kern/40562 busy loop in ffs_sync when unmounting a file system
 PR kern/40525 panic: ffs_valloc: dup alloc
 - A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
   buffers being invalidated. Problem discovered and patch by dholland@.
 - If the syncer fails to lazily sync a vnode due to lock contention,
   retry 1 second later instead of 30 seconds later.
 - Flush inode atime updates every ~10 seconds (this makes most sense with
   logging). Presently they didn't hit the disk for read-only files or
   devices until the file system was unmounted. It would be better to trickle
   the updates out but that would require more extensive changes.
 - Fix issues with file system corruption, busy looping and other nasty
   problems when logging and non-logging file systems are intermixed,
   with one being the root file system.
 - For logging, do not flush metadata on an inode-at-a-time basis if the sync
   has been requested by ioflush. Previously, we could try hundreds of log
   sync operations a second due to inode update activity, causing the syncer
   to fall behind and metadata updates to be serialized across the entire
   file system. Instead, burst out metadata and log flushes at a minimum
   interval of every 10 seconds on an active file system (happens more often
   if the log becomes full). Note this does not change the operation of
   fsync() etc.
 - With the flush issue fixed, re-enable concurrent metadata updates in
   vfs_wapbl.c.


 To generate a diff of this commit:
 cvs rdiff -r1.3 -r1.3.8.1 src/sys/kern/vfs_wapbl.c
 cvs rdiff -r1.34 -r1.34.20.1 src/sys/miscfs/syncfs/sync_subr.c
 cvs rdiff -r1.25 -r1.25.10.1 src/sys/miscfs/syncfs/sync_vnops.c
 cvs rdiff -r1.113 -r1.113.4.1 src/sys/ufs/ffs/ffs_alloc.c
 cvs rdiff -r1.239 -r1.239.2.1 src/sys/ufs/ffs/ffs_vfsops.c
 cvs rdiff -r1.104.4.5 -r1.104.4.6 src/sys/ufs/ffs/ffs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Wed, 25 Mar 2009 02:59:59 +0000
State-Changed-Why:
The problem I found by reading the source is fixed. Other stuff may still
be lurking of course...


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.