NetBSD Problem Report #43336

From hf@spg.tu-darmstadt.de  Fri May 21 16:00:49 2010
Return-Path: <hf@spg.tu-darmstadt.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 7C25463B873
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 21 May 2010 16:00:49 +0000 (UTC)
Message-Id: <201005211551.o4LFpvpm001308@Gstoder.nt.e-technik.tu-darmstadt.de>
Date: Fri, 21 May 2010 17:51:57 +0200 (CEST)
From: Hauke Fath <hf@spg.tu-darmstadt.de>
Reply-To: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@gnats.NetBSD.org
Cc: Hauke Fath <hf@spg.tu-darmstadt.de>
Subject: fsck wapbl assertion failure after crash
X-Send-Pr-Version: 3.95

>Number:         43336
>Category:       bin
>Synopsis:       fsck wapbl assertion failure after crash
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 21 16:05:00 +0000 2010
>Closed-Date:    Mon Sep 27 16:26:55 +0000 2010
>Last-Modified:  Mon Sep 27 16:26:55 +0000 2010
>Originator:     Hauke Fath
>Release:        NetBSD 5.0_STABLE
>Organization:
-- 
     The ASCII Ribbon Campaign                    Hauke Fath
()     No HTML/RTF in email	        Institut für Nachrichtentechnik
/\     No Word docs in email                     TU Darmstadt
     Respect for open standards              Ruf +49-6151-16-3281
>Environment:


System: NetBSD Gstoder 5.0_STABLE NetBSD 5.0_STABLE (P4W) #0: Mon Mar 8 18:05:33 CET 2010 hf@Hochstuhl:/var/obj/netbsd-builds/5/i386/sys/arch/i386/compile/P4W i386
Architecture: i386
Machine: i386
>Description:

	During reboot from a kernel panic, my netbsd-5 system dropped
	to single-user, fsck(8) died with the following wapbl
	assertion failure:

<http://www.spg.tu-darmstadt.de/~hf/netbsd/netbsd-5-wapbl.jpg>

	I've seen this problem three times so far. The only way out
	appears to be a 'tunefs -l 0 /dev/rwd0[aefg...]' followed by
	'mount -a' and 'shutdown -r now'.

>How-To-Repeat:

	On a netbsd-5 machine with file-systems mount(8)ed '-o log'
	run into a kernel panic that dumps core and reboots. Find that
	fsck(8) will not.

>Fix:
	For a workaround, see above.

	Manuel Bouyer suggested changing the assertion into a warning,
	so that fsck can clean up the log.

>Release-Note:

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/43336: fsck wapbl assertion failure after crash
Date: Thu, 27 May 2010 07:24:01 +0000

 On Fri, May 21, 2010 at 04:05:00PM +0000, Hauke Fath wrote:
  > <http://www.spg.tu-darmstadt.de/~hf/netbsd/netbsd-5-wapbl.jpg>

 For the record, the transcript is:

    # fsck -f
    ** /dev/rwd0a
    assertion "wr->wr_blkhashcnt > 0" failed: file "/public/netbsd-5/sys/kern/vfs_wapbl.c", line 2156, function "wapbl_blkhash_clear"
    fsck: /dev/rwd0a: Abort trap
    # 

 Examining the code I don't see any way this can happen (besides memory
 corruption) except by having more than 2^31 items in the hash
 table. This seems improbable but I guess one never knows. How big is
 your wd0a?

 Can you patch your copy to print the value of wr_blkhashcnt before
 crashing? It would be interesting to know if it's 0 or if it's -2^31
 or thereabouts.

  > 	Manuel Bouyer suggested changing the assertion into a warning,
  > 	so that fsck can clean up the log.

 Given that it's an in-memory construct and the assertion checks a
 fairly simple invariant of a data structure, failure suggests that
 something pretty bad is wrong; continuing might well just cause fsck
 to munge the volume. So I'm not sure this is a good idea.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org, Hauke Fath <hf@spg.tu-darmstadt.de>
Subject: Re: bin/43336: fsck wapbl assertion failure after crash
Date: Mon, 31 May 2010 11:29:00 +0200

 At 7:25 Uhr +0000 27.05.2010, David Holland wrote:
 > Can you patch your copy to print the value of wr_blkhashcnt before
 > crashing? It would be interesting to know if it's 0 or if it's -2^31
 > or thereabouts.

 A major power outage gave me the opportunity to try that - the machine was
 updated to a netbsd-5 snapshot a few days ago.

 wr_blkhashcnt is 0.

 I have a 3 GB bzip2'ed archive of a 10 GB /var/obj filesystem here that I
 could put up for download if that helps? Unfortunately all of the
 filesystems on the machine are (at least) that big.

 	hauke

 -- 
      The ASCII Ribbon Campaign                    Hauke Fath
 ()     No HTML/RTF in email            Institut für Nachrichtentechnik
 /\     No Word docs in email                     TU Darmstadt
      Respect for open standards              Ruf +49-6151-16-3281

From: Ryo SHIMIZU <ryo@nerv.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, Hauke Fath <hf@spg.tu-darmstadt.de>,
    David Holland <dholland-bugs@netbsd.org>
Subject: Re: bin/43336: fsck wapbl assertion failure after crash
Date: Thu, 02 Sep 2010 14:33:37 +0900

 I found a bug in wapbl_blkhash_init().
 It will be initialized incorrectly because wr->wr_blkhashmask is unspecified
 called from fsck_ffs (!_KERNEL)


 Index: vfs_wapbl.c
 ===================================================================
 RCS file: /cvsroot/src/sys/kern/vfs_wapbl.c,v
 retrieving revision 1.36
 diff -a -u -r1.36 vfs_wapbl.c
 --- vfs_wapbl.c	21 Apr 2010 19:50:57 -0000	1.36
 +++ vfs_wapbl.c	2 Sep 2010 04:56:09 -0000
 @@ -2095,9 +2095,9 @@
  		for (hashsize = 1; hashsize < size; hashsize <<= 1)
  			continue;
  		wr->wr_blkhash = wapbl_malloc(hashsize * sizeof(*wr->wr_blkhash));
 -		for (i = 0; i < wr->wr_blkhashmask; i++)
 -			LIST_INIT(&wr->wr_blkhash[i]);
  		wr->wr_blkhashmask = hashsize - 1;
 +		for (i = 0; i <= wr->wr_blkhashmask; i++)
 +			LIST_INIT(&wr->wr_blkhash[i]);
  	}
  #endif /* ! _KERNEL */
  }



 --
 ryo shimizu

From: Matthias Drochner <drochner@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/43336 CVS commit: src/sys/kern
Date: Fri, 10 Sep 2010 10:14:56 +0000

 Module Name:	src
 Committed By:	drochner
 Date:		Fri Sep 10 10:14:56 UTC 2010

 Modified Files:
 	src/sys/kern: vfs_wapbl.c

 Log Message:
 fix two bugs reported by Ryo Shimizu:
 -wrong initialization reported in a followup to PR bin/43336
  (looks harmless because it applies to zero-initialized memory, so
  LIST_INIT() is a no-op)
 -wrong loop count in reply misses a hash bucket (PR kern/43827)
  (this was introduced by a post-netbsd-5 change, so it isn't related
  to the PR above)


 To generate a diff of this commit:
 cvs rdiff -u -r1.36 -r1.37 src/sys/kern/vfs_wapbl.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Ryo SHIMIZU <ryo@nerv.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, drochner@NetBSD.org
Subject: Re: bin/43336: fsck wapbl assertion failure after crash
Date: Sat, 11 Sep 2010 22:03:32 +0900

 Although it's too late, I found repeatability.

 >How-To-Repeat:
 # cd /var/tmp
 # ln -s J /etc/malloc.conf
 # dd if=/dev/zero of=xxx.fs bs=1m count=128
 128+0 records in
 128+0 records out
 134217728 bytes transferred in 2.263 secs (59309645 bytes/sec)
 # newfs -F -O 2 xxx.fs
 xxx.fs: 128.0MB (262144 sectors) block size 8192, fragment size 1024
 using 4 cylinder groups of 32.00MB, 4096 blks, 7712 inodes.
 super-block backups (for fsck_ffs -b #) at:
 144, 65680, 131216, 196752,
 # vnconfig vnd0 xxx.fs
 # mount -o log /dev/vnd0a /mnt
 # cp /netbsd /mnt
 # cp xxx.fs xxx.live.fs
 # fsck_ffs -ny xxx.live.fs
 xxx.live.fs is not a character device
 CONTINUE? yes

 ** xxx.live.fs
 fsck_ffs: ioctl (DIOCGWEDGEINFO): Inappropriate ioctl for device
 assertion "wr->wr_blkhashcnt > 0" failed: file "/home/builds/ab/netbsd-5-1-RC3/src/sys/kern/vfs_wapbl.c", line 2156, function "wapbl_blkhash_clear"
 Abort

From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/43336 CVS commit: [netbsd-5] src/sys/kern
Date: Mon, 13 Sep 2010 19:52:49 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Mon Sep 13 19:52:49 UTC 2010

 Modified Files:
 	src/sys/kern [netbsd-5]: vfs_wapbl.c

 Log Message:
 Apply patch (requested by drochner in ticket #1454):
 Fix inconsistencies in the wapbl replay process which can lead to a
 premature abort of the fsck run and possibly leave a corrupted
 filesystem.  Addresses PR bin/43336.


 To generate a diff of this commit:
 cvs rdiff -u -r1.3.8.1 -r1.3.8.2 src/sys/kern/vfs_wapbl.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 26 Sep 2010 23:18:20 +0000
State-Changed-Why:
Is this fully fixed?
(good catch, folks :-) )


From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: dholland@NetBSD.org
Subject: Re: bin/43336 (fsck wapbl assertion failure after crash)
Date: Mon, 27 Sep 2010 10:59:27 +0200

 At 23:18 Uhr +0000 26.09.2010, dholland@NetBSD.org wrote:
 >Is this fully fixed?

 Yes, verified fixed with (one of) the original file-system(s).

 	hauke

 -- 
      The ASCII Ribbon Campaign                    Hauke Fath
 ()     No HTML/RTF in email            Institut für Nachrichtentechnik
 /\     No Word docs in email                     TU Darmstadt
      Respect for open standards              Ruf +49-6151-16-3281

State-Changed-From-To: feedback->closed
State-Changed-By: snj@NetBSD.org
State-Changed-When: Mon, 27 Sep 2010 16:26:55 +0000
State-Changed-Why:
Fixed and pulled up to netbsd-5.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.