NetBSD Problem Report #43336
From hf@spg.tu-darmstadt.de Fri May 21 16:00:49 2010
Return-Path: <hf@spg.tu-darmstadt.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 7C25463B873
for <gnats-bugs@gnats.NetBSD.org>; Fri, 21 May 2010 16:00:49 +0000 (UTC)
Message-Id: <201005211551.o4LFpvpm001308@Gstoder.nt.e-technik.tu-darmstadt.de>
Date: Fri, 21 May 2010 17:51:57 +0200 (CEST)
From: Hauke Fath <hf@spg.tu-darmstadt.de>
Reply-To: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@gnats.NetBSD.org
Cc: Hauke Fath <hf@spg.tu-darmstadt.de>
Subject: fsck wapbl assertion failure after crash
X-Send-Pr-Version: 3.95
>Number: 43336
>Category: bin
>Synopsis: fsck wapbl assertion failure after crash
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: bin-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri May 21 16:05:00 +0000 2010
>Closed-Date: Mon Sep 27 16:26:55 +0000 2010
>Last-Modified: Mon Sep 27 16:26:55 +0000 2010
>Originator: Hauke Fath
>Release: NetBSD 5.0_STABLE
>Organization:
--
The ASCII Ribbon Campaign Hauke Fath
() No HTML/RTF in email Institut für Nachrichtentechnik
/\ No Word docs in email TU Darmstadt
Respect for open standards Ruf +49-6151-16-3281
>Environment:
System: NetBSD Gstoder 5.0_STABLE NetBSD 5.0_STABLE (P4W) #0: Mon Mar 8 18:05:33 CET 2010 hf@Hochstuhl:/var/obj/netbsd-builds/5/i386/sys/arch/i386/compile/P4W i386
Architecture: i386
Machine: i386
>Description:
During reboot from a kernel panic, my netbsd-5 system dropped
to single-user, fsck(8) died with the following wapbl
assertion failure:
<http://www.spg.tu-darmstadt.de/~hf/netbsd/netbsd-5-wapbl.jpg>
I've seen this problem three times so far. The only way out
appears to be a 'tunefs -l 0 /dev/rwd0[aefg...]' followed by
'mount -a' and 'shutdown -r now'.
>How-To-Repeat:
On a netbsd-5 machine with file-systems mount(8)ed '-o log'
run into a kernel panic that dumps core and reboots. Find that
fsck(8) will not.
>Fix:
For a workaround, see above.
Manuel Bouyer suggested changing the assertion into a warning,
so that fsck can clean up the log.
>Release-Note:
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/43336: fsck wapbl assertion failure after crash
Date: Thu, 27 May 2010 07:24:01 +0000
On Fri, May 21, 2010 at 04:05:00PM +0000, Hauke Fath wrote:
> <http://www.spg.tu-darmstadt.de/~hf/netbsd/netbsd-5-wapbl.jpg>
For the record, the transcript is:
# fsck -f
** /dev/rwd0a
assertion "wr->wr_blkhashcnt > 0" failed: file "/public/netbsd-5/sys/kern/vfs_wapbl.c", line 2156, function "wapbl_blkhash_clear"
fsck: /dev/rwd0a: Abort trap
#
Examining the code I don't see any way this can happen (besides memory
corruption) except by having more than 2^31 items in the hash
table. This seems improbable but I guess one never knows. How big is
your wd0a?
Can you patch your copy to print the value of wr_blkhashcnt before
crashing? It would be interesting to know if it's 0 or if it's -2^31
or thereabouts.
> Manuel Bouyer suggested changing the assertion into a warning,
> so that fsck can clean up the log.
Given that it's an in-memory construct and the assertion checks a
fairly simple invariant of a data structure, failure suggests that
something pretty bad is wrong; continuing might well just cause fsck
to munge the volume. So I'm not sure this is a good idea.
--
David A. Holland
dholland@netbsd.org
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org, Hauke Fath <hf@spg.tu-darmstadt.de>
Subject: Re: bin/43336: fsck wapbl assertion failure after crash
Date: Mon, 31 May 2010 11:29:00 +0200
At 7:25 Uhr +0000 27.05.2010, David Holland wrote:
> Can you patch your copy to print the value of wr_blkhashcnt before
> crashing? It would be interesting to know if it's 0 or if it's -2^31
> or thereabouts.
A major power outage gave me the opportunity to try that - the machine was
updated to a netbsd-5 snapshot a few days ago.
wr_blkhashcnt is 0.
I have a 3 GB bzip2'ed archive of a 10 GB /var/obj filesystem here that I
could put up for download if that helps? Unfortunately all of the
filesystems on the machine are (at least) that big.
hauke
--
The ASCII Ribbon Campaign Hauke Fath
() No HTML/RTF in email Institut für Nachrichtentechnik
/\ No Word docs in email TU Darmstadt
Respect for open standards Ruf +49-6151-16-3281
From: Ryo SHIMIZU <ryo@nerv.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, Hauke Fath <hf@spg.tu-darmstadt.de>,
David Holland <dholland-bugs@netbsd.org>
Subject: Re: bin/43336: fsck wapbl assertion failure after crash
Date: Thu, 02 Sep 2010 14:33:37 +0900
I found a bug in wapbl_blkhash_init().
It will be initialized incorrectly because wr->wr_blkhashmask is unspecified
called from fsck_ffs (!_KERNEL)
Index: vfs_wapbl.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_wapbl.c,v
retrieving revision 1.36
diff -a -u -r1.36 vfs_wapbl.c
--- vfs_wapbl.c 21 Apr 2010 19:50:57 -0000 1.36
+++ vfs_wapbl.c 2 Sep 2010 04:56:09 -0000
@@ -2095,9 +2095,9 @@
for (hashsize = 1; hashsize < size; hashsize <<= 1)
continue;
wr->wr_blkhash = wapbl_malloc(hashsize * sizeof(*wr->wr_blkhash));
- for (i = 0; i < wr->wr_blkhashmask; i++)
- LIST_INIT(&wr->wr_blkhash[i]);
wr->wr_blkhashmask = hashsize - 1;
+ for (i = 0; i <= wr->wr_blkhashmask; i++)
+ LIST_INIT(&wr->wr_blkhash[i]);
}
#endif /* ! _KERNEL */
}
--
ryo shimizu
From: Matthias Drochner <drochner@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/43336 CVS commit: src/sys/kern
Date: Fri, 10 Sep 2010 10:14:56 +0000
Module Name: src
Committed By: drochner
Date: Fri Sep 10 10:14:56 UTC 2010
Modified Files:
src/sys/kern: vfs_wapbl.c
Log Message:
fix two bugs reported by Ryo Shimizu:
-wrong initialization reported in a followup to PR bin/43336
(looks harmless because it applies to zero-initialized memory, so
LIST_INIT() is a no-op)
-wrong loop count in reply misses a hash bucket (PR kern/43827)
(this was introduced by a post-netbsd-5 change, so it isn't related
to the PR above)
To generate a diff of this commit:
cvs rdiff -u -r1.36 -r1.37 src/sys/kern/vfs_wapbl.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Ryo SHIMIZU <ryo@nerv.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, drochner@NetBSD.org
Subject: Re: bin/43336: fsck wapbl assertion failure after crash
Date: Sat, 11 Sep 2010 22:03:32 +0900
Although it's too late, I found repeatability.
>How-To-Repeat:
# cd /var/tmp
# ln -s J /etc/malloc.conf
# dd if=/dev/zero of=xxx.fs bs=1m count=128
128+0 records in
128+0 records out
134217728 bytes transferred in 2.263 secs (59309645 bytes/sec)
# newfs -F -O 2 xxx.fs
xxx.fs: 128.0MB (262144 sectors) block size 8192, fragment size 1024
using 4 cylinder groups of 32.00MB, 4096 blks, 7712 inodes.
super-block backups (for fsck_ffs -b #) at:
144, 65680, 131216, 196752,
# vnconfig vnd0 xxx.fs
# mount -o log /dev/vnd0a /mnt
# cp /netbsd /mnt
# cp xxx.fs xxx.live.fs
# fsck_ffs -ny xxx.live.fs
xxx.live.fs is not a character device
CONTINUE? yes
** xxx.live.fs
fsck_ffs: ioctl (DIOCGWEDGEINFO): Inappropriate ioctl for device
assertion "wr->wr_blkhashcnt > 0" failed: file "/home/builds/ab/netbsd-5-1-RC3/src/sys/kern/vfs_wapbl.c", line 2156, function "wapbl_blkhash_clear"
Abort
From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/43336 CVS commit: [netbsd-5] src/sys/kern
Date: Mon, 13 Sep 2010 19:52:49 +0000
Module Name: src
Committed By: snj
Date: Mon Sep 13 19:52:49 UTC 2010
Modified Files:
src/sys/kern [netbsd-5]: vfs_wapbl.c
Log Message:
Apply patch (requested by drochner in ticket #1454):
Fix inconsistencies in the wapbl replay process which can lead to a
premature abort of the fsck run and possibly leave a corrupted
filesystem. Addresses PR bin/43336.
To generate a diff of this commit:
cvs rdiff -u -r1.3.8.1 -r1.3.8.2 src/sys/kern/vfs_wapbl.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 26 Sep 2010 23:18:20 +0000
State-Changed-Why:
Is this fully fixed?
(good catch, folks :-) )
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: dholland@NetBSD.org
Subject: Re: bin/43336 (fsck wapbl assertion failure after crash)
Date: Mon, 27 Sep 2010 10:59:27 +0200
At 23:18 Uhr +0000 26.09.2010, dholland@NetBSD.org wrote:
>Is this fully fixed?
Yes, verified fixed with (one of) the original file-system(s).
hauke
--
The ASCII Ribbon Campaign Hauke Fath
() No HTML/RTF in email Institut für Nachrichtentechnik
/\ No Word docs in email TU Darmstadt
Respect for open standards Ruf +49-6151-16-3281
State-Changed-From-To: feedback->closed
State-Changed-By: snj@NetBSD.org
State-Changed-When: Mon, 27 Sep 2010 16:26:55 +0000
State-Changed-Why:
Fixed and pulled up to netbsd-5.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.