NetBSD Problem Report #45933
From krinsky@bantha.org Mon Feb 6 03:28:01 2012
Return-Path: <krinsky@bantha.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
by www.NetBSD.org (Postfix) with ESMTP id 6DBC163BCF4
for <gnats-bugs@gnats.NetBSD.org>; Mon, 6 Feb 2012 03:28:01 +0000 (UTC)
Message-Id: <20120206032759.14F4D139851@bantha.org>
Date: Sun, 5 Feb 2012 22:27:59 -0500 (EST)
From: krinsky@bantha.org
Reply-To: krinsky@bantha.org
To: gnats-bugs@gnats.NetBSD.org
Subject: fsck_ffs can't find wapbl journal on 4k-block disks
X-Send-Pr-Version: 3.95
>Number: 45933
>Category: bin
>Synopsis: fsck_ffs can't find wapbl journal on 4k-block disks
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: mlelstv
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Feb 06 03:30:00 +0000 2012
>Closed-Date: Fri Sep 30 08:00:16 +0000 2016
>Last-Modified: Fri Sep 30 08:00:16 +0000 2016
>Originator: David Krinsky
>Release: NetBSD 5.99.59
>Organization:
>Environment:
System: NetBSD foundation.krinsky.net 5.99.59 NetBSD 5.99.59 (NEWBANTHA_XEN3_DOM0) #0: Thu Jan 12 20:56:21 EST 2012 root@foundation.krinsky.net:/usr/obj/sys/arch/amd64/compile/NEWBANTHA_XEN3_DOM0 amd64
Architecture: x86_64
Machine: amd64
fsck_ffs ident:
$NetBSD: crt0-common.c,v 1.7 2011/06/30 20:07:35 matt Exp $
$NetBSD: crt0.S,v 1.3 2011/07/01 02:59:05 joerg Exp $
$NetBSD: crti.S,v 1.1 2010/08/07 18:01:35 joerg Exp $
$NetBSD: crtbegin.S,v 1.2 2010/11/30 18:37:59 joerg Exp $
$NetBSD: dir.c,v 1.53 2011/03/06 17:08:16 bouyer Exp $
$NetBSD: inode.c,v 1.64 2011/03/06 17:08:16 bouyer Exp $
$NetBSD: main.c,v 1.80 2011/08/29 14:34:59 joerg Exp $
$NetBSD: pass1.c,v 1.49 2011/08/14 12:32:01 christos Exp $
$NetBSD: pass1b.c,v 1.22 2011/03/06 17:08:16 bouyer Exp $
$NetBSD: pass2.c,v 1.48 2011/08/14 12:32:01 christos Exp $
$NetBSD: pass3.c,v 1.20 2011/03/06 17:08:16 bouyer Exp $
$NetBSD: pass4.c,v 1.26 2011/03/06 17:08:16 bouyer Exp $
$NetBSD: pass5.c,v 1.49 2011/08/14 12:32:01 christos Exp $
$NetBSD: fsutil.c,v 1.20 2011/06/09 19:57:50 christos Exp $
$NetBSD: setup.c,v 1.94 2011/08/14 12:32:01 christos Exp $
$NetBSD: utilities.c,v 1.60 2011/06/09 19:57:52 christos Exp $
$NetBSD: ffs_bswap.c,v 1.35 2011/03/06 17:08:38 bouyer Exp $
$NetBSD: ffs_subr.c,v 1.47 2011/08/14 12:37:09 christos Exp $
$NetBSD: ffs_tables.c,v 1.9 2005/12/11 12:25:25 christos Exp $
$NetBSD: ffs_appleufs.c,v 1.12 2011/11/19 22:51:31 tls Exp $
$NetBSD: partutil.c,v 1.11 2011/11/13 22:04:51 christos Exp $
$NetBSD: quota2_subr.c,v 1.4 2011/06/07 14:56:13 bouyer Exp $
$NetBSD: progress.c,v 1.5 2009/04/11 06:48:36 lukem Exp $
$NetBSD: vfs_wapbl.c,v 1.48 2011/12/02 12:38:59 yamt Exp $
$NetBSD: wapbl.c,v 1.5 2010/03/06 11:31:40 mlelstv Exp $
$NetBSD: crtend.S,v 1.1 2010/08/07 18:01:34 joerg Exp $
$NetBSD: crtn.S,v 1.1 2010/08/07 18:01:35 joerg Exp $
>Description:
When attempting to fsck an FFSv2 filesystem that had been mounted with "-o log"
but then not cleanly unmounted, fsck_ffs fails to find and read the journal if
the filesystem is on a device with a sector size of 4K:
foundation# fsck /dev/mapper/rvgr00-wapbltest
** /dev/mapper/rvgr00-wapbltest
CANNOT READ: BLK 8388608
CONTINUE? [yn] y
THE FOLLOWING DISK SECTORS COULD NOT BE READ: 67108864 (8388608), 67108865 (8388608),
Unrecognized wapbl magic: 0x00000000
UNABLE TO READ JOURNAL FOR REPLAY
CONTINUE? [yn]
(This example is from a 4096 MiB FFSv2 filesystem located on a 4160
MiB LVM logical volume, which is ultimately resident on a RAID1
RAIDframe array of two 2TB Seagate Barracuda Green drives--which have
4K disk sectors. But I've observed the same thing on filesystems
created on the same drive directly with no RAID or LVM. Note also
that the only crashes I've experienced to trigger this have been by
killing a Xen DomU with xm destroy, but the problem doesn't appear
Xen-related; the DomUs, like the Dom0 above, are running recent
versions of -current for amd64.)
Block 8388608 is the correct location of the WAPBL journal in 512-byte blocks;
manually doing
dd if=/dev/mapper/rvgr00-wapbltest of=/tmp/whatever bs=512 count=131072 skip=8388608
succeeds, and /tmp/whatever is not just zeroes.
>How-To-Repeat:
- Create an FFSv2 filesystem on a device with 4K blocks (I have been leaving space
for a WAPBL journal after the filesystem but it seems unlikely that this is
necessary.)
- mount -o log /dev/{filesystem} /mnt
- Crash
- fsck /dev/{filesystem}
>Fix:
The problem appears to be that check_wapbl() in sbin/fsck_ffs/wapbl.c calls
wapbl_replay_start with addr=8388608 and blksize=512, as apparently specified in the
superblock:
(gdb) bt
#0 bread (fd=5, buf=0x7f7ff7b1e000 "", blk=8388608, size=1024)
at /usr/src/sbin/fsck_ffs/utilities.c:342
#1 0x000000000041cf82 in wapbl_read (data=0x7f7ff7b1e000, len=1024, devvp=0x0, pbn=8388608)
at /usr/src/sbin/fsck_ffs/wapbl.c:84
#2 0x000000000041c426 in wapbl_replay_start (wrp=0x624850, vp=0x0, off=8388608,
count=131072, blksize=512) at /usr/src/sys/kern/vfs_wapbl.c:2421
#3 0x000000000041d4c3 in check_wapbl () at /usr/src/sbin/fsck_ffs/wapbl.c:210
#4 0x0000000000411d68 in setup (dev=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest",
origdev=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest")
at /usr/src/sbin/fsck_ffs/setup.c:185
#5 0x0000000000408f35 in checkfilesys (
filesys=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest",
origfs=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest", child=0)
at /usr/src/sbin/fsck_ffs/main.c:306
#6 0x0000000000408de7 in main (argc=0, argv=0x7f7fffffdc60)
at /usr/src/sbin/fsck_ffs/main.c:245
(gdb) fr 3
#3 0x000000000041d4c3 in check_wapbl () at /usr/src/sbin/fsck_ffs/wapbl.c:210
210 error = wapbl_replay_start(
(gdb) l
205 sbdirty();
206 ret = FSCK_EXIT_CHECK_FAILED;
207 break;
208 }
209 if (sblock->fs_flags & FS_DOWAPBL) {
210 error = wapbl_replay_start(
211 &wapbl_replay, 0, addr, count, blksize);
212 if (error) {
213 pfatal(
214 "UNABLE TO READ JOURNAL FOR REPLAY");
(gdb) p addr
$23 = 8388608
(gdb) p blksize
$24 = 512
But then wapbl_replay_start doesn't pass this information on to wapbl_read--it just
calls wapbl_read with the address in 512-byte blocks (as "pbn") without specifying
the blocksize:
(gdb) fr 2
#2 0x000000000041c426 in wapbl_replay_start (wrp=0x624850, vp=0x0, off=8388608,
count=131072, blksize=512) at /usr/src/sys/kern/vfs_wapbl.c:2421
2421 error = wapbl_read(scratch, 2<<log_dev_bshift, devvp, pbn);
(gdb) l
2416
2417 pbn = logpbn;
2418 #ifdef _KERNEL
2419 pbn = btodb(pbn << log_dev_bshift);
2420 #endif
2421 error = wapbl_read(scratch, 2<<log_dev_bshift, devvp, pbn);
2422 if (error)
2423 goto errout;
2424
2425 wch = (struct wapbl_wc_header *)scratch;
(gdb) p pbn
$25 = 8388608
and the ultimate result is that bread tries to pread from a block way beyond the end of the
partition (i.e., 4096*pbn bytes into the partition instead of 512*pbn bytes), because
dev_bsize == 4096:
(gdb) fr 0
#0 bread (fd=5, buf=0x7f7ff7b1e000 "", blk=8388608, size=1024)
at /usr/src/sbin/fsck_ffs/utilities.c:344
344 if ((pread(fd, buf, (int)size, offset) == size) &&
(gdb) l
339 int i, errs;
340 off_t offset;
341
342 offset = blk;
343 offset *= dev_bsize;
344 if ((pread(fd, buf, (int)size, offset) == size) &&
345 read_wapbl(buf, size, blk) == 0)
346 return (0);
347 rwerror("READ", blk);
348 errs = 0;
(gdb) p offset
$37 = 34359738368
(gdb) p blk
$38 = 8388608
(gdb) p dev_bsize
$39 = 4096
I am not sufficiently familiar with this code to tell whether the cleanest solution
is to pass the blocksize further down, to recalculate pbn in terms of the larger
correct blocksize, to make sure the location of the log is stored in units that match
the device, or something else.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: bin-bug-people->mlelstv
Responsible-Changed-By: mlelstv@NetBSD.org
Responsible-Changed-When: Sun, 13 Jul 2014 08:35:51 +0000
Responsible-Changed-Why:
my code
From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/45933: fsck_ffs can't find wapbl journal on 4k-block disks
Date: Sun, 13 Jul 2014 11:41:25 +0200
The code really does the right thing because the superblock stores the
journal position in physical block numbers(*).
The WAPBL code in the kernel then translates to 512byte block numbers
because that's what all disk drivers in NetBSD expect. The disk driver
(e.g. scsipi) then translates back to physical block numbers.
In Userland the physical block number is translated by
wapbl_read/wapbl_write to a byte offset by multiplying it with
the physical sector size and the byte offset is used to seek
the device or image.
(*) So why does your superblock seem to have a logical block
number for the journal position?
The reason is Xen, the xen driver presents the virtual disk as a
512byte/sector device independently of the underlying hardware.
Therefore the DomU thinks it talks to such a device, and will write
a journal position measured in 512byte blocks. fsck on the DomU
also works because it uses the same geometry.
This is fine, unless you try to access the filesystem from the Dom0
which obviously knows that the disk sectors are larger. The
superblock no longer matches reality and fsck does the wrong thing.
Something similar happens when you copy a FFS image between disks with
different sector sizes.
Greetings,
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: David Krinsky <krinsky@bantha.org>
To: gnats-bugs@NetBSD.org
Cc: mlelstv@NetBSD.org
Subject: Re: bin/45933: fsck_ffs can't find wapbl journal on 4k-block disks
Date: Sun, 13 Jul 2014 08:53:02 -0400
On Sun, Jul 13, 2014 at 09:45:00AM +0000, Michael van Elst (mlelstv@serpens.de) wrote:
> The code really does the right thing because the superblock stores the
> journal position in physical block numbers(*).
>
> The WAPBL code in the kernel then translates to 512byte block numbers
> because that's what all disk drivers in NetBSD expect. The disk driver
> (e.g. scsipi) then translates back to physical block numbers.
>
> In Userland the physical block number is translated by
> wapbl_read/wapbl_write to a byte offset by multiplying it with
> the physical sector size and the byte offset is used to seek
> the device or image.
Michael,
Thank you for the response. But if this is the full explanation, why
does fsck in the domU fail? That is what I am seeing-- this bug
doesn't depend on creating a filesystem in a domU and then fscking it
from dom0. My problem--which I run into in the field any time one of
my domUs crashes--is that fsck in the *domU* fails.
> (*) So why does your superblock seem to have a logical block
> number for the journal position?
>
> The reason is Xen, the xen driver presents the virtual disk as a
> 512byte/sector device independently of the underlying hardware.
> Therefore the DomU thinks it talks to such a device, and will write
> a journal position measured in 512byte blocks. fsck on the DomU
> also works because it uses the same geometry.
It isn't working in my case--the wapbl code called by fsck in the domU
is incorrectly treating the journal position as though it were in 4K
blocks, doing the conversion, and fsck in the domU fails to find the
journal.
David
From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@NetBSD.org
Cc: mlelstv@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
krinsky@bantha.org
Subject: Re: bin/45933: fsck_ffs can't find wapbl journal on 4k-block disks
Date: Sun, 13 Jul 2014 18:39:29 +0200
On Sun, Jul 13, 2014 at 02:10:15PM +0000, David Krinsky wrote:
> Michael,
>
> Thank you for the response. But if this is the full explanation, why
> does fsck in the domU fail? That is what I am seeing-- this bug
> doesn't depend on creating a filesystem in a domU and then fscking it
> from dom0.
Can you please dump the superblock with 'dumpfs -s' and the device
properties with 'drvctl -p' from inside the DomU ?
Greetings,
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: David Krinsky <krinsky@bantha.org>
To: Michael van Elst <mlelstv@serpens.de>
Cc: gnats-bugs@NetBSD.org, mlelstv@NetBSD.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: bin/45933: fsck_ffs can't find wapbl journal on 4k-block disks
Date: Sun, 13 Jul 2014 18:38:51 -0400
On Sun, Jul 13, 2014 at 06:39:29PM +0200, Michael van Elst (mlelstv@serpens.de) wrote:
> On Sun, Jul 13, 2014 at 02:10:15PM +0000, David Krinsky wrote:
>
> > Michael,
> >
> > Thank you for the response. But if this is the full explanation, why
> > does fsck in the domU fail? That is what I am seeing-- this bug
> > doesn't depend on creating a filesystem in a domU and then fscking it
> > from dom0.
>
>
> Can you please dump the superblock with 'dumpfs -s' and the device
> properties with 'drvctl -p' from inside the DomU ?
# dumpfs -s /dev/xbd4
file system: /dev/xbd4d
format FFSv2
endian little-endian
location 65536 (-b 16)
magic 19540119 time Sun Jul 13 08:27:32 2014
superblock location 65536 id [ 4f2dc109 39cab8dc ]
cylgrp dynamic inodes FFSv2 sblock FFSv2 fslevel 5
nbfree 4129308 ndir 1 nifree 8269435 nffree 13
ncg 177 size 33554432 blocks 33034735
bsize 32768 shift 15 mask 0xffff8000
fsize 4096 shift 12 mask 0xfffff000
frag 8 shift 3 fsbtodb 0
bpg 23697 fpg 189576 ipg 46720
minfree 5% optim time maxcontig 2 maxbpg 4096
symlinklen 120 contigsumsize 2
maxfilesize 0x000800800805ffff
nindir 4096 inopb 128
avgfilesize 16384 avgfpdir 64
sblkno 24 cblkno 32 iblkno 40 dblkno 2960
sbsize 4096 cgsize 32768
csaddr 2960 cssize 4096
cgrotor 0 fmod 0 ronly 0 clean 0x01
wapbl version 0x1 location 1 flags 0x0
wapbl loc0 268435456 loc1 131072 loc2 512 loc3 0
flags none
fsmnt /var/mail
volname swuid 0
# drvctl -p xbd4
Properties for device `xbd4':
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>device-driver</key>
<string>xbd</string>
<key>device-unit</key>
<integer>0x4</integer>
<key>disk-info</key>
<dict>
<key>geometry</key>
<dict>
<key>cylinders-per-unit</key>
<integer>0x20040</integer>
<key>sector-size</key>
<integer>0x200</integer>
<key>sectors-per-track</key>
<integer>0x800</integer>
<key>sectors-per-unit</key>
<integer>0x10020000</integer>
<key>tracks-per-cylinder</key>
<integer>0x1</integer>
</dict>
</dict>
</dict>
</plist>
From: Michael van Elst <mlelstv@serpens.de>
To: David Krinsky <krinsky@bantha.org>
Cc: gnats-bugs@NetBSD.org, mlelstv@NetBSD.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: bin/45933: fsck_ffs can't find wapbl journal on 4k-block disks
Date: Tue, 15 Jul 2014 06:33:58 +0200
On Sun, Jul 13, 2014 at 06:38:51PM -0400, David Krinsky wrote:
> # dumpfs -s /dev/xbd4
> wapbl version 0x1 location 1 flags 0x0
> wapbl loc0 268435456 loc1 131072 loc2 512 loc3 0
location = 1 UFS_WAPBL_JOURNALLOC_END_PARTITION
loc0 = 268435456 Journal placed at this physical offset
loc1 = 131072 Journal size is 128k sectors.
loc2 = 512 Physical sector size 512 bytes.
So WAPBL says it uses 512 byte sectors.
> # drvctl -p xbd4
> <key>sector-size</key>
> <integer>0x200</integer>
The disk driver also reports 512 byte sectors.
However, fsck uses this value only to find the superblock and does
a recalculation of the sector size based on the superblock contents,
so that it can support filesystem images.
dev_bsize = sblock->fs_fsize / FFS_FSBTODB(sblock, 1);
with
> fsize 4096 shift 12 mask 0xfffff000
> frag 8 shift 3 fsbtodb 0
this evaluates to:
4096 / (1 << 0)
The problem is that the fsbtodb value in the superblock is wrong.
The value holds the shift value between filesystem block size
(which is really the fragsize not the blocksize!) and
the physical sector size. So the superblock says the physical
sector size is 4096.
Did you create the filesystem with newfs -S 4096 or on the Dom0 ?
That would explain why the superblock has the wrong value.
Kernel code works because it ignores the fsbtodb value. It also
created the journal at the first mount and stored its location
based on the actual driver values.
Greetings,
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: "Michael van Elst" <mlelstv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/45933 CVS commit: src/sys/kern
Date: Sun, 9 Aug 2015 07:40:59 +0000
Module Name: src
Committed By: mlelstv
Date: Sun Aug 9 07:40:59 UTC 2015
Modified Files:
src/sys/kern: vfs_wapbl.c
Log Message:
Refactor disk address calculation from physical block numbers in
the journal into a function. Make that function work correctly with
sector sizes != DEV_BSIZE when compiled outside the kernel (i.e.
fsck_ffs).
Fixes PR bin/45933
To generate a diff of this commit:
cvs rdiff -u -r1.61 -r1.62 src/sys/kern/vfs_wapbl.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/45933 CVS commit: [netbsd-7] src/sys/kern
Date: Sun, 9 Aug 2015 10:15:15 +0000
Module Name: src
Committed By: martin
Date: Sun Aug 9 10:15:15 UTC 2015
Modified Files:
src/sys/kern [netbsd-7]: vfs_wapbl.c
Log Message:
Pull up following revision(s) (requested by mlelstv in ticket #943):
sys/kern/vfs_wapbl.c: revision 1.62
Refactor disk address calculation from physical block numbers in
the journal into a function. Make that function work correctly with
sector sizes != DEV_BSIZE when compiled outside the kernel (i.e.
fsck_ffs).
Fixes PR bin/45933
To generate a diff of this commit:
cvs rdiff -u -r1.59 -r1.59.4.1 src/sys/kern/vfs_wapbl.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 05 Sep 2015 22:53:16 +0000
State-Changed-Why:
should be fixed in -7; please check
(I have never seen this myself, not sure why)
State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Fri, 30 Sep 2016 08:00:16 +0000
State-Changed-Why:
1-year feedback timeout
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.