NetBSD Problem Report #50108
From www@NetBSD.org Fri Jul 31 21:52:30 2015
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 302A5A6551
for <gnats-bugs@gnats.NetBSD.org>; Fri, 31 Jul 2015 21:52:30 +0000 (UTC)
Message-Id: <20150731215228.C525CA6551@mollari.NetBSD.org>
Date: Fri, 31 Jul 2015 21:52:28 +0000 (UTC)
From: dudinea@gmail.com
Reply-To: dudinea@gmail.com
To: gnats-bugs@NetBSD.org
Subject: fsck_ffs fails replaying wapbl journal on filesystem with 4k sectors
X-Send-Pr-Version: www-1.0
>Number: 50108
>Category: bin
>Synopsis: fsck_ffs fails replaying wapbl journal on filesystem with 4k sectors
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: mlelstv
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Jul 31 21:55:00 +0000 2015
>Closed-Date: Sun Nov 20 07:33:54 +0000 2022
>Last-Modified: Sun Nov 20 07:33:54 +0000 2022
>Originator: Eugene Doudine
>Release: NetBSD 7.99.20
>Organization:
>Environment:
NetBSD nbsd1.trekflow.com 7.99.20 NetBSD 7.99.20 (GENERIC) #1: Wed Jul 29 09:54:29 IDT 2015 root@nbsd1.trekflow.com:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
fsck_ffs ident:
$NetBSD: crt0.S,v 1.3 2011/07/01 02:59:05 joerg Exp $
$NetBSD: crt0-common.c,v 1.13 2013/01/31 22:24:25 matt Exp $
$NetBSD: crti.S,v 1.1 2010/08/07 18:01:35 joerg Exp $
$NetBSD: crtbegin.S,v 1.2 2010/11/30 18:37:59 joerg Exp $
$NetBSD: dir.c,v 1.57 2013/06/23 07:28:36 dholland Exp $
$NetBSD: inode.c,v 1.71 2014/04/05 12:32:27 justin Exp $
$NetBSD: main.c,v 1.83 2015/06/16 23:58:30 christos Exp $
$NetBSD: pass1.c,v 1.56 2013/10/19 01:09:58 christos Exp $
$NetBSD: pass1b.c,v 1.23 2013/01/22 09:39:12 dholland Exp $
$NetBSD: pass2.c,v 1.50 2013/06/09 17:57:09 dholland Exp $
$NetBSD: pass3.c,v 1.21 2013/01/22 09:39:12 dholland Exp $
$NetBSD: pass4.c,v 1.28 2013/06/23 22:03:34 dholland Exp $
$NetBSD: pass5.c,v 1.54 2013/06/23 22:03:34 dholland Exp $
$NetBSD: fsutil.c,v 1.26 2015/06/21 04:01:40 dholland Exp $
$NetBSD: setup.c,v 1.100 2013/06/23 07:28:36 dholland Exp $
$NetBSD: utilities.c,v 1.64 2013/10/19 01:09:58 christos Exp $
$NetBSD: ffs_bswap.c,v 1.39 2015/05/20 18:21:17 riastradh Exp $
$NetBSD: ffs_subr.c,v 1.48 2013/10/20 00:29:10 htodd Exp $
$NetBSD: ffs_tables.c,v 1.9 2005/12/11 12:25:25 christos Exp $
$NetBSD: ffs_appleufs.c,v 1.15 2015/02/15 11:04:43 maxv Exp $
$NetBSD: partutil.c,v 1.15 2015/06/03 17:53:23 martin Exp $
$NetBSD: quota2_subr.c,v 1.5 2012/02/05 14:19:04 dholland Exp $
$NetBSD: progress.c,v 1.5 2009/04/11 06:48:36 lukem Exp $
$NetBSD: vfs_wapbl.c,v 1.61 2014/10/18 08:33:29 snj Exp $
$NetBSD: wapbl.c,v 1.5 2010/03/06 11:31:40 mlelstv Exp $
$NetBSD: crtend.S,v 1.1 2010/08/07 18:01:34 joerg Exp $
$NetBSD: crtn.S,v 1.1 2010/08/07 18:01:35 joerg Exp $
>Description:
When trying to run fsck on a FFS filesystem that has been created with 4k sectors, mounted with -o log and then not cleanly unmounted,
fsck_ffs fails when it's replaying the journal and tries to write wrong blocks off the device end:
fsck_ffs -d -d /dev/rvnd0a
** /dev/rvnd0a
wapbl_replay_start: vp=0x0 off=91808 count=8192 blksize=4096
wapbl_read: 8192 bytes at block 91808 on fd 0x3
wapbl_replay: head=5332992 tail=5095424 off=8192 len=33546240 used=237568
wapbl_read: 4096 bytes at block 93052 on fd 0x3
wapbl_read: 4096 bytes at block 93108 on fd 0x3
wapbl_read: 4096 bytes at block 93109 on fd 0x3
** File system is journaled; replaying journal
wapbl_read: 4096 bytes at block 93092 on fd 0x3
wapbl_write: 4096 bytes at block 367616 on fd 0x4
CANNOT WRITE: BLK 367616
CONTINUE? [yn] n
Looks like it's trying to replay journal into blocks
that lie off the device ends. I suppose that it can
corrupt file system as well when wrong blocks to be written
lie within device boundaries.
Details of the device and filesystem above:
device created with:
vnconfig -c vnd0 ./diskimg.bin 4096/50/20/100
dumpfs -s /dev/rvnd0a:
file system: /dev/rvnd0a
format FFSv1
endian little-endian
magic 11954 time Sun Jul 26 21:16:49 2015
superblock location 8192 id [ 55b51630 27fdad44 ]
cylgrp dynamic inodes 4.4BSD sblock FFSv2 fslevel 4
nbfree 11377 ndir 1 nifree 22522 nffree 11
ncg 4 size 91808 blocks 91031
bsize 32768 shift 15 mask 0xffff8000
fsize 4096 shift 12 mask 0xfffff000
frag 8 shift 3 fsbtodb 0
bpg 2869 fpg 22952 ipg 5632
minfree 5% optim time maxcontig 2 maxbpg 8192
symlinklen 60 contigsumsize 2
maxfilesize 0x004002001005ffff
nindir 8192 inopb 256
avgfilesize 16384 avgfpdir 64
sblkno 8 cblkno 16 iblkno 24 dblkno 200
sbsize 4096 cgsize 8192
csaddr 200 cssize 4096
cgrotor 0 fmod 0 ronly 0 clean 0x02
wapbl version 0x1 location 1 flags 0x0
wapbl loc0 91808 loc1 8192 loc2 4096 loc3 0
flags wapbl
fsmnt /mnt/test_fsck
volname swuid 0
>How-To-Repeat:
Create an FFS filesystem on 4k system disk or with -S 4096 option
Mount with -o log
Crash
fsck /dev/{DEVNAME}
I've been able to reproduce problem on both real and vnode
devices. On the other hand kernel can replay journal
on such filesystems without errors.
>Fix:
>Release-Note:
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/50108: fsck_ffs fails replaying wapbl journal on filesystem
with 4k sectors
Date: Sun, 2 Aug 2015 07:48:59 +0000
On Fri, Jul 31, 2015 at 09:55:00PM +0000, dudinea@gmail.com wrote:
> When trying to run fsck on a FFS filesystem that has been created
> with 4k sectors, mounted with -o log and then not cleanly
> unmounted, fsck_ffs fails when it's replaying the journal and tries
> to write wrong blocks off the device end:
>
> fsck_ffs -d -d /dev/rvnd0a
> ** /dev/rvnd0a
> wapbl_replay_start: vp=0x0 off=91808 count=8192 blksize=4096
> wapbl_read: 8192 bytes at block 91808 on fd 0x3
> wapbl_replay: head=5332992 tail=5095424 off=8192 len=33546240 used=237568
> wapbl_read: 4096 bytes at block 93052 on fd 0x3
> wapbl_read: 4096 bytes at block 93108 on fd 0x3
> wapbl_read: 4096 bytes at block 93109 on fd 0x3
> ** File system is journaled; replaying journal
> wapbl_read: 4096 bytes at block 93092 on fd 0x3
> wapbl_write: 4096 bytes at block 367616 on fd 0x4
>
> CANNOT WRITE: BLK 367616
> CONTINUE? [yn] n
>
> Looks like it's trying to replay journal into blocks
> that lie off the device ends. I suppose that it can
> corrupt file system as well when wrong blocks to be written
> lie within device boundaries.
The cause of this is almost certainly wrong unit conversion
somewhere.
However, I notice that the dumpfs output says that there are 91808
blocks in the fs. So reading from block 91808 (not to mention past
that) and succeeding seems odd - does this volume have the journal
past the end of the volume data?
Anyway, 367616 is 91904 * 4, and 91904 is reasonably close to the
other blocks it's accessing so it's a plausible correct block value.
Which would imply that there's a stray factor of 4x somewhere. I'd
have expected 8x (4096 / 512) but whatever...
If you (or someone) can catch this in the debugger and then trace the
provenance of that 367616 my guess is that it's been multiplied by the
wrong thing.
Although this is not trivial; for the moment I can't tell for sure if
wapbl_read and wapbl_write are supposed to be getting block numbers in
frags or device-level blocks or DEV_BSIZE blocks - it looks like in
fsck it's supposed to be device-level blocks, but I'm not sure it's
the same in the kernel... and I'm not sure that's what it's getting
either (from reading the code) (which is what one would expect given
the bug report I suppose).
unfortunately the units are obscure and most of the conversion macros
are named unhelpful things like 'sntod' or 'fsbtodb' where the letters
aren't necessarily related much to the units they convert.
--
David A. Holland
dholland@netbsd.org
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/50108: fsck_ffs fails replaying wapbl journal on filesystem with 4k sectors
Date: Sun, 2 Aug 2015 10:45:07 +0000 (UTC)
dholland-bugs@netbsd.org (David Holland) writes:
> If you (or someone) can catch this in the debugger and then trace the
> provenance of that 367616 my guess is that it's been multiplied by the
> wrong thing.
> Although this is not trivial; for the moment I can't tell for sure if
> wapbl_read and wapbl_write are supposed to be getting block numbers in
> frags or device-level blocks or DEV_BSIZE blocks
It depends on wether it is kernel (DEV_BSIZE) or userland (sector size).
The fsck_ffs output shows that the journal blocks can be read
successfully. The write however failed, and assuming this is
a unit problem and the real block is 91904 or even 45952, it's
a block inside the filesystem.
Writes to blocks inside the filesystem using wapl_write are
caused by replaying the journal... and the block numbers that
appear inside the journal are written by the kernel using
its DEV_BSIZE units. When replayed by fsck_ffs in userland
however, these are interpreted as sector size units.
You can dump the journal with:
dumpfs -sjv vnd0a
This should show a block with 'daddr 367616' in the journal.
There are 4 places in the kernel code in vfs_wapbl.c (also used by
fsck_ffs) that retrieve block numbers from the journal.
wapbl_blkhash_ins(wr, wc->wc_blocks[i].wc_daddr + btodb(j * fsblklen), *offp);
wapbl_blkhash_rem(wr, wc->wc_blocks[i].wc_daddr + btodb(j * fsblklen));
and twice in the disabled verify function:
wapbl_blkhash_get(wr, wc->wc_blocks[i].wc_daddr + btodb(j * fsblklen));
Unfortunately there is no magic to transparently get the right
unit conversion here.
You could replace
wc->wc_blocks[i].wc_daddr + btodb(j * fsblklen)
with
wapbl_block_daddr(wc, i, j, fsblklen)
for something like:
daddr_t
wapbl_block_daddr(struct wapbl_wc_blocklist *wc, int i, int j, int blen)
{
#ifdef _KERNEL
return wc->wc_blocks[i].wc_daddr + btodb(j * blen);
#else
return dbtob(wc->wc_blocks[i].wc_daddr) / blen + j;
#endif
}
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
Responsible-Changed-From-To: bin-bug-people->mlelstv
Responsible-Changed-By: mlelstv@NetBSD.org
Responsible-Changed-When: Sun, 09 Aug 2015 07:36:15 +0000
Responsible-Changed-Why:
I have a fix
State-Changed-From-To: open->analyzed
State-Changed-By: mlelstv@NetBSD.org
State-Changed-When: Sun, 09 Aug 2015 07:36:15 +0000
State-Changed-Why:
problem is understood
State-Changed-From-To: analyzed->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 17 Aug 2015 18:23:03 +0000
State-Changed-Why:
problem is not understood unless the explanation is posted to gnats :-)
From: Eugene <dudinea@gmail.com>
To: gnats-bugs@NetBSD.org
Cc: dholland@NetBSD.org, mlelstv@NetBSD.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org
Subject: Re: bin/50108 (fsck_ffs fails replaying wapbl journal on filesystem
with 4k sectors)
Date: Mon, 17 Aug 2015 23:02:40 +0300
On Mon, 17 Aug 2015 18:23:03 +0000 (UTC)
dholland@NetBSD.org wrote:
> Synopsis: fsck_ffs fails replaying wapbl journal on filesystem with
> 4k sectors
[...]
> problem is not understood unless the explanation is posted to
> gnats :-)
Why? Michael's explanation and fix look right: I've tested his proposed
solution, and it works right for both 4k and regular 512 byte sectors.
BTW, on the other hand, the idea of the fix - passing same function
argument with different units depending on user/kernel mode - does not
look elegant to me.
Wouldn't it be better to modify fsck_ffs code (that is the
wapbl_read()/wapbl_write() functions in sbin/fsck_ffs/wapbl.c -
kern/vfs_wapbl.c calls them back for IO when working in user mode)
to use the same DEV_BSIZE unit as kernel when replaying the journal?
--
Regards,
Eugene
mailto:dudinea@gmail.com
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/50108 (fsck_ffs fails replaying wapbl journal on filesystem with 4k sectors)
Date: Mon, 17 Aug 2015 23:11:51 +0000 (UTC)
dudinea@gmail.com (Eugene) writes:
>Why? Michael's explanation and fix look right: I've tested his proposed
>solution, and it works right for both 4k and regular 512 byte sectors.
It has already been committed to -current and netbsd-7.
>BTW, on the other hand, the idea of the fix - passing same function
>argument with different units depending on user/kernel mode - does not
>look elegant to me.
All disk I/O code originally addressed a block by physical block
number and DEV_BSIZE was the (one and only) physical block size.
At some point, about when the SCSI subsystem was integrated into
the kernel, the model was changed, the kernel now uses the fixed
DEV_BSIZE=512 coordinates and the driver translates that into
physical blocks.
Userland however still maintains the old view and there was lots
of code in the kernel that treated DEV_BSIZE as the physical block
size.
This wasn't a problem as virtually all disks used 512byte blocks
at that time. Both views were effectively the same.
When disks with other block sizes became popular, it was necessary
to decide how to handle this in kernel and userland.
Userland using physical blocks for addressing had no problems
except for determining the block size. This is done by querying
the device driver for low-level tools and by checking filesystem
parameters (== superblock) for filesystem tools.
The kernel mainly needed to fix all places where filesystem
to disk block translations still didn't use DEV_BSIZE.
One problem is shared code between kernel and userland. But
since all this code only dealt with the translation between filesytem
coordinates and disk coordinates, this is handled in exactly one
place, the fsbtodb/dbtofsb macros which are now defined differently
for kernel and userland.
WAPBL is an exception. It's the only filesystem-related code that
does not use filesystem coordinates but disk coordinates (in kernel
that's DEV_BSIZE units) but which are then passed in the journal to
userland code.
And since WAPBL is again shared code between kernel and userland the
different views need to be reflected by that code.
>Wouldn't it be better to modify fsck_ffs code (that is the
>wapbl_read()/wapbl_write() functions in sbin/fsck_ffs/wapbl.c -
>kern/vfs_wapbl.c calls them back for IO when working in user mode)
>to use the same DEV_BSIZE unit as kernel when replaying the journal?
The functions are used for both types of coordinates. When reading/writing
the journal it's filesystem coordinates. When transfering data between
filesystem and journal its disk coordinates (as written by the kernel).
The only way to get out of this split is to have the same view on
both sides again. Either revert the kernel to use physical block
numbers again (i.e. change all drivers and most filesystems)
or change userland to use DEV_BSIZE coordinates (become incompatibel
with other BSDs and third party software).
On the other hand, it doesn't really add complexity. Unit translations
are done all over the place and you just have to keep in mind that
kernel and userland have different views.
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: Eugene <dudinea@gmail.com>
To: gnats-bugs@NetBSD.org
Cc: mlelstv@serpens.de, mlelstv@NetBSD.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: bin/50108 (fsck_ffs fails replaying wapbl journal on filesystem
with 4k sectors)
Date: Tue, 18 Aug 2015 06:59:39 +0300
On Mon, 17 Aug 2015 23:20:01 +0000 (UTC)
mlelstv@serpens.de (Michael van Elst) wrote:
> It has already been committed to -current and netbsd-7.
Thanks! But your commit states that it fixes bin/45933, not this one.
Is this by error or bin/45933 is really a manifestation of the same
bug?
>
> >BTW, on the other hand, the idea of the fix - passing same function
[ skipped ]
> On the other hand, it doesn't really add complexity. Unit
> translations are done all over the place and you just have to keep in
> mind that kernel and userland have different views.
Thank you for your great explanation.
--
Regards,
Eugene
mailto:dudinea@gmail.com
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/50108 (fsck_ffs fails replaying wapbl journal on filesystem with 4k sectors)
Date: Tue, 18 Aug 2015 16:16:21 +0000 (UTC)
dudinea@gmail.com (Eugene) writes:
>On Mon, 17 Aug 2015 23:20:01 +0000 (UTC)
>mlelstv@serpens.de (Michael van Elst) wrote:
>> It has already been committed to -current and netbsd-7.
>Thanks! But your commit states that it fixes bin/45933, not this one.
>Is this by error or bin/45933 is really a manifestation of the same
>bug?
45933 is probably a different error, but even with correct superblock
it would have failed (and then fsck might have trashed the filesystem).
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/50108 (fsck_ffs fails replaying wapbl journal on filesystem
with 4k sectors)
Date: Sat, 5 Sep 2015 23:43:52 +0000
On Mon, Aug 17, 2015 at 08:05:01PM +0000, Eugene wrote:
> [...]
> > problem is not understood unless the explanation is posted to
> > gnats :-)
> Why? Michael's explanation and fix look right: I've tested his proposed
> solution, and it works right for both 4k and regular 512 byte sectors.
It's not included in the PR. It *is* included in 45933, but I'd
forgotten about that one and it wasn't referenced in here here until
later.
If the issue is "understood" but doesn't actually get fixed, what
happens is that someone coming by months (or years) later has no idea
what was understood and has to redo the work. (Or if they're lucky
they can ask the original person, but things get forgotten.)
so it was just a procedural complaint.
> Wouldn't it be better to modify fsck_ffs code (that is the
> wapbl_read()/wapbl_write() functions in sbin/fsck_ffs/wapbl.c -
> kern/vfs_wapbl.c calls them back for IO when working in user mode)
> to use the same DEV_BSIZE unit as kernel when replaying the journal?
This is a thorny and ugly issue and there isn't any one good solution
to it that doesn't involve digging up and chewing over all of ffs. :(
--
David A. Holland
dholland@netbsd.org
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/50108 (fsck_ffs fails replaying wapbl journal on filesystem
with 4k sectors)
Date: Sat, 5 Sep 2015 23:46:05 +0000
On Tue, Aug 18, 2015 at 04:20:00PM +0000, Michael van Elst wrote:
> >Thanks! But your commit states that it fixes bin/45933, not this one.
> >Is this by error or bin/45933 is really a manifestation of the same
> >bug?
>
> 45933 is probably a different error, but even with correct superblock
> it would have failed (and then fsck might have trashed the filesystem).
...so, what else do we understand about how this PR is different from
45933?
--
David A. Holland
dholland@netbsd.org
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/50108 (fsck_ffs fails replaying wapbl journal on filesystem with 4k sectors)
Date: Sun, 6 Sep 2015 08:19:40 +0000 (UTC)
dholland-bugs@netbsd.org (David Holland) writes:
> ...so, what else do we understand about how this PR is different from
> 45933?
45933 is caused by a bad superblock. The superblock encodes the
physical sector size as seen by newfs. Userland tools like fsck
use that information when reading a filesystem image. The kernel
ignores that information, the disk driver has its own data to
rely on.
So when you make a filesystem image visible to multiple kernels
(like in: Xen host vs. Xen guest) it doesn't matter. But the
userland running under either kernel still cares.
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
State-Changed-From-To: open->closed
State-Changed-By: mlelstv@NetBSD.org
State-Changed-When: Sun, 20 Nov 2022 07:33:54 +0000
State-Changed-Why:
The problem has been fixed in netbsd-7Ã.
>Unformatted:
Code similar to the suggested has been committed in vfs_wapl.c 1.62
and pulled up to netbsd-7.
This probably needs to be applied to netbsd-6 as well.
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2022
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.