NetBSD Problem Report #45933

From krinsky@bantha.org  Mon Feb  6 03:28:01 2012
Return-Path: <krinsky@bantha.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 6DBC163BCF4
	for <gnats-bugs@gnats.NetBSD.org>; Mon,  6 Feb 2012 03:28:01 +0000 (UTC)
Message-Id: <20120206032759.14F4D139851@bantha.org>
Date: Sun,  5 Feb 2012 22:27:59 -0500 (EST)
From: krinsky@bantha.org
Reply-To: krinsky@bantha.org
To: gnats-bugs@gnats.NetBSD.org
Subject: fsck_ffs can't find wapbl journal on 4k-block disks
X-Send-Pr-Version: 3.95

>Number:         45933
>Category:       bin
>Synopsis:       fsck_ffs can't find wapbl journal on 4k-block disks
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    mlelstv
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb 06 03:30:00 +0000 2012
>Last-Modified:  Tue Jul 15 04:35:01 +0000 2014
>Originator:     David Krinsky
>Release:        NetBSD 5.99.59
>Organization:
>Environment:


System: NetBSD foundation.krinsky.net 5.99.59 NetBSD 5.99.59 (NEWBANTHA_XEN3_DOM0) #0: Thu Jan 12 20:56:21 EST 2012 root@foundation.krinsky.net:/usr/obj/sys/arch/amd64/compile/NEWBANTHA_XEN3_DOM0 amd64
Architecture: x86_64
Machine: amd64

fsck_ffs ident:
     $NetBSD: crt0-common.c,v 1.7 2011/06/30 20:07:35 matt Exp $
     $NetBSD: crt0.S,v 1.3 2011/07/01 02:59:05 joerg Exp $
     $NetBSD: crti.S,v 1.1 2010/08/07 18:01:35 joerg Exp $
     $NetBSD: crtbegin.S,v 1.2 2010/11/30 18:37:59 joerg Exp $
     $NetBSD: dir.c,v 1.53 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: inode.c,v 1.64 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: main.c,v 1.80 2011/08/29 14:34:59 joerg Exp $
     $NetBSD: pass1.c,v 1.49 2011/08/14 12:32:01 christos Exp $
     $NetBSD: pass1b.c,v 1.22 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: pass2.c,v 1.48 2011/08/14 12:32:01 christos Exp $
     $NetBSD: pass3.c,v 1.20 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: pass4.c,v 1.26 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: pass5.c,v 1.49 2011/08/14 12:32:01 christos Exp $
     $NetBSD: fsutil.c,v 1.20 2011/06/09 19:57:50 christos Exp $
     $NetBSD: setup.c,v 1.94 2011/08/14 12:32:01 christos Exp $
     $NetBSD: utilities.c,v 1.60 2011/06/09 19:57:52 christos Exp $
     $NetBSD: ffs_bswap.c,v 1.35 2011/03/06 17:08:38 bouyer Exp $
     $NetBSD: ffs_subr.c,v 1.47 2011/08/14 12:37:09 christos Exp $
     $NetBSD: ffs_tables.c,v 1.9 2005/12/11 12:25:25 christos Exp $
     $NetBSD: ffs_appleufs.c,v 1.12 2011/11/19 22:51:31 tls Exp $
     $NetBSD: partutil.c,v 1.11 2011/11/13 22:04:51 christos Exp $
     $NetBSD: quota2_subr.c,v 1.4 2011/06/07 14:56:13 bouyer Exp $
     $NetBSD: progress.c,v 1.5 2009/04/11 06:48:36 lukem Exp $
     $NetBSD: vfs_wapbl.c,v 1.48 2011/12/02 12:38:59 yamt Exp $
     $NetBSD: wapbl.c,v 1.5 2010/03/06 11:31:40 mlelstv Exp $
     $NetBSD: crtend.S,v 1.1 2010/08/07 18:01:34 joerg Exp $
     $NetBSD: crtn.S,v 1.1 2010/08/07 18:01:35 joerg Exp $

>Description:

When attempting to fsck an FFSv2 filesystem that had been mounted with "-o log"
but then not cleanly unmounted, fsck_ffs fails to find and read the journal if
the filesystem is on a device with a sector size of 4K:

    foundation# fsck /dev/mapper/rvgr00-wapbltest
    ** /dev/mapper/rvgr00-wapbltest

    CANNOT READ: BLK 8388608
    CONTINUE? [yn] y

    THE FOLLOWING DISK SECTORS COULD NOT BE READ: 67108864 (8388608), 67108865 (8388608),
    Unrecognized wapbl magic: 0x00000000
    UNABLE TO READ JOURNAL FOR REPLAY
    CONTINUE? [yn] 

(This example is from a 4096 MiB FFSv2 filesystem located on a 4160
MiB LVM logical volume, which is ultimately resident on a RAID1
RAIDframe array of two 2TB Seagate Barracuda Green drives--which have
4K disk sectors.  But I've observed the same thing on filesystems
created on the same drive directly with no RAID or LVM.  Note also
that the only crashes I've experienced to trigger this have been by
killing a Xen DomU with xm destroy, but the problem doesn't appear
Xen-related; the DomUs, like the Dom0 above, are running recent
versions of -current for amd64.)

Block 8388608 is the correct location of the WAPBL journal in 512-byte blocks;
manually doing 

    dd if=/dev/mapper/rvgr00-wapbltest of=/tmp/whatever bs=512 count=131072 skip=8388608

succeeds, and /tmp/whatever is not just zeroes.

>How-To-Repeat:
    - Create an FFSv2 filesystem on a device with 4K blocks (I have been leaving space 
	for a WAPBL journal after the filesystem but it seems unlikely that this is
	necessary.)
    - mount -o log /dev/{filesystem} /mnt
    - Crash
    - fsck /dev/{filesystem}

>Fix:

The problem appears to be that check_wapbl() in sbin/fsck_ffs/wapbl.c calls
wapbl_replay_start with addr=8388608 and blksize=512, as apparently specified in the
superblock:

(gdb) bt
#0  bread (fd=5, buf=0x7f7ff7b1e000 "", blk=8388608, size=1024)
    at /usr/src/sbin/fsck_ffs/utilities.c:342
#1  0x000000000041cf82 in wapbl_read (data=0x7f7ff7b1e000, len=1024, devvp=0x0, pbn=8388608)
    at /usr/src/sbin/fsck_ffs/wapbl.c:84
#2  0x000000000041c426 in wapbl_replay_start (wrp=0x624850, vp=0x0, off=8388608, 
    count=131072, blksize=512) at /usr/src/sys/kern/vfs_wapbl.c:2421
#3  0x000000000041d4c3 in check_wapbl () at /usr/src/sbin/fsck_ffs/wapbl.c:210
#4  0x0000000000411d68 in setup (dev=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest", 
    origdev=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest")
    at /usr/src/sbin/fsck_ffs/setup.c:185
#5  0x0000000000408f35 in checkfilesys (
    filesys=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest", 
    origfs=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest", child=0)
    at /usr/src/sbin/fsck_ffs/main.c:306
#6  0x0000000000408de7 in main (argc=0, argv=0x7f7fffffdc60)
    at /usr/src/sbin/fsck_ffs/main.c:245
(gdb) fr 3
#3  0x000000000041d4c3 in check_wapbl () at /usr/src/sbin/fsck_ffs/wapbl.c:210
210                                     error = wapbl_replay_start(
(gdb) l
205                                     sbdirty();
206                                     ret = FSCK_EXIT_CHECK_FAILED;
207                                     break;
208                             }
209                             if (sblock->fs_flags & FS_DOWAPBL) {
210                                     error = wapbl_replay_start(
211                                         &wapbl_replay, 0, addr, count, blksize);
212                                     if (error) {
213                                             pfatal(
214                                                "UNABLE TO READ JOURNAL FOR REPLAY");
(gdb) p addr
$23 = 8388608
(gdb) p blksize
$24 = 512

But then wapbl_replay_start doesn't pass this information on to wapbl_read--it just
calls wapbl_read with the address in 512-byte blocks (as "pbn") without specifying
the blocksize:

(gdb) fr 2
#2  0x000000000041c426 in wapbl_replay_start (wrp=0x624850, vp=0x0, off=8388608, 
    count=131072, blksize=512) at /usr/src/sys/kern/vfs_wapbl.c:2421
2421            error = wapbl_read(scratch, 2<<log_dev_bshift, devvp, pbn);
(gdb) l
2416    
2417            pbn = logpbn;
2418    #ifdef _KERNEL
2419            pbn = btodb(pbn << log_dev_bshift);
2420    #endif
2421            error = wapbl_read(scratch, 2<<log_dev_bshift, devvp, pbn);
2422            if (error)
2423                    goto errout;
2424    
2425            wch = (struct wapbl_wc_header *)scratch;
(gdb) p pbn
$25 = 8388608

and the ultimate result is that bread tries to pread from a block way beyond the end of the
partition (i.e., 4096*pbn bytes into the partition instead of 512*pbn bytes), because
dev_bsize == 4096:

(gdb) fr 0
#0  bread (fd=5, buf=0x7f7ff7b1e000 "", blk=8388608, size=1024)
    at /usr/src/sbin/fsck_ffs/utilities.c:344
344             if ((pread(fd, buf, (int)size, offset) == size) &&
(gdb) l
339             int i, errs;
340             off_t offset;
341     
342             offset = blk;
343             offset *= dev_bsize;
344             if ((pread(fd, buf, (int)size, offset) == size) &&
345                 read_wapbl(buf, size, blk) == 0)
346                     return (0);
347             rwerror("READ", blk);
348             errs = 0;
(gdb) p offset
$37 = 34359738368
(gdb) p blk
$38 = 8388608
(gdb) p dev_bsize
$39 = 4096

I am not sufficiently familiar with this code to tell whether the cleanest solution
is to pass the blocksize further down, to recalculate pbn in terms of the larger
correct blocksize, to make sure the location of the log is stored in units that match
the device, or something else.

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: bin-bug-people->mlelstv
Responsible-Changed-By: mlelstv@NetBSD.org
Responsible-Changed-When: Sun, 13 Jul 2014 08:35:51 +0000
Responsible-Changed-Why:
my code


From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/45933: fsck_ffs can't find wapbl journal on 4k-block disks
Date: Sun, 13 Jul 2014 11:41:25 +0200

 The code really does the right thing because the superblock stores the
 journal position in physical block numbers(*).

 The WAPBL code in the kernel then translates to 512byte block numbers
 because that's what all disk drivers in NetBSD expect. The disk driver
 (e.g. scsipi) then translates back to physical block numbers.

 In Userland the physical block number is translated by
 wapbl_read/wapbl_write to a byte offset by multiplying it with
 the physical sector size and the byte offset is used to seek
 the device or image.

 (*) So why does your superblock seem to have a logical block
 number for the journal position?

 The reason is Xen, the xen driver presents the virtual disk as a
 512byte/sector device independently of the underlying hardware.
 Therefore the DomU thinks it talks to such a device, and will write
 a journal position measured in 512byte blocks. fsck on the DomU
 also works because it uses the same geometry.

 This is fine, unless you try to access the filesystem from the Dom0
 which obviously knows that the disk sectors are larger. The
 superblock no longer matches reality and fsck does the wrong thing.

 Something similar happens when you copy a FFS image between disks with
 different sector sizes.




 Greetings,
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: David Krinsky <krinsky@bantha.org>
To: gnats-bugs@NetBSD.org
Cc: mlelstv@NetBSD.org
Subject: Re: bin/45933: fsck_ffs can't find wapbl journal on 4k-block disks
Date: Sun, 13 Jul 2014 08:53:02 -0400

 On Sun, Jul 13, 2014 at 09:45:00AM +0000, Michael van Elst (mlelstv@serpens.de) wrote:
 >  The code really does the right thing because the superblock stores the
 >  journal position in physical block numbers(*).
 >  
 >  The WAPBL code in the kernel then translates to 512byte block numbers
 >  because that's what all disk drivers in NetBSD expect. The disk driver
 >  (e.g. scsipi) then translates back to physical block numbers.
 >  
 >  In Userland the physical block number is translated by
 >  wapbl_read/wapbl_write to a byte offset by multiplying it with
 >  the physical sector size and the byte offset is used to seek
 >  the device or image.

 Michael,

 Thank you for the response.  But if this is the full explanation, why
 does fsck in the domU fail?  That is what I am seeing-- this bug
 doesn't depend on creating a filesystem in a domU and then fscking it
 from dom0.  My problem--which I run into in the field any time one of
 my domUs crashes--is that fsck in the *domU* fails.  

 >  (*) So why does your superblock seem to have a logical block
 >  number for the journal position?
 >  
 >  The reason is Xen, the xen driver presents the virtual disk as a
 >  512byte/sector device independently of the underlying hardware.
 >  Therefore the DomU thinks it talks to such a device, and will write
 >  a journal position measured in 512byte blocks. fsck on the DomU
 >  also works because it uses the same geometry.

 It isn't working in my case--the wapbl code called by fsck in the domU
 is incorrectly treating the journal position as though it were in 4K
 blocks, doing the conversion, and fsck in the domU fails to find the
 journal.  

 David

From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@NetBSD.org
Cc: mlelstv@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        krinsky@bantha.org
Subject: Re: bin/45933: fsck_ffs can't find wapbl journal on 4k-block disks
Date: Sun, 13 Jul 2014 18:39:29 +0200

 On Sun, Jul 13, 2014 at 02:10:15PM +0000, David Krinsky wrote:

 >  Michael,
 >  
 >  Thank you for the response.  But if this is the full explanation, why
 >  does fsck in the domU fail?  That is what I am seeing-- this bug
 >  doesn't depend on creating a filesystem in a domU and then fscking it
 >  from dom0.


 Can you please dump the superblock with 'dumpfs -s' and the device
 properties with 'drvctl -p' from inside the DomU ?


 Greetings,
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: David Krinsky <krinsky@bantha.org>
To: Michael van Elst <mlelstv@serpens.de>
Cc: gnats-bugs@NetBSD.org, mlelstv@NetBSD.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: bin/45933: fsck_ffs can't find wapbl journal on 4k-block disks
Date: Sun, 13 Jul 2014 18:38:51 -0400

 On Sun, Jul 13, 2014 at 06:39:29PM +0200, Michael van Elst (mlelstv@serpens.de) wrote:
 > On Sun, Jul 13, 2014 at 02:10:15PM +0000, David Krinsky wrote:
 > 
 > >  Michael,
 > >  
 > >  Thank you for the response.  But if this is the full explanation, why
 > >  does fsck in the domU fail?  That is what I am seeing-- this bug
 > >  doesn't depend on creating a filesystem in a domU and then fscking it
 > >  from dom0.
 > 
 > 
 > Can you please dump the superblock with 'dumpfs -s' and the device
 > properties with 'drvctl -p' from inside the DomU ?

 # dumpfs -s /dev/xbd4 
 file system: /dev/xbd4d
 format  FFSv2
 endian  little-endian
 location 65536  (-b 16)
 magic   19540119        time    Sun Jul 13 08:27:32 2014
 superblock location     65536   id      [ 4f2dc109 39cab8dc ]
 cylgrp  dynamic inodes  FFSv2   sblock  FFSv2   fslevel 5
 nbfree  4129308 ndir    1       nifree  8269435 nffree  13
 ncg     177     size    33554432        blocks  33034735
 bsize   32768   shift   15      mask    0xffff8000
 fsize   4096    shift   12      mask    0xfffff000
 frag    8       shift   3       fsbtodb 0
 bpg     23697   fpg     189576  ipg     46720
 minfree 5%      optim   time    maxcontig 2     maxbpg  4096
 symlinklen 120  contigsumsize 2
 maxfilesize 0x000800800805ffff
 nindir  4096    inopb   128
 avgfilesize 16384       avgfpdir 64
 sblkno  24      cblkno  32      iblkno  40      dblkno  2960
 sbsize  4096    cgsize  32768
 csaddr  2960    cssize  4096
 cgrotor 0       fmod    0       ronly   0       clean   0x01
 wapbl version 0x1       location 1      flags 0x0
 wapbl loc0 268435456    loc1 131072     loc2 512        loc3 0
 flags   none
 fsmnt   /var/mail
 volname         swuid   0

 # drvctl -p xbd4 
 Properties for device `xbd4':
 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
 <plist version="1.0">
 <dict>
         <key>device-driver</key>
         <string>xbd</string>
         <key>device-unit</key>
         <integer>0x4</integer>
         <key>disk-info</key>
         <dict>
                 <key>geometry</key>
                 <dict>
                         <key>cylinders-per-unit</key>
                         <integer>0x20040</integer>
                         <key>sector-size</key>
                         <integer>0x200</integer>
                         <key>sectors-per-track</key>
                         <integer>0x800</integer>
                         <key>sectors-per-unit</key>
                         <integer>0x10020000</integer>
                         <key>tracks-per-cylinder</key>
                         <integer>0x1</integer>
                 </dict>
         </dict>
 </dict>
 </plist>

From: Michael van Elst <mlelstv@serpens.de>
To: David Krinsky <krinsky@bantha.org>
Cc: gnats-bugs@NetBSD.org, mlelstv@NetBSD.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Subject: Re: bin/45933: fsck_ffs can't find wapbl journal on 4k-block disks
Date: Tue, 15 Jul 2014 06:33:58 +0200

 On Sun, Jul 13, 2014 at 06:38:51PM -0400, David Krinsky wrote:

 > # dumpfs -s /dev/xbd4
 > wapbl version 0x1       location 1      flags 0x0
 > wapbl loc0 268435456    loc1 131072     loc2 512        loc3 0

 location = 1     UFS_WAPBL_JOURNALLOC_END_PARTITION
 loc0 = 268435456 Journal placed at this physical offset 
 loc1 = 131072    Journal size is 128k sectors.
 loc2 = 512       Physical sector size 512 bytes.

 So WAPBL says it uses 512 byte sectors.


 > # drvctl -p xbd4        
 >                         <key>sector-size</key>  
 >                         <integer>0x200</integer>

 The disk driver also reports 512 byte sectors.

 However, fsck uses this value only to find the superblock and does
 a recalculation of the sector size based on the superblock contents,
 so that it can support filesystem images.

 dev_bsize = sblock->fs_fsize / FFS_FSBTODB(sblock, 1);

 with

 > fsize   4096    shift   12      mask    0xfffff000
 > frag    8       shift   3       fsbtodb 0

 this evaluates to:

             4096             / (1 << 0)

 The problem is that the fsbtodb value in the superblock is wrong.
 The value holds the shift value between filesystem block size
 (which is really the fragsize not the blocksize!) and
 the physical sector size. So the superblock says the physical
 sector size is 4096.

 Did you create the filesystem with newfs -S 4096 or on the Dom0 ?
 That would explain why the superblock has the wrong value.

 Kernel code works because it ignores the fsbtodb value. It also
 created the journal at the first mount and stored its location
 based on the actual driver values.


 Greetings,
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.