NetBSD Problem Report #45933

From krinsky@bantha.org  Mon Feb  6 03:28:01 2012
Return-Path: <krinsky@bantha.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 6DBC163BCF4
	for <gnats-bugs@gnats.NetBSD.org>; Mon,  6 Feb 2012 03:28:01 +0000 (UTC)
Message-Id: <20120206032759.14F4D139851@bantha.org>
Date: Sun,  5 Feb 2012 22:27:59 -0500 (EST)
From: krinsky@bantha.org
Reply-To: krinsky@bantha.org
To: gnats-bugs@gnats.NetBSD.org
Subject: fsck_ffs can't find wapbl journal on 4k-block disks
X-Send-Pr-Version: 3.95

>Number:         45933
>Category:       bin
>Synopsis:       fsck_ffs can't find wapbl journal on 4k-block disks
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb 06 03:30:00 +0000 2012
>Originator:     David Krinsky
>Release:        NetBSD 5.99.59
>Organization:
>Environment:


System: NetBSD foundation.krinsky.net 5.99.59 NetBSD 5.99.59 (NEWBANTHA_XEN3_DOM0) #0: Thu Jan 12 20:56:21 EST 2012 root@foundation.krinsky.net:/usr/obj/sys/arch/amd64/compile/NEWBANTHA_XEN3_DOM0 amd64
Architecture: x86_64
Machine: amd64

fsck_ffs ident:
     $NetBSD: crt0-common.c,v 1.7 2011/06/30 20:07:35 matt Exp $
     $NetBSD: crt0.S,v 1.3 2011/07/01 02:59:05 joerg Exp $
     $NetBSD: crti.S,v 1.1 2010/08/07 18:01:35 joerg Exp $
     $NetBSD: crtbegin.S,v 1.2 2010/11/30 18:37:59 joerg Exp $
     $NetBSD: dir.c,v 1.53 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: inode.c,v 1.64 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: main.c,v 1.80 2011/08/29 14:34:59 joerg Exp $
     $NetBSD: pass1.c,v 1.49 2011/08/14 12:32:01 christos Exp $
     $NetBSD: pass1b.c,v 1.22 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: pass2.c,v 1.48 2011/08/14 12:32:01 christos Exp $
     $NetBSD: pass3.c,v 1.20 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: pass4.c,v 1.26 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: pass5.c,v 1.49 2011/08/14 12:32:01 christos Exp $
     $NetBSD: fsutil.c,v 1.20 2011/06/09 19:57:50 christos Exp $
     $NetBSD: setup.c,v 1.94 2011/08/14 12:32:01 christos Exp $
     $NetBSD: utilities.c,v 1.60 2011/06/09 19:57:52 christos Exp $
     $NetBSD: ffs_bswap.c,v 1.35 2011/03/06 17:08:38 bouyer Exp $
     $NetBSD: ffs_subr.c,v 1.47 2011/08/14 12:37:09 christos Exp $
     $NetBSD: ffs_tables.c,v 1.9 2005/12/11 12:25:25 christos Exp $
     $NetBSD: ffs_appleufs.c,v 1.12 2011/11/19 22:51:31 tls Exp $
     $NetBSD: partutil.c,v 1.11 2011/11/13 22:04:51 christos Exp $
     $NetBSD: quota2_subr.c,v 1.4 2011/06/07 14:56:13 bouyer Exp $
     $NetBSD: progress.c,v 1.5 2009/04/11 06:48:36 lukem Exp $
     $NetBSD: vfs_wapbl.c,v 1.48 2011/12/02 12:38:59 yamt Exp $
     $NetBSD: wapbl.c,v 1.5 2010/03/06 11:31:40 mlelstv Exp $
     $NetBSD: crtend.S,v 1.1 2010/08/07 18:01:34 joerg Exp $
     $NetBSD: crtn.S,v 1.1 2010/08/07 18:01:35 joerg Exp $

>Description:

When attempting to fsck an FFSv2 filesystem that had been mounted with "-o log"
but then not cleanly unmounted, fsck_ffs fails to find and read the journal if
the filesystem is on a device with a sector size of 4K:

    foundation# fsck /dev/mapper/rvgr00-wapbltest
    ** /dev/mapper/rvgr00-wapbltest

    CANNOT READ: BLK 8388608
    CONTINUE? [yn] y

    THE FOLLOWING DISK SECTORS COULD NOT BE READ: 67108864 (8388608), 67108865 (8388608),
    Unrecognized wapbl magic: 0x00000000
    UNABLE TO READ JOURNAL FOR REPLAY
    CONTINUE? [yn] 

(This example is from a 4096 MiB FFSv2 filesystem located on a 4160
MiB LVM logical volume, which is ultimately resident on a RAID1
RAIDframe array of two 2TB Seagate Barracuda Green drives--which have
4K disk sectors.  But I've observed the same thing on filesystems
created on the same drive directly with no RAID or LVM.  Note also
that the only crashes I've experienced to trigger this have been by
killing a Xen DomU with xm destroy, but the problem doesn't appear
Xen-related; the DomUs, like the Dom0 above, are running recent
versions of -current for amd64.)

Block 8388608 is the correct location of the WAPBL journal in 512-byte blocks;
manually doing 

    dd if=/dev/mapper/rvgr00-wapbltest of=/tmp/whatever bs=512 count=131072 skip=8388608

succeeds, and /tmp/whatever is not just zeroes.

>How-To-Repeat:
    - Create an FFSv2 filesystem on a device with 4K blocks (I have been leaving space 
	for a WAPBL journal after the filesystem but it seems unlikely that this is
	necessary.)
    - mount -o log /dev/{filesystem} /mnt
    - Crash
    - fsck /dev/{filesystem}

>Fix:

The problem appears to be that check_wapbl() in sbin/fsck_ffs/wapbl.c calls
wapbl_replay_start with addr=8388608 and blksize=512, as apparently specified in the
superblock:

(gdb) bt
#0  bread (fd=5, buf=0x7f7ff7b1e000 "", blk=8388608, size=1024)
    at /usr/src/sbin/fsck_ffs/utilities.c:342
#1  0x000000000041cf82 in wapbl_read (data=0x7f7ff7b1e000, len=1024, devvp=0x0, pbn=8388608)
    at /usr/src/sbin/fsck_ffs/wapbl.c:84
#2  0x000000000041c426 in wapbl_replay_start (wrp=0x624850, vp=0x0, off=8388608, 
    count=131072, blksize=512) at /usr/src/sys/kern/vfs_wapbl.c:2421
#3  0x000000000041d4c3 in check_wapbl () at /usr/src/sbin/fsck_ffs/wapbl.c:210
#4  0x0000000000411d68 in setup (dev=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest", 
    origdev=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest")
    at /usr/src/sbin/fsck_ffs/setup.c:185
#5  0x0000000000408f35 in checkfilesys (
    filesys=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest", 
    origfs=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest", child=0)
    at /usr/src/sbin/fsck_ffs/main.c:306
#6  0x0000000000408de7 in main (argc=0, argv=0x7f7fffffdc60)
    at /usr/src/sbin/fsck_ffs/main.c:245
(gdb) fr 3
#3  0x000000000041d4c3 in check_wapbl () at /usr/src/sbin/fsck_ffs/wapbl.c:210
210                                     error = wapbl_replay_start(
(gdb) l
205                                     sbdirty();
206                                     ret = FSCK_EXIT_CHECK_FAILED;
207                                     break;
208                             }
209                             if (sblock->fs_flags & FS_DOWAPBL) {
210                                     error = wapbl_replay_start(
211                                         &wapbl_replay, 0, addr, count, blksize);
212                                     if (error) {
213                                             pfatal(
214                                                "UNABLE TO READ JOURNAL FOR REPLAY");
(gdb) p addr
$23 = 8388608
(gdb) p blksize
$24 = 512

But then wapbl_replay_start doesn't pass this information on to wapbl_read--it just
calls wapbl_read with the address in 512-byte blocks (as "pbn") without specifying
the blocksize:

(gdb) fr 2
#2  0x000000000041c426 in wapbl_replay_start (wrp=0x624850, vp=0x0, off=8388608, 
    count=131072, blksize=512) at /usr/src/sys/kern/vfs_wapbl.c:2421
2421            error = wapbl_read(scratch, 2<<log_dev_bshift, devvp, pbn);
(gdb) l
2416    
2417            pbn = logpbn;
2418    #ifdef _KERNEL
2419            pbn = btodb(pbn << log_dev_bshift);
2420    #endif
2421            error = wapbl_read(scratch, 2<<log_dev_bshift, devvp, pbn);
2422            if (error)
2423                    goto errout;
2424    
2425            wch = (struct wapbl_wc_header *)scratch;
(gdb) p pbn
$25 = 8388608

and the ultimate result is that bread tries to pread from a block way beyond the end of the
partition (i.e., 4096*pbn bytes into the partition instead of 512*pbn bytes), because
dev_bsize == 4096:

(gdb) fr 0
#0  bread (fd=5, buf=0x7f7ff7b1e000 "", blk=8388608, size=1024)
    at /usr/src/sbin/fsck_ffs/utilities.c:344
344             if ((pread(fd, buf, (int)size, offset) == size) &&
(gdb) l
339             int i, errs;
340             off_t offset;
341     
342             offset = blk;
343             offset *= dev_bsize;
344             if ((pread(fd, buf, (int)size, offset) == size) &&
345                 read_wapbl(buf, size, blk) == 0)
346                     return (0);
347             rwerror("READ", blk);
348             errs = 0;
(gdb) p offset
$37 = 34359738368
(gdb) p blk
$38 = 8388608
(gdb) p dev_bsize
$39 = 4096

I am not sufficiently familiar with this code to tell whether the cleanest solution
is to pass the blocksize further down, to recalculate pbn in terms of the larger
correct blocksize, to make sure the location of the log is stored in units that match
the device, or something else.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.