NetBSD Problem Report #43905
From bad@bsd.de Sat Sep 25 11:48:10 2010
Return-Path: <bad@bsd.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 5323263B874
for <gnats-bugs@gnats.NetBSD.org>; Sat, 25 Sep 2010 11:48:10 +0000 (UTC)
Message-Id: <20100925114842.AD17E89@limiting-factor.k.bsd.de>
Date: Sat, 25 Sep 2010 13:48:42 +0200 (MEST)
From: bad@bsd.de
Reply-To: bad@bsd.de
To: gnats-bugs@gnats.NetBSD.org
Subject: fsck_root fails on write protected partition
X-Send-Pr-Version: 3.95
>Number: 43905
>Category: misc
>Synopsis: fsck_root fails spectaculary trying to check a clean FS on a read-only block device
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: misc-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Sep 25 11:50:00 +0000 2010
>Last-Modified: Mon Sep 27 06:30:02 +0000 2010
>Originator: Christoph Badura
>Release: NetBSD 5.99.39
>Organization:
netbsd bozotic software testing labs
>Environment:
System: NetBSD 5.99.39 NetBSD 5.99.39 (XEN3_DOMU) #0: Fri Sep 24 05:34:52 UTC 2010 builds@b6.netbsd.org:/home/builds/ab/HEAD/i386/201009240000Z-obj/home/builds/ab/HEAD/src/sys/arch/i386/compile/XEN3_DOMU i386
Architecture: i386
Machine: i386
>Description:
Boot NetBSD with a root file system on a read-only block device.
I've been using shared root file system images for Xen domU for a few years
in that setup.
With the new fsck_chroot the boot strap aborts as follows:
Starting root file system check:
NO WRITE ACCESS
/dev/rxbd0a: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.
Automatic file system check failed; help!
ERROR: ABORTING BOOT (sending SIGTERM to parent)!
Note that the root fs is marked as read-only in fstab and this has worked
for years:
/dev/xbd0a / ffs ro
After exporting the root image as writable to the domU the comedy continues
with:
Starting root file system check:
/dev/rxbd0a: file system is clean; not checking
There's at least two bugs here:
1) fsck aborts because it can't get write access to the block device
even thought he file system is clean and it doesn't actually want to write
to the device anyway.
2) valid fstabs that have been working since at least 3.0 suddenly fail
>How-To-Repeat:
create a NetBSD minimal installation in a vnd backed image.
export that image as read-only to a domU.
boot the domU.
>Fix:
>Audit-Trail:
From: Christoph Badura <bad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/43905 CVS commit: src/etc/rc.d
Date: Sat, 25 Sep 2010 15:10:15 +0000
Module Name: src
Committed By: bad
Date: Sat Sep 25 15:10:14 UTC 2010
Modified Files:
src/etc/rc.d: fsck_root
Log Message:
Treat empty or missing fs_passno field like it has a value of 0 as fstab(5)
specifies.
Related to PR misc/43905 but does not fix the underlying issues.
To generate a diff of this commit:
cvs rdiff -u -r1.4 -r1.5 src/etc/rc.d/fsck_root
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Alan Barrett <apb@cequrux.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: misc/43905: fsck_root fails on write protected partition
Date: Sun, 26 Sep 2010 21:50:30 +0200
On Sat, 25 Sep 2010, bad@bsd.de wrote:
> There's at least two bugs here:
>
> 1) fsck aborts because it can't get write access to the block device
> even thought he file system is clean and it doesn't actually want to write
> to the device anyway.
/etc/rc.d/fsck_root invoked "fsck -p /" (assuming that fsck_flags had
the default value of "-p"). Given that, it's not clear to me that fsck
should have behaved any differently. (It is clear that the fsck_root
script should not have invoked fsck, but that falls under your point 2
below.)
> 2) valid fstabs that have been working since at least 3.0 suddenly fail
Yes, a missing fs_passno in /etc/fstab should behave like fs_passno-0,
and should prevent the fsck_root script from trying to fsck the root
file system. I see you fixed that already.
--apb (Alan Barrett)
From: Christoph Badura <bad@bsd.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: misc/43905: fsck_root fails on write protected partition
Date: Sun, 26 Sep 2010 23:42:26 +0200
On Sun, Sep 26, 2010 at 07:55:03PM +0000, Alan Barrett wrote:
> On Sat, 25 Sep 2010, bad@bsd.de wrote:
> > 1) fsck aborts because it can't get write access to the block device
> > even thought he file system is clean and it doesn't actually want to write
> > to the device anyway.
>
> /etc/rc.d/fsck_root invoked "fsck -p /" (assuming that fsck_flags had
> the default value of "-p"). Given that, it's not clear to me that fsck
> should have behaved any differently.
How about "it should behave like TFM(tm) says"? Or, alternitively "it should
behave like it has the last 15+ years"? That has worked satisfactorily
for me.
youll-thank-me-later!bad 109 % uname -r; grep xbd0a /etc/fstab; sudo fsck -p /
5.1_RC4
/dev/xbd0a / ffs ro
Password:
/dev/rxbd0a: file system is clean; not checking
fsck_flags is at the default value on both domUs.
I don't know why that blows up on -current. And I don't expect you to answer
that.
I don't know what fsck_root is for either. Lacking any documentation at
all it looks to me like it is trying to duplicate standard functionality.
--chris
From: Alan Barrett <apb@cequrux.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: misc/43905: fsck_root fails on write protected partition
Date: Mon, 27 Sep 2010 08:27:56 +0200
On Sun, 26 Sep 2010, Christoph Badura wrote:
> > /etc/rc.d/fsck_root invoked "fsck -p /" (assuming that fsck_flags
> > had the default value of "-p"). Given that, it's not clear to me
> > that fsck should have behaved any differently.
>
> How about "it should behave like TFM(tm) says"? Or, alternitively
> "it should behave like it has the last 15+ years"? That has worked
> satisfactorily for me.
>
> youll-thank-me-later!bad 109 % uname -r; grep xbd0a /etc/fstab; sudo fsck -p /
> 5.1_RC4
> /dev/xbd0a / ffs ro
> Password:
> /dev/rxbd0a: file system is clean; not checking
I don't see anything in the fsck(8) man page that promises not to open
devices for writing if the file system is clean. Since you show that
this used to work, there are at last two bugs here:
1) there was a regression in behaviour;
2) the behaviour is not documented.
> I don't know what fsck_root is for either. Lacking any documentation at
> all it looks to me like it is trying to duplicate standard functionality.
I don't know where (or even whether) it's documented, but the reason
for having two separate rc.d scripts to fsck the root file system and
to fsck everything else, is to allow other rc.d scripts to be run in
between. The following explanation appears in the log message for
revision 1.1 of fsck_root:
" revision 1.1
" date: 2009-04-21 18:08:57 +0200; author: joerg; state: Exp;
" Split fsck during boot into two phases. Check the root file system
" first, mount root and run the various disk providers. Add swap and
" check the remaining file systems after that.
" This breaks the dependency cycle for lvm, which needs writeable /dev.
" Depend on rndctl in cgd.
--apb (Alan Barrett)
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.