NetBSD Problem Report #43905

From bad@bsd.de  Sat Sep 25 11:48:10 2010
Return-Path: <bad@bsd.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 5323263B874
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 25 Sep 2010 11:48:10 +0000 (UTC)
Message-Id: <20100925114842.AD17E89@limiting-factor.k.bsd.de>
Date: Sat, 25 Sep 2010 13:48:42 +0200 (MEST)
From: bad@bsd.de
Reply-To: bad@bsd.de
To: gnats-bugs@gnats.NetBSD.org
Subject: fsck_root fails on write protected partition
X-Send-Pr-Version: 3.95

>Number:         43905
>Category:       misc
>Synopsis:       fsck_root fails spectaculary trying to check a clean FS on a read-only block device
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    misc-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Sep 25 11:50:00 +0000 2010
>Last-Modified:  Mon Sep 27 06:30:02 +0000 2010
>Originator:     Christoph Badura
>Release:        NetBSD 5.99.39
>Organization:
netbsd bozotic software testing labs
>Environment:


System: NetBSD  5.99.39 NetBSD 5.99.39 (XEN3_DOMU) #0: Fri Sep 24 05:34:52 UTC 2010  builds@b6.netbsd.org:/home/builds/ab/HEAD/i386/201009240000Z-obj/home/builds/ab/HEAD/src/sys/arch/i386/compile/XEN3_DOMU i386
Architecture: i386
Machine: i386
>Description:
Boot NetBSD with a root file system on a read-only block device.
I've been using shared root file system images for Xen domU for a few years
in that setup.

With the new fsck_chroot the boot strap aborts as follows:

Starting root file system check:
NO WRITE ACCESS
/dev/rxbd0a: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.
Automatic file system check failed; help!
ERROR: ABORTING BOOT (sending SIGTERM to parent)!

Note that the root fs is marked as read-only in fstab and this has worked
for years:
/dev/xbd0a              /               ffs     ro

After exporting the root image as writable to the domU the comedy continues
with:

Starting root file system check:
/dev/rxbd0a: file system is clean; not checking

There's at least two bugs here:

1) fsck aborts because it can't get write access to the block device
even thought he file system is clean and it doesn't actually want to write
to the device anyway.

2) valid fstabs that have been working since at least 3.0 suddenly fail

>How-To-Repeat:

create a NetBSD minimal installation in a vnd backed image.
export that image as read-only to a domU.
boot the domU.

>Fix:


>Audit-Trail:
From: Christoph Badura <bad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/43905 CVS commit: src/etc/rc.d
Date: Sat, 25 Sep 2010 15:10:15 +0000

 Module Name:	src
 Committed By:	bad
 Date:		Sat Sep 25 15:10:14 UTC 2010

 Modified Files:
 	src/etc/rc.d: fsck_root

 Log Message:
 Treat empty or missing fs_passno field like it has a value of 0 as fstab(5)
 specifies.
 Related to PR misc/43905 but does not fix the underlying issues.


 To generate a diff of this commit:
 cvs rdiff -u -r1.4 -r1.5 src/etc/rc.d/fsck_root

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Alan Barrett <apb@cequrux.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: misc/43905: fsck_root fails on write protected partition
Date: Sun, 26 Sep 2010 21:50:30 +0200

 On Sat, 25 Sep 2010, bad@bsd.de wrote:
 > There's at least two bugs here:
 > 
 > 1) fsck aborts because it can't get write access to the block device
 > even thought he file system is clean and it doesn't actually want to write
 > to the device anyway.

 /etc/rc.d/fsck_root invoked "fsck -p /" (assuming that fsck_flags had
 the default value of "-p").  Given that, it's not clear to me that fsck
 should have behaved any differently.  (It is clear that the fsck_root
 script should not have invoked fsck, but that falls under your point 2
 below.)

 > 2) valid fstabs that have been working since at least 3.0 suddenly fail

 Yes, a missing fs_passno in /etc/fstab should behave like fs_passno-0,
 and should prevent the fsck_root script from trying to fsck the root
 file system.  I see you fixed that already.

 --apb (Alan Barrett)

From: Christoph Badura <bad@bsd.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: misc/43905: fsck_root fails on write protected partition
Date: Sun, 26 Sep 2010 23:42:26 +0200

 On Sun, Sep 26, 2010 at 07:55:03PM +0000, Alan Barrett wrote:
 >  On Sat, 25 Sep 2010, bad@bsd.de wrote:
 >  > 1) fsck aborts because it can't get write access to the block device
 >  > even thought he file system is clean and it doesn't actually want to write
 >  > to the device anyway.
 >  
 >  /etc/rc.d/fsck_root invoked "fsck -p /" (assuming that fsck_flags had
 >  the default value of "-p").  Given that, it's not clear to me that fsck
 >  should have behaved any differently.

 How about "it should behave like TFM(tm) says"?  Or, alternitively "it should
 behave like it has the last 15+ years"?  That has worked satisfactorily
 for me.

 youll-thank-me-later!bad 109 % uname -r; grep xbd0a /etc/fstab; sudo fsck -p /
 5.1_RC4
 /dev/xbd0a      /       ffs     ro
 Password:
 /dev/rxbd0a: file system is clean; not checking

 fsck_flags is at the default value on both domUs.

 I don't know why that blows up on -current.  And I don't expect you to answer
 that.

 I don't know what fsck_root is for either.  Lacking any documentation at
 all it looks to me like it is trying to duplicate standard functionality.

 --chris

From: Alan Barrett <apb@cequrux.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: misc/43905: fsck_root fails on write protected partition
Date: Mon, 27 Sep 2010 08:27:56 +0200

 On Sun, 26 Sep 2010, Christoph Badura wrote:
 >  >  /etc/rc.d/fsck_root invoked "fsck -p /" (assuming that fsck_flags
 >  >  had the default value of "-p").  Given that, it's not clear to me
 >  >  that fsck should have behaved any differently.
 >
 >  How about "it should behave like TFM(tm) says"?  Or, alternitively
 >  "it should behave like it has the last 15+ years"?  That has worked
 >  satisfactorily for me.
 >  
 >  youll-thank-me-later!bad 109 % uname -r; grep xbd0a /etc/fstab; sudo fsck -p /
 >  5.1_RC4
 >  /dev/xbd0a      /       ffs     ro
 >  Password:
 >  /dev/rxbd0a: file system is clean; not checking

 I don't see anything in the fsck(8) man page that promises not to open
 devices for writing if the file system is clean.  Since you show that
 this used to work, there are at last two bugs here:

 1) there was a regression in behaviour;
 2) the behaviour is not documented.

 >  I don't know what fsck_root is for either.  Lacking any documentation at
 >  all it looks to me like it is trying to duplicate standard functionality.

 I don't know where (or even whether) it's documented, but the reason
 for having two separate rc.d scripts to fsck the root file system and
 to fsck everything else, is to allow other rc.d scripts to be run in
 between.  The following explanation appears in the log message for
 revision 1.1 of fsck_root:

 " revision 1.1
 " date: 2009-04-21 18:08:57 +0200;  author: joerg;  state: Exp;
 " Split fsck during boot into two phases. Check the root file system
 " first, mount root and run the various disk providers. Add swap and
 " check the remaining file systems after that.
 " This breaks the dependency cycle for lvm, which needs writeable /dev.
 " Depend on rndctl in cgd.

 --apb (Alan Barrett)

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.