NetBSD Problem Report #39371
From www@NetBSD.org Sun Aug 17 19:59:38 2008
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id C6F4063B877
for <gnats-bugs@gnats.netbsd.org>; Sun, 17 Aug 2008 19:59:38 +0000 (UTC)
Message-Id: <20080817195938.8D00763B853@narn.NetBSD.org>
Date: Sun, 17 Aug 2008 19:59:38 +0000 (UTC)
From: tnn@NetBSD.org
Reply-To: tnn@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: wapbl should allow mounting "/" even if journal is hosed
X-Send-Pr-Version: www-1.0
>Number: 39371
>Category: kern
>Synopsis: wapbl should allow mounting "/" even if journal is hosed
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: joerg
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Aug 17 20:00:01 +0000 2008
>Closed-Date: Sun Aug 21 20:04:10 +0000 2016
>Last-Modified: Sun Aug 21 20:04:10 +0000 2016
>Originator: Tobias Nygren
>Release: 4.99.72
>Organization:
>Environment:
NetBSD x40.int.nygren.pp.se 4.99.72 NetBSD 4.99.72 (GENERIC.x40) #1: Tue Aug 12 12:40:16 CEST 2008 tnn@x40.int.nygren.pp.se:/usb/obj/sys/arch/i386/compile/GENERIC.x40 i386
>Description:
The journal of my root filesystem became corrupt for some reason, leaving me with no sane recovery method:
boot device: wd0
root on wd0a dumps on wd0b
Unrecognized wapbl type: 0x08012180
no file system for wd0 (dev 0x0)
cannot mount root, error = 79
root device (default wd0a): ddb
I had to use ddb to trick the kernel into mounting the root fs without the FS_DOWAPBL flag, after which I could run fsck(8) to recover.
I think we should allow mounting "/" read-only, even if log replay fails.
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->joerg
Responsible-Changed-By: joerg@NetBSD.org
Responsible-Changed-When: Wed, 12 Nov 2008 16:04:07 +0000
Responsible-Changed-Why:
I'll deal with it.
From: Manuel Bouyer <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/39371 CVS commit: [netbsd-5] src/sys/dev/raidframe
Date: Fri, 16 Jan 2009 22:43:34 +0000 (UTC)
Module Name: src
Committed By: bouyer
Date: Fri Jan 16 22:43:34 UTC 2009
Modified Files:
src/sys/dev/raidframe [netbsd-5]: rf_netbsdkintf.c
Log Message:
Pull up following revision(s) (requested by oster in ticket #268):
sys/dev/raidframe/rf_netbsdkintf.c: revision 1.252
Implement DIOCCACHESYNC for RAIDframe too,
should help prevent journal corruption that causes PR#39371
To generate a diff of this commit:
cvs rdiff -r1.250 -r1.250.4.1 src/sys/dev/raidframe/rf_netbsdkintf.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Sergio L. Pascual" <slp@sinrega.org>
To: "gnats-bugs@netbsd.org" <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/39371
Date: Tue, 13 Jan 2015 16:30:52 +0100
This is a multi-part message in MIME format.
--nextPart2671283.tZjUFLN22J
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
The situation has somewhat improved these days (tested with -current). Now a
root filesystem with a corrupted WAPBL replay is mounted read only, thanks to
the MNT_FORCE flag.
The bad news is that kernel will panic when the init scripts try to remount
the filesystem read-write and WAPBL enabled:
<--- cut here --->
boot device: wd0
root on wd0a dumps on wd0b
Unrecognized wapbl magic: 0x5741424c
root file system type: ffs
kern.module.path=/stand/amd64/7.99.4/modules
clock: unknown CMOS layout
Tue Jan 13 14:57:12 UTC 2015
Starting root file system check:
/dev/rwd0a: file system is journaled; not checking
/: replaying log to disk
uvm_fault(0xfffffe807ed61e68, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff809fd84c cs 8 rflags 10286 cr2 1c ilevel 0 rsp ff
fffe8043f7cd18
curlwp 0xfffffe807f121180 pid 62.1 lowest kstack 0xfffffe8043f792c0
kernel: page fault trap, code=0
Stopped in pid 62.1 (mount_ffs) at netbsd:wapbl_replay_write+0x17: movl
1c(%rdi),%ecx
db{0}>
<--- cut here --->
This happens because, as ffs_wapbl_replay_start fails, mp->mnt_wapbl_replay is
NULL (there's a KDASSERT there, but I was running a non-DEBUG kernel), which
generates an exception when wapbl_replay_write tries to access this structure.
This patch replaces the KDASSERT with a check which, in case of mp-
>mnt_wapbl_replay being uninitialized, returns EFTYPE, allowing the user to
use the emergency shell to run fsck on the root filesystem.
--nextPart2671283.tZjUFLN22J
Content-Disposition: attachment; filename="netbsd_wapbl_rootmount.diff"
Content-Transfer-Encoding: 7Bit
Content-Type: text/x-patch; charset="UTF-8"; name="netbsd_wapbl_rootmount.diff"
diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
index 1d1e32a..adb4970 100644
--- a/sys/ufs/ffs/ffs_vfsops.c
+++ b/sys/ufs/ffs/ffs_vfsops.c
@@ -585,9 +585,13 @@ ffs_mount(struct mount *mp, const char *path, void *data, size_t *data_len)
fs->fs_fmod = 1;
#ifdef WAPBL
if (fs->fs_flags & FS_DOWAPBL) {
+ if (!mp->mnt_wapbl_replay) {
+ printf("%s: WAPBL replay corrupted\n",
+ mp->mnt_stat.f_mntonname);
+ return EFTYPE;
+ }
printf("%s: replaying log to disk\n",
mp->mnt_stat.f_mntonname);
- KDASSERT(mp->mnt_wapbl_replay);
error = wapbl_replay_write(mp->mnt_wapbl_replay,
devvp);
if (error) {
--nextPart2671283.tZjUFLN22J--
From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/39371 CVS commit: src/sys/ufs/ffs
Date: Thu, 15 Jan 2015 22:57:52 -0500
Module Name: src
Committed By: christos
Date: Fri Jan 16 03:57:52 UTC 2015
Modified Files:
src/sys/ufs/ffs: ffs_vfsops.c
Log Message:
PR/39371: Tobias Nygren: Don't fail mounting root if WAPBL log is corrupt.
Patch from Sergio L. Pascual.
XXX: pullup-7
To generate a diff of this commit:
cvs rdiff -u -r1.304 -r1.305 src/sys/ufs/ffs/ffs_vfsops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/39371 CVS commit: [netbsd-7] src/sys/ufs
Date: Wed, 28 Jan 2015 18:34:11 +0000
Module Name: src
Committed By: martin
Date: Wed Jan 28 18:34:11 UTC 2015
Modified Files:
src/sys/ufs/ffs [netbsd-7]: ffs_vfsops.c
src/sys/ufs/ufs [netbsd-7]: ufs_extern.h ufs_inode.c ufs_vnops.c
Log Message:
Pull up following revision(s) (requested by christos in ticket #425):
sys/ufs/ufs/ufs_inode.c: revision 1.91-1.92
sys/ufs/ufs/ufs_vnops.c: revision 1.223-1.224
sys/ufs/ufs/ufs_extern.h: revision 1.76-1.77
sys/ufs/ffs/ffs_vfsops.c: revision 1.303-1.305
Add debugging for mount...
Merge some error returns
Check more errors
Restore apple ufs error handling.
Move and unify indirect block truncate algorithm into a separate function.
PR/39371: Tobias Nygren: Don't fail mounting root if WAPBL log is corrupt.
Patch from Sergio L. Pascual.
To generate a diff of this commit:
cvs rdiff -u -r1.299.2.2 -r1.299.2.3 src/sys/ufs/ffs/ffs_vfsops.c
cvs rdiff -u -r1.75 -r1.75.2.1 src/sys/ufs/ufs/ufs_extern.h
cvs rdiff -u -r1.90 -r1.90.2.1 src/sys/ufs/ufs/ufs_inode.c
cvs rdiff -u -r1.221 -r1.221.2.1 src/sys/ufs/ufs/ufs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: PR/39371 CVS commit: src/sys/ufs/ffs
Date: Sat, 14 Feb 2015 10:25:54 +0000
On Fri, Jan 16, 2015 at 04:00:01AM +0000, Christos Zoulas wrote:
> Module Name: src
> Committed By: christos
> Date: Fri Jan 16 03:57:52 UTC 2015
>
> Modified Files:
> src/sys/ufs/ffs: ffs_vfsops.c
>
> Log Message:
> PR/39371: Tobias Nygren: Don't fail mounting root if WAPBL log is corrupt.
> Patch from Sergio L. Pascual.
> XXX: pullup-7
>
>
> To generate a diff of this commit:
> cvs rdiff -u -r1.304 -r1.305 src/sys/ufs/ffs/ffs_vfsops.c
Does this need to be in -6 or can the PR be closed?
--
David A. Holland
dholland@netbsd.org
State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sun, 21 Aug 2016 19:47:30 +0000
State-Changed-Why:
Patch was applied 15 Jan 2015. Can you confirm this fixes the problem?
State-Changed-From-To: feedback->closed
State-Changed-By: tnn@NetBSD.org
State-Changed-When: Sun, 21 Aug 2016 20:04:10 +0000
State-Changed-Why:
The committed patches look sane to me and the problem never
occurred again after, so I'd say it is fixed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.