NetBSD Problem Report #40474
From root@confusion.i.kivinen.iki.fi Sun Jan 25 11:01:24 2009
Return-Path: <root@confusion.i.kivinen.iki.fi>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 029E563BAB8
for <gnats-bugs@gnats.NetBSD.org>; Sun, 25 Jan 2009 11:01:23 +0000 (UTC)
Message-Id: <20090125090154.F185B32CA023@confusion.i.kivinen.iki.fi>
Date: Sun, 25 Jan 2009 11:01:54 +0200 (EET)
From: kivinen@iki.fi
Reply-To: kivinen@iki.fi
To: gnats-bugs@gnats.NetBSD.org
Subject: Kernel panic after remounting raid root with softdep
X-Send-Pr-Version: 3.95
>Number: 40474
>Category: kern
>Synopsis: Kernel panic after remounting raid root with softdep
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Jan 25 11:05:00 +0000 2009
>Closed-Date: Wed Apr 01 04:09:12 +0000 2009
>Last-Modified: Wed Apr 01 04:09:12 +0000 2009
>Originator: Tero Kivinen
>Release: NetBSD 5.0_BETA
>Organization:
>Environment:
System: NetBSD confusion.i.kivinen.iki.fi 5.0_BETA NetBSD 5.0_BETA (GENERIC) #0: Sat Jan 24 19:01:52 EET 2009 root@:/usr/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
The system has mirrored raid as root, and when I boot it up
and then run mount -o softdep -u / the command works, but after
that next time I start "vi" the machine panics. The mount
command does not show that the softdep took effect (i.e
it still list options as local), and if I can still
run some other programs from the root filesystem. The
file edited with vi does not matter, and crash happens
even if no file is given. If this is tried on the second
(non root disk) then this does not happen, even if the
vi binary is copied to the other disk (i.e. mount the
another disk without softdep, remount it with -u -o softdep
and then try to run vi from there do work).
I assume this might be related to somehow to the way
vi creates temp files or something.
I have several kernel dumps so if more information is
needed I can get them with gdb. Here is stack trace
from gdb:
GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386--netbsdelf"...
(gdb) target kvm netbsd.4.core
#0 0xc053acf2 in cpu_reboot ()
(gdb) bt
#0 0xc053acf2 in cpu_reboot ()
#1 0xc01b2d19 in db_sync_cmd ()
#2 0xc01b3468 in db_command ()
#3 0xc01b36e3 in db_command_loop ()
#4 0xc01b6600 in db_trap ()
#5 0xc0535d3b in kdb_trap ()
#6 0xc053d993 in trap ()
#7 0xc010cb1f in calltrap ()
#8 0xc053441c in breakpoint ()
#9 0xc047f808 in panic ()
#10 0xc03c6f28 in softdep_setup_inomapdep ()
#11 0xc03b2626 in ffs_nodealloccg ()
#12 0xc03b0311 in ffs_hashalloc ()
#13 0xc03b487a in ffs_valloc ()
#14 0xc03f1964 in ufs_makeinode ()
#15 0xc03f1dea in ufs_create ()
#16 0xc04c72bc in VOP_CREATE ()
#17 0xc04c1912 in vn_open ()
#18 0xc04bdd00 in sys_open ()
#19 0xc053d45d in syscall ()
#20 0xc0100514 in syscall1 ()
(gdb) quit
>How-To-Repeat:
Boot up machine with root on mirrored raid, without
listing softdep option in fstab entry. Remount root
using softdep with
# mount -u -o softdep /
Start vi:
# vi
Kernel panics immediately. I have not verified
whether the raid is required, but it seems to
be related to the root disk.
>Fix:
Do not remount root with softdep.
>Release-Note:
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: kivinen@iki.fi
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/40474: Kernel panic after remounting raid root with
softdep
Date: Sun, 25 Jan 2009 21:07:55 +0000
On Sun, Jan 25, 2009 at 11:05:00AM +0000, kivinen@iki.fi wrote:
> The system has mirrored raid as root, and when I boot it up
> and then run mount -o softdep -u / the command works, but after
> that next time I start "vi" the machine panics. The mount
> command does not show that the softdep took effect (i.e
> it still list options as local), and if I can still
> run some other programs from the root filesystem. The
> file edited with vi does not matter, and crash happens
> even if no file is given. If this is tried on the second
> (non root disk) then this does not happen, even if the
> vi binary is copied to the other disk (i.e. mount the
> another disk without softdep, remount it with -u -o softdep
> and then try to run vi from there do work).
>
> I assume this might be related to somehow to the way
> vi creates temp files or something.
I'm pretty sure turning softdep on after mount is a known problem, and
that there's an existing PR about it, but I can't find one. Anyway, I
doubt it's got much to do with either vi or raid.
I think I know what's going on, maybe, but I don't have time right now
to look into it in any detail, and unfortunately I don't think anyone
else is interested in fixing softdep issues... :(
The immediate problem appears to be that it's allocating the same
inode twice. I think this is probably happening either because of a
general known problem with mount updates not flushing things properly
(e.g. PR 30525)... or because softdep uses different rules for
handling free object bitmaps from baseline ffs, and after the mode
transition not all filesystem state is arranged the way softdep
expects.
Did the fsck after the crash show any problems? And, if you reboot
again without having used softdep, boot to single user, and force
running a fsck, does *that* show any problems? (In fact, it's probably
a good idea to do such a fsck on general principles; fsck behaves
differently if it thinks softdep was in use, so crashing in this
fashion might have confused it.)
If you have a test machine to crash on, the following information
might be useful, maybe:
- If you sync after the remount, does the crash still occur?
(I expect it will, but it would be interesting if it didn't.)
- Can you reproduce the crash without vi but instead by manually
creating a scratch file in (probably) /var/tmp or
/var/tmp/vi.recover?
- If you boot single-user, remount / read-write, sync, remount /
again to turn on softdep, sync, and then run vi, does it crash?
If not, does omitting any of the syncs change the behavior?
- If not, and you do the same, but create a scratch file in /tmp or
/var/tmp before turning on softdep, does it then crash? If not,
does creating a lot of scratch files instead of just one trigger
the crash?
- It might also be interesting to deliberately hit the power switch
at various points, because this allows comparing what's been
written to disk with what's being kept in memory, and any
discrepancy is interesting. However, this requires mucking about
with fsdb and I'm not sure exactly what to look for either.
(None of this is worth crashing anything other than a test machine to
find out, though.)
--
David A. Holland
dholland@netbsd.org
From: Andrew Doran <ad@netbsd.org>
To: David Holland <dholland-bugs@netbsd.org>
Cc: kivinen@iki.fi, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/40474: Kernel panic after remounting raid root with softdep
Date: Sun, 25 Jan 2009 21:40:58 +0000
On Sun, Jan 25, 2009 at 09:07:55PM +0000, David Holland wrote:
> On Sun, Jan 25, 2009 at 11:05:00AM +0000, kivinen@iki.fi wrote:
> > The system has mirrored raid as root, and when I boot it up
> > and then run mount -o softdep -u / the command works, but after
> > that next time I start "vi" the machine panics. The mount
> > command does not show that the softdep took effect (i.e
> > it still list options as local), and if I can still
> > run some other programs from the root filesystem. The
> > file edited with vi does not matter, and crash happens
> > even if no file is given. If this is tried on the second
> > (non root disk) then this does not happen, even if the
> > vi binary is copied to the other disk (i.e. mount the
> > another disk without softdep, remount it with -u -o softdep
> > and then try to run vi from there do work).
> >
> > I assume this might be related to somehow to the way
> > vi creates temp files or something.
>
> I'm pretty sure turning softdep on after mount is a known problem,
Yes, and turning it off.
Andrew
From: Tero Kivinen <kivinen@iki.fi>
To: David Holland <dholland-bugs@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Subject: Re: kern/40474: Kernel panic after remounting raid root with
softdep
Date: Tue, 27 Jan 2009 15:25:00 +0200
David Holland writes:
> I'm pretty sure turning softdep on after mount is a known problem, and
> that there's an existing PR about it, but I can't find one. Anyway, I
> doubt it's got much to do with either vi or raid.
I remember doing it earlier. Usually quite soon after install, i.e.
noticed that fstab is missing softdeps and unpacking pkgsrc.tar.gz was
slow because of that, and then added softdeps to fstab and said mount
-u -o softdep / to get it effective immediately for the already
running tar...
I noticed the problem now as I tried to do the same and the machine
paniced when I tried to edit fstab after remount...
> The immediate problem appears to be that it's allocating the same
> inode twice. I think this is probably happening either because of a
> general known problem with mount updates not flushing things properly
> (e.g. PR 30525)... or because softdep uses different rules for
> handling free object bitmaps from baseline ffs, and after the mode
> transition not all filesystem state is arranged the way softdep
> expects.
It could be that it is actually crashing on file creation, so
allocating inode twice could be the rason (vi creates the recovery
file).
> Did the fsck after the crash show any problems?
The first fsck removed lots of files, so many that I needed to rm -rf
/usr/src and cvs checkout it again to get it working again. I think I
had just before (or during, dont remember exactly) said cvs update on
the /usr/src, so most likely it several CVS/* files.
Later times when I tried it immediately after boot, fsck reported some
unreferenced files etc, but nothing special there.
> And, if you reboot again without having used softdep, boot to single
> user, and force running a fsck, does *that* show any problems? (In
> fact, it's probably a good idea to do such a fsck on general
> principles; fsck behaves differently if it thinks softdep was in
> use, so crashing in this fashion might have confused it.)
As the /etc/fstab was still configured without softdep all the fsck
runs were done without softdeps.
> If you have a test machine to crash on, the following information
> might be useful, maybe:
My test machine was my main machine I was updating to NetBSD-5.0beta
and as it is already done, I don't want to crash it anymore
(reconstructing 1.5 TB of raid takes a long time...).
>
> - If you sync after the remount, does the crash still occur?
> (I expect it will, but it would be interesting if it didn't.)
I think I tried that already, i.e. said:
# sync
# mount -u -o softdep /
# sync
# sync
# sync
# vi /etc/fstab
<crash>
My normal reflex is to type sync few times at that kind of times, so
my guess was that I tried that too (but I am not sure, as I didn't
record my commands exactly).
> (None of this is worth crashing anything other than a test machine to
> find out, though.)
As I said my test machine is already back in use, so cannot do those
things anymore...
--
kivinen@iki.fi
From: David Holland <dholland-bugs@netbsd.org>
To: Tero Kivinen <kivinen@iki.fi>
Cc: David Holland <dholland-bugs@netbsd.org>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Subject: Re: kern/40474: Kernel panic after remounting raid root with
softdep
Date: Sat, 21 Feb 2009 08:06:27 +0000
On Tue, Jan 27, 2009 at 03:25:00PM +0200, Tero Kivinen wrote:
> > The immediate problem appears to be that it's allocating the same
> > inode twice. I think this is probably happening either because of a
> > general known problem with mount updates not flushing things properly
> > (e.g. PR 30525)... or because softdep uses different rules for
> > handling free object bitmaps from baseline ffs, and after the mode
> > transition not all filesystem state is arranged the way softdep
> > expects.
>
> It could be that it is actually crashing on file creation, so
> allocating inode twice could be the rason (vi creates the recovery
> file).
If it's ffsv2, it could also conceivably be PR 40525.
> As I said my test machine is already back in use, so cannot do those
> things anymore...
Right. Well, I'll put it on the list for softdep... :-|
--
David A. Holland
dholland@netbsd.org
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40474 CVS commit: src
Date: Sun, 22 Feb 2009 20:28:07 +0000 (UTC)
Module Name: src
Committed By: ad
Date: Sun Feb 22 20:28:07 UTC 2009
Modified Files:
src/doc: CHANGES
src/lib/libp2k: p2k.c
src/sbin/fsck_lfs: lfs.c
src/sbin/mount: mount.8
src/sbin/newfs_lfs: make_lfs.c
src/sbin/tunefs: tunefs.8 tunefs.c
src/sys/arch/vax/conf: VAX780
src/sys/conf: files
src/sys/kern: sys_aio.c vfs_bio.c vfs_subr.c vfs_syscalls.c
src/sys/miscfs/specfs: spec_vnops.c
src/sys/miscfs/syncfs: sync_subr.c
src/sys/modules/ffs: Makefile
src/sys/rump/fs/lib/libffs: Makefile
src/sys/rump/include/rump: rump.h
src/sys/rump/librump/rumpvfs: rump_vfs.c vm_vfs.c
src/sys/sys: buf.h vnode.h
src/sys/ufs: files.ufs
src/sys/ufs/ffs: ffs_alloc.c ffs_balloc.c ffs_extern.h ffs_inode.c
ffs_snapshot.c ffs_vfsops.c ffs_vnops.c ffs_wapbl.c
src/sys/ufs/lfs: lfs_rfw.c lfs_vfsops.c lfs_vnops.c
src/sys/ufs/ufs: inode.h ufs_dirhash.c ufs_extern.h ufs_inode.c
ufs_lookup.c ufs_readwrite.c ufs_vnops.c ufs_wapbl.c
src/sys/uvm: uvm_pager.c
Removed Files:
src/sys/rump/librump/rumpkern/opt: opt_softdep.h
src/sys/ufs/ffs: ffs_softdep.c ffs_softdep.stub.c softdep.h
Log Message:
PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep
Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
To generate a diff of this commit:
cvs rdiff -r1.1191 -r1.1192 src/doc/CHANGES
cvs rdiff -r1.8 -r1.9 src/lib/libp2k/p2k.c
cvs rdiff -r1.29 -r1.30 src/sbin/fsck_lfs/lfs.c
cvs rdiff -r1.65 -r1.66 src/sbin/mount/mount.8
cvs rdiff -r1.13 -r1.14 src/sbin/newfs_lfs/make_lfs.c
cvs rdiff -r1.37 -r1.38 src/sbin/tunefs/tunefs.8 src/sbin/tunefs/tunefs.c
cvs rdiff -r1.1 -r1.2 src/sys/arch/vax/conf/VAX780
cvs rdiff -r1.942 -r1.943 src/sys/conf/files
cvs rdiff -r1.22 -r1.23 src/sys/kern/sys_aio.c
cvs rdiff -r1.215 -r1.216 src/sys/kern/vfs_bio.c
cvs rdiff -r1.368 -r1.369 src/sys/kern/vfs_subr.c
cvs rdiff -r1.388 -r1.389 src/sys/kern/vfs_syscalls.c
cvs rdiff -r1.122 -r1.123 src/sys/miscfs/specfs/spec_vnops.c
cvs rdiff -r1.36 -r1.37 src/sys/miscfs/syncfs/sync_subr.c
cvs rdiff -r1.2 -r1.3 src/sys/modules/ffs/Makefile
cvs rdiff -r1.6 -r1.7 src/sys/rump/fs/lib/libffs/Makefile
cvs rdiff -r1.9 -r1.10 src/sys/rump/include/rump/rump.h
cvs rdiff -r1.1 -r0 src/sys/rump/librump/rumpkern/opt/opt_softdep.h
cvs rdiff -r1.12 -r1.13 src/sys/rump/librump/rumpvfs/rump_vfs.c
cvs rdiff -r1.3 -r1.4 src/sys/rump/librump/rumpvfs/vm_vfs.c
cvs rdiff -r1.110 -r1.111 src/sys/sys/buf.h
cvs rdiff -r1.200 -r1.201 src/sys/sys/vnode.h
cvs rdiff -r1.18 -r1.19 src/sys/ufs/files.ufs
cvs rdiff -r1.121 -r1.122 src/sys/ufs/ffs/ffs_alloc.c
cvs rdiff -r1.51 -r1.52 src/sys/ufs/ffs/ffs_balloc.c
cvs rdiff -r1.74 -r1.75 src/sys/ufs/ffs/ffs_extern.h
cvs rdiff -r1.102 -r1.103 src/sys/ufs/ffs/ffs_inode.c
cvs rdiff -r1.91 -r1.92 src/sys/ufs/ffs/ffs_snapshot.c
cvs rdiff -r1.116 -r0 src/sys/ufs/ffs/ffs_softdep.c
cvs rdiff -r1.23 -r0 src/sys/ufs/ffs/ffs_softdep.stub.c
cvs rdiff -r1.242 -r1.243 src/sys/ufs/ffs/ffs_vfsops.c
cvs rdiff -r1.110 -r1.111 src/sys/ufs/ffs/ffs_vnops.c
cvs rdiff -r1.11 -r1.12 src/sys/ufs/ffs/ffs_wapbl.c
cvs rdiff -r1.11 -r0 src/sys/ufs/ffs/softdep.h
cvs rdiff -r1.11 -r1.12 src/sys/ufs/lfs/lfs_rfw.c
cvs rdiff -r1.269 -r1.270 src/sys/ufs/lfs/lfs_vfsops.c
cvs rdiff -r1.219 -r1.220 src/sys/ufs/lfs/lfs_vnops.c
cvs rdiff -r1.55 -r1.56 src/sys/ufs/ufs/inode.h
cvs rdiff -r1.27 -r1.28 src/sys/ufs/ufs/ufs_dirhash.c
cvs rdiff -r1.60 -r1.61 src/sys/ufs/ufs/ufs_extern.h
cvs rdiff -r1.77 -r1.78 src/sys/ufs/ufs/ufs_inode.c
cvs rdiff -r1.100 -r1.101 src/sys/ufs/ufs/ufs_lookup.c
cvs rdiff -r1.93 -r1.94 src/sys/ufs/ufs/ufs_readwrite.c
cvs rdiff -r1.172 -r1.173 src/sys/ufs/ufs/ufs_vnops.c
cvs rdiff -r1.4 -r1.5 src/sys/ufs/ufs/ufs_wapbl.c
cvs rdiff -r1.93 -r1.94 src/sys/uvm/uvm_pager.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Wed, 01 Apr 2009 04:09:12 +0000
State-Changed-Why:
softdep (softupdates) has been removed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.