NetBSD Problem Report #40474

From root@confusion.i.kivinen.iki.fi  Sun Jan 25 11:01:24 2009
Return-Path: <root@confusion.i.kivinen.iki.fi>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 029E563BAB8
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 25 Jan 2009 11:01:23 +0000 (UTC)
Message-Id: <20090125090154.F185B32CA023@confusion.i.kivinen.iki.fi>
Date: Sun, 25 Jan 2009 11:01:54 +0200 (EET)
From: kivinen@iki.fi
Reply-To: kivinen@iki.fi
To: gnats-bugs@gnats.NetBSD.org
Subject: Kernel panic after remounting raid root with softdep
X-Send-Pr-Version: 3.95

>Number:         40474
>Category:       kern
>Synopsis:       Kernel panic after remounting raid root with softdep
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jan 25 11:05:00 +0000 2009
>Closed-Date:    Wed Apr 01 04:09:12 +0000 2009
>Last-Modified:  Wed Apr 01 04:09:12 +0000 2009
>Originator:     Tero Kivinen
>Release:        NetBSD 5.0_BETA
>Organization:
>Environment:
System: NetBSD confusion.i.kivinen.iki.fi 5.0_BETA NetBSD 5.0_BETA (GENERIC) #0: Sat Jan 24 19:01:52 EET 2009 root@:/usr/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:

	The system has mirrored raid as root, and when I boot it up
	and then run mount -o softdep -u / the command works, but after
	that next time I start "vi" the machine panics. The mount
	command does not show that the softdep took effect (i.e
	it still list options as local), and if I can still 
	run some other programs from the root filesystem. The 
	file edited with vi does not matter, and crash happens
	even if no file is given. If this is tried on the second 
	(non root disk) then this does not happen, even if the 
	vi binary is copied to the other disk (i.e. mount the
	another disk without softdep, remount it with -u -o softdep
	and then try to run vi from there do work). 

	I assume this might be related to somehow to the way 
	vi creates temp files or something.

	I have several kernel dumps so if more information is
	needed I can get them with gdb. Here is stack trace
	from gdb:

GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386--netbsdelf"...
(gdb) target kvm netbsd.4.core
#0  0xc053acf2 in cpu_reboot ()
(gdb) bt
#0  0xc053acf2 in cpu_reboot ()
#1  0xc01b2d19 in db_sync_cmd ()
#2  0xc01b3468 in db_command ()
#3  0xc01b36e3 in db_command_loop ()
#4  0xc01b6600 in db_trap ()
#5  0xc0535d3b in kdb_trap ()
#6  0xc053d993 in trap ()
#7  0xc010cb1f in calltrap ()
#8  0xc053441c in breakpoint ()
#9  0xc047f808 in panic ()
#10 0xc03c6f28 in softdep_setup_inomapdep ()
#11 0xc03b2626 in ffs_nodealloccg ()
#12 0xc03b0311 in ffs_hashalloc ()
#13 0xc03b487a in ffs_valloc ()
#14 0xc03f1964 in ufs_makeinode ()
#15 0xc03f1dea in ufs_create ()
#16 0xc04c72bc in VOP_CREATE ()
#17 0xc04c1912 in vn_open ()
#18 0xc04bdd00 in sys_open ()
#19 0xc053d45d in syscall ()
#20 0xc0100514 in syscall1 ()
(gdb) quit

>How-To-Repeat:

	Boot up machine with root on mirrored raid, without
	listing softdep option in fstab entry. Remount root
	using softdep with

		# mount -u -o softdep /

	Start vi:

		# vi

	Kernel panics immediately. I have not verified
	whether the raid is required, but it seems to 
	be related to the root disk.

>Fix:

	Do not remount root with softdep.

>Release-Note:

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: kivinen@iki.fi
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/40474: Kernel panic after remounting raid root with
	softdep
Date: Sun, 25 Jan 2009 21:07:55 +0000

 On Sun, Jan 25, 2009 at 11:05:00AM +0000, kivinen@iki.fi wrote:
  > 	The system has mirrored raid as root, and when I boot it up
  > 	and then run mount -o softdep -u / the command works, but after
  > 	that next time I start "vi" the machine panics. The mount
  > 	command does not show that the softdep took effect (i.e
  > 	it still list options as local), and if I can still 
  > 	run some other programs from the root filesystem. The 
  > 	file edited with vi does not matter, and crash happens
  > 	even if no file is given. If this is tried on the second 
  > 	(non root disk) then this does not happen, even if the 
  > 	vi binary is copied to the other disk (i.e. mount the
  > 	another disk without softdep, remount it with -u -o softdep
  > 	and then try to run vi from there do work). 
  > 
  > 	I assume this might be related to somehow to the way 
  > 	vi creates temp files or something.

 I'm pretty sure turning softdep on after mount is a known problem, and
 that there's an existing PR about it, but I can't find one. Anyway, I
 doubt it's got much to do with either vi or raid.

 I think I know what's going on, maybe, but I don't have time right now
 to look into it in any detail, and unfortunately I don't think anyone
 else is interested in fixing softdep issues... :(

 The immediate problem appears to be that it's allocating the same
 inode twice. I think this is probably happening either because of a
 general known problem with mount updates not flushing things properly
 (e.g. PR 30525)... or because softdep uses different rules for
 handling free object bitmaps from baseline ffs, and after the mode
 transition not all filesystem state is arranged the way softdep
 expects.

 Did the fsck after the crash show any problems? And, if you reboot
 again without having used softdep, boot to single user, and force
 running a fsck, does *that* show any problems? (In fact, it's probably
 a good idea to do such a fsck on general principles; fsck behaves
 differently if it thinks softdep was in use, so crashing in this
 fashion might have confused it.)


 If you have a test machine to crash on, the following information
 might be useful, maybe:

    - If you sync after the remount, does the crash still occur?
      (I expect it will, but it would be interesting if it didn't.)

    - Can you reproduce the crash without vi but instead by manually
      creating a scratch file in (probably) /var/tmp or
      /var/tmp/vi.recover?

    - If you boot single-user, remount / read-write, sync, remount /
      again to turn on softdep, sync, and then run vi, does it crash?
      If not, does omitting any of the syncs change the behavior?

    - If not, and you do the same, but create a scratch file in /tmp or
      /var/tmp before turning on softdep, does it then crash? If not,
      does creating a lot of scratch files instead of just one trigger
      the crash?

    - It might also be interesting to deliberately hit the power switch
      at various points, because this allows comparing what's been
      written to disk with what's being kept in memory, and any
      discrepancy is interesting. However, this requires mucking about
      with fsdb and I'm not sure exactly what to look for either.

 (None of this is worth crashing anything other than a test machine to
 find out, though.)

 -- 
 David A. Holland
 dholland@netbsd.org

From: Andrew Doran <ad@netbsd.org>
To: David Holland <dholland-bugs@netbsd.org>
Cc: kivinen@iki.fi, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/40474: Kernel panic after remounting raid root with softdep
Date: Sun, 25 Jan 2009 21:40:58 +0000

 On Sun, Jan 25, 2009 at 09:07:55PM +0000, David Holland wrote:
 > On Sun, Jan 25, 2009 at 11:05:00AM +0000, kivinen@iki.fi wrote:
 >  > 	The system has mirrored raid as root, and when I boot it up
 >  > 	and then run mount -o softdep -u / the command works, but after
 >  > 	that next time I start "vi" the machine panics. The mount
 >  > 	command does not show that the softdep took effect (i.e
 >  > 	it still list options as local), and if I can still 
 >  > 	run some other programs from the root filesystem. The 
 >  > 	file edited with vi does not matter, and crash happens
 >  > 	even if no file is given. If this is tried on the second 
 >  > 	(non root disk) then this does not happen, even if the 
 >  > 	vi binary is copied to the other disk (i.e. mount the
 >  > 	another disk without softdep, remount it with -u -o softdep
 >  > 	and then try to run vi from there do work). 
 >  > 
 >  > 	I assume this might be related to somehow to the way 
 >  > 	vi creates temp files or something.
 > 
 > I'm pretty sure turning softdep on after mount is a known problem,

 Yes, and turning it off.

 Andrew

From: Tero Kivinen <kivinen@iki.fi>
To: David Holland <dholland-bugs@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        gnats-bugs@netbsd.org
Subject: Re: kern/40474: Kernel panic after remounting raid root with
	softdep
Date: Tue, 27 Jan 2009 15:25:00 +0200

 David Holland writes:
 > I'm pretty sure turning softdep on after mount is a known problem, and
 > that there's an existing PR about it, but I can't find one. Anyway, I
 > doubt it's got much to do with either vi or raid.

 I remember doing it earlier. Usually quite soon after install, i.e.
 noticed that fstab is missing softdeps and unpacking pkgsrc.tar.gz was
 slow because of that, and then added softdeps to fstab and said mount
 -u -o softdep / to get it effective immediately for the already
 running tar...

 I noticed the problem now as I tried to do the same and the machine
 paniced when I tried to edit fstab after remount... 

 > The immediate problem appears to be that it's allocating the same
 > inode twice. I think this is probably happening either because of a
 > general known problem with mount updates not flushing things properly
 > (e.g. PR 30525)... or because softdep uses different rules for
 > handling free object bitmaps from baseline ffs, and after the mode
 > transition not all filesystem state is arranged the way softdep
 > expects.

 It could be that it is actually crashing on file creation, so
 allocating inode twice could be the rason (vi creates the recovery
 file).

 > Did the fsck after the crash show any problems?

 The first fsck removed lots of files, so many that I needed to rm -rf
 /usr/src and cvs checkout it again to get it working again. I think I
 had just before (or during, dont remember exactly) said cvs update on
 the /usr/src, so most likely it several CVS/* files.

 Later times when I tried it immediately after boot, fsck reported some
 unreferenced files etc, but nothing special there.

 > And, if you reboot again without having used softdep, boot to single
 > user, and force running a fsck, does *that* show any problems? (In
 > fact, it's probably a good idea to do such a fsck on general
 > principles; fsck behaves differently if it thinks softdep was in
 > use, so crashing in this fashion might have confused it.)

 As the /etc/fstab was still configured without softdep all the fsck
 runs were done without softdeps.

 > If you have a test machine to crash on, the following information
 > might be useful, maybe:

 My test machine was my main machine I was updating to NetBSD-5.0beta
 and as it is already done, I don't want to crash it anymore
 (reconstructing 1.5 TB of raid takes a long time...).

 > 
 >    - If you sync after the remount, does the crash still occur?
 >      (I expect it will, but it would be interesting if it didn't.)

 I think I tried that already, i.e. said:

   # sync
   # mount -u -o softdep /
   # sync
   # sync
   # sync
   # vi /etc/fstab
   <crash>

 My normal reflex is to type sync few times at that kind of times, so
 my guess was that I tried that too (but I am not sure, as I didn't
 record my commands exactly). 

 > (None of this is worth crashing anything other than a test machine to
 > find out, though.)

 As I said my test machine is already back in use, so cannot do those
 things anymore... 
 -- 
 kivinen@iki.fi

From: David Holland <dholland-bugs@netbsd.org>
To: Tero Kivinen <kivinen@iki.fi>
Cc: David Holland <dholland-bugs@netbsd.org>, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	gnats-bugs@netbsd.org
Subject: Re: kern/40474: Kernel panic after remounting raid root with
	softdep
Date: Sat, 21 Feb 2009 08:06:27 +0000

 On Tue, Jan 27, 2009 at 03:25:00PM +0200, Tero Kivinen wrote:
  > > The immediate problem appears to be that it's allocating the same
  > > inode twice. I think this is probably happening either because of a
  > > general known problem with mount updates not flushing things properly
  > > (e.g. PR 30525)... or because softdep uses different rules for
  > > handling free object bitmaps from baseline ffs, and after the mode
  > > transition not all filesystem state is arranged the way softdep
  > > expects.
  > 
  > It could be that it is actually crashing on file creation, so
  > allocating inode twice could be the rason (vi creates the recovery
  > file).

 If it's ffsv2, it could also conceivably be PR 40525.

  > As I said my test machine is already back in use, so cannot do those
  > things anymore... 

 Right. Well, I'll put it on the list for softdep... :-|

 -- 
 David A. Holland
 dholland@netbsd.org

From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/40474 CVS commit: src
Date: Sun, 22 Feb 2009 20:28:07 +0000 (UTC)

 Module Name:	src
 Committed By:	ad
 Date:		Sun Feb 22 20:28:07 UTC 2009

 Modified Files:
 	src/doc: CHANGES
 	src/lib/libp2k: p2k.c
 	src/sbin/fsck_lfs: lfs.c
 	src/sbin/mount: mount.8
 	src/sbin/newfs_lfs: make_lfs.c
 	src/sbin/tunefs: tunefs.8 tunefs.c
 	src/sys/arch/vax/conf: VAX780
 	src/sys/conf: files
 	src/sys/kern: sys_aio.c vfs_bio.c vfs_subr.c vfs_syscalls.c
 	src/sys/miscfs/specfs: spec_vnops.c
 	src/sys/miscfs/syncfs: sync_subr.c
 	src/sys/modules/ffs: Makefile
 	src/sys/rump/fs/lib/libffs: Makefile
 	src/sys/rump/include/rump: rump.h
 	src/sys/rump/librump/rumpvfs: rump_vfs.c vm_vfs.c
 	src/sys/sys: buf.h vnode.h
 	src/sys/ufs: files.ufs
 	src/sys/ufs/ffs: ffs_alloc.c ffs_balloc.c ffs_extern.h ffs_inode.c
 	    ffs_snapshot.c ffs_vfsops.c ffs_vnops.c ffs_wapbl.c
 	src/sys/ufs/lfs: lfs_rfw.c lfs_vfsops.c lfs_vnops.c
 	src/sys/ufs/ufs: inode.h ufs_dirhash.c ufs_extern.h ufs_inode.c
 	    ufs_lookup.c ufs_readwrite.c ufs_vnops.c ufs_wapbl.c
 	src/sys/uvm: uvm_pager.c
 Removed Files:
 	src/sys/rump/librump/rumpkern/opt: opt_softdep.h
 	src/sys/ufs/ffs: ffs_softdep.c ffs_softdep.stub.c softdep.h

 Log Message:
 PR kern/26878 FFSv2 + softdep = livelock (no free ram)
 PR kern/16942 panic with softdep and quotas
 PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
 PR kern/26274 softdep panic: allocdirect_merge: ...
 PR kern/26374 Long delay before non-root users can write to softdep partitions
 PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
 PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
 PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
 PR kern/31981 stopping scsi disk can cause panic (softdep)
 PR kern/32116 kernel panic in softdep (assertion failure)
 PR kern/32532 softdep_trackbufs deadlock
 PR kern/37191 softdep: locking against myself
 PR kern/40474 Kernel panic after remounting raid root with softdep

 Retire softdep, pass 2. As discussed and later formally announced on the
 mailing lists.


 To generate a diff of this commit:
 cvs rdiff -r1.1191 -r1.1192 src/doc/CHANGES
 cvs rdiff -r1.8 -r1.9 src/lib/libp2k/p2k.c
 cvs rdiff -r1.29 -r1.30 src/sbin/fsck_lfs/lfs.c
 cvs rdiff -r1.65 -r1.66 src/sbin/mount/mount.8
 cvs rdiff -r1.13 -r1.14 src/sbin/newfs_lfs/make_lfs.c
 cvs rdiff -r1.37 -r1.38 src/sbin/tunefs/tunefs.8 src/sbin/tunefs/tunefs.c
 cvs rdiff -r1.1 -r1.2 src/sys/arch/vax/conf/VAX780
 cvs rdiff -r1.942 -r1.943 src/sys/conf/files
 cvs rdiff -r1.22 -r1.23 src/sys/kern/sys_aio.c
 cvs rdiff -r1.215 -r1.216 src/sys/kern/vfs_bio.c
 cvs rdiff -r1.368 -r1.369 src/sys/kern/vfs_subr.c
 cvs rdiff -r1.388 -r1.389 src/sys/kern/vfs_syscalls.c
 cvs rdiff -r1.122 -r1.123 src/sys/miscfs/specfs/spec_vnops.c
 cvs rdiff -r1.36 -r1.37 src/sys/miscfs/syncfs/sync_subr.c
 cvs rdiff -r1.2 -r1.3 src/sys/modules/ffs/Makefile
 cvs rdiff -r1.6 -r1.7 src/sys/rump/fs/lib/libffs/Makefile
 cvs rdiff -r1.9 -r1.10 src/sys/rump/include/rump/rump.h
 cvs rdiff -r1.1 -r0 src/sys/rump/librump/rumpkern/opt/opt_softdep.h
 cvs rdiff -r1.12 -r1.13 src/sys/rump/librump/rumpvfs/rump_vfs.c
 cvs rdiff -r1.3 -r1.4 src/sys/rump/librump/rumpvfs/vm_vfs.c
 cvs rdiff -r1.110 -r1.111 src/sys/sys/buf.h
 cvs rdiff -r1.200 -r1.201 src/sys/sys/vnode.h
 cvs rdiff -r1.18 -r1.19 src/sys/ufs/files.ufs
 cvs rdiff -r1.121 -r1.122 src/sys/ufs/ffs/ffs_alloc.c
 cvs rdiff -r1.51 -r1.52 src/sys/ufs/ffs/ffs_balloc.c
 cvs rdiff -r1.74 -r1.75 src/sys/ufs/ffs/ffs_extern.h
 cvs rdiff -r1.102 -r1.103 src/sys/ufs/ffs/ffs_inode.c
 cvs rdiff -r1.91 -r1.92 src/sys/ufs/ffs/ffs_snapshot.c
 cvs rdiff -r1.116 -r0 src/sys/ufs/ffs/ffs_softdep.c
 cvs rdiff -r1.23 -r0 src/sys/ufs/ffs/ffs_softdep.stub.c
 cvs rdiff -r1.242 -r1.243 src/sys/ufs/ffs/ffs_vfsops.c
 cvs rdiff -r1.110 -r1.111 src/sys/ufs/ffs/ffs_vnops.c
 cvs rdiff -r1.11 -r1.12 src/sys/ufs/ffs/ffs_wapbl.c
 cvs rdiff -r1.11 -r0 src/sys/ufs/ffs/softdep.h
 cvs rdiff -r1.11 -r1.12 src/sys/ufs/lfs/lfs_rfw.c
 cvs rdiff -r1.269 -r1.270 src/sys/ufs/lfs/lfs_vfsops.c
 cvs rdiff -r1.219 -r1.220 src/sys/ufs/lfs/lfs_vnops.c
 cvs rdiff -r1.55 -r1.56 src/sys/ufs/ufs/inode.h
 cvs rdiff -r1.27 -r1.28 src/sys/ufs/ufs/ufs_dirhash.c
 cvs rdiff -r1.60 -r1.61 src/sys/ufs/ufs/ufs_extern.h
 cvs rdiff -r1.77 -r1.78 src/sys/ufs/ufs/ufs_inode.c
 cvs rdiff -r1.100 -r1.101 src/sys/ufs/ufs/ufs_lookup.c
 cvs rdiff -r1.93 -r1.94 src/sys/ufs/ufs/ufs_readwrite.c
 cvs rdiff -r1.172 -r1.173 src/sys/ufs/ufs/ufs_vnops.c
 cvs rdiff -r1.4 -r1.5 src/sys/ufs/ufs/ufs_wapbl.c
 cvs rdiff -r1.93 -r1.94 src/sys/uvm/uvm_pager.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Wed, 01 Apr 2009 04:09:12 +0000
State-Changed-Why:
softdep (softupdates) has been removed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.