NetBSD Problem Report #8964

Received: (qmail 736 invoked from network); 7 Dec 1999 06:46:03 -0000
Message-Id: <199912070645.PAA16552@icnmp9.icg.tnr.sharp.co.jp>
Date: Tue, 7 Dec 1999 15:45:15 +0900 (JST)
From: itohy@netbsd.org
Reply-To: itohy@netbsd.org
To: gnats-bugs@gnats.netbsd.org
Subject: panic on reboot(2) if an LFS is mounted read-only
X-Send-Pr-Version: 3.95

>Number:         8964
>Category:       kern
>Synopsis:       panic on reboot(2) if an LFS is mounted read-only
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Dec 06 22:48:00 +0000 1999
>Closed-Date:    Sat Jul 14 09:39:20 +0000 2018
>Last-Modified:  Sat Jul 14 09:39:20 +0000 2018
>Originator:     ITOH Yasufumi
>Release:        1.4P (Dec. 3, 1999)
>Organization:

>Environment:
System: NetBSD 1.4P NetBSD 1.4P (ACHA.elf) #20: Sun Dec 5 14:55:44 JST 1999 itohy@pino.my.domain:/usr/src/sys/arch/x68k/compile/ACHA.elf x68k

no SOFTDEP in the kernel config.


>Description:
	The system panic()s on reboot if
	 1. an LFS is mounted, and
	 2. the filesystem is read-only.

	This problem seems to be introduced by the softdep merge.

>How-To-Repeat:
	Mount an LFS in read-only mode and reboot the system.
	Here's an example when the root filesystem is an LFS.

	# mount
	root_device on / type lfs (local, read-only)
	# reboot
	Dec  6 08:29:42 init: kernel security level changed from 0 to 1
	syncing disks... panic: bawrite LFS buffer
	Stopped in reboot at	cpu_Debugger+0x6:	unlk	a6
	db> trace
	cpu_Debugger(2004,1,2710,519c00,54fc54) + 6
	panic(116d2c,54fe68,45bd8,54fe60,12abd4) + 56
	lfs_bwrite(54fe60,12abd4,519c00,54fe9c,4bd16) + 1c
	bawrite(519c00) + 36
	vfs_shutdown(54ff3c,300e0,0,0,0) + be
	cpu_reboot(0,0,0,530640,1) + 3c
	sys_reboot(530640,54ff88,54ff80) + 52
	syscall(d0) + 196
	trap0() + e
	db>

>Fix:
	Unknown.
	LFS buffers should not be marked asynchronous.

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->perseant 
Responsible-Changed-By: perseant 
Responsible-Changed-When: Tue Dec 7 11:54:43 PST 1999 
Responsible-Changed-Why:  
I have a solution. 
State-Changed-From-To: open->feedback 
State-Changed-By: perseant 
State-Changed-When: Tue Dec 7 11:55:36 PST 1999 
State-Changed-Why:  
Suggested patch. 

From: Konrad Schroder <perseant@hhhh.org>
To: itohy@netbsd.org, fvdl@netbsd.org
Cc: gnats-bugs@gnats.netbsd.org
Subject: Re: kern/8964: panic on reboot(2) if an LFS is mounted read-only
Date: Tue, 7 Dec 1999 11:54:18 -0800 (PST)

 On Tue, 7 Dec 1999 itohy@netbsd.org wrote:

 > 	The system panic()s on reboot if
 > 	 1. an LFS is mounted, and
 > 	 2. the filesystem is read-only.
 > 
 > 	This problem seems to be introduced by the softdep merge.

 Itoh, thanks, I see it.  There are actually two problems here...the first
 (bawrite an lfs buffer) masks the fact that lfs buffers can be marked
 dirty (and written to disk under some circumstances) even if the
 filesystem is mounted ro.  This patch should fix both...please let me know
 whether it works for you.

 Frank, is my reading of vfs_shutdown correct in that we only have to
 rewrite buffers from softdep filesystems?
 						Konrad Schroder
 						perseant@hhhh.org

 Index: kern/vfs_subr.c
 ===================================================================
 RCS file: /cvsroot/syssrc/sys/kern/vfs_subr.c,v
 retrieving revision 1.115
 diff -u -r1.115 vfs_subr.c
 --- vfs_subr.c	1999/11/23 23:52:40	1.115
 +++ vfs_subr.c	1999/12/07 19:25:02
 @@ -2294,7 +2294,9 @@
  			 * written will be remarked as dirty until other
  			 * buffers are written.
  			 */
 -			if (bp->b_flags & B_DELWRI) {
 +			if (bp->b_vp && bp->b_vp->v_mount
 +			    && (bp->b_vp->v_mount->mnt_flag & MNT_SOFTDEP)
 +			    && (bp->b_flags & B_DELWRI)) {
  				s = splbio();
  				bremfree(bp);
  				bp->b_flags |= B_BUSY;
 Index: ufs/lfs/lfs_bio.c
 ===================================================================
 RCS file: /cvsroot/syssrc/sys/ufs/lfs/lfs_bio.c,v
 retrieving revision 1.14
 diff -u -r1.14 lfs_bio.c
 --- lfs_bio.c	1999/11/23 23:52:42	1.14
 +++ lfs_bio.c	1999/12/07 19:25:11
 @@ -140,8 +140,10 @@
  	register struct buf *bp = ap->a_bp;

  #ifdef DIAGNOSTIC
 -	if(bp->b_flags & B_ASYNC)
 +        if(VTOI(bp->b_vp)->i_lfs->lfs_ronly == 0
 +	   && (bp->b_flags & B_ASYNC)) {
  		panic("bawrite LFS buffer");
 +	}
  #endif /* DIAGNOSTIC */
  	return lfs_bwrite_ext(bp,0);
  }
 @@ -184,6 +186,23 @@
  	struct inode *ip;
  	int db, error, s;

 +#ifdef LFS_HONOR_RDONLY
 +	/*
 +	 * Don't write *any* blocks if we're mounted read-only.
 +	 * In particular the cleaner can't write blocks either.
 +	 */
 +        if(VTOI(bp->b_vp)->i_lfs->lfs_ronly) {
 +		bp->b_flags &= ~(B_DELWRI|B_LOCKED|B_READ|B_ERROR);
 +		s = splbio();
 +		reassignbuf(bp, bp->b_vp);
 +		splx(s);
 +		if(bp->b_flags & B_CALL)
 +			bp->b_flags &= ~B_BUSY;
 +		else
 +			brelse(bp);
 +		return EROFS;
 +	}
 +#endif
  	/*
  	 * Set the delayed write flag and use reassignbuf to move the buffer
  	 * from the clean list to the dirty one.
 @@ -242,17 +261,7 @@
  		++locked_queue_count;
  		locked_queue_bytes += bp->b_bufsize;
  		s = splbio();
 -#ifdef LFS_HONOR_RDONLY
 -		/*
 -		 * XXX KS - Don't write blocks if we're mounted ro.
 -		 * Placement here means that the cleaner can't write
 -		 * blocks either.
 -		 */
 -	        if(VTOI(bp->b_vp)->i_lfs->lfs_ronly)
 -			bp->b_flags &= ~(B_DELWRI|B_LOCKED);
 -		else
 -#endif
 -			bp->b_flags |= B_DELWRI | B_LOCKED;
 +		bp->b_flags |= B_DELWRI | B_LOCKED;
  		bp->b_flags &= ~(B_READ | B_ERROR);
  		reassignbuf(bp, bp->b_vp);
  		splx(s);
 @@ -316,8 +325,12 @@

  	if(lfs_dostats) 
  		++lfs_stats.write_exceeded;
 -	if (lfs_writing && flags==0) /* XXX flags */
 +	if (lfs_writing && flags==0) {/* XXX flags */
 +#ifdef DEBUG_LFS
 +		printf("lfs_flush: not flushing because another flush is active\n");
 +#endif
  		return;
 +	}
  	lfs_writing = 1;

  	simple_lock(&mountlist_slock);
 @@ -378,6 +391,10 @@
  	{
  		if(lfs_dostats)
  			++lfs_stats.wait_exceeded;
 +#ifdef DEBUG_LFS
 +		printf("lfs_check: waiting: count=%d, bytes=%ld\n",
 +			locked_queue_count, locked_queue_bytes);
 +#endif
  		error = tsleep(&locked_queue_count, PCATCH | PUSER,
  			       "buffers", hz * LFS_BUFWAIT);
  	}



From: Frank van der Linden <frank@wins.uva.nl>
To: Konrad Schroder <perseant@hhhh.org>
Cc: itohy@netbsd.org, fvdl@netbsd.org, gnats-bugs@gnats.netbsd.org
Subject: Re: kern/8964: panic on reboot(2) if an LFS is mounted read-only
Date: Wed, 8 Dec 1999 20:15:49 +0100

 On Tue, Dec 07, 1999 at 11:54:18AM -0800, Konrad Schroder wrote:
 > Frank, is my reading of vfs_shutdown correct in that we only have to
 > rewrite buffers from softdep filesystems?

 That is correct. However, I would feel much better about this if
 read-only LFS actually meant read only. Isn't it simpler to
 make the read-only option imply the "noclean" (-n) flag?

 - Frank

From: Konrad Schroder <perseant@hhhh.org>
To: Frank van der Linden <frank@wins.uva.nl>
Cc: itohy@netbsd.org, fvdl@netbsd.org, gnats-bugs@gnats.netbsd.org
Subject: Re: kern/8964: panic on reboot(2) if an LFS is mounted read-only
Date: Wed, 8 Dec 1999 14:52:19 -0800 (PST)

 On Wed, 8 Dec 1999, Frank van der Linden wrote:

 > That is correct. However, I would feel much better about this if
 > read-only LFS actually meant read only. Isn't it simpler to
 > make the read-only option imply the "noclean" (-n) flag?

 Good point, I've just made mount_lfs do this.

 Two things though: I don't think that the softdep part of this problem
 affected only read-only fss; in particular if the LFS is deadlocked[*]
 within a dirop, and I reboot it from the debugger, I also get this
 behavior.  Also, the LFS portion of the patch does ensure that lfs_bwrite
 never marks blocks dirty if the fs is mounted read-only.

 [* - another open PR; the LFS does dirop accounting wrong and is unable to
 write dirty blocks.  But bawrite still is the wrong thing to do.]

 						Konrad Schroder
 						perseant@hhhh.org



From: itohy@netbsd.org (ITOH Yasufumi)
To: perseant@hhhh.org
Cc: fvdl@netbsd.org, gnats-bugs@gnats.netbsd.org
Subject: Re: kern/8964: panic on reboot(2) if an LFS is mounted read-only
Date: Thu, 9 Dec 1999 08:33:35 +0900 (JST)

 In article <Pine.NEB.4.10.9912071142580.4110-100000@hhhh.hitl.washington.edu>
 perseant@hhhh.org writes:

 > On Tue, 7 Dec 1999 itohy@netbsd.org wrote:
 > 
 > > 	The system panic()s on reboot if
 > > 	 1. an LFS is mounted, and
 > > 	 2. the filesystem is read-only.
 > > 
 > > 	This problem seems to be introduced by the softdep merge.
 > 
 > Itoh, thanks, I see it.  There are actually two problems here...the first
 > (bawrite an lfs buffer) masks the fact that lfs buffers can be marked
 > dirty (and written to disk under some circumstances) even if the
 > filesystem is mounted ro.  This patch should fix both...please let me know
 > whether it works for you.

 I tried the patch and I confirmed it works fine
 about read-only LFS and reboot.  Thanks.

 However, I had a problem.
 I'm not sure this problem has something to do with this change or
 another LFS problem or an MD problem.
 Unfortunately, I didn't record the details of the problem.
 I thought it can be reproduced, but once I tried the previous version
 of kernel (the problem didn't appear), the problem disappeared....

 The problem was,

 (boot single-user on LFS)
 # mount -u /dev/sd2a /		# remount read/write
 # mount -r /dev/sd0a /mnt	# this is usual root (FFS)
 # cp /mnt/netbsd-* /		# netbsd-1.4C, netbsd-1.4M, ...
 [hang (tty echoback is alive)]

 I pressed the INTERRUPT button and saw the trace, but didn't wrote down
 the output....  The trace output was something like as

 tsleep()
 lfs_check()
 lfs_balloc()	?? probably
 ...

 I can recall the wait channel was "buffers" (by hitting
 status character on the console).
 fsck_lfs found an unref file in the filesystem.


 Oops, I have another problem after I wrote above.
 I'm not sure this is a problem of LFS or MD part or the hardware.

 (boot single user)
 # mount -u /dev/sd2a /		# remount r/w
 # rm netbsd-*
 # date 199912092352		# yeah, I set the system date
 # sync
 # sync
 # reboot
 Dec  9 23:52:40 init: kernel security level changed from 0 to 1
 syncing disks... done
 unmounting / (/dev/sd2a)...
 uvm_vnp_terminate(0x5521a8): terminating active vonde (refs=2)
 uvm_vnp_terminate(0x52b72c): terminating active vnode (refs=2)

 [hang with continuous disk access (tty echo is alive)]
 [press INTERRUPT button]

 Got a keyboard NMI
 Stopped in reboot at	cpu_Debugger+0x6:	unlk	a6
 db> trace
 cpu_Debugger(55bb44,4f8,ffffff08,51,8) + 6
 nmihand(ffffff08,51,8,a96021,80a96021) + 42
 lev7intr(31ec00,2004,320a00,328d80,55bb9c) + 12
 intio_intr(55bb34) + 48
 intiotrap(328d80,52b198,55bc20,b9bde,55bc18) + 8	# SCSI interrupt?
 spec_strategy(55bc18) + 46
 lfs_writeseg(325c00,30e280) + 610
 lfs_segwrite(324400,4,52c0d8,55bcac,b7012) + 2e8
 lfs_flush_fs(324400,4) + 38
 lfs_update(55bcb8,12ac94,52b330,0,0) + 170
 lfs_fsync(55bd04) + 40
 vinvalbuf(52b330,1,ffffffff,530780,0,0) + b6
 vclean(52b330,8,530780) + a4
 vgonel(52b330,530780) + 40
 vflush(324400,52b264,2) + 72
 lfs_unmount(324400,80000,530780) + 2a
 dounmount(324400,80000,530780) + d4
 vfs_unmountall(0,130,d0,2,0) + 6a
 vfs_shutdown(55bf3c,300f0,0,0,0) + 184
 cpu_reboot(0,0,0,530780,1) + 3c
 sys_reboot(530780,55bf88,55bf80) + 52
 syscall(d0) + 196
 trap0() + e
 db> c

 (continue and stop at another timing)

 Got a keyboard NMI
 Stopped in reboot at	cpu_Debugger+0x6:	unlk	a6
 db> trace
 cpu_Debugger(55bb88,4f8,2304,300,2304) + 6
 nmihand(2304,300,2304,1,52b198) + 42
 lev7intr(?)
 bgetvp(52b198,328780) + 12
 lfs_newbuf(52b198,54eb4,200,4,1) + a6
 lfs_initseg(325c00,1e8,2b,0,3c,2b,0) + 158
 lfs_seglock(325c00,4,2004,2004,530780) + 9e
 lfs_segwrite(324400,4,52c0d8,55bcac,b7012) + ae
 lfs_flush_fs(324400,4) + 38
 lfs_update(55bcb8,12ac94,52b330,0,0) + 170
 lfs_fsync(55bd04) + 40
 vinvalbuf(52b330,1,ffffffff,530780,0,0) + b6
 vclean(52b330,8,530780) + a4
 vgonel(52b330,530780) + 40
 vflush(324400,52b264,2) + 72
 lfs_unmount(324400,80000,530780) + 2a
 dounmount(324400,80000,530780) + d4
 vfs_unmountall(0,130,d0,2,0) + 6a
 vfs_shutdown(55bf3c,300f0,0,0,0) + 184
 cpu_reboot(0,0,0,530780,1) + 3c
 sys_reboot(530780,55bf88,55bf80) + 52
 syscall(d0) + 196
 trap0() + e
 db>

 rm and reboot worked on the previous session.

 --
 ITOH, Yasufumi <itohy@netbsd.org>
State-Changed-From-To: feedback->analyzed 
State-Changed-By: fair 
State-Changed-When: Sun Apr 23 02:46:30 PDT 2000 
State-Changed-Why:  
Feedback was provided, and the original problem was apparently solved. 
However, the submitter reported a new problem with LFS. Ideally, this new 
problem should be reported in a new PR, but I leave it up to the responsible 
developer to decide whether to keep working on it in this PR, or require a 
new one. 
Responsible-Changed-From-To: perseant->kern-bug-people 
Responsible-Changed-By: perseant 
Responsible-Changed-When: Thu Nov 20 19:56:50 UTC 2003 
Responsible-Changed-Why:  
Trying to be realistic 
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/8964: panic on reboot(2) if an LFS is mounted read-only
Date: Sat, 3 May 2008 03:37:16 +0000

 Is this (partly?) a symptom of the silly softdep rootfs/syncer thing
 that ad just fixed? ad?

 -- 
 David A. Holland
 dholland@netbsd.org

State-Changed-From-To: analyzed->closed
State-Changed-By: zafer@NetBSD.org
State-Changed-When: Sat, 14 Jul 2018 09:39:20 +0000
State-Changed-Why:
I tested thoroughly and I cannot reproduce the issue anymore.
Thank you for the problem report ITOH Yasufumi.


>Unformatted:
Home
PR Database Search
(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.