NetBSD Problem Report #48714

From gson@gson.org  Sat Apr  5 19:17:19 2014
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id F2519A5806
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  5 Apr 2014 19:17:18 +0000 (UTC)
Message-Id: <20140405191352.4C23C75E33@guava.gson.org>
Date: Sat,  5 Apr 2014 22:13:52 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@gnats.NetBSD.org
Subject: fsck prompts only appear after being answered
X-Send-Pr-Version: 3.95

>Number:         48714
>Category:       bin
>Synopsis:       fsck prompts only appear after being answered
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Apr 05 19:20:00 +0000 2014
>Closed-Date:    Sat Jul 19 14:40:15 +0000 2014
>Last-Modified:  Sat Jul 19 14:40:15 +0000 2014
>Originator:     Andreas Gustafsson
>Release:        NetBSD 6.1.3
>Organization:
>Environment:
System: NetBSD 6.1.3
Architecture: x86_64
Machine: amd64
>Description:

When booting a NetBSD 6.1.3 system after a crash where fsck needs to
perform an interactive repair, fsck appears to hang.  For example,
it may print a message such as

  INCORRECT BLOCK COUNT I=252276 (1600 should be 416)

but then nothing more happens, no matter how long I wait.

However, if I press control-C on the console at this point, I get the
following output:

  ^CCORRECT? [yn] Unknown error 130; help!
  ERROR: ABORTING BOOT (sending SIGTERM to parent)!
  [1]   Terminated              (stty status "^T...

Note the "CORRECT? [yn]" prompt.  It looks like fsck was actually
waiting for operator input, but the prompt never appeared on the
console until fsck was interrupted by the SIGINT.

If instead of pressing control-C, I blindly type "y" and press enter,
without having been prompted to do so, fsck accepts the input and
continues, printing the prompt *after* the answer to it:

  ** Phase 4 - Check Reference Counts
  LINK COUNT FILE I=174911  OWNER=0 MODE=100444
  SIZE=1539 MTIME=Apr  5 14:02 2014   COUNT 2 SHOULD BE 1
  y
  ADJUST? [yn] 

I first noticed this problem on an i386 system that has been upgraded
to 6.1.3 from an earlier NetBSD version, and as a result of that
upgrade path, does not mount its root file system with "-o log",
and has fsck_flags="" by default.

Reproducing the bug on a fresh 6.1.3 install is harder because WAPBL
is now enabled by default, causing fsck to be bypassed in most cases,
and fsck_flags is set to "-p" by default, causing fsck to not prompt
the operator for minor problems.  Nonetheless, I have reproduced it on
a fresh 6.1.3/amd64 install by disabling WAPBL and setting
fsck_flags="".

The problem only occurs in the fsck run automatically from the rc
scripts; rerunning fsck manually from the single-user shell prompt,
the fsck prompts appear as expected, before being answered.

The bug also affects -current as of source date 2014.04.03.15.24.20.

>How-To-Repeat:

# mount -u /
# echo 'fsck_flags=""' >>/etc/rc.conf

Do an unclean shutdown and reboot, for example:

  Break into ddb in the middle of unpacking a tarball
  Type "reboot 0x4"

>Fix:

>Release-Note:

>Audit-Trail:
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@NetBSD.org
Cc: Alan Barrett <apb@netbsd.org>
Subject: Re: bin/48714: fsck prompts only appear after being answered
Date: Sun, 6 Apr 2014 21:54:54 +0400

 On Sat, Apr 05, 2014 at 19:20:00 +0000, Andreas Gustafsson wrote:

 > The problem only occurs in the fsck run automatically from the rc
 > scripts; rerunning fsck manually from the single-user shell prompt,
 > the fsck prompts appear as expected, before being answered.

 This looks like a consequence of the output filtering done by rc.
 Cc'ing apb@

 -uwe

From: Alan Barrett <apb@netbsd.org>
To: Valery Ushakov <uwe@stderr.spb.ru>
Cc: gnats-bugs@NetBSD.org
Subject: Re: bin/48714: fsck prompts only appear after being answered
Date: Sun, 6 Apr 2014 20:08:28 +0200

 On Sun, 06 Apr 2014, Valery Ushakov wrote:
 >On Sat, Apr 05, 2014 at 19:20:00 +0000, Andreas Gustafsson wrote:
 >
 >> The problem only occurs in the fsck run automatically from the rc
 >> scripts; rerunning fsck manually from the single-user shell prompt,
 >> the fsck prompts appear as expected, before being answered.
 >
 >This looks like a consequence of the output filtering done by rc.
 >Cc'ing apb@

 Interactive rc.d scripts need to be marked with "KEYWORD: interactive"
 to change the way the filtering works.  This appears to be undocumented.

 --apb (Alan Barrett)

From: Alan Barrett <apb@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/48714: fsck prompts only appear after being answered
Date: Sun, 6 Apr 2014 20:36:40 +0200

 On Sun, 06 Apr 2014, Alan Barrett wrote:
 > Interactive rc.d scripts need to be marked with 
 > "KEYWORD: interactive" to change the way the filtering works. 
 > This appears to be undocumented.

 However, /etc/rc.d/fsck and /etc/rc.d/fsck_root are not intended 
 to be interactive.  They are intended to throw an error that 
 causes the system to drop to single user mode if there's a 
 problem.  Non-default settings of fsck_flags, as appears to be the 
 case here, could cause fsck to become interactive.

 The meaning of "KEYWORD: interactive" is documented in rc.subr, 
 by the way, but it is not as clear or easy to find as it should 
 be.

 Another problem is that there's no way to get the "don't wait for 
 a newline before displaying output" behaviour that's wanted here, 
 without also getting the unwanted behaviour of "don't log the 
 output to /var/run/rc.log".

 --apb (Alan Barrett)

From: Alan Barrett <apb@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/48714: fsck prompts only appear after being answered
Date: Mon, 7 Apr 2014 19:27:14 +0200

 On Sun, 06 Apr 2014, Alan Barrett wrote:
 > However, /etc/rc.d/fsck and /etc/rc.d/fsck_root are not intended
 > to be interactive.  They are intended to throw an error that
 > causes the system to drop to single user mode if there's a
 > problem.  Non-default settings of fsck_flags, as appears to be the
 > case here, could cause fsck to become interactive.

 Please try the appended patch, which should cause partial lines to
 appear after a few seconds, even for scripts that are not marked with
 "KEYWORD: interactive".

 --apb (Alan Barrett)

 --- a/etc/rc
 +++ b/etc/rc
 @@ -119,6 +119,20 @@ rc_real_work()
  	kill -0 $RC_PID >/dev/null 2>&1 || RC_PID=$$

  	#
 +	# As long as process $RC_PID is still running, send a "nop"
 +	# metadata message to the postprocessor every few seconds.
 +	# This should help flush partial lines that may appear when
 +	# rc.d scripts that are NOT marked with "KEYWORD: interactive"
 +	# nevertheless attempt to print prompts and wait for input.
 +	#
 +	(
 +	    while kill -0 $RC_PID ; do
 +		print_rc_metadata "nop"
 +		sleep 3
 +	    done
 +	) &
 +
 +	#
  	# Get a list of all rc.d scripts, and use rcorder to choose
  	# what order to execute them.
  	#

From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: Re: bin/48714: fsck prompts only appear after being answered
Date: Tue, 8 Apr 2014 20:59:45 +0300

 Alan Barrett wrote:
 >  Please try the appended patch, which should cause partial lines to
 >  appear after a few seconds, even for scripts that are not marked with
 >  "KEYWORD: interactive".

 Thanks for looking into this.  With the patch, the prompts do appear
 as expected.  I suspect the delay will become annoying if the file
 system is so badly damaged that fsck ends up asking many dozens of
 questions, but at least fsck no longer gives the appearance of
 hanging.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Alan Barrett <apb@cequrux.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/48714: fsck prompts only appear after being answered
Date: Wed, 9 Apr 2014 12:55:27 +0200

 On Tue, 08 Apr 2014, Andreas Gustafsson wrote:
 >Thanks for looking into this.  With the patch, the prompts do appear
 >as expected.  I suspect the delay will become annoying if the file
 >system is so badly damaged that fsck ends up asking many dozens of
 >questions, but at least fsck no longer gives the appearance of
 >hanging.

 Yes, the delay will become annoying, but I think I'll commit that 
 patch as being better than nothing.  I have some half formed ideas 
 for fixing this properly, but the fact that tee(1) is not present 
 on the root file system makes things difficult.  Essentially, 
 we want prompts to go to the console, and to go to the log file 
 /var/run/rc.log, and to appear without delay.

 --apb (Alan Barrett)

From: "Alan Barrett" <apb@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48714 CVS commit: src/etc
Date: Wed, 9 Apr 2014 12:45:05 +0000

 Module Name:	src
 Committed By:	apb
 Date:		Wed Apr  9 12:45:05 UTC 2014

 Modified Files:
 	src/etc: rc

 Log Message:
 Send a "nop" metadata message to the postprocessor every few seconds,
 to flush partial output lines.  This should help with PR 48714.


 To generate a diff of this commit:
 cvs rdiff -u -r1.167 -r1.168 src/etc/rc

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/48714: fsck prompts only appear after being answered
Date: Sat, 26 Apr 2014 03:52:35 +0000

 On Sat, Apr 05, 2014 at 07:20:00PM +0000, Andreas Gustafsson wrote:
  > I first noticed this problem on an i386 system that has been upgraded
  > to 6.1.3 from an earlier NetBSD version, and as a result of that
  > upgrade path, does not mount its root file system with "-o log",
  > and has fsck_flags="" by default.
  > 
  > Reproducing the bug on a fresh 6.1.3 install is harder because WAPBL
  > is now enabled by default, causing fsck to be bypassed in most cases,
  > and fsck_flags is set to "-p" by default, causing fsck to not prompt
  > the operator for minor problems.  Nonetheless, I have reproduced it on
  > a fresh 6.1.3/amd64 install by disabling WAPBL and setting
  > fsck_flags="".

 Boot-time fsck should always use -p... it is only -p mode that is
 expected to lead to correct crash recovery. For ffs this may not
 actually matter (I'm not sure offhand), but for other filesystem types
 you may end up in trouble.

 -- 
 David A. Holland
 dholland@netbsd.org

State-Changed-From-To: open->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Sat, 19 Jul 2014 14:40:15 +0000
State-Changed-Why:
Given the the problem only occurs on misconfigured systems, the current
fix is IMO good enough.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.