NetBSD Problem Report #50046
From www@NetBSD.org Sun Jul 12 21:55:42 2015
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 50CEAA5B2E
for <gnats-bugs@gnats.NetBSD.org>; Sun, 12 Jul 2015 21:55:42 +0000 (UTC)
Message-Id: <20150712215540.F3988A6555@mollari.NetBSD.org>
Date: Sun, 12 Jul 2015 21:55:40 +0000 (UTC)
From: frank@phoenix.owl.de
Reply-To: frank@phoenix.owl.de
To: gnats-bugs@NetBSD.org
Subject: Slow rc multiuser boot in 7.0 and current
X-Send-Pr-Version: www-1.0
>Number: 50046
>Category: misc
>Synopsis: Slow rc multiuser boot in 7.0 and current
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: phx
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Jul 12 22:00:00 +0000 2015
>Closed-Date: Wed Nov 11 11:03:10 +0000 2015
>Last-Modified: Wed Nov 11 11:03:10 +0000 2015
>Originator: Frank Wille
>Release: netbsd-7
>Organization:
NetBSD
>Environment:
NetBSD/amiga 7.0_RC1
>Description:
Since the workaround for PR 48714 the multiuser boot in 7.0 and current has become very slow, which especially affects smaller single-CPU platforms.
It has been reported on port-amiga (A3000 and A4000/68060) by myself:
http://mail-index.netbsd.org/port-amiga/2015/01/25/msg007942.html
On x68k (XM6i/68030) by Tetsuya Isaki:
http://mail-index.netbsd.org/port-amiga/2015/02/01/msg007951.html
And on Sparc (via qemu) by Arto Huusko:
http://mail-index.netbsd.org/netbsd-users/2015/07/12/msg016476.html
The above mentioned workaround introduces a background process in rc, which sends a "nop" every three seconds. Removing this passage from rc:
--- rc.orig 2015-01-16 22:17:31.000000000 +0100
+++ rc 2015-05-31 14:47:06.000000000 +0200
@@ -120,24 +120,6 @@
kill -0 $RC_PID >/dev/null 2>&1 || RC_PID=$$
#
- # As long as process $RC_PID is still running, send a "nop"
- # metadata message to the postprocessor every few seconds.
- # This should help flush partial lines that may appear when
- # rc.d scripts that are NOT marked with "KEYWORD: interactive"
- # nevertheless attempt to print prompts and wait for input.
- #
- (
- # First detach from tty, to avoid intercepting SIGINFO.
- eval "exec ${_rc_original_stdout_fd}<&-"
- eval "exec ${_rc_original_stderr_fd}<&-"
- exec </dev/null >/dev/null 2>&1
- while kill -0 $RC_PID ; do
- print_rc_metadata "nop"
- sleep 3
- done
- ) &
... makes booting noticable faster. In this case from 5 minutes to 4 minutes.
From /var/run/rc.log:
Original:
[/etc/rc starting at Sun May 31 13:27:54 CEST 2015]
...
[/etc/rc finished at Sun May 31 13:32:55 CEST 2015]
Without "nop":
[/etc/rc starting at Sun May 31 14:48:48 CEST 2015]
...
[/etc/rc finished at Sun May 31 14:52:59 CEST 2015]
There are probably other factors involved to make booting much slower
than under netbsd-6, but this one is unnecessary.
>How-To-Repeat:
Boot a slow single-CPU system into multiuser mode and compare the time it needs with netbsd-6.
>Fix:
Reverting the workaround for PR 48714 partly fixes it.
>Release-Note:
>Audit-Trail:
From: Arto Huusko <armihu@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: misc/50046
Date: Mon, 13 Jul 2015 13:43:07 +0300
Reverting revision 1.94 of etc/rc.subr fixes the slow boot.
The change causes a considerable amount of forking.
On a test system the number of forks right after boot
with rc.subr 1.94: 1080
with rc.subr 1.94 reverted: 280
Arto Huusko
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, misc-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, frank@phoenix.owl.de
Cc:
Subject: Re: misc/50046
Date: Wed, 15 Jul 2015 10:34:31 -0400
On Jul 13, 10:45am, armihu@gmail.com (Arto Huusko) wrote:
-- Subject: Re: misc/50046
| From: Arto Huusko <armihu@gmail.com>
| To: gnats-bugs@netbsd.org
| Cc:
| Subject: Re: misc/50046
| Date: Mon, 13 Jul 2015 13:43:07 +0300
|
| Reverting revision 1.94 of etc/rc.subr fixes the slow boot.
| The change causes a considerable amount of forking.
| On a test system the number of forks right after boot
| with rc.subr 1.94: 1080
| with rc.subr 1.94 reverted: 280
Can you please test the stdbuf change I posted on tech-userlevel (compile
and re-install libc), together with adding:
export STDBUF0=L
export STDBUF1=L
export STDBUF2=L
in the beginning of rc and reverting rc and rc.subr to pre-pinger changes?
Thanks,
christos
From: Frank Wille <frank@phoenix.owl.de>
To: Christos Zoulas <christos@zoulas.com>,
gnats-bugs@NetBSD.org,
misc-bug-people@netbsd.org,
gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Cc:
Subject: Re: misc/50046
Date: Tue, 21 Jul 2015 17:01:47 +0200
Christos Zoulas wrote:
> Can you please test the stdbuf change I posted on tech-userlevel
> (compile and re-install libc), together with adding:
>
> export STDBUF0=L
> export STDBUF1=L
> export STDBUF2=L
>
> in the beginning of rc and reverting rc and rc.subr to pre-pinger
> changes?
I did some test runs today on an Amiga 1200, 68030/40MHz with 64MB. The
multiuser boot process takes here 8 minutes and 44 seconds, which was less
than half the time with NetBSD 6.
In the first tests I reverted /etc/rc to 1.167, but left the _rc_pid
definition in line 87 in. Otherwise booting will hang (it is needed by
rc.subr). Results ("new clib" includes your stdbuf changes):
old clib: 7:00 min.
new clib: 6:57 min.
new clib with STDBUFn=L: 6:58 min.
new clib with STDBUFn=U: 7:01 min.
The stdbuf change doesn't make a big difference here. But removing the "nop"
pinger code is already noticable.
In the next tests I reverted /etc/rc to 1.167 and /etc/rc.subr to 1.93, as
suggested by Arto Huusko:
old clib: 3:44 min.
new clib: 3:38 min.
So I can confirm that reverting /etc/rc.subr is the key. Would be great to
find a solution here.
--
Frank Wille
From: Alan Barrett <apb@cequrux.com>
To: gnats-bugs@NetBSD.org
Cc: misc-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: misc/50046: Slow rc multiuser boot in 7.0 and current
Date: Wed, 12 Aug 2015 10:24:27 +0400
On Sun, 12 Jul 2015, frank@phoenix.owl.de wrote:
>>Description:
>Since the workaround for PR 48714 the multiuser boot in 7.0 and current has become very slow, which especially affects smaller single-CPU platforms.
I suspect the tests in the _have_rc_postprocessor function in rc.subr.
There are even comments saying that they are expensive.
Please try the appended patch.
--apb (Alan Barrett)
Index: etc/rc.subr
===================================================================
--- rc.subr 7 Oct 2014 19:09:45 -0000 1.96
+++ rc.subr 12 Aug 2015 06:14:04 -0000
@@ -814,19 +814,9 @@ $command $rc_flags $command_args"
_have_rc_postprocessor()
{
# Cheap tests that fd and pid are set, fd is writable.
- [ -n "${_rc_postprocessor_fd}" ] || return 1
- [ -n "${_rc_pid}" ] || return 1
- eval ": >&${_rc_postprocessor_fd}" 2>/dev/null || return 1
-
- # More expensive test that pid is running.
- # Unset _rc_pid if this fails.
- kill -0 "${_rc_pid}" 2>/dev/null \
- || { unset _rc_pid; return 1; }
-
- # More expensive test that pid appears to be
- # a shell running an rc script.
- # Unset _rc_pid if this fails.
- expr "$(ps -p "${_rc_pid}" -o command=)" : ".*sh .*/rc.*" >/dev/null \
+ [ -n "${_rc_pid}" ] || { unset _rc_pid; return 1; }
+ [ -n "${_rc_postprocessor_fd}" ] || { unset _rc_pid; return 1; }
+ eval ": >&${_rc_postprocessor_fd}" 2>/dev/null \
|| { unset _rc_pid; return 1; }
return 0
From: Frank Wille <frank@phoenix.owl.de>
To: gnats-bugs@NetBSD.org
Cc: Alan Barrett <apb@cequrux.com>, misc-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: misc/50046: Slow rc multiuser boot in 7.0 and current
Date: Sat, 15 Aug 2015 13:00:08 +0200
On Wed, 12 Aug 2015 08:00:01 +0000 (UTC)
Alan Barrett <apb@cequrux.com> wrote:
> I suspect the tests in the _have_rc_postprocessor function in rc.subr.
> There are even comments saying that they are expensive.
>
> Please try the appended patch.
I tested your patch with the same hardware as in my previous reply
(Amiga 1200, 68030/40MHz, 64MB).
Now (with the patch applied) the multiuser boot takes exactly 4:00 minutes.
It was 8:44 minutes before.
I think this is acceptable. The remaining 15 seconds are probably lost
through the nop-pinger code in /etc/rc.
--
Frank Wille
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: misc/50046: Slow rc multiuser boot in 7.0 and current
Date: Sat, 5 Sep 2015 22:57:57 +0000
On Wed, Aug 12, 2015 at 08:00:01AM +0000, Alan Barrett wrote:
> I suspect the tests in the _have_rc_postprocessor function in rc.subr.
> There are even comments saying that they are expensive.
>
> Please try the appended patch.
Did this get committed?
--
David A. Holland
dholland@netbsd.org
From: Frank Wille <frank@phoenix.owl.de>
To: gnats-bugs@NetBSD.org,
misc-bug-people@netbsd.org,
gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Cc:
Subject: Re: misc/50046: Slow rc multiuser boot in 7.0 and current
Date: Tue, 08 Sep 2015 15:08:58 +0200
David Holland wrote:
> Did this get committed?
No, unfortunately not.
I think we should just commit the patch, when Alan is too busy...
--
Frank Wille
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: misc/50046: Slow rc multiuser boot in 7.0 and current
Date: Wed, 9 Sep 2015 06:30:29 +0000
On Tue, Sep 08, 2015 at 01:10:07PM +0000, Frank Wille wrote:
>> Did this get committed?
>
> No, unfortunately not.
> I think we should just commit the patch, when Alan is too busy...
Yes, probably. I sort of wonder why it needs to check so extensively
for a postprocessor when it ought to be able to know based on whether
one got forked (and it shouldn't just die arbitrarily); but I'm
probably missing something and I'm not curious enough to actually wade
into the logic right now. :-/
--
David A. Holland
dholland@netbsd.org
From: Frank Wille <frank@phoenix.owl.de>
To: gnats-bugs@NetBSD.org,
misc-bug-people@netbsd.org,
gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Cc: apb@cequrux.com
Subject: Re: misc/50046: Slow rc multiuser boot in 7.0 and current
Date: Wed, 30 Sep 2015 17:22:17 +0200
Alan Barrett wrote:
> I suspect the tests in the _have_rc_postprocessor function in rc.subr.
> There are even comments saying that they are expensive.
>
> Please try the appended patch.
> [...]
Alan, can you commit that please? It has been tested and was a real
improvement. Even when it is not the final solution we need it in 7.0 now.
Thanks!
--
Frank Wille
Responsible-Changed-From-To: misc-bug-people->phx
Responsible-Changed-By: phx@NetBSD.org
Responsible-Changed-When: Sat, 31 Oct 2015 12:21:01 +0000
Responsible-Changed-Why:
Applying the patch myself.
From: "Frank Wille" <phx@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/50046 CVS commit: src/etc
Date: Sat, 31 Oct 2015 12:31:37 +0000
Module Name: src
Committed By: phx
Date: Sat Oct 31 12:31:37 UTC 2015
Modified Files:
src/etc: rc.subr
Log Message:
PR misc/50046
Remove the expensive tests in _have_rc_postprocessor(), as proposed by apb@.
It more than halves the multiuser boot time on slow machines and brings
it back near to the previous level.
To generate a diff of this commit:
cvs rdiff -u -r1.96 -r1.97 src/etc/rc.subr
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->pending-pullups
State-Changed-By: phx@NetBSD.org
State-Changed-When: Sat, 31 Oct 2015 16:19:20 +0000
State-Changed-Why:
Pullup for NetBSD-7 requested.
From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/50046 CVS commit: [netbsd-7] src/etc
Date: Sun, 8 Nov 2015 00:45:23 +0000
Module Name: src
Committed By: riz
Date: Sun Nov 8 00:45:22 UTC 2015
Modified Files:
src/etc [netbsd-7]: rc.subr
Log Message:
Pull up following revision(s) (requested by phx in ticket #1015):
etc/rc.subr: revision 1.97
PR misc/50046
Remove the expensive tests in _have_rc_postprocessor(), as proposed by apb@.
It more than halves the multiuser boot time on slow machines and brings
it back near to the previous level.
To generate a diff of this commit:
cvs rdiff -u -r1.94 -r1.94.2.1 src/etc/rc.subr
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: phx@NetBSD.org
State-Changed-When: Wed, 11 Nov 2015 11:03:10 +0000
State-Changed-Why:
Fixed, tested and pulled up to netbsd-7.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.