NetBSD Problem Report #54514
From www@netbsd.org Fri Aug 30 19:39:40 2019
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id E5BDF7A153
for <gnats-bugs@gnats.NetBSD.org>; Fri, 30 Aug 2019 19:39:39 +0000 (UTC)
Message-Id: <20190830193938.CFF857A188@mollari.NetBSD.org>
Date: Fri, 30 Aug 2019 19:39:38 +0000 (UTC)
From: logix@foobar.franken.de
Reply-To: logix@foobar.franken.de
To: gnats-bugs@NetBSD.org
Subject: script(1) sometimes swallows last line(s) of output
X-Send-Pr-Version: www-1.0
>Number: 54514
>Category: bin
>Synopsis: script(1) sometimes swallows last line(s) of output
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: bin-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Aug 30 19:40:00 +0000 2019
>Closed-Date: Tue May 09 20:58:28 +0000 2023
>Last-Modified: Tue May 09 20:58:28 +0000 2023
>Originator: Harold Gutch
>Release: 8.0
>Organization:
>Environment:
NetBSD localhost 8.0 NetBSD 8.0 (GENERIC) #0: Tue Jul 17 14:59:51 UTC 2018 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
script(1) can terminate before the last few lines of output are printed and dumped to the logfile. The reason is that the script process handling output and writing to file might receive a SIGCHLD and terminate before finishing its job.
>How-To-Repeat:
This is non-deterministic, but in my tests (see "Fix") I got wrong output depending on the exact test script ~30%-70% of the time.
% cat > test.sh << EOF
? #!/bin/sh
? echo 1
? echo 2
? echo 3
? EOF
% chmod u+x test.sh
% ./test.sh
1
2
3
% script -c ./test.sh typescript
Script started, output file is typescript
1
Script done, output file is typescript
% cat typescript
Script started on Fri Aug 30 21:16:26 2019
1
Script done on Fri Aug 30 21:16:26 2019
%
>Fix:
Inspired by OpenBSD ( http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/script/script.c.diff?r1=1.26&r2=1.27 ):
-Index: script.c
===================================================================
RCS file: /usr/cvs/netbsd/src/usr.bin/script/script.c,v
retrieving revision 1.21
diff -u -p -r1.21 script.c
--- src/usr.bin/script/script.c 6 Sep 2011 18:29:56 -0000 1.21
+++ src/usr.bin/script/script.c 30 Aug 2019 21:22:48 -0000
@@ -166,7 +166,6 @@ main(int argc, char *argv[])
rtt.c_lflag &= ~ECHO;
(void)tcsetattr(STDIN_FILENO, TCSAFLUSH, &rtt);
- (void)signal(SIGCHLD, finish);
child = fork();
if (child < 0) {
warn("fork");
@@ -183,6 +182,7 @@ main(int argc, char *argv[])
else
doshell(command);
}
+ (void)signal(SIGCHLD, finish);
if (!rawout)
(void)fclose(fscript);
@@ -226,6 +226,7 @@ dooutput(void)
(void)fprintf(fscript, "Script started on %s", ctime(&tvec));
(void)signal(SIGALRM, scriptflush);
+ (void)signal(SIGCHLD, SIG_IGN);
value.it_interval.tv_sec = SECSPERMIN / 2;
value.it_interval.tv_usec = 0;
value.it_value = value.it_interval;
Result:
1) without patch:
$ for i in `jot 100`; do script -c ./test.sh typescript.${i}; done
[...]
$ wc -l typescript.* | awk '/typescript/{print $1}' | sort | uniq -c
71 4
29 6
===> only in 29 out of 100 runs the output contained 6 lines (1 line header, 3 lines regular output, 1 new line, 1 line footer)
2) with patch:
$ for i in `jot 100`; do ./script -c ./test.sh typescript.${i}; done
[...]
$ wc -l typescript.* | awk '/typescript/{print $1}' | sort | uniq -c
100 6
===> every run contained the expected 6 lines
>Release-Note:
>Audit-Trail:
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/54514: script(1) sometimes swallows last line(s) of output
Date: Sat, 31 Aug 2019 08:58:18 +0700
I don't believe that is the correct fix. Your first change
(moving the signal(SIGCHLD, finish) later) opens a possible
(if unlikely) race if the child exits very quickly (before the
signal is established) - which would result in the parent
hanging until it reads EOF from stdin (which for your tests would
be essentially forever - until manually killed).
The correct way to handle this is to leave the initial signal() call
catching SIGCHLD where it is currently, and insert
(void)signal(SIGCHLD, SIG_DFL);
in the
if (child == 0) {
case, before the subsequent (second) fork(). (An alternative would be
to save the result of the initial signal call, and use that instead of
SIG_DFL in the one to be added - but that should make no real difference.)
Your second change is wrong (setting SIGCHLD to SIG_IGN is almost
never the right thing to do), and in any case unnecessary, both
because after your first change (or after the version of it I
suggest above) when dooutput() is called the state of SIGCHLD will
be SIG_DFL (which results in what I suspect you wanted to achieve
by setting it to SIG_IGN) and also because dooutput() is called in the
grandchild process, which never forks, so never has any children
and so will never receive a SIGCHLD in any case. This change can
simply be deleted.
Please try again with the changes as suggested, and let us know
if my suggested variation for your patch fixes the problem.
kre
ps: this has most likely never been a problem, as script is almost
always used interactively, and only exits when the user enters ^D
to the terminal, which they typically don't do until all the output
that is expected has been seen. For the kind of usage in your test.
the input is already known (it is in the test script) so isn't needed,
to capture output simple shell redirection is enough. I am not sure
why anyone would ever normally run script as it is being run here.
From: Harold Gutch <logix@foobar.franken.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/54514: script(1) sometimes swallows last line(s) of output
Date: Sun, 1 Sep 2019 21:42:52 +0200
Hi,
On Sat, Aug 31, 2019 at 02:00:02AM +0000, Robert Elz wrote:
> The following reply was made to PR bin/54514; it has been noted by GNATS.
>
> From: Robert Elz <kre@munnari.OZ.AU>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: bin/54514: script(1) sometimes swallows last line(s) of output
> Date: Sat, 31 Aug 2019 08:58:18 +0700
>
> I don't believe that is the correct fix. Your first change
> (moving the signal(SIGCHLD, finish) later) opens a possible
> (if unlikely) race if the child exits very quickly (before the
> signal is established) - which would result in the parent
> hanging until it reads EOF from stdin (which for your tests would
> be essentially forever - until manually killed).
Good catch, you're right - thanks!
> The correct way to handle this is to leave the initial signal() call
> catching SIGCHLD where it is currently, and insert
> (void)signal(SIGCHLD, SIG_DFL);
> in the
> if (child == 0) {
> case, before the subsequent (second) fork(). (An alternative would be
> to save the result of the initial signal call, and use that instead of
> SIG_DFL in the one to be added - but that should make no real difference.)
>
> Your second change is wrong (setting SIGCHLD to SIG_IGN is almost
> never the right thing to do), and in any case unnecessary, both
> because after your first change (or after the version of it I
> suggest above) when dooutput() is called the state of SIGCHLD will
> be SIG_DFL (which results in what I suspect you wanted to achieve
> by setting it to SIG_IGN) and also because dooutput() is called in the
> grandchild process, which never forks, so never has any children
> and so will never receive a SIGCHLD in any case. This change can
> simply be deleted.
Thanks for pointing this out. I wasn't aware of the difference
between SIG_IGN and SIG_DFL for the SIGCHILD case.
> Please try again with the changes as suggested, and let us know
> if my suggested variation for your patch fixes the problem.
Yes, your suggestion fixes the problem for me.
--- src/usr.bin/script/script.c 2011-09-06 18:29:56.000000000 +0000
+++ src/usr.bin/script/script.c 2019-08-31 12:41:26.368033644 +0000
@@ -173,6 +173,7 @@
fail();
}
if (child == 0) {
+ (void)signal(SIGCHLD, SIG_DFL);
subchild = child = fork();
if (child < 0) {
warn("fork");
> kre
>
> ps: this has most likely never been a problem, as script is almost
> always used interactively, and only exits when the user enters ^D
> to the terminal, which they typically don't do until all the output
> that is expected has been seen. For the kind of usage in your test.
> the input is already known (it is in the test script) so isn't needed,
> to capture output simple shell redirection is enough. I am not sure
> why anyone would ever normally run script as it is being run here.
I actually use script(1) that way somewhat regularly if I want to log
the output of a longer job. The alternative would be to redirect stdout
and stderr, open a second terminal and "tail -f" the log file (but
very occasionally I also want timestamping via "script -t"). I find
it simpler to run "script ./blah logfile". And sure, I could do it in
two steps, call script and then call whatever I want interactively.
So yes, I really did stumble over this issue a while ago in a real
situation. And of course the test case is contrived and just
illustrates the issue.
Thanks for your input!
Harold
From: Harold Gutch <logix@foobar.franken.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/54514: script(1) sometimes swallows last line(s) of output
Date: Tue, 9 May 2023 17:37:25 +0200
On Sun, Sep 01, 2019 at 09:42:52PM +0200, Harold Gutch wrote:
> On Sat, Aug 31, 2019 at 02:00:02AM +0000, Robert Elz wrote:
> > Please try again with the changes as suggested, and let us know
> > if my suggested variation for your patch fixes the problem.
>
> Yes, your suggestion fixes the problem for me.
>
> --- src/usr.bin/script/script.c 2011-09-06 18:29:56.000000000 +0000
> +++ src/usr.bin/script/script.c 2019-08-31 12:41:26.368033644 +0000
> @@ -173,6 +173,7 @@
> fail();
> }
> if (child == 0) {
> + (void)signal(SIGCHLD, SIG_DFL);
> subchild = child = fork();
> if (child < 0) {
> warn("fork");
script.c has received a number of changes since, and in the meanwhile
the symptomps of this have changed a bit and no longer swallow lines
but instead duplicate the "done" line:
$ script -c ./test.sh typescript
Script started, output file is typescript
1
2
3
Script done, output file is typescript
$ cat typescript
Script started on Tue May 9 17:34:31 2023
1
2
3
Script done on Tue May 9 17:34:31 2023
Script done on Tue May 9 17:34:31 2023
resetting the SIGCHLD signal handler to SIG_DFL for the child process
still fixes this.
Harold
State-Changed-From-To: open->closed
State-Changed-By: hgutch@NetBSD.org
State-Changed-When: Tue, 09 May 2023 20:58:28 +0000
State-Changed-Why:
Fixed with rev. 1.34 of src/usr.bin/script/script.c
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.