NetBSD Problem Report #48875
From ef@math.uni-bonn.de Thu Jun 5 16:15:47 2014
Return-Path: <ef@math.uni-bonn.de>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 7C613A6512
for <gnats-bugs@gnats.NetBSD.org>; Thu, 5 Jun 2014 16:15:47 +0000 (UTC)
Message-Id: <20140605161544.5160B1BD49@trave.math.uni-bonn.de>
Date: Thu, 5 Jun 2014 18:15:44 +0200 (CEST)
From: ef@math.uni-bonn.de
Reply-To: ef@math.uni-bonn.de
To: gnats-bugs@gnats.NetBSD.org
Subject: possible ash backgrounding bug
X-Send-Pr-Version: 3.95
>Number: 48875
>Category: bin
>Synopsis: possible ash backgrounding bug
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: kre
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Jun 05 16:20:00 +0000 2014
>Closed-Date: Sat Oct 26 21:10:14 +0000 2019
>Last-Modified: Wed Apr 28 10:00:02 +0000 2021
>Originator: Edgar Fuß <ef@math.uni-bonn.de>
>Release: NetBSD 6.1_STABLE
>Organization:
Mathematisches Institut der Universität Bonn
>Environment:
Architecture: x86_64
Machine: amd64
>Description:
When doing $() command substitution calling a background function,
evaluation pauses until the backgrounded process finishes.
This doesn't happen if the function body is expanded inside the substitution.
>How-To-Repeat:
The first echo will finish imediately, the second only after three seconds:
#!/bin/sh
f() {
sleep 3
}
echo $(sleep 3 >&- & echo $!)
echo $(f >&- & echo $!)
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: bin-bug-people->krre
Responsible-Changed-By: kre@NetBSD.org
Responsible-Changed-When: Fri, 06 May 2016 19:52:54 +0000
Responsible-Changed-Why:
I am investigating this PR & the unnderlying issues.
State-Changed-From-To: open->analyzed
State-Changed-By: kre@NetBSD.org
State-Changed-When: Fri, 06 May 2016 19:52:54 +0000
State-Changed-Why:
This turns out not to be a backgrounding problem, but a redirection issue.
Since functions execute in the current shell, when a function is called
with fd redirections, the old values of the fds need to be saved (via dup)
so they can be restored after the function returns. The careful closing of
stdout when running "f" in the supplied script, with the intent of this
not having it open while sleep is running (which works just fine when sleep
is called directly) is thus thwarted, the shell running the function has its
extra copy which remains open.
This, in the parent shell, reading the results of the command substitution
stalls until that fd is closed, which only happens after the function has
completed, after the sleep returns.
Most ways of trying to avoid this in the script do not work, however
it can be written as:
echo $( { exec >&- ; f ;} & echo $! )
which closes the toublesome fd in the shell that will run the function,
before the function starts, leaving nothing to cause the stall.
I am still investigating just how the shell can figure this out on its
own, as one of the posix goals/requirements is that there is not supposed
to be any (trivial) way to determine if a utility is implemented as a command
or a function, and this would be a trivial way...
Responsible-Changed-From-To: krre->kre
Responsible-Changed-By: kre@NetBSD.org
Responsible-Changed-When: Fri, 06 May 2016 20:10:35 +0000
Responsible-Changed-Why:
Fix typo. You'd think I'd know my own user name (it's only been 40 years)
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/48875 CVS commit: src/bin/sh
Date: Mon, 9 May 2016 21:03:10 +0000
Module Name: src
Committed By: kre
Date: Mon May 9 21:03:10 UTC 2016
Modified Files:
src/bin/sh: eval.c eval.h main.c
Log Message:
PR bin/48875 - avoid holding (replaced) file descriptors open when running a
command in the current shell (so they can be restored for the next command)
in cases where it is obvious that there is not going to be a following
command to use them. This fixes the problem reported in the PR (though
there are still plenty of situations where a FD could be closed but isn't,
we do not do full fd flow eveluation to determine whether a fd will be
used or not).
This is the change that was just committed and then backed out again...
OK christos@
To generate a diff of this commit:
cvs rdiff -u -r1.124 -r1.125 src/bin/sh/eval.c
cvs rdiff -u -r1.18 -r1.19 src/bin/sh/eval.h
cvs rdiff -u -r1.66 -r1.67 src/bin/sh/main.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: analyzed->feedback
State-Changed-By: kre@NetBSD.org
State-Changed-When: Mon, 09 May 2016 22:08:16 +0000
State-Changed-Why:
I think this is fixed as best it reasonably can be,
Do you agree?
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/48875 CVS commit: src/tests/bin/sh
Date: Mon, 9 May 2016 22:34:37 +0000
Module Name: src
Committed By: kre
Date: Mon May 9 22:34:37 UTC 2016
Modified Files:
src/tests/bin/sh: t_redir.sh
Log Message:
PR bin/48875 PR bin/51123 This adds tests more that verify fide descriptor
redirection works correctly (including that the bugs reported in those PRs
are fixed.) Note that the tests for 48875 are slow, so one of the new
test cases ends up running > 25 seconds (just doing sleeps) - each individual
test is just a few seconds, but there are several of them.
OK christos@
To generate a diff of this commit:
cvs rdiff -u -r1.6 -r1.7 src/tests/bin/sh/t_redir.sh
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/48875 CVS commit: src/bin/sh
Date: Tue, 10 May 2016 15:14:30 +0000
Module Name: src
Committed By: kre
Date: Tue May 10 15:14:30 UTC 2016
Modified Files:
src/bin/sh: eval.c
Log Message:
PR bin/48875 - minor correction (well, not so minor) - commands in loops
must be assumed to have something following, even if the loop itself doesn't,
so redirected fd's around func calls need to be saved. Should fix etcupdate
To generate a diff of this commit:
cvs rdiff -u -r1.125 -r1.126 src/bin/sh/eval.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/48875 CVS commit: src/bin/sh
Date: Fri, 13 May 2016 10:32:52 +0000
Module Name: src
Committed By: kre
Date: Fri May 13 10:32:52 UTC 2016
Modified Files:
src/bin/sh: eval.c
Log Message:
More fallout from the fix for PR bin/48875 - this one found just by
code reading, rather than any actual real use case failing.
With this script
f()
{
echo hello $1
}
exec 3>&1
echo $(
for i in a b c
do
echo @$i
f >&3
done >/tmp/foo
)
echo foo= $(cat /tmp/foo)
what should be output is
hello
hello
hello
foo= @a @b @c
but since the (my) 48875 fix the other day, we've been getting
hello
@b
hello
@c
hello
foo= @a
This fixes that. I think (hope) this is the last of these fixes...
To generate a diff of this commit:
cvs rdiff -u -r1.126 -r1.127 src/bin/sh/eval.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/48875 (possible ash backgrounding bug)
Date: Wed, 25 May 2016 15:40:11 +0200
> I think this is fixed as best it reasonably can be,
> Do you agree?
I guess yes, but I'm somewhat overwhelmed by how complicated the issue really is.
With your fix, the shell will mostly do what I mean (which is good), but will it still do what some shell hackers who know about the circumstances will expect it to do?
Does POSIX (either the standard or the experts behind the standard) say anything about the issue?
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/48875 (possible ash backgrounding bug)
Date: Wed, 25 May 2016 22:37:56 +0700
Date: Wed, 25 May 2016 13:45:01 +0000 (UTC)
From: Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>
Message-ID: <20160525134501.615E27AAB0@mollari.NetBSD.org>
| > I think this is fixed as best it reasonably can be,
| > Do you agree?
| I guess yes, but I'm somewhat overwhelmed by how complicated the
| issue really is.
Turns out that it is even more complicated... I have not yet considered
the issue or traps (which Jilles Tjoelker, principal maintainer of the
FreeBSD shell, pointed out to me recently.) I am still pondering how
best to deal with this. It might mean that the method used for the
current fix cannot work at all, and the whole thing needs to be revisited.
| With your fix, the shell will mostly do what I mean (which is good),
| but will it still do what some shell hackers who know about the
| circumstances will expect it to do?
Mostly I think, as it is, if only we could ignore traps. But we cannot.
It is all a bit messy as functions have to operate in the current shell,
which means anything that is done which is supposed to be local to the
function (such as redirecting its input or output) have to be able to be
undone again. The only way to do that is to keep fd's open, so they can
be returned when needed. That then has the side effect you have seen
when something else is waiting on the fd being closed.
The current fix attempts to determine when the fd cannot be needed again,
and so can simply be closed, but as with all attempts to predict the future,
it is never as easy as it seems it should be.
| Does POSIX (either the standard or the experts behind the standard)
| say anything about the issue?
The standard no - it does not say about implementation, nor about sequencing
or timing. The correct eventual outcome was achieved in the case you
highlighted, everything else beyond that is just a matter of hoq the
implementation works. The experts, I don't know, I am not sure it has
ever really been considered.
kre
From: Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>
To: gnats-bugs@gnats.netbsd.org
Cc:
Subject: Re: bin/48875
Date: Mon, 6 Jun 2016 18:45:01 +0200
The PR seems to still be in feedback state.
However, I don't feel I'm in a position to give any educated feedback given the problem seems to be far more involved than I anticipated.
I do a fair amount of shell programming in the sense of writing #!/bin/sh scripts, but not in the sense of digging around in the shell's C source.
State-Changed-From-To: feedback->analyzed
State-Changed-By: kre@NetBSD.org
State-Changed-When: Mon, 06 Jun 2016 18:50:15 +0000
State-Changed-Why:
Feedback (of a kind) received. Issue is back under re-investigation.
Problem understood, whether it can be properly corrected is not yet
clear - the current implemented fix may need to be undone.
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: PR/48875 CVS commit: src/tests/bin/sh
Date: Tue, 7 Jun 2016 16:30:38 +0000
On Mon, May 09, 2016 at 10:35:01PM +0000, Robert Elz wrote:
> Modified Files:
> src/tests/bin/sh: t_redir.sh
>
> Log Message:
> PR bin/48875 PR bin/51123 This adds tests more that verify fide descriptor
> redirection works correctly (including that the bugs reported in those PRs
> are fixed.) Note that the tests for 48875 are slow, so one of the new
> test cases ends up running > 25 seconds (just doing sleeps) - each individual
> test is just a few seconds, but there are several of them.
Minor note on this in passing: our sleep(1) honors "sleep 0.1" and the
tests don't really have to be totally portable, so you might be able
to improve the overall sleep time.
--
David A. Holland
dholland@netbsd.org
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: PR/48875 CVS commit: src/tests/bin/sh
Date: Wed, 08 Jun 2016 02:29:15 +0700
Date: Tue, 7 Jun 2016 16:35:00 +0000 (UTC)
From: David Holland <dholland-bugs@netbsd.org>
Message-ID: <20160607163500.C74B37AAAE@mollari.NetBSD.org>
| Minor note on this in passing: our sleep(1) honors "sleep 0.1" and the
| tests don't really have to be totally portable,
Yes, I know, and while I'd prefer the tests to be as portable as possible,
not all of them are, but ...
| so you might be able to improve the overall sleep time.
It might be possible to make some minor improvement, but the real issue
is that the sequencing of the sleeps finishing matters to the tests
working correctly, on an unloaded system we might expect that
sleep 0.2 & sleep 0.1 &
would have the 2nd sleep finish first, but if the system is just a bit
loaded and it takes 0.1 of a second for the second one to even start,
we have a race, and the test will sporadically report failures.
Rather than having that happen, I think it better to have longer sleeps,
with bigger differences in the intervals, even if that does mean an additional
20 seconds (of idleness) during the test run, so that even on truly busy
systems it is very unlikely that there will end up being false failures
(or real failures that aren't a result of a failure in what is really
being tested, however you want to characterise it.)
kre
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/48875 CVS commit: src/bin/sh
Date: Sun, 19 Aug 2018 11:16:13 +0000
Module Name: src
Committed By: kre
Date: Sun Aug 19 11:16:13 UTC 2018
Modified Files:
src/bin/sh: eval.c eval.h main.c
Log Message:
PR bin/48875
Revert the changes that were made 19 May 2016 (principally eval.c 1.125)
and the bug fixes in subsequent days (eval.c 1.126 and 1.127) and also
update some newer code that was added more recently which acted in
accordance with those changes (make that code be as it would have been
if the changes now being reverted had never been made).
While the changes made did solve the problem, in a sense, they were
never correct (see the PR for some discussion) and it had always been
intended that they be reverted. However, in practical sh code, no
issues were reported - until just recently - so nothing was done,
until now...
After this commit, the validate_fn_redirects test case of the sh ATF
test t_redir will fail. In particular, the subtest of that test
case which is described in the source (of the test) as:
This one is the real test for PR bin/48875
will fail.
Alternative changes, not to "fix" the problem in the PR, but to
often avoid it will be coming very soon - after which that ATF
test will succeed again.
XXX pullup-8
To generate a diff of this commit:
cvs rdiff -u -r1.157 -r1.158 src/bin/sh/eval.c
cvs rdiff -u -r1.20 -r1.21 src/bin/sh/eval.h
cvs rdiff -u -r1.73 -r1.74 src/bin/sh/main.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/48875 CVS commit: src/bin/sh
Date: Sun, 19 Aug 2018 23:50:27 +0000
Module Name: src
Committed By: kre
Date: Sun Aug 19 23:50:27 UTC 2018
Modified Files:
src/bin/sh: error.h eval.c input.c input.h main.c parser.c trap.c
trap.h
Log Message:
PR bin/48875 (is related, and ameliorated, but not exactly "fixed")
Import a whole set of tree evaluation enhancements from FreeBSD.
With these, before forking, the shell predicts (often) when all it will
have to do after forking (in the parent) is wait for the child and then
exit with the status from the child, and in such a case simply does not
fork, but rather allows the child to take over the parent's role.
This turns out to handle the particular test case from PR bin/48875 in
such a way that it works as hoped, rather than as it did (the delay there
was caused by an extra copy of the shell hanging around waiting for the
background child to complete ... and keeping the command substitution
stdout open, so the "real" parent had to wait in case more output appeared).
As part of doing this, redirection processing for compound commands gets
moved out of evalsubshell() and into a new evalredir(), which allows us
to properly handle errors occurring while performing those redirects,
and not mishandle (as in simply forget) fd's which had been moved out
of the way temporarily.
evaltree() has its degree of recursion reduced by making it loop to
handle the subsequent operation: that is instead of (for any binop
like ';' '&&' (etc)) where it used to
evaltree(node->left);
evaltree(node->right);
return;
it now does (kind of)
next = node;
while ((node = next) != NULL) {
next = NULL;
if (node is a binary op) {
evaltree(node->left);
if appropriate /* if && test for success, etc */
next = node->right;
continue;
}
/* similar for loops, etc */
}
which can be a good saving, as while the left side (now) tends to be
(usually) a simple (or simpleish) command, the right side can be many
commands (in a command sequence like a; b; c; d; ... the node at the
top of the tree will now have "a" as its left node, and the tree for
b; c; d; ... as its right node - until now everything was evaluated
recursively so it made no difference, and the tree was constructed
the other way).
if/while/... statements are done similarly, recurse to evaluate the
condition, then if the (or one of the) body parts is to be evaluated,
set next to that, and loop (previously it recursed).
There is more to do in this area (particularly in the way that case
statements are processed - we can avoid recursion there as well) but
that can wait for another day.
While doing all of this we keep much better track of when the shell is
just going to exit once the current tree is evaluated (with a new
predicate at_eof() to tell us that we have, for sure, reached the end
of the input stream, that is, this shell will, for certain, not be reading
more command input) and use that info to avoid unneeded forks. For that
we also need another new predicate (have_traps()) to determine of there
are any caught traps which might occur - if there are, we need to remain
to (potentially) handle them, so these optimisations will not occur (to
make the issue in PR 48875 appear again, run the same code, but with a
trap set to execute some code when a signal (or EXIT) occurs - note that
the trap must be set in the appropriate level of sub-shell to have this
effect, any caught traps are cleared in a subshell whenever one is created).
There is still work to be done to handle traps properly, whatever
weirdness they do (some of which is related to some of this.)
These changes do not need man page updates, but 48875 does - an update
to sh.1 will be forthcoming once it is decided what it should say...
Once again, all the heavy lifting for this set of changes comes directly
(with thanks) from the FreeBSD shell.
XXX pullup-8 (but not very soon)
To generate a diff of this commit:
cvs rdiff -u -r1.20 -r1.21 src/bin/sh/error.h src/bin/sh/input.h
cvs rdiff -u -r1.158 -r1.159 src/bin/sh/eval.c
cvs rdiff -u -r1.62 -r1.63 src/bin/sh/input.c
cvs rdiff -u -r1.74 -r1.75 src/bin/sh/main.c
cvs rdiff -u -r1.149 -r1.150 src/bin/sh/parser.c
cvs rdiff -u -r1.44 -r1.45 src/bin/sh/trap.c
cvs rdiff -u -r1.22 -r1.23 src/bin/sh/trap.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/48875 CVS commit: [netbsd-8] src/bin/sh
Date: Sat, 25 Aug 2018 14:22:50 +0000
Module Name: src
Committed By: martin
Date: Sat Aug 25 14:22:50 UTC 2018
Modified Files:
src/bin/sh [netbsd-8]: eval.c eval.h main.c
Log Message:
Pull up following revision(s) (requested by kre in ticket #983):
bin/sh/eval.c: revision 1.158
bin/sh/eval.h: revision 1.21
bin/sh/main.c: revision 1.74
PR bin/48875
Revert the changes that were made 19 May 2016 (principally eval.c 1.125)
and the bug fixes in subsequent days (eval.c 1.126 and 1.127) and also
update some newer code that was added more recently which acted in
accordance with those changes (make that code be as it would have been
if the changes now being reverted had never been made).
While the changes made did solve the problem, in a sense, they were
never correct (see the PR for some discussion) and it had always been
intended that they be reverted. However, in practical sh code, no
issues were reported - until just recently - so nothing was done,
until now...
After this commit, the validate_fn_redirects test case of the sh ATF
test t_redir will fail. In particular, the subtest of that test
case which is described in the source (of the test) as:
This one is the real test for PR bin/48875
will fail.
Alternative changes, not to "fix" the problem in the PR, but to
often avoid it will be coming very soon - after which that ATF
test will succeed again.
XXX pullup-8
To generate a diff of this commit:
cvs rdiff -u -r1.140.2.4 -r1.140.2.5 src/bin/sh/eval.c
cvs rdiff -u -r1.19 -r1.19.8.1 src/bin/sh/eval.h
cvs rdiff -u -r1.70.2.1 -r1.70.2.2 src/bin/sh/main.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/48875 CVS commit: src/bin/sh
Date: Sat, 25 Aug 2018 17:35:31 +0000
Module Name: src
Committed By: kre
Date: Sat Aug 25 17:35:31 UTC 2018
Modified Files:
src/bin/sh: sh.1
Log Message:
PR bin/48875
Add a paragraph (briefer than previously posted to mailing lists)
to explain that there is no guarantee that the results of a command
substitution will be available before all commands started by the
cmdsub have completed.
Include the original proposed text (much longer) as *roff comments, so
it will at least be available to those who browse the man page sources.
While here, clean up the existing text about command substitutions to
make it a little more accurate (and to advise against using the `` form).
To generate a diff of this commit:
cvs rdiff -u -r1.206 -r1.207 src/bin/sh/sh.1
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: analyzed->feedback
State-Changed-By: kre@NetBSD.org
State-Changed-When: Sat, 25 Aug 2018 17:51:12 +0000
State-Changed-Why:
I believe that we have probably done as much as is possible now
for this PR. The sh in current runs the test case from the PR
without delays, and the more complex similar one from the ATF tests.
There is also now text in sh.1 which explains that there is no
guarantee that the results of a command substitution will become
available to the parent shell before all commands started by the
commend substitution have finished.
Is this sufficient to close this PR (given that this time at least
the basics of the fixes committed are correct) ?
From: Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/48875 (possible ash backgrounding bug)
Date: Sun, 2 Sep 2018 11:00:05 +0200
> Is this sufficient to close this PR (given that this time at least
> the basics of the fixes committed are correct)?
From my PoV, yes.
Please note, however, that I cannot check whether the real original problem has been solved. The issue originally occured on Debian, thus dash. I boiled it down to what I wrote in my original mail to tech-userlevel@ (the thread which later led to this PR), and checked that boiled-down example against NetBSD's ash (/bin/sh).
As kre obviously fixed NetBSD's /bin/sh and not Debian's, I can't check the original, real thing. It's also unfeasable for me to port the real thing (which is dotcache, which is for desktop systems, and I don't have NetBSD desktop systems) to NetBSD.
State-Changed-From-To: feedback->needs-pullups
State-Changed-By: kre@NetBSD.org
State-Changed-When: Sun, 02 Sep 2018 17:21:47 +0000
State-Changed-Why:
After some more settling in time, the chanegs (including the
fix that was applied) should be pulled up to -8
So far, as things are now, it all seems stable.
State-Changed-From-To: needs-pullups->closed
State-Changed-By: kre@NetBSD.org
State-Changed-When: Sat, 26 Oct 2019 21:10:14 +0000
State-Changed-Why:
While performance improvements are still to come for the shell,
which might, perhaps, improve the issues in this PR even more,
I believe there is no more directly to be done.
No pullup to -8 will be made, the fix (ameriolation) will be in -9
only, there are simply too many unrelated changes in -9 to sort
out all of the bits needed to get this into -8, and as it isn't
actually a bug fix, just easier to understand behaviour, risking
the stability of the -8 shell for this doesn't seem a good choice.
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/48875 CVS commit: src/bin/sh
Date: Sun, 4 Apr 2021 13:24:07 +0000
Module Name: src
Committed By: kre
Date: Sun Apr 4 13:24:07 UTC 2021
Modified Files:
src/bin/sh: eval.c jobs.c jobs.h
Log Message:
Related to PR bin/48875
Correct an issue found by Oguz <oguzismailuysal@gmail.com> and reported
in e-mail (on the bug-bash list initially!) with the code changed to deal
with PR bin/48875
With:
sh -c 'echo start at $SECONDS;
(sleep 3 & (sleep 1& wait) );
echo end at $SECONDS'
The shell should say "start at 0\nend at 1\n", but instead (before
this fix, in -9 and HEAD, but not -8) does "start at 0\nend at 3\n"
(Not in -8 as the 48875 changes were never pulled up)>
There was an old problem, fixed years ago, which cause the same symptom,
related to the way the jobs table was cleared (or not) in subshells, and
it seemed like that might have resurfaced.
But not so, the issue here is the sub-shell elimination, which was part
of the 48875 "fix" (not really, it wasn't really a bug, just sub-optimal
and unexpected behaviour).
What the shell actually has been running in this case is:
sh -c 'echo start at $SECONDS;
(sleep 3 & sleep 1& wait );
echo end at $SECONDS'
as the inner subshell was deemed unnecessary - all its parent would
do is wait for its exit status, and then exit with that status - we
may as well simply replace the current sub-shell with the new one,
let it do its thing, and we're done...
But not here, the running "sleep 3" will remain a child of that merged
sub-shell, and the "wait" will thus wait for it, along with the sleep 1
which is all it should be seeing.
For now, fix this by not eliminating a sub-shell if there are existing
unwaited upon children in the current one. It might be possible to
simply disregard the old child for the purposes of wait (and "jobs", etc,
all cmds which look at the jobs table) but the bookkeeping required to
make that work reliably is likely to take some time to get correct...
Along with this fix comes a fix to DEBUG mode shells, which, in situations
like this, could dump core in the debug code if the relevant tracing was
enabled, and add a new trace for when the jobs table is cleared (which was
added predating the discovery of the actual cause of this issue, but seems
worth keeping.) Neither of these changes have any effect on shells
compiled normally.
XXX pullup -9
To generate a diff of this commit:
cvs rdiff -u -r1.181 -r1.182 src/bin/sh/eval.c
cvs rdiff -u -r1.109 -r1.110 src/bin/sh/jobs.c
cvs rdiff -u -r1.23 -r1.24 src/bin/sh/jobs.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/48875 CVS commit: [netbsd-9] src/bin/sh
Date: Wed, 28 Apr 2021 09:58:43 +0000
Module Name: src
Committed By: martin
Date: Wed Apr 28 09:58:43 UTC 2021
Modified Files:
src/bin/sh [netbsd-9]: eval.c jobs.c jobs.h
Log Message:
Pull up following revision(s) (requested by kre in ticket #1259):
bin/sh/jobs.h: revision 1.24
bin/sh/eval.c: revision 1.182
bin/sh/jobs.c: revision 1.110
Related to PR bin/48875
Correct an issue found by Oguz <oguzismailuysal@gmail.com> and reported
in e-mail (on the bug-bash list initially!) with the code changed to deal
with PR bin/48875
With:
sh -c 'echo start at $SECONDS;
(sleep 3 & (sleep 1& wait) );
echo end at $SECONDS'
The shell should say "start at 0\nend at 1\n", but instead (before
this fix, in -9 and HEAD, but not -8) does "start at 0\nend at 3\n"
(Not in -8 as the 48875 changes were never pulled up)>
There was an old problem, fixed years ago, which cause the same symptom,
related to the way the jobs table was cleared (or not) in subshells, and
it seemed like that might have resurfaced.
But not so, the issue here is the sub-shell elimination, which was part
of the 48875 "fix" (not really, it wasn't really a bug, just sub-optimal
and unexpected behaviour).
What the shell actually has been running in this case is:
sh -c 'echo start at $SECONDS;
(sleep 3 & sleep 1& wait );
echo end at $SECONDS'
as the inner subshell was deemed unnecessary - all its parent would
do is wait for its exit status, and then exit with that status - we
may as well simply replace the current sub-shell with the new one,
let it do its thing, and we're done...
But not here, the running "sleep 3" will remain a child of that merged
sub-shell, and the "wait" will thus wait for it, along with the sleep 1
which is all it should be seeing.
For now, fix this by not eliminating a sub-shell if there are existing
unwaited upon children in the current one. It might be possible to
simply disregard the old child for the purposes of wait (and "jobs", etc,
all cmds which look at the jobs table) but the bookkeeping required to
make that work reliably is likely to take some time to get correct...
Along with this fix comes a fix to DEBUG mode shells, which, in situations
like this, could dump core in the debug code if the relevant tracing was
enabled, and add a new trace for when the jobs table is cleared (which was
added predating the discovery of the actual cause of this issue, but seems
worth keeping.) Neither of these changes have any effect on shells
compiled normally.
XXX pullup -9
To generate a diff of this commit:
cvs rdiff -u -r1.175.2.2 -r1.175.2.3 src/bin/sh/eval.c
cvs rdiff -u -r1.106.2.1 -r1.106.2.2 src/bin/sh/jobs.c
cvs rdiff -u -r1.23 -r1.23.2.1 src/bin/sh/jobs.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.