NetBSD Problem Report #48875

From ef@math.uni-bonn.de  Thu Jun  5 16:15:47 2014
Return-Path: <ef@math.uni-bonn.de>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 7C613A6512
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  5 Jun 2014 16:15:47 +0000 (UTC)
Message-Id: <20140605161544.5160B1BD49@trave.math.uni-bonn.de>
Date: Thu,  5 Jun 2014 18:15:44 +0200 (CEST)
From: ef@math.uni-bonn.de
Reply-To: ef@math.uni-bonn.de
To: gnats-bugs@gnats.NetBSD.org
Subject: possible ash backgrounding bug
X-Send-Pr-Version: 3.95

>Number:         48875
>Category:       bin
>Synopsis:       possible ash backgrounding bug
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    kre
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jun 05 16:20:00 +0000 2014
>Closed-Date:    Sat Oct 26 21:10:14 +0000 2019
>Last-Modified:  Wed Apr 28 10:00:02 +0000 2021
>Originator:     Edgar Fuß <ef@math.uni-bonn.de>
>Release:        NetBSD 6.1_STABLE
>Organization:
	Mathematisches Institut der Universität Bonn
>Environment:
Architecture: x86_64
Machine: amd64
>Description:
	When doing $() command substitution calling a background function,
	evaluation pauses until the backgrounded process finishes.
	This doesn't happen if the function body is expanded inside the substitution.
>How-To-Repeat:
	The first echo will finish imediately, the second only after three seconds:
	#!/bin/sh

	f() {
		sleep 3
	}

	echo $(sleep 3 >&- & echo $!)
	echo $(f >&- & echo $!)
>Fix:


>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: bin-bug-people->krre
Responsible-Changed-By: kre@NetBSD.org
Responsible-Changed-When: Fri, 06 May 2016 19:52:54 +0000
Responsible-Changed-Why:
I am investigating this PR & the unnderlying issues.


State-Changed-From-To: open->analyzed
State-Changed-By: kre@NetBSD.org
State-Changed-When: Fri, 06 May 2016 19:52:54 +0000
State-Changed-Why:
This turns out not to be a backgrounding problem, but a redirection issue.
Since functions execute in the current shell, when a function is called
with fd redirections, the old values of the fds need to be saved (via dup)
so they can be restored after the function returns.  The careful closing of
stdout when running "f" in the supplied script, with the intent of this
not having it open while sleep is running (which works just fine when sleep
is called directly) is thus thwarted, the shell running the function has its
extra copy which remains open.

This, in the parent shell, reading the results of the command substitution
stalls until that fd is closed, which only happens after the function has
completed, after the sleep returns.

Most ways of trying to avoid this in the script do not work, however
it can be written as:
        echo $( { exec >&- ; f ;} & echo $! )
which closes the toublesome fd in the shell that will run the function,
before the function starts, leaving nothing to cause the stall.

I am still investigating just how the shell can figure this out on its
own, as one of the posix goals/requirements is that there is not supposed
to be any (trivial) way to determine if a utility is implemented as a command
or a function, and this would be a trivial way...


Responsible-Changed-From-To: krre->kre
Responsible-Changed-By: kre@NetBSD.org
Responsible-Changed-When: Fri, 06 May 2016 20:10:35 +0000
Responsible-Changed-Why:
Fix typo.   You'd think I'd know my own user name (it's only been 40 years)


From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48875 CVS commit: src/bin/sh
Date: Mon, 9 May 2016 21:03:10 +0000

 Module Name:	src
 Committed By:	kre
 Date:		Mon May  9 21:03:10 UTC 2016

 Modified Files:
 	src/bin/sh: eval.c eval.h main.c

 Log Message:
 PR bin/48875 - avoid holding (replaced) file descriptors open when running a
 command in the current shell (so they can be restored for the next command)
 in cases where it is obvious that there is not going to be a following
 command to use them.   This fixes the problem reported in the PR (though
 there are still plenty of situations where a FD could be closed but isn't,
 we do not do full fd flow eveluation to determine whether a fd will be
 used or not).

 This is the change that was just committed and then backed out again...

 OK christos@


 To generate a diff of this commit:
 cvs rdiff -u -r1.124 -r1.125 src/bin/sh/eval.c
 cvs rdiff -u -r1.18 -r1.19 src/bin/sh/eval.h
 cvs rdiff -u -r1.66 -r1.67 src/bin/sh/main.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->feedback
State-Changed-By: kre@NetBSD.org
State-Changed-When: Mon, 09 May 2016 22:08:16 +0000
State-Changed-Why:
I think this is fixed as best it reasonably can be,
Do you agree?


From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48875 CVS commit: src/tests/bin/sh
Date: Mon, 9 May 2016 22:34:37 +0000

 Module Name:	src
 Committed By:	kre
 Date:		Mon May  9 22:34:37 UTC 2016

 Modified Files:
 	src/tests/bin/sh: t_redir.sh

 Log Message:
 PR bin/48875   PR bin/51123    This adds tests more that verify fide descriptor
 redirection works correctly (including that the bugs reported in those PRs
 are fixed.)  Note that the tests for 48875 are slow, so one of the new
 test cases ends up running > 25 seconds (just doing sleeps) - each individual
 test is just a few seconds, but there are several of them.

 OK christos@


 To generate a diff of this commit:
 cvs rdiff -u -r1.6 -r1.7 src/tests/bin/sh/t_redir.sh

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48875 CVS commit: src/bin/sh
Date: Tue, 10 May 2016 15:14:30 +0000

 Module Name:	src
 Committed By:	kre
 Date:		Tue May 10 15:14:30 UTC 2016

 Modified Files:
 	src/bin/sh: eval.c

 Log Message:
 PR bin/48875 - minor correction (well, not so minor) - commands in loops
 must be assumed to have something following, even if the loop itself doesn't,
 so redirected fd's around func calls need to be saved.   Should fix etcupdate


 To generate a diff of this commit:
 cvs rdiff -u -r1.125 -r1.126 src/bin/sh/eval.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48875 CVS commit: src/bin/sh
Date: Fri, 13 May 2016 10:32:52 +0000

 Module Name:	src
 Committed By:	kre
 Date:		Fri May 13 10:32:52 UTC 2016

 Modified Files:
 	src/bin/sh: eval.c

 Log Message:
 More fallout from the fix for PR bin/48875 - this one found just by
 code reading, rather than any actual real use case failing.

 With this script
 	f()
 	{
 		echo hello $1
 	}

 	exec 3>&1
 	echo $(
 		for i in a b c
 		do
 			echo @$i
 			f >&3
 		done >/tmp/foo
 	)
 	echo foo= $(cat /tmp/foo)

 what should be output is

 	hello
 	hello
 	hello

 	foo= @a @b @c

 but since the (my) 48875 fix the other day, we've been getting

 	hello
 	@b
 	hello
 	@c
 	hello

 	foo= @a

 This fixes that.   I think (hope) this is the last of these fixes...


 To generate a diff of this commit:
 cvs rdiff -u -r1.126 -r1.127 src/bin/sh/eval.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/48875 (possible ash backgrounding bug)
Date: Wed, 25 May 2016 15:40:11 +0200

 > I think this is fixed as best it reasonably can be,
 > Do you agree?
 I guess yes, but I'm somewhat overwhelmed by how complicated the issue really is.
 With your fix, the shell will mostly do what I mean (which is good), but will it still do what some shell hackers who know about the circumstances will expect it to do?
 Does POSIX (either the standard or the experts behind the standard) say anything about the issue?

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/48875 (possible ash backgrounding bug)
Date: Wed, 25 May 2016 22:37:56 +0700

     Date:        Wed, 25 May 2016 13:45:01 +0000 (UTC)
     From:        Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>
     Message-ID:  <20160525134501.615E27AAB0@mollari.NetBSD.org>

   |  > I think this is fixed as best it reasonably can be,
   |  > Do you agree?
   | I guess yes, but I'm somewhat overwhelmed by how complicated the
   | issue really is.

 Turns out that it is even more complicated... I have not yet considered
 the issue or traps (which Jilles Tjoelker, principal maintainer of the
 FreeBSD shell, pointed out to me recently.)   I am still pondering how
 best to deal with this.   It might mean that the method used for the
 current fix cannot work at all, and the whole thing needs to be revisited.

   | With your fix, the shell will mostly do what I mean (which is good),
   | but will it still do what some shell hackers who know about the
   | circumstances will expect it to do?

 Mostly I think, as it is, if only we could ignore traps.   But we cannot.

 It is all a bit messy as functions have to operate in the current shell,
 which means anything that is done which is supposed to be local to the
 function (such as redirecting its input or output) have to be able to be
 undone again.   The only way to do that is to keep fd's open, so they can
 be returned when needed.   That then has the side effect you have seen
 when something else is waiting on the fd being closed.

 The current fix attempts to determine when the fd cannot be needed again,
 and so can simply be closed, but as with all attempts to predict the future,
 it is never as easy as it seems it should be.

   |  Does POSIX (either the standard or the experts behind the standard)
   | say anything about the issue?

 The standard no - it does not say about implementation, nor about sequencing
 or timing.  The correct eventual outcome was achieved in the case you
 highlighted, everything else beyond that is just a matter of hoq the
 implementation works.   The experts, I don't know, I am not sure it has
 ever really been considered.

 kre

From: Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>
To: gnats-bugs@gnats.netbsd.org
Cc: 
Subject: Re: bin/48875
Date: Mon, 6 Jun 2016 18:45:01 +0200

 The PR seems to still be in feedback state.
 However, I don't feel I'm in a position to give any educated feedback given the problem seems to be far more involved than I anticipated.
 I do a fair amount of shell programming in the sense of writing #!/bin/sh scripts, but not in the sense of digging around in the shell's C source.

State-Changed-From-To: feedback->analyzed
State-Changed-By: kre@NetBSD.org
State-Changed-When: Mon, 06 Jun 2016 18:50:15 +0000
State-Changed-Why:
Feedback (of a kind) received.   Issue is back under re-investigation.
Problem understood, whether it can be properly corrected is not yet
clear - the current implemented fix may need to be undone.


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/48875 CVS commit: src/tests/bin/sh
Date: Tue, 7 Jun 2016 16:30:38 +0000

 On Mon, May 09, 2016 at 10:35:01PM +0000, Robert Elz wrote:
  >  Modified Files:
  >  	src/tests/bin/sh: t_redir.sh
  >  
  >  Log Message:
  >  PR bin/48875   PR bin/51123    This adds tests more that verify fide descriptor
  >  redirection works correctly (including that the bugs reported in those PRs
  >  are fixed.)  Note that the tests for 48875 are slow, so one of the new
  >  test cases ends up running > 25 seconds (just doing sleeps) - each individual
  >  test is just a few seconds, but there are several of them.

 Minor note on this in passing: our sleep(1) honors "sleep 0.1" and the
 tests don't really have to be totally portable, so you might be able
 to improve the overall sleep time.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/48875 CVS commit: src/tests/bin/sh
Date: Wed, 08 Jun 2016 02:29:15 +0700

     Date:        Tue,  7 Jun 2016 16:35:00 +0000 (UTC)
     From:        David Holland <dholland-bugs@netbsd.org>
     Message-ID:  <20160607163500.C74B37AAAE@mollari.NetBSD.org>

   |  Minor note on this in passing: our sleep(1) honors "sleep 0.1" and the
   |  tests don't really have to be totally portable,

 Yes, I know, and while I'd prefer the tests to be as portable as possible,
 not all of them are, but ...

   | so you might be able to improve the overall sleep time.

 It might be possible to make some minor improvement, but the real issue
 is that the sequencing of the sleeps finishing matters to the tests
 working correctly, on an unloaded system we might expect that
 	sleep 0.2 & sleep 0.1 &
 would have the 2nd sleep finish first, but if the system is just a bit
 loaded and it takes 0.1 of a second for the second one to even start,
 we have a race, and the test will sporadically report failures.

 Rather than having that happen, I think it better to have longer sleeps,
 with bigger differences in the intervals, even if that does mean an additional
 20 seconds (of idleness) during the test run, so that even on truly busy
 systems it is very unlikely that there will end up being false failures
 (or real failures that aren't a result of a failure in what is really
 being tested, however you want to characterise it.)

 kre

From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48875 CVS commit: src/bin/sh
Date: Sun, 19 Aug 2018 11:16:13 +0000

 Module Name:	src
 Committed By:	kre
 Date:		Sun Aug 19 11:16:13 UTC 2018

 Modified Files:
 	src/bin/sh: eval.c eval.h main.c

 Log Message:
 PR bin/48875

 Revert the changes that were made 19 May 2016 (principally eval.c 1.125)
 and the bug fixes in subsequent days (eval.c 1.126 and 1.127) and also
 update some newer code that was added more recently which acted in
 accordance with those changes (make that code be as it would have been
 if the changes now being reverted had never been made).

 While the changes made did solve the problem, in a sense, they were
 never correct (see the PR for some discussion) and it had always been
 intended that they be reverted.   However, in practical sh code, no
 issues were reported - until just recently - so nothing was done,
 until now...

 After this commit, the validate_fn_redirects test case of the sh ATF
 test t_redir will fail.   In particular, the subtest of that test
 case which is described in the source (of the test) as:
 	This one is the real test for PR bin/48875
 will fail.

 Alternative changes, not to "fix" the problem in the PR, but to
 often avoid it will be coming very soon - after which that ATF
 test will succeed again.

 XXX pullup-8


 To generate a diff of this commit:
 cvs rdiff -u -r1.157 -r1.158 src/bin/sh/eval.c
 cvs rdiff -u -r1.20 -r1.21 src/bin/sh/eval.h
 cvs rdiff -u -r1.73 -r1.74 src/bin/sh/main.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48875 CVS commit: src/bin/sh
Date: Sun, 19 Aug 2018 23:50:27 +0000

 Module Name:	src
 Committed By:	kre
 Date:		Sun Aug 19 23:50:27 UTC 2018

 Modified Files:
 	src/bin/sh: error.h eval.c input.c input.h main.c parser.c trap.c
 	    trap.h

 Log Message:
 PR bin/48875 (is related, and ameliorated, but not exactly "fixed")

 Import a whole set of tree evaluation enhancements from FreeBSD.

 With these, before forking, the shell predicts (often) when all it will
 have to do after forking (in the parent) is wait for the child and then
 exit with the status from the child, and in such a case simply does not
 fork, but rather allows the child to take over the parent's role.

 This turns out to handle the particular test case from PR bin/48875 in
 such a way that it works as hoped, rather than as it did (the delay there
 was caused by an extra copy of the shell hanging around waiting for the
 background child to complete ... and keeping the command substitution
 stdout open, so the "real" parent had to wait in case more output appeared).

 As part of doing this, redirection processing for compound commands gets
 moved out of evalsubshell() and into a new evalredir(), which allows us
 to properly handle errors occurring while performing those redirects,
 and not mishandle (as in simply forget) fd's which had been moved out
 of the way temporarily.

 evaltree() has its degree of recursion reduced by making it loop to
 handle the subsequent operation: that is instead of (for any binop
 like ';' '&&' (etc)) where it used to
 	evaltree(node->left);
 	evaltree(node->right);
 	return;
 it now does (kind of)
 	next = node;
 	while ((node = next) != NULL) {
 		next = NULL;

 		if (node is a binary op) {
 			evaltree(node->left);
 			if appropriate /* if && test for success, etc */
 				next = node->right;
 			continue;
 		}
 		/* similar for loops, etc */
 	}
 which can be a good saving, as while the left side (now) tends to be
 (usually) a simple (or simpleish) command, the right side can be many
 commands (in a command sequence like a; b; c; d; ...  the node at the
 top of the tree will now have "a" as its left node, and the tree for
 b; c; d; ... as its right node - until now everything was evaluated
 recursively so it made no difference, and the tree was constructed
 the other way).

 if/while/... statements are done similarly, recurse to evaluate the
 condition, then if the (or one of the) body parts is to be evaluated,
 set next to that, and loop (previously it recursed).

 There is more to do in this area (particularly in the way that case
 statements are processed - we can avoid recursion there as well) but
 that can wait for another day.

 While doing all of this we keep much better track of when the shell is
 just going to exit once the current tree is evaluated (with a new
 predicate at_eof() to tell us that we have, for sure, reached the end
 of the input stream, that is, this shell will, for certain, not be reading
 more command input) and use that info to avoid unneeded forks.   For that
 we also need another new predicate (have_traps()) to determine of there
 are any caught traps which might occur - if there are, we need to remain
 to (potentially) handle them, so these optimisations will not occur (to
 make the issue in PR 48875 appear again, run the same code, but with a
 trap set to execute some code when a signal (or EXIT) occurs - note that
 the trap must be set in the appropriate level of sub-shell to have this
 effect, any caught traps are cleared in a subshell whenever one is created).

 There is still work to be done to handle traps properly, whatever
 weirdness they do (some of which is related to some of this.)

 These changes do not need man page updates, but 48875 does - an update
 to sh.1 will be forthcoming once it is decided what it should say...

 Once again, all the heavy lifting for this set of changes comes directly
 (with thanks) from the FreeBSD shell.

 XXX pullup-8 (but not very soon)


 To generate a diff of this commit:
 cvs rdiff -u -r1.20 -r1.21 src/bin/sh/error.h src/bin/sh/input.h
 cvs rdiff -u -r1.158 -r1.159 src/bin/sh/eval.c
 cvs rdiff -u -r1.62 -r1.63 src/bin/sh/input.c
 cvs rdiff -u -r1.74 -r1.75 src/bin/sh/main.c
 cvs rdiff -u -r1.149 -r1.150 src/bin/sh/parser.c
 cvs rdiff -u -r1.44 -r1.45 src/bin/sh/trap.c
 cvs rdiff -u -r1.22 -r1.23 src/bin/sh/trap.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48875 CVS commit: [netbsd-8] src/bin/sh
Date: Sat, 25 Aug 2018 14:22:50 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sat Aug 25 14:22:50 UTC 2018

 Modified Files:
 	src/bin/sh [netbsd-8]: eval.c eval.h main.c

 Log Message:
 Pull up following revision(s) (requested by kre in ticket #983):

 	bin/sh/eval.c: revision 1.158
 	bin/sh/eval.h: revision 1.21
 	bin/sh/main.c: revision 1.74

 PR bin/48875

 Revert the changes that were made 19 May 2016 (principally eval.c 1.125)
 and the bug fixes in subsequent days (eval.c 1.126 and 1.127) and also
 update some newer code that was added more recently which acted in
 accordance with those changes (make that code be as it would have been
 if the changes now being reverted had never been made).

 While the changes made did solve the problem, in a sense, they were
 never correct (see the PR for some discussion) and it had always been
 intended that they be reverted.   However, in practical sh code, no
 issues were reported - until just recently - so nothing was done,
 until now...

 After this commit, the validate_fn_redirects test case of the sh ATF
 test t_redir will fail.   In particular, the subtest of that test
 case which is described in the source (of the test) as:

 	This one is the real test for PR bin/48875

 will fail.

 Alternative changes, not to "fix" the problem in the PR, but to
 often avoid it will be coming very soon - after which that ATF
 test will succeed again.

 XXX pullup-8


 To generate a diff of this commit:
 cvs rdiff -u -r1.140.2.4 -r1.140.2.5 src/bin/sh/eval.c
 cvs rdiff -u -r1.19 -r1.19.8.1 src/bin/sh/eval.h
 cvs rdiff -u -r1.70.2.1 -r1.70.2.2 src/bin/sh/main.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48875 CVS commit: src/bin/sh
Date: Sat, 25 Aug 2018 17:35:31 +0000

 Module Name:	src
 Committed By:	kre
 Date:		Sat Aug 25 17:35:31 UTC 2018

 Modified Files:
 	src/bin/sh: sh.1

 Log Message:
 PR bin/48875

 Add a paragraph (briefer than previously posted to mailing lists)
 to explain that there is no guarantee that the results of a command
 substitution will be available before all commands started by the
 cmdsub have completed.

 Include the original proposed text (much longer) as *roff comments, so
 it will at least be available to those who browse the man page sources.

 While here, clean up the existing text about command substitutions to
 make it a little more accurate (and to advise against using the `` form).


 To generate a diff of this commit:
 cvs rdiff -u -r1.206 -r1.207 src/bin/sh/sh.1

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->feedback
State-Changed-By: kre@NetBSD.org
State-Changed-When: Sat, 25 Aug 2018 17:51:12 +0000
State-Changed-Why:
I believe that we have probably done as much as is possible now
for this PR.   The sh in current runs the test case from the PR
without delays, and the more complex similar one from the ATF tests.

There is also now text in sh.1 which explains that there is no
guarantee that the results of a command substitution will become
available to the parent shell before all commands started by the
commend substitution have finished.

Is this sufficient to close this PR (given that this time at least
the basics of the fixes committed are correct) ?


From: Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/48875 (possible ash backgrounding bug)
Date: Sun, 2 Sep 2018 11:00:05 +0200

 > Is this sufficient to close this PR (given that this time at least
 > the basics of the fixes committed are correct)?
 From my PoV, yes.

 Please note, however, that I cannot check whether the real original problem has been solved. The issue originally occured on Debian, thus dash. I boiled it down to what I wrote in my original mail to tech-userlevel@ (the thread which later led to this PR), and checked that boiled-down example against NetBSD's ash (/bin/sh).
 As kre obviously fixed NetBSD's /bin/sh and not Debian's, I can't check the original, real thing. It's also unfeasable for me to port the real thing (which is dotcache, which is for desktop systems, and I don't have NetBSD desktop systems) to NetBSD.

State-Changed-From-To: feedback->needs-pullups
State-Changed-By: kre@NetBSD.org
State-Changed-When: Sun, 02 Sep 2018 17:21:47 +0000
State-Changed-Why:
After some more settling in time, the chanegs (including the
fix that was applied) should be pulled up to -8

So far, as things are now, it all seems stable.


State-Changed-From-To: needs-pullups->closed
State-Changed-By: kre@NetBSD.org
State-Changed-When: Sat, 26 Oct 2019 21:10:14 +0000
State-Changed-Why:
While performance improvements are still to come for the shell,
which might, perhaps, improve the issues in this PR even more,
I believe there is no more directly to be done.

No pullup to -8 will be made, the fix (ameriolation) will be in -9
only, there are simply too many unrelated changes in -9 to sort
out all of the bits needed to get this into -8, and as it isn't
actually a bug fix, just easier to understand behaviour, risking
the stability of the -8 shell for this doesn't seem a good choice.


From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48875 CVS commit: src/bin/sh
Date: Sun, 4 Apr 2021 13:24:07 +0000

 Module Name:	src
 Committed By:	kre
 Date:		Sun Apr  4 13:24:07 UTC 2021

 Modified Files:
 	src/bin/sh: eval.c jobs.c jobs.h

 Log Message:
 Related to PR bin/48875

 Correct an issue found by Oguz <oguzismailuysal@gmail.com> and reported
 in e-mail (on the bug-bash list initially!) with the code changed to deal
 with PR bin/48875

 With:

 	 sh -c 'echo start at $SECONDS;
 			(sleep 3 & (sleep 1& wait) );
 		echo end at $SECONDS'

 The shell should say "start at 0\nend at 1\n", but instead (before
 this fix, in -9 and HEAD, but not -8) does "start at 0\nend at 3\n"
 (Not in -8 as the 48875 changes were never pulled up)>

 There was an old problem, fixed years ago, which cause the same symptom,
 related to the way the jobs table was cleared (or not) in subshells, and
 it seemed like that might have resurfaced.

 But not so, the issue here is the sub-shell elimination, which was part
 of the 48875 "fix" (not really, it wasn't really a bug, just sub-optimal
 and unexpected behaviour).

 What the shell actually has been running in this case is:

 	 sh -c 'echo start at $SECONDS;
 			(sleep 3 & sleep 1& wait );
 		echo end at $SECONDS'

 as the inner subshell was deemed unnecessary - all its parent would
 do is wait for its exit status, and then exit with that status - we
 may as well simply replace the current sub-shell with the new one,
 let it do its thing, and we're done...

 But not here, the running "sleep 3" will remain a child of that merged
 sub-shell, and the "wait" will thus wait for it, along with the sleep 1
 which is all it should be seeing.

 For now, fix this by not eliminating a sub-shell if there are existing
 unwaited upon children in the current one.  It might be possible to
 simply disregard the old child for the purposes of wait (and "jobs", etc,
 all cmds which look at the jobs table) but the bookkeeping required to
 make that work reliably is likely to take some time to get correct...

 Along with this fix comes a fix to DEBUG mode shells, which, in situations
 like this, could dump core in the debug code if the relevant tracing was
 enabled, and add a new trace for when the jobs table is cleared (which was
 added predating the discovery of the actual cause of this issue, but seems
 worth keeping.)   Neither of these changes have any effect on shells
 compiled normally.

 XXX pullup -9


 To generate a diff of this commit:
 cvs rdiff -u -r1.181 -r1.182 src/bin/sh/eval.c
 cvs rdiff -u -r1.109 -r1.110 src/bin/sh/jobs.c
 cvs rdiff -u -r1.23 -r1.24 src/bin/sh/jobs.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48875 CVS commit: [netbsd-9] src/bin/sh
Date: Wed, 28 Apr 2021 09:58:43 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Wed Apr 28 09:58:43 UTC 2021

 Modified Files:
 	src/bin/sh [netbsd-9]: eval.c jobs.c jobs.h

 Log Message:
 Pull up following revision(s) (requested by kre in ticket #1259):

 	bin/sh/jobs.h: revision 1.24
 	bin/sh/eval.c: revision 1.182
 	bin/sh/jobs.c: revision 1.110

 Related to PR bin/48875

 Correct an issue found by Oguz <oguzismailuysal@gmail.com> and reported
 in e-mail (on the bug-bash list initially!) with the code changed to deal
 with PR bin/48875

 With:
 	 sh -c 'echo start at $SECONDS;
 			(sleep 3 & (sleep 1& wait) );
 		echo end at $SECONDS'

 The shell should say "start at 0\nend at 1\n", but instead (before
 this fix, in -9 and HEAD, but not -8) does "start at 0\nend at 3\n"
 (Not in -8 as the 48875 changes were never pulled up)>

 There was an old problem, fixed years ago, which cause the same symptom,
 related to the way the jobs table was cleared (or not) in subshells, and
 it seemed like that might have resurfaced.

 But not so, the issue here is the sub-shell elimination, which was part
 of the 48875 "fix" (not really, it wasn't really a bug, just sub-optimal
 and unexpected behaviour).

 What the shell actually has been running in this case is:

 	 sh -c 'echo start at $SECONDS;
 			(sleep 3 & sleep 1& wait );
 		echo end at $SECONDS'

 as the inner subshell was deemed unnecessary - all its parent would
 do is wait for its exit status, and then exit with that status - we
 may as well simply replace the current sub-shell with the new one,
 let it do its thing, and we're done...

 But not here, the running "sleep 3" will remain a child of that merged
 sub-shell, and the "wait" will thus wait for it, along with the sleep 1
 which is all it should be seeing.

 For now, fix this by not eliminating a sub-shell if there are existing
 unwaited upon children in the current one.  It might be possible to
 simply disregard the old child for the purposes of wait (and "jobs", etc,
 all cmds which look at the jobs table) but the bookkeeping required to
 make that work reliably is likely to take some time to get correct...

 Along with this fix comes a fix to DEBUG mode shells, which, in situations
 like this, could dump core in the debug code if the relevant tracing was
 enabled, and add a new trace for when the jobs table is cleared (which was
 added predating the discovery of the actual cause of this issue, but seems
 worth keeping.)   Neither of these changes have any effect on shells
 compiled normally.

 XXX pullup -9


 To generate a diff of this commit:
 cvs rdiff -u -r1.175.2.2 -r1.175.2.3 src/bin/sh/eval.c
 cvs rdiff -u -r1.106.2.1 -r1.106.2.2 src/bin/sh/jobs.c
 cvs rdiff -u -r1.23 -r1.23.2.1 src/bin/sh/jobs.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.