NetBSD Problem Report #52640
From kre@munnari.OZ.AU Mon Oct 23 05:39:57 2017
Return-Path: <kre@munnari.OZ.AU>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id AED3F7A1AF
for <gnats-bugs@www.NetBSD.org>; Mon, 23 Oct 2017 05:39:57 +0000 (UTC)
Message-Id: <201710230539.v9N5dVCh029045@andromeda.noi.kre.to>
Date: Mon, 23 Oct 2017 12:39:31 +0700 (ICT)
From: kre@munnari.OZ.AU
To: gnats-bugs@www.NetBSD.org
Subject: /bin/sh can "lose" background children when waiting on foreground ones
X-Send-Pr-Version: 3.95
>Number: 52640
>Category: bin
>Synopsis: /bin/sh can "lose" background children when waiting on foreground ones
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kre
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Oct 23 05:40:00 +0000 2017
>Closed-Date: Fri Nov 17 18:33:00 +0000 2017
>Last-Modified: Fri Nov 17 18:33:00 +0000 2017
>Originator: Robert Elz
>Release: NetBSD 8.99.1 (lots and lots of releases...)
>Organization:
>Environment:
System: NetBSD andromeda.noi.kre.to 8.99.1 NetBSD 8.99.1 (VBOX64-1.3-20170812) #39: Sat Aug 12 15:25:04 ICT 2017 kre@magnolia.noi.kre.to:/usr/obj/current/kernels/amd64/VBOX64 amd64
Architecture: x86_64
Machine: amd64
>Description:
The following script
#! /bin/sh
(sleep 3; exit 3) & PID=$!
sleep 10
(wait $PID; echo "In child: status" $?)
wait $PID; echo "In parent: status" $?
should print:
In child: status 127
In parent: status 3
as all other shells I could find to test do (except bosh,
which is just broken, and appears to return status 0 from
the wait command in all cases, and zsh, which is just weird,
in this and so many other ways)
instead, on all currently available NetBSD sh's we see
In child: status 127
In parent: status 127
That's because the background job completes while sh is waiting
for the later foreground job, and when that happens (at least
in many cases) the background job is simply discarded (if it
exited with a signal, that will be immediately reported, but
it will still be discarded.)
Fix that problem and we instead get
In child: status 3
In parent: status 3
!!! The sub-shell has no children, it should not be
able to get status from one of its siblings.
This only happens when the child has already exited before
the sub-shell is forked, and only when the status of that
child has not already been discarded (including incorrectly
discarded as above.)
This is because when a sub-shell is forked, the job table
(which holds the results of completed tasks, and the status
of active ones) is just marked invalid, not actually cleared
(until a new job needs to be created), but the shell's "wait"
command only bothers to look at the "invalid" flag in the
case of a simple "wait" (ie: not "wait pid") which is actually
backwards - the "wait" case does not really need it, though it
avoids wasting (cpu) time, whereas the "wait pid" case does.
>How-To-Repeat:
Write any script that runs a short background job, then a
longer foreground one (which is probably why this hasn't
been noticed - most commonly the timings are inverted),
and observe what happens when the script eventually waits
for the (already completed) background job.
>Fix:
Coming soon.... Will request pullup to -8, the shells on
the older systems are so out of date that they can just
continue to suffer with this (and many other) problems that
are usually never noticed in the wild.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: bin-bug-people->kre
Responsible-Changed-By: kre@NetBSD.org
Responsible-Changed-When: Mon, 23 Oct 2017 06:03:33 +0000
Responsible-Changed-Why:
I am looking into this PR
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52640 CVS commit: src/bin/sh
Date: Mon, 23 Oct 2017 10:52:07 +0000
Module Name: src
Committed By: kre
Date: Mon Oct 23 10:52:07 UTC 2017
Modified Files:
src/bin/sh: jobs.c
Log Message:
PR bin/52640 PR bin/52641
Don't delete jobs from the jobs table merely because they finished,
if they are not the job we are waiting upon. (bin/52640 part 1)
In a sub-shell environment, don't allow wait to find jobs from the
parent shell that had already exited (before the sub-shell was
created) and return status for them as if they are our children.
(bin/52640 part 2)
Don't have the "jobs" command also be an implicit "wait" command
in non-interactive shells. (bin/52641)
Use WCONTINUED (when it exists) so we can report on stopped jobs that
"mysteriously" move back to running state without the user issuing
a "bg" command (eg: kill -CONT <pid>) Previously they would keep
being reported as stopped until they exited.
When a job is detected as having changed status just as we're
issuing a "jobs" command (i.e.: the change occurred between the last
prompt and the jobs command being entered) don't report it twice,
once from the status change, and then again in the jobs command
output. Once is enough (keep the jobs output, suppress the other).
Apply some sanity to the way jobs_invalid is processed - ignore it
in getjob() instead of just ignoring it most of the time there, and
instead always check it before calling getjob() in situations where
we can handle only children of the current shell. This allows the
(totally broken) save/clear/restore of jobs_invalid in jobscmd() to
be done away with (previously an error while in the clear state would
have left jobs_invalid incorrectly cleared - shouldn't have mattered
since jobs_invalid => subshell => error causes exit, but better to be safe).
Add/improve the DEBUG more tracing.
XXX pullup -8
To generate a diff of this commit:
cvs rdiff -u -r1.90 -r1.91 src/bin/sh/jobs.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->needs-pullups
State-Changed-By: kre@NetBSD.org
State-Changed-When: Mon, 23 Oct 2017 10:57:00 +0000
State-Changed-Why:
waiting on some testing before requesting puttup to -8
State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: kre@NetBSD.org
State-Changed-When: Sun, 29 Oct 2017 16:02:35 +0000
State-Changed-Why:
Ticket 337 for netbsd-8
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52640 CVS commit: [netbsd-8] src/bin/sh
Date: Fri, 17 Nov 2017 14:56:52 +0000
Module Name: src
Committed By: martin
Date: Fri Nov 17 14:56:52 UTC 2017
Modified Files:
src/bin/sh [netbsd-8]: jobs.c
Log Message:
Pull up following revision(s) (requested by kre in ticket #337):
bin/sh/jobs.c: revision 1.91 (patch)
PR bin/52640 PR bin/52641
Don't delete jobs from the jobs table merely because they finished,
if they are not the job we are waiting upon. (bin/52640 part 1)
In a sub-shell environment, don't allow wait to find jobs from the
parent shell that had already exited (before the sub-shell was
created) and return status for them as if they are our children.
(bin/52640 part 2)
Don't have the "jobs" command also be an implicit "wait" command
in non-interactive shells. (bin/52641)
Use WCONTINUED (when it exists) so we can report on stopped jobs that
"mysteriously" move back to running state without the user issuing
a "bg" command (eg: kill -CONT <pid>) Previously they would keep
being reported as stopped until they exited.
When a job is detected as having changed status just as we're
issuing a "jobs" command (i.e.: the change occurred between the last
prompt and the jobs command being entered) don't report it twice,
once from the status change, and then again in the jobs command
output. Once is enough (keep the jobs output, suppress the other).
Apply some sanity to the way jobs_invalid is processed - ignore it
in getjob() instead of just ignoring it most of the time there, and
instead always check it before calling getjob() in situations where
we can handle only children of the current shell. This allows the
(totally broken) save/clear/restore of jobs_invalid in jobscmd() to
be done away with (previously an error while in the clear state would
have left jobs_invalid incorrectly cleared - shouldn't have mattered
since jobs_invalid => subshell => error causes exit, but better to be safe).
Add/improve the DEBUG more tracing.
To generate a diff of this commit:
cvs rdiff -u -r1.85.2.1 -r1.85.2.2 src/bin/sh/jobs.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: kre@NetBSD.org
State-Changed-When: Fri, 17 Nov 2017 18:33:00 +0000
State-Changed-Why:
Pullups completed
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.