NetBSD Problem Report #48138

From www@NetBSD.org  Tue Aug 20 17:31:10 2013
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 1D9D07104F
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 20 Aug 2013 17:31:10 +0000 (UTC)
Message-Id: <20130820173108.B6EC17180D@mollari.NetBSD.org>
Date: Tue, 20 Aug 2013 17:31:08 +0000 (UTC)
From: sdaoden@gmail.com
Reply-To: sdaoden@gmail.com
To: gnats-bugs@NetBSD.org
Subject: sh(1) wait(1) builtin fails after bg job was SIG(STOP|TSTP|CONT) controlled
X-Send-Pr-Version: www-1.0

>Number:         48138
>Category:       bin
>Synopsis:       sh(1) wait(1) builtin fails after bg job was SIG(STOP|TSTP|CONT) controlled
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    bin-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Aug 20 17:35:00 +0000 2013
>Closed-Date:    Sat May 05 17:43:19 +0000 2018
>Last-Modified:  Sun May 06 00:55:01 +0000 2018
>Originator:     Steffen
>Release:        6.99.23
>Organization:
>Environment:
NetBSD nhead 6.99.23 NetBSD 6.99.23 (GENERIC) #0: Thu Aug  8 19:07:01 UTC 2013  builds@b6.netbsd.org:/home/builds/ab/HEAD/amd64/201308081710Z-obj/home/builds/ab/HEAD/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
* [steffen@nhead]$ /bin/sleep 30 &
* [steffen@nhead]$ jobs
[1] + Running                 /bin/sleep 30
* [steffen@nhead]$ kill -STOP %1
* [steffen@nhead]$ jobs
[1] + Running                 /bin/sleep 30
* [steffen@nhead]$ wait %1
[1] + Suspended (signal)      /bin/sleep 30
* [steffen@nhead]$ kill -CONT %1
* [steffen@nhead]$ jobs
[1] + Suspended (signal)      /bin/sleep 30
* [steffen@nhead]$ wait %1
* [steffen@nhead]$ wait
* [steffen@nhead]$ jobs
[1] + Suspended (signal)      /bin/sleep 30
* [steffen@nhead]$ 
[1]   Done                    /bin/sleep 30
* [steffen@nhead]$ 


Of the tested mksh(1), NetBSD sh(1), dash(1), FreeBSD sh(1), Mac OS X /bin/ksh and bash(1), only the latter two "do the right thing" (satisfy me, that is):


Mac OS X /bin/ksh(1):

?0[steffen@sherwood]$ /bin/sleep 30 &
[1]	43762
?0[steffen@sherwood]$ jobs
[1] +  Running                 /bin/sleep 30 &
?0[steffen@sherwood]$ wait  
^C?258[steffen@sherwood]$ jobs
[1] +  Running                 /bin/sleep 30 &
?0[steffen@sherwood]$ kill -STOP %1
?0[steffen@sherwood]$ jobs         
[1] +  Running                 /bin/sleep 30 &
?0[steffen@sherwood]$ wait %1
[1] + Stopped (SIGSTOP)        /bin/sleep 30 &
?0[steffen@sherwood]$ kill -CONT %1
?0[steffen@sherwood]$ jobs
[1] +  Running                 /bin/sleep 30 &
?0[steffen@sherwood]$ wait %1

^C?258[steffen@sherwood]$ jobs
?0[steffen@sherwood]$ 


bash(1):


?0[steffen@sherwood]$ /bin/sleep 30 &
[1] 43752
?0[steffen@sherwood]$ wait %1
^C
?1[steffen@sherwood]$ kill -STOP %1
?0[steffen@sherwood]$ jobs
[1]+  Running                 /bin/sleep 30 &
?0[steffen@sherwood]$ wait %1

[1]+  Stopped                 /bin/sleep 30
?145[steffen@sherwood]$ wait %1
bash: warning: wait_for_job: job 1 is stopped
?145[steffen@sherwood]$ kill -CONT %1
?0[steffen@sherwood]$ jobs
[1]+  Running                 /bin/sleep 30 &
?0[steffen@sherwood]$ wait %1
^C
?1[steffen@sherwood]$ jobs
[1]+  Running                 /bin/sleep 30 &
?0[steffen@sherwood]$ 
[1]+  Done                    /bin/sleep 30


>How-To-Repeat:

>Fix:
no.

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->feedback
State-Changed-By: kre@NetBSD.org
State-Changed-When: Fri, 13 Apr 2018 07:18:23 +0000
State-Changed-Why:
I believe this is fixed in NetBSD-current (any version after
late last October):

[jinx]$ /bin/sleep 30 &
[jinx]$ jobs
[1] + Running                 /bin/sleep 30 &
[jinx]$ kill -STOP %1
[1] + Suspended (signal)      /bin/sleep 30 &
[jinx]$ wait %1
[jinx]$ echo $?
127
[jinx]$ kill -CONT %1
[1] + Running                 /bin/sleep 30 &
[jinx]$ jobs
[1] + Running                 /bin/sleep 30 &
[jinx]$ wait %1
^C
[jinx]$ jobs
[1] + Running                 /bin/sleep 30 &
[jinx]$ 
[jinx]$ 
[1]   Done                    /bin/sleep 30 &
[jinx]$ 
[jinx]$ echo $NETBSD_SHELL
20160401

That last echo is really just to show that it is the
NetBSD /bin/sh that is being used there, the version
number hasn't changed since it was added, not all
versions that show that have the fix (but any that
do not have NETBSD_SHELL at all certainly do not).

The fix for this has also been pulled up to NetBSD-8
(in general none of the recent sh changes are being
pulled up to -7 or -6 so this is not there.)

Sorry that I did not have this PR in mind when this was
being worked on, or it would have been mentioned in the
commit messages.   The fix was a "side effect" (though
deliberate) of the fixes for bin/52640 and bin/52641.

If you are able to test, please advise if it seems correct now.


From: Steffen Nurpmeso <steffen@sdaoden.eu>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/48138 (sh(1) wait(1) builtin fails after bg job was
 SIG(STOP|TSTP|CONT) controlled)
Date: Sun, 15 Apr 2018 00:01:36 +0200

 Hello Robert Elz!

 kre@NetBSD.org wrote:
  |Synopsis: sh(1) wait(1) builtin fails after bg job was SIG(STOP|TSTP|CONT) \
  |controlled

 Oh-ha, yes, i recall something.  (I had to look -- from 2013!)

  |State-Changed-From-To: open->feedback
  |State-Changed-By: kre@NetBSD.org
  |State-Changed-When: Fri, 13 Apr 2018 07:18:23 +0000
  |State-Changed-Why:
  |I believe this is fixed in NetBSD-current (any version after
  |late last October):

 This is good to know.

  |[jinx]$ /bin/sleep 30 &
  |[jinx]$ jobs
  |[1] + Running                 /bin/sleep 30 &
  |[jinx]$ kill -STOP %1
  |[1] + Suspended (signal)      /bin/sleep 30 &
  |[jinx]$ wait %1
  |[jinx]$ echo $?
  |127
  |[jinx]$ kill -CONT %1
  |[1] + Running                 /bin/sleep 30 &
  |[jinx]$ jobs
  |[1] + Running                 /bin/sleep 30 &
  |[jinx]$ wait %1
  |^C
  |[jinx]$ jobs
  |[1] + Running                 /bin/sleep 30 &
  |[jinx]$ 
  |[jinx]$ 
  |[1]   Done                    /bin/sleep 30 &
  |[jinx]$ 
  |[jinx]$ echo $NETBSD_SHELL
  |20160401
  |
  |That last echo is really just to show that it is the
  |NetBSD /bin/sh that is being used there, the version
  |number hasn't changed since it was added, not all
  |versions that show that have the fix (but any that
  |do not have NETBSD_SHELL at all certainly do not).
  |
  |The fix for this has also been pulled up to NetBSD-8
  |(in general none of the recent sh changes are being
  |pulled up to -7 or -6 so this is not there.)
  |
  |Sorry that I did not have this PR in mind when this was
  |being worked on, or it would have been mentioned in the
  |commit messages.   The fix was a "side effect" (though
  |deliberate) of the fixes for bin/52640 and bin/52641.

 I beg you, please!  I was following all that, but a bit
 unreflected it seems.

  |If you are able to test, please advise if it seems correct now.

 Well.. unfortunately not; not before the 22nd (at 64kBit/s until
 then).  I do not have a usable NetBSD around at the moment, in
 fact, only NetBSD Mail and NetPGP in CVS repositories.  Yes, the
 server and the client both run the very same configuration of
 a very small Linux (with one file, /etc/apk/world, showing
 differences, and the server having an additional file
 /etc/.server).  Ok, there are VMs, but they are very slow and
 occupy the machine, i use them only when i really have to; e.g.,
 compiling libidn2 took two hours in the OpenBSD VM ...

 Other than that, it looks perfect :-)

 And thanks for the shell glob thread, i knew i have to rewrite the
 shell expression parser so that it creates a tree of inspectable
 objects in order to support ``, $(), ${aXb}, etc., but that i will
 need it to be able to re-apply quoting to strings that will be
 passed to fnmatch(3) i did not realize until yesterday.  Be warned
 you will be credited for that.  (The actual implementation will
 take some time, however, i am with my future g/s-roff fork for the
 forseeable future.)

 Greetings from Germany, which enters the five months of autumn
 that disturb the seven months of winter -- next week!  So i wish
 the weekend of anticipation that we have.

 --steffen
 |
 |Der Kragenbaer,                The moon bear,
 |der holt sich munter           he cheerfully and one by one
 |einen nach dem anderen runter  wa.ks himself off
 |(By Robert Gernhardt)

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/48138 (sh(1) wait(1) builtin fails after bg job was SIG(STOP|TSTP|CONT) controlled)
Date: Mon, 16 Apr 2018 09:56:52 +0700

     Date:        Sat, 14 Apr 2018 22:05:01 +0000 (UTC)
     From:        Steffen Nurpmeso <steffen@sdaoden.eu>
     Message-ID:  <20180414220501.B27C17A221@mollari.NetBSD.org>

   |  Oh-ha, yes, i recall something.  (I had to look -- from 2013!)

 Yes, apologies for the delay, it was quite an old PR (but not nearly
 as old as some others.)

   |  Well.. unfortunately not; not before the 22nd

 There is no hurry.   At this point, unless there is still a problem,
 all that needs to happen is for this PR to be closed.   It can easily
 wait a few more months if needed until you get a chance to verify
 that all is now OK.

   |  And thanks for the shell glob thread,

 For the benefit of gnats, and other readers, I believe this refers
 to a thread on a non-NetBSD list.

   |  Greetings from Germany, which enters the five months of autumn
   |  that disturb the seven months of winter -- next week! 

 It has been several years since I last visited Deutschland, but my
 memory is that the weather was mostly entirely pleasant (most times
 I would have been there would have been in the May->August
 period though.)

 kre

From: Steffen Nurpmeso <steffen@sdaoden.eu>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/48138
Date: Sat, 05 May 2018 15:09:33 +0200

 --- >8 ---  Forwarded message  --- 8< ---
 Date: Wed, 25 Apr 2018 19:58:50 +0200
 From: Steffen Nurpmeso <sdaoden@gmail.com>
 To: (NetBSD Problem Report DB Administrator) <gnats@NetBSD.org>
 Subject: Re: Reminder of 1 NetBSD Problem Report awaiting feedback
 Message-ID: <20180425175850.DInVR%sdaoden@gmail.com>
 OpenPGP: id=EE19E1C1F2F7054F8D3954D8308964B51883A0DD; url=https://ftp.sdaoden.eu/steffen.asc

 gnats@NetBSD.org (NetBSD Problem Report DB Administrator) wrote:
   ...
  |bin/48138 - non-critical low priority sw-bug
  | sh(1) wait(1) builtin fails after bg job was SIG(STOP
  | http://gnats.NetBSD.org/cgi-bin/query-pr-single.pl?number=48138

 I can confirm this works on 8.0 RC1, though your

   [jinx]$ kill -STOP %1
   [1] + Suspended (signal)      /bin/sleep 30 &
   [jinx]$ wait %1
   [jinx]$ echo $?
   127

 is

   #[steffen@nbsd]$ kill -STOP %1
   [1] + Suspended (signal)      /bin/sleep 60 &
   #[steffen@nbsd]$ jobs
   [1] + Suspended (signal)      /bin/sleep 60 &
   #[steffen@nbsd]$ wait;echo $?
   0
   #[steffen@nbsd]$ wait %1;echo $?
   145

 for me.  By the why it is nice that WEXITSTATUS() ensures that the
 given argument is something that the address can be taken off, as
 required by POSIX.  musl does not, for example.

 --steffen
 |
 |Der Kragenbaer,                The moon bear,
 |der holt sich munter           he cheerfully and one by one
 |einen nach dem anderen runter  wa.ks himself off
 |(By Robert Gernhardt)

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: Steffen Nurpmeso <steffen@sdaoden.eu>
Subject: Re: bin/48138
Date: Sun, 06 May 2018 00:03:16 +0700

 Thanks for the confirmation - I will close this PR, as I think the
 actual bug in it is fixed.

 I suspect (though haven't checked - yet) that the difference
 between the 145 status you saw, and the 127 I see is because
 the sh "wait" command has been rewritten (again) in -current
 (when the -n and -p options were added).

 I need to think on what should happen with wait and stopped
 processes -- posix simply says it should wait for all known
 jobs to complete (or all lgiven as args to wait, if any are)
 Nothing about finding stopped processes, or not waiting for
 them to finish.

 Assuming wait des not simply wait for the stopped process to
 finish, 127 would be the correct status (I think) when a job/procid
 is given on the command line, as that indicates "not found" and
 here the process was not found exited - the 145 would indicate
 to a script that the process exited with a STGSTOP signal (which
 would be a remarkable feat for it to achieve).

 Certainly I don't believe that the wait command should (ever)
 produce jobs style output, and I am not sure bash's warning is
 a good idea either.

 But perhaps some other status would be better here, to indicate
 that the process exists, but isn't running, and hasn't exited.
 Or perhaps wait should just ... wait (but as you showed in the
 PR, other shells do not).

 I think handling this can be deferred for a while though.

 Incidentally, can you tell me which ksh is /bin/ksh on MacOS

 	/bin/ksh -c 'echo ${KSH_VERSION}'

 kre

State-Changed-From-To: feedback->closed
State-Changed-By: kre@NetBSD.org
State-Changed-When: Sat, 05 May 2018 17:43:19 +0000
State-Changed-Why:
Problem fixed


From: Steffen Nurpmeso <steffen@sdaoden.eu>
To: Robert Elz <kre@munnari.OZ.AU>
Cc: gnats-bugs@NetBSD.org
Subject: Re: bin/48138
Date: Sat, 05 May 2018 23:53:38 +0200

 Robert Elz <kre@munnari.OZ.AU> wrote:
  |Thanks for the confirmation - I will close this PR, as I think the
  |actual bug in it is fixed.

 I hope this reply will not reopen it.
 I am sorry for the de-facto late reply, it seems i came from
 a somewhat frustrating i386 NetBSD installation session and
 blindly hit reply to the reminder.. (and got surprised to see that
 coming in today anew).

  |I suspect (though haven't checked - yet) that the difference
  |between the 145 status you saw, and the 127 I see is because
  |the sh "wait" command has been rewritten (again) in -current
  |(when the -n and -p options were added).

 I wondered because i did not really understand the signal number
 i think -- SIGCHLD?

  |I need to think on what should happen with wait and stopped
  |processes -- posix simply says it should wait for all known
  |jobs to complete (or all lgiven as args to wait, if any are)
  |Nothing about finding stopped processes, or not waiting for
  |them to finish.
  |
  |Assuming wait des not simply wait for the stopped process to
  |finish, 127 would be the correct status (I think) when a job/procid
  |is given on the command line, as that indicates "not found" and
  |here the process was not found exited - the 145 would indicate
  |to a script that the process exited with a STGSTOP signal (which
  |would be a remarkable feat for it to achieve).

 SIGSTOP, are you sure, wasn't that SIGCHLD? ... But with some
 trying i see that No. 1 (bash) acts the very same:

   [1]+  Stopped                 sleep 45
   #?0[steffen@essex tmp]$ wait
   #?0[steffen@essex tmp]$ wait %1
   bash: warning: wait_for_job: job 1 is stopped

 So desirable to follow, given that nothing better exists.

  |Certainly I don't believe that the wait command should (ever)
  |produce jobs style output, and I am not sure bash's warning is
  |a good idea either.

 You have been there before.  I do not know, i would say "in
 interactive context, yes".

  |But perhaps some other status would be better here, to indicate
  |that the process exists, but isn't running, and hasn't exited.
  |Or perhaps wait should just ... wait (but as you showed in the
  |PR, other shells do not).
  |
  |I think handling this can be deferred for a while though.
  |
  |Incidentally, can you tell me which ksh is /bin/ksh on MacOS
  |
  | /bin/ksh -c 'echo ${KSH_VERSION}'

 Oh, what do you think!  Well indeed the old Lion (10.7.5) says:

   ?0[sdaoden@devon shared]$ /bin/ksh 'echo ${.sh.version}'
   Version M 1993-12-28 s+

 whereas here i have

   #?0[steffen@essex tmp]$ echo $KSH_VERSION
   @(#)MIRBSD KSH R56 2018/01/14

 A nice weekend i wish.

 --steffen
 |
 |Der Kragenbaer,                The moon bear,
 |der holt sich munter           he cheerfully and one by one
 |einen nach dem anderen runter  wa.ks himself off
 |(By Robert Gernhardt)

From: Robert Elz <kre@munnari.OZ.AU>
To: Steffen Nurpmeso <steffen@sdaoden.eu>
Cc: gnats-bugs@NetBSD.org
Subject: Re: bin/48138
Date: Sun, 06 May 2018 07:50:11 +0700

     Date:        Sat, 05 May 2018 23:53:38 +0200
     From:        Steffen Nurpmeso <steffen@sdaoden.eu>
     Message-ID:  <20180505215338.NXsqn%steffen@sdaoden.eu>

   | I wondered because i did not really understand the signal number
   | i think -- SIGCHLD?

 SIGSTOP (145-128 == 17 == SIGSTOP)

   | SIGSTOP, are you sure, wasn't that SIGCHLD?

 No, when a process stops, the parent is told in the
 wait which signal caused it (the child) to stop - in this
 case the kill -STOP sends SIGSTOP (a ^Z would send
 SIGTSTP if the process had been in foreground,
 and status would be 146).

 The parent (the shell here) might be delivered a
 SIGCHLD (if it requested one) to inform it of the
 child's status change (so it knows to wait) but
 sh doesn't use that (and even if it did, the SIGCHLD
 comes to the shell, it would, or should, never be
 an exit or stop status from a child - it is not a signal
 that causes process termination, ever.)

   | ... But with some
   | trying i see that No. 1 (bash) acts the very same:
   |
   |   [1]+  Stopped                 sleep 45
   |   #?0[steffen@essex tmp]$ wait
   |   #?0[steffen@essex tmp]$ wait %1
   |   bash: warning: wait_for_job: job 1 is stopped

 Yes, I saw that, but

   | So desirable to follow, given that nothing better exists.

 ksh93 does not, it does wait more like POSIX suggests
 it should (but looks to e full of bugs).   zsh doesn't either,
 but appears to have a different set of bugs.

 I am really not sure what is best here, but I really think that
 a simple wait should wait for the process to finish.   zsh appears
 to restart the process if it is waited on and was stopped
 (but then waits uninterruptedly, which is not good.)

   | You have been there before.  I do not know, i would say "in
   | interactive context, yes".

 wait is not all that common interactively, it isn't needed (unlike
 in scripts) as the shell waits automatically when needed,  the
 jobs command (and just getting the next prompt) are more
 common ways to determine when a process has finished
 (or changed state.)

   | Oh, what do you think!  Well indeed the old Lion (10.7.5) says:
   |
   |   ?0[sdaoden@devon shared]$ /bin/ksh 'echo ${.sh.version}'
   |   Version M 1993-12-28 s+

 Looks to be a slightly(?) old ksh93 - did that not have KSH_VERSION?
 I was going to suggest ${.sh.version} but a lot of the derived from
 pdksh versions of ksh do not implement that.

   | whereas here i have
   |
   |   #?0[steffen@essex tmp]$ echo $KSH_VERSION
   |   @(#)MIRBSD KSH R56 2018/01/14

 mksh - an entirely different beast indeed.

 Is there anyone else bothering to read this PR, who has an
 opinion on what wait (the sh builtin command) should do
 when there are stopped children - both in the case where
 the command is a bare "wait" and in the case where the
 wait is given some kind of pid/job parameter, and that one
 selects a child that is stopped?

 kre

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.