NetBSD Problem Report #17171
Received: (qmail 23826 invoked by uid 605); 5 Jun 2002 00:14:02 -0000
Message-Id: <20020605001359.F2A8711137@www.netbsd.org>
Date: Tue, 4 Jun 2002 17:13:59 -0700 (PDT)
From: noah@noah.org
Sender: gnats-bugs-owner@netbsd.org
Reply-To: noah@noah.org
To: gnats-bugs@gnats.netbsd.org
Subject: Dead Child does not raise SIGCHLD until after parent reads all output on a pty.
X-Send-Pr-Version: www-1.0
>Number: 17171
>Notify-List: gson@gson.org
>Category: kern
>Synopsis: Dead Child does not raise SIGCHLD until after parent reads all output on a pty.
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jun 05 00:15:00 +0000 2002
>Closed-Date: Mon Apr 09 18:34:09 +0000 2018
>Last-Modified: Mon Apr 09 18:34:09 +0000 2018
>Originator: Noah Spurrier
>Release: 1.5.3_ALPHA
>Organization:
None
>Environment:
NetBSD thor 1.5.3_ALPHA NetBSD 1.5.3_ALPHA (ART)
>Description:
When a child process dies it normally sends a SIGCHLD to the parent.
The child remains in the process list as a zombie until the
parent calls waitpid() or wait(). However this does NOT seem to be the
case for child created with forkpty(). A child may be dead (zombie) and
the parent will never receive a SIGCHLD and will block on wait
unless the file descriptor of the pty is emptied first. I know that
the child is in the "Waiting to Exit" state (zombie) because I can
see this in the ps listing and because it has been sent a SIGKILL
from which is cannot, presumably, ignore. The critical things seems
to be for the pty to have unread data. If the child never print output
to the pty or if the parent consumes it all then the SIGCHLD will be raised.
As far as I can tell, the pty device and fork should have nothing to
do with each other. I realize that pty devices and especially
forkpty are non-standard (at least not POSIX), but forkpty is built
on top of fork. This synchronous behavior strikes me as a surprising
side effect. Shouldn't the SIGCHLD signal should be asynchronous and
be unrelated to the state of the pty device? If I had created my own
forkpty using openpty and fork, then I would cannot imagine why the
pty would prevent my child's SIGCHLD from being sent.
I may be wrong in my assumption of expected behavior. Maybe there is
a layer in between that is proxying signals for some reason.
I'm sorry that I'm not a real hacker to go track this down for you.
I confirmed this behavior on NetBSD 1.5.3
>How-To-Repeat:
I have attached a test program, test.c, that should demonstrate
the problem. This program will also compile on Linux and OS X, so you
have other platforms to compare it against. Email me at noah@noah.org
if you would prefer I send it in a separate email. I can also send sample output from a NetBSD 1.5.3; a Linux machine; and an OSX machine.
I have tested this program on OS X and Linux. Both of those systems
do not show this problem. The SIGCHLD signal always arrives not long
after the child gets a SIGKILL, nor is it ever synchronous with some
state in the pty.
This test.c will allow you to test three different scenarios.
If you run it with 'test 0' then the Child will print some output
before it is killed. The Parent will NOT read output after child is
killed. You will see that the parent never receives a SIGCHLD even
though the child is clearly good and dead.
If you run it with 'test 1' then the Child will NOT print any output
nor will the parent attempt to read any. In this case the Parent
will receive the SIGCHLD signal and you can see that it occurs at
the time the signal is sent. In other words, the signal does not
appear to be delayed and appears asynchronously as expected.
If you run it with 'test 2' then the Child will print some output
before it is killed. The Parent will read output AFTER child is
killed. In this case the signal does not arrive until AFTER the
parent reads the output. The parent is reading data from a dead
child (which is not necessarily bad), but it never gets the SIGCHLD
signal until after the data from the dead child is consumed.
This shows surprising synchronous behavior.
I hope that this is clear enough. I tried to be thorough and avoid
any obvious newbie mistakes before I submitted this as a bug. I also
took some small effort to compare the NetBSD behavior with other
UNIX platforms.
/*
I built this with "gcc -lutil test.c -otest"
So far I have tested this on OpenBSD 3.0 and OpenBSD 2.9
Linux 2.4.9 and OS X (close to NetBSD I believe).
As a test, I ignore most exceptional errors such as failed fork or waitpid.
*/
#include <sys/types.h> /* include this before any other sys headers */
#include <sys/wait.h> /* header for waitpid() and various macros */
#include <signal.h> /* header for signal functions */
#include <stdio.h> /* header for fprintf() */
#include <unistd.h> /* header for fork() */
#ifdef LINUX
#include <pty.h>
#else
#include <util.h> /* header for forkpty, compile with -lutil */
#endif
void sig_chld(int); /* prototype for our SIGCHLD handler */
int main(int argc, char * argv[])
{
struct sigaction act;
int pid;
int fd;
char slave_name [20];
int CHILD_OUTPUT_FLAG;
int PARENT_READ_FLAG;
char buffer [1000];
int count;
/*
Command line arguments:
0 - or nothing for default. Child will print some output before it is killed.
Parent will end without ever trying to read this output.
1 - To run test where child will not print any output.
2 - To run test where child will print output and
parent will try to read output after child is killed.
*/
if (argc > 1 && *(argv[1]) == '1')
{
printf ("PARENT: Child will not print any output.
");
printf ("PARENT: Parent will NOT read output after child is killed.
");
CHILD_OUTPUT_FLAG = 0;
PARENT_READ_FLAG = 0;
}
else if (argc > 1 && *(argv[1]) == '2')
{
printf ("PARENT: Child will print some output before it is killed.
");
printf ("PARENT: Parent will read output after child is killed.
");
CHILD_OUTPUT_FLAG = 1;
PARENT_READ_FLAG = 1;
}
else
{
printf ("PARENT: Child will print some output before it is killed.
");
printf ("PARENT: Parent will NOT read output after child is killed.
");
CHILD_OUTPUT_FLAG = 1;
PARENT_READ_FLAG = 0;
}
/* Assign sig_chld as our SIGCHLD handler.
We don't want to block any other signals in this example
We're only interested in children that have terminated, not ones
which have been stopped (eg user pressing control-Z at terminal).
Finally, make these values effective. If we were writing a real
application, we would save the old value instead of passing NULL.
*/
act.sa_handler = sig_chld;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_NOCLDSTOP;
sigaction(SIGCHLD, &act, NULL);
/* Do the Fork thing.
*/
pid = forkpty (&fd, slave_name, NULL, NULL);
/* pid = fork(); */
switch (pid)
{
case 0: /* Child process. */
if (CHILD_OUTPUT_FLAG)
printf ("CHILD: This output may cause trouble.
");
sleep(1000);
break;
default: /* Parent process. */
printf ("PARENT: After fork, sleeping...
");
sleep(5); /* Crappy way to avoid a race with child. */
printf ("PARENT: Child pid: %d
", pid);
printf ("PARENT: sending SIGKILL to child...
");
kill (pid, SIGKILL);
printf ("PARENT: After kill, sleeping...
");
sleep(5);
break;
}
if (PARENT_READ_FLAG)
{
printf ("PARENT: Consuming any output from child pty fd.
");
count = read (fd, buffer, 999);
printf ("PARENT: Read %d characters.
", count);
}
else
{
printf ("PARENT: Not attempting to read from child.
");
}
printf ("PARENT: leaving.
");
return 0;
}
void sig_chld(int signo)
{
int status, wpid, child_val;
printf ("SIGCHLD: In sig_chld signal handler.
");
/* Wait for any child without blocking */
wpid = waitpid (-1, & status, WNOHANG);
printf ("SIGCHLD: Waitpid found status for pid: %d
", wpid);
printf("SIGCHLD: Waitpid status: %d
", status);
if (WIFEXITED(status)) /* did child exit normally? */
{
child_val = WEXITSTATUS(status);
printf("SIGCHLD: child exited normally with status %d
", child_val);
}
printf ("SIGCHLD: End of sig_chld.
");
}
>Fix:
Unknown
>Release-Note:
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/17171: Dead Child does not raise SIGCHLD until after
parent reads all output on a pty.
Date: Sat, 29 Nov 2008 21:26:45 +0000
This behavior still exists in -current. Enclosed is an additional
similar test program that clarifies what exactly is going on. It opens
a pty with forkpty and optionally writes to the slave end [-w] and/or
reads from the master end [-r].
If neither option is given, SIGCHLD is received right away when the
child is killed.
With -w so the child writes to the tty, the kill kicks the child out
of nanosleep and into some D state with no wchan listed in ps. I have
not gone to the trouble of instrumenting the kernel to find out
exactly where this happens, but it seems to be while closing file
handles. It appears that the slave end of a pty waits on close for
buffered writes to be read out of the master end.
The child exits as soon as the buffered writes in the pty are cleared,
either by reading from the master end with -w, or when the master end
is closed. In either case, SIGCHLD is delivered to the parent
immediately.
The original test program only closes the master end of the pty upon
its own exit; this leads to considerable confusion about what actually
happens.
Conclusion: there is no problem with signal posting or delivery. The
problem is that the child process does not actually exit when the
submitter expects, but instead blocks closing the pty.
I think this behavior may be somewhat undesirable but I don't think it
is really a bug. Processes that open ptys are supposed to be able to
read client I/O out of them, and if they do so in a timely manner
there's no visible effect.
The only definite bug here is that the D-state wait the child process
falls into should have a wchan.
Arguably it shouldn't be a D-state wait either, on the grounds that
those are supposed to be short-term only; although interrupting close
isn't exactly a good thing either so it isn't so clear.
------
/*
* Alternate test program for PR 17171.
*
* The -w option causes the child process to write to the opened tty.
* The -r option causes the parent process to read from the pty master end.
*
* Note that each invocation of ps causes an extra SIGCHLD to be
* received; don't be fooled by them.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <stdarg.h>
#include <limits.h>
#include <signal.h>
#include <err.h>
#include <util.h>
static
void
say(const char *fmt, ...)
{
char buf[4096];
va_list ap;
va_start(ap, fmt);
vsnprintf(buf, sizeof(buf), fmt, ap);
va_end(ap);
/* make it as atomic as possible */
write(STDOUT_FILENO, buf, strlen(buf));
}
static
void
onsigchld(int sig)
{
(void)sig;
write(STDOUT_FILENO, "*** SIGCHLD ***\n", 16);
}
static
void
dops(pid_t pid, const char *what)
{
char buf[128];
say("[PARENT] ps -l of %s\n", what);
snprintf(buf, sizeof(buf), "ps -l%d", pid);
system(buf);
}
static
void
usage(const char *av0)
{
errx(1, "usage: %s [-r] [-w]", av0);
}
int
main(int argc, char *argv[])
{
int ch;
int parentreads = 0;
int childwrites = 0;
struct sigaction sa;
pid_t pid;
int masterfd;
char slavename[PATH_MAX];
ssize_t result;
char buf[256];
while ((ch = getopt(argc, argv, "rw")) != -1) {
switch (ch) {
case 'r': parentreads = 1; break;
case 'w': childwrites = 1; break;
default: usage(argv[0]); break;
}
}
if (optind != argc) {
usage(argv[0]);
}
if (sigaction(SIGCHLD, NULL, &sa)) {
err(1, "sigaction: get");
}
sa.sa_handler = onsigchld;
sa.sa_flags |= SA_NOCLDSTOP;
sigemptyset(&sa.sa_mask);
if (sigaction(SIGCHLD, &sa, NULL)) {
err(1, "sigaction: set");
}
pid = forkpty(&masterfd, slavename, NULL, NULL);
if (pid < 0) {
err(1, "forkpty");
}
if (pid == 0) {
/* child */
if (childwrites) {
say("[CHILD] la de da\n");
}
/* sleep until killed */
sleep(1000);
_exit(101);
}
/* parent */
/* 1. report who we are; wait to make sure child prints */
say("[PARENT] my pid %d, child pid %d, tty %s\n",
getpid(), pid, slavename);
sleep(1);
/* 2. inspect child; wait strictly for paranoia */
dops(pid, "running child");
sleep(1);
/* 3. post SIGKILL; sleep to make sure it's processed */
say("[PARENT] sending kill\n");
kill(pid, SIGKILL);
sleep(1);
/* 4. inspect child again; wait strictly for paranoia */
dops(pid, "killed child");
sleep(1);
if (parentreads) {
/* 5. read from the pty master */
say("[PARENT] reading from pty master\n");
result = read(masterfd, buf, sizeof(buf));
if (result < 0) {
warn("read: masterfd");
say("[PARENT] read failed\n");
}
else {
say("[PARENT] read %zd bytes from pty\n", result);
}
sleep(1);
/* 6. inspect child again; wait strictly for paranoia */
dops(pid, "killed child after ptm read");
sleep(1);
}
/* 7. now close the pty master */
say("[PARENT] closing pty master\n");
close(masterfd);
sleep(1);
/* 8. inspect child again */
dops(pid, "killed child after ptm close");
sleep(1);
say("[PARENT] exiting.\n");
return 0;
}
--
David A. Holland
dholland@netbsd.org
State-Changed-From-To: open->analyzed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 29 Nov 2008 21:30:50 +0000
State-Changed-Why:
Figured out what's going on.
State-Changed-From-To: analyzed->pending-pullups
State-Changed-By: gson@NetBSD.org
State-Changed-When: Wed, 14 Oct 2015 19:39:41 +0000
State-Changed-Why:
Child will exit after a five-second timeout as of tty.c 1.267.
State-Changed-From-To: pending-pullups->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Thu, 17 Aug 2017 19:05:24 +0000
State-Changed-Why:
This issue should be fixed in rev 1.267 of kern/tty.c, which is only on
netbsd-8. Do you think this still needs to be pulled up to older releases?
State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 09 Apr 2018 18:34:09 +0000
State-Changed-Why:
There is no need to fix this in -7.
It's no wonder the "pending" pullups weren't ever processed; they weren't
filed.
*ahem*
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.