NetBSD Problem Report #17171

Received: (qmail 23826 invoked by uid 605); 5 Jun 2002 00:14:02 -0000
Message-Id: <20020605001359.F2A8711137@www.netbsd.org>
Date: Tue,  4 Jun 2002 17:13:59 -0700 (PDT)
From: noah@noah.org
Sender: gnats-bugs-owner@netbsd.org
Reply-To: noah@noah.org
To: gnats-bugs@gnats.netbsd.org
Subject: Dead Child does not raise SIGCHLD until after parent reads all output on a pty.
X-Send-Pr-Version: www-1.0

>Number:         17171
>Notify-List:    gson@gson.org
>Category:       kern
>Synopsis:       Dead Child does not raise SIGCHLD until after parent reads all output on a pty.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jun 05 00:15:00 +0000 2002
>Closed-Date:    Mon Apr 09 18:34:09 +0000 2018
>Last-Modified:  Mon Apr 09 18:34:09 +0000 2018
>Originator:     Noah Spurrier
>Release:        1.5.3_ALPHA
>Organization:
None
>Environment:
NetBSD thor 1.5.3_ALPHA NetBSD 1.5.3_ALPHA (ART)
>Description:
When a child process dies it normally sends a SIGCHLD to the parent.
The child remains in the process list as a zombie until the
parent calls waitpid() or wait(). However this does NOT seem to be the
case for child created with forkpty(). A child may be dead (zombie) and
the parent will never receive a SIGCHLD and will block on wait
unless the file descriptor of the pty is emptied first. I know that
the child is in the "Waiting to Exit" state (zombie)  because I can
see this in the ps listing and because it has been sent a SIGKILL
from which is cannot, presumably, ignore. The critical things seems
to be for the pty to have unread data. If the child never print output
to the pty or if the parent consumes it all then the SIGCHLD will be raised.

As far as I can tell, the pty device and fork should have nothing to
do with each other. I realize that pty devices and especially
forkpty are non-standard (at least not POSIX), but forkpty is built
on top of fork. This synchronous behavior strikes me as a surprising
side effect. Shouldn't the SIGCHLD signal should be asynchronous and
be unrelated to the state of the pty device? If I had created my own
forkpty using openpty and fork, then I would cannot imagine why the
pty would prevent my child's SIGCHLD from being sent.
I may be wrong in my assumption of expected behavior. Maybe there is
a layer in between that is proxying signals for some reason.
I'm sorry that I'm not a real hacker to go track this down for you.

I confirmed this behavior on NetBSD 1.5.3


>How-To-Repeat:
I have attached a test program, test.c, that should demonstrate
the problem. This program will also compile on Linux and OS X, so you
have other platforms to compare it against. Email me at noah@noah.org 
if you would prefer I send it in a separate email. I can also send sample output from a NetBSD 1.5.3; a Linux machine; and an OSX machine.

I have tested this program on OS X and Linux. Both of those systems 
do not show this problem. The SIGCHLD signal always arrives not long 
after the child gets a SIGKILL, nor is it ever synchronous with some 
state in the pty.

This test.c will allow you to test three different scenarios. 

If you run it with 'test 0' then the Child will print some output
before it is killed. The Parent will NOT read output after child is
killed. You will see that the parent never receives a SIGCHLD even
though the child is clearly good and dead.

If you run it with 'test 1' then the Child will NOT print any output
nor will the parent attempt to read any. In this case the Parent
will receive the SIGCHLD signal and you can see that it occurs at
the time the signal is sent. In other words, the signal does not
appear to be delayed and appears asynchronously as expected.

If you run it with 'test 2' then the Child will print some output
before it is killed. The Parent will read output AFTER child is
killed. In this case the signal does not arrive until AFTER the
parent reads the output. The parent is reading data from a dead
child (which is not necessarily bad), but it never gets the SIGCHLD
signal until after the data from the dead child is consumed. 
This shows surprising synchronous behavior.

I hope that this is clear enough. I tried to be thorough and avoid
any obvious newbie mistakes before I submitted this as a bug. I also
took some small effort to compare the NetBSD behavior with other
UNIX platforms.

/* 
  I built this with "gcc -lutil test.c -otest"
  So far I have tested this on OpenBSD 3.0 and OpenBSD 2.9
  Linux 2.4.9 and OS X (close to NetBSD I believe).
  As a test, I ignore most exceptional errors such as failed fork or waitpid.
*/

#include <sys/types.h>  /* include this before any other sys headers */
#include <sys/wait.h>   /* header for waitpid() and various macros */
#include <signal.h>     /* header for signal functions */
#include <stdio.h>      /* header for fprintf() */
#include <unistd.h>     /* header for fork() */
#ifdef LINUX
#include <pty.h>
#else
#include <util.h>        /* header for forkpty, compile with -lutil */
#endif

void sig_chld(int);  /* prototype for our SIGCHLD handler */

int main(int argc, char * argv[]) 
{
    struct sigaction act;
    int pid;
    int fd;
    char slave_name [20];
    int CHILD_OUTPUT_FLAG;
    int PARENT_READ_FLAG;
    char buffer [1000];
    int count;

    /*
        Command line arguments:
                0 - or nothing for default. Child will print some output before it is killed.
                        Parent will end without ever trying to read this output.
                1 - To run test where child will not print any output.
                2 - To run test where child will print output and 
                        parent will try to read output after child is killed.
    */
    if (argc > 1 && *(argv[1]) == '1')
    {
        printf ("PARENT: Child will not print any output.
");
        printf ("PARENT: Parent will NOT read output after child is killed.
");
        CHILD_OUTPUT_FLAG = 0;
        PARENT_READ_FLAG = 0;
    }
    else if (argc > 1 && *(argv[1]) == '2')
    {
        printf ("PARENT: Child will print some output before it is killed.
");
        printf ("PARENT: Parent will read output after child is killed.
");
        CHILD_OUTPUT_FLAG = 1;
        PARENT_READ_FLAG = 1;
    }
    else
    {
        printf ("PARENT: Child will print some output before it is killed.
");
        printf ("PARENT: Parent will NOT read output after child is killed.
");
        CHILD_OUTPUT_FLAG = 1;
        PARENT_READ_FLAG = 0;
    } 

    /* Assign sig_chld as our SIGCHLD handler.
       We don't want to block any other signals in this example 
       We're only interested in children that have terminated, not ones
       which have been stopped (eg user pressing control-Z at terminal).
       Finally, make these values effective. If we were writing a real 
       application, we would save the old value instead of passing NULL.
     */
    act.sa_handler = sig_chld;
    sigemptyset(&act.sa_mask);
    act.sa_flags = SA_NOCLDSTOP;
    sigaction(SIGCHLD, &act, NULL);


    /* Do the Fork thing. 
    */
    pid = forkpty (&fd, slave_name, NULL, NULL);
    /* pid = fork(); */

    switch (pid)
    {
            case 0: /* Child process. */     
                if (CHILD_OUTPUT_FLAG)
                    printf ("CHILD: This output may cause trouble.
");
                sleep(1000);
            break;

            default: /* Parent process. */
                printf ("PARENT: After fork, sleeping...
");
                sleep(5); /* Crappy way to avoid a race with child. */
                printf ("PARENT: Child pid: %d
", pid); 
                printf ("PARENT: sending SIGKILL to child...
");
                kill (pid, SIGKILL);
                printf ("PARENT: After kill, sleeping...
");
                sleep(5);
            break;
    }

    if (PARENT_READ_FLAG)
    {
        printf ("PARENT: Consuming any output from child pty fd.
");
        count = read (fd, buffer, 999);
        printf ("PARENT: Read %d characters.
", count);
    }
    else
    {
        printf ("PARENT: Not attempting to read from child.
");
    }

    printf ("PARENT: leaving.

");
    return 0;
}

void sig_chld(int signo) 
{
    int status, wpid, child_val;

    printf ("SIGCHLD: In sig_chld signal handler.
");

    /* Wait for any child without blocking */
    wpid = waitpid (-1, & status, WNOHANG);
    printf ("SIGCHLD:	Waitpid found status for pid: %d
", wpid);
    printf("SIGCHLD:	Waitpid status: %d
", status);

    if (WIFEXITED(status)) /* did child exit normally? */
    {
        child_val = WEXITSTATUS(status); 
        printf("SIGCHLD:	child exited normally with status %d
", child_val);
    }
    printf ("SIGCHLD: End of sig_chld.
");
}

>Fix:
Unknown
>Release-Note:
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/17171: Dead Child does not raise SIGCHLD until after
	parent reads all output on a pty.
Date: Sat, 29 Nov 2008 21:26:45 +0000

 This behavior still exists in -current. Enclosed is an additional
 similar test program that clarifies what exactly is going on. It opens
 a pty with forkpty and optionally writes to the slave end [-w] and/or
 reads from the master end [-r].

 If neither option is given, SIGCHLD is received right away when the
 child is killed.

 With -w so the child writes to the tty, the kill kicks the child out
 of nanosleep and into some D state with no wchan listed in ps. I have
 not gone to the trouble of instrumenting the kernel to find out
 exactly where this happens, but it seems to be while closing file
 handles. It appears that the slave end of a pty waits on close for
 buffered writes to be read out of the master end.

 The child exits as soon as the buffered writes in the pty are cleared,
 either by reading from the master end with -w, or when the master end
 is closed. In either case, SIGCHLD is delivered to the parent
 immediately.

 The original test program only closes the master end of the pty upon
 its own exit; this leads to considerable confusion about what actually
 happens.

 Conclusion: there is no problem with signal posting or delivery. The
 problem is that the child process does not actually exit when the
 submitter expects, but instead blocks closing the pty.

 I think this behavior may be somewhat undesirable but I don't think it
 is really a bug. Processes that open ptys are supposed to be able to
 read client I/O out of them, and if they do so in a timely manner
 there's no visible effect.

 The only definite bug here is that the D-state wait the child process
 falls into should have a wchan.

 Arguably it shouldn't be a D-state wait either, on the grounds that
 those are supposed to be short-term only; although interrupting close
 isn't exactly a good thing either so it isn't so clear.

    ------

 /*
  * Alternate test program for PR 17171.
  *
  * The -w option causes the child process to write to the opened tty.
  * The -r option causes the parent process to read from the pty master end.
  *
  * Note that each invocation of ps causes an extra SIGCHLD to be
  * received; don't be fooled by them.
  */

 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
 #include <stdarg.h>
 #include <limits.h>
 #include <signal.h>
 #include <err.h>
 #include <util.h>

 static
 void
 say(const char *fmt, ...)
 {
 	char buf[4096];
 	va_list ap;

 	va_start(ap, fmt);
 	vsnprintf(buf, sizeof(buf), fmt, ap);
 	va_end(ap);

 	/* make it as atomic as possible */
 	write(STDOUT_FILENO, buf, strlen(buf));
 }

 static
 void
 onsigchld(int sig)
 {
 	(void)sig;
 	write(STDOUT_FILENO, "*** SIGCHLD ***\n", 16);
 }

 static
 void
 dops(pid_t pid, const char *what)
 {
 	char buf[128];

 	say("[PARENT] ps -l of %s\n", what);
 	snprintf(buf, sizeof(buf), "ps -l%d", pid);
 	system(buf);
 }

 static
 void
 usage(const char *av0)
 {
 	errx(1, "usage: %s [-r] [-w]", av0);
 }

 int
 main(int argc, char *argv[])
 {
 	int ch;
 	int parentreads = 0;
 	int childwrites = 0;

 	struct sigaction sa;
 	pid_t pid;
 	int masterfd;
 	char slavename[PATH_MAX];
 	ssize_t result;
 	char buf[256];

 	while ((ch = getopt(argc, argv, "rw")) != -1) {
 		switch (ch) {
 		    case 'r': parentreads = 1; break;
 		    case 'w': childwrites = 1; break;
 		    default: usage(argv[0]); break;
 		}
 	}
 	if (optind != argc) {
 		usage(argv[0]);
 	}

 	if (sigaction(SIGCHLD, NULL, &sa)) {
 		err(1, "sigaction: get");
 	}
 	sa.sa_handler = onsigchld;
 	sa.sa_flags |= SA_NOCLDSTOP;
 	sigemptyset(&sa.sa_mask);
 	if (sigaction(SIGCHLD, &sa, NULL)) {
 		err(1, "sigaction: set");
 	}

 	pid = forkpty(&masterfd, slavename, NULL, NULL);
 	if (pid < 0) {
 		err(1, "forkpty");
 	}

 	if (pid == 0) {
 		/* child */
 		if (childwrites) {
 			say("[CHILD] la de da\n");
 		}
 		/* sleep until killed */
 		sleep(1000);
 		_exit(101);
 	}

 	/* parent */

 	/* 1. report who we are; wait to make sure child prints */
 	say("[PARENT] my pid %d, child pid %d, tty %s\n",
 	    getpid(), pid, slavename);
 	sleep(1);

 	/* 2. inspect child; wait strictly for paranoia */
 	dops(pid, "running child");
 	sleep(1);

 	/* 3. post SIGKILL; sleep to make sure it's processed */
 	say("[PARENT] sending kill\n");
 	kill(pid, SIGKILL);
 	sleep(1);

 	/* 4. inspect child again; wait strictly for paranoia */
 	dops(pid, "killed child");
 	sleep(1);

 	if (parentreads) {
 		/* 5. read from the pty master */
 		say("[PARENT] reading from pty master\n");
 		result = read(masterfd, buf, sizeof(buf));
 		if (result < 0) {
 			warn("read: masterfd");
 			say("[PARENT] read failed\n");
 		}
 		else {
 			say("[PARENT] read %zd bytes from pty\n", result);
 		}
 		sleep(1);

 		/* 6. inspect child again; wait strictly for paranoia */
 		dops(pid, "killed child after ptm read");
 		sleep(1);
 	}

 	/* 7. now close the pty master */
 	say("[PARENT] closing pty master\n");
 	close(masterfd);
 	sleep(1);

 	/* 8. inspect child again */
 	dops(pid, "killed child after ptm close");
 	sleep(1);

 	say("[PARENT] exiting.\n");
 	return 0;
 }


 -- 
 David A. Holland
 dholland@netbsd.org

State-Changed-From-To: open->analyzed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 29 Nov 2008 21:30:50 +0000
State-Changed-Why:
Figured out what's going on.


State-Changed-From-To: analyzed->pending-pullups
State-Changed-By: gson@NetBSD.org
State-Changed-When: Wed, 14 Oct 2015 19:39:41 +0000
State-Changed-Why:
Child will exit after a five-second timeout as of tty.c 1.267.


State-Changed-From-To: pending-pullups->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Thu, 17 Aug 2017 19:05:24 +0000
State-Changed-Why:
This issue should be fixed in rev 1.267 of kern/tty.c, which is only on 
netbsd-8. Do you think this still needs to be pulled up to older releases?


State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 09 Apr 2018 18:34:09 +0000
State-Changed-Why:
There is no need to fix this in -7.

It's no wonder the "pending" pullups weren't ever processed; they weren't
filed.
*ahem*


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.