NetBSD Problem Report #41566

From buhrow@lothlorien.nfbcal.org  Wed Jun 10 07:49:02 2009
Return-Path: <buhrow@lothlorien.nfbcal.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 375EB63B9E6
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 10 Jun 2009 07:49:02 +0000 (UTC)
Message-Id: <200906100749.n5A7n0VY007830@lothlorien.nfbcal.org>
Date: Wed, 10 Jun 2009 00:49:00 -0700 (PDT)
From: buhrow@lothlorien.nfbcal.org
Reply-To: buhrow@lothlorien.nfbcal.org
To: gnats-bugs@gnats.NetBSD.org
Subject: pty(4) handling under NetBSD-5 is broken
X-Send-Pr-Version: 3.95

>Number:         41566
>Category:       kern
>Synopsis:       It is possible for pty(4) master and slave processes to deadlock causing the processes to  get stuck forever.
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    plunky
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jun 10 07:50:00 +0000 2009
>Closed-Date:    Thu Jun 18 07:39:10 +0000 2009
>Last-Modified:  Thu Jun 18 07:39:10 +0000 2009
>Originator:     Brian Buhrow
>Release:        NetBSD 5.0_STABLE
>Organization:
	NFB of California
>Environment:


System: NetBSD arathorn.via.net 5.0 NetBSD 5.0 (GENERIC) #0: Sun Apr 26
18:50:08 UTC 200
9
builds@b6.netbsd.org:/home/builds/ab/netbsd-5-0-RELEASE/i386/200904260229Z-ob
j/home/builds/ab/netbsd-5-0-RELEASE/src/sys/arch/i386/compile/GENERIC i386

Architecture: i386
Machine: i386
>Description:

	the problem seems to be that if the master process is writing a a lot
of data to a pty, the slave process on the corresponding tty can fall
behind, causing something in the kernel to stop processing either master or
slave.
The following script shows the problem, and how to repeat it using the test
program provided below.  Note that if this program is run under NetBSD-4.x
or earlier, it runs forever, printing input and output lines until it is
manually terminated, which is what it should do.
	Under NetBSD-5, however, it gets stuck in ttyraw according to ps -l
output, as shown.
	I do not know exactly how to see further into the kernel to see what
is going wrong, but I'm hoping that this pr will, along with the test
program which fails reliably under NetBSD-5 and works reliably under all
other versions of NetBSD, will inspire some assistance in this effort.
	This appears to me to be a serious bug which should be addressed and
then pulled up to the NetBSD-5 branch as soon as possible.

-thanks
-Brian

Script started on Wed Jun 10 00:25:44 2009
%./ptytest&
[1] 1756
%./ptytest: Master process(1756) is writing to slave process (3339)
./ptytest: Using pty /dev/ttyp2
3339: Read 26 bytes from pty
3339: Read 2 bytes from pty
1756: Wrote 28 bytes to master pty
3339: Read 26 bytes from pty
3339: Read 2 bytes from pty
1756: Wrote 52 bytes to master pty
3339: Read 50 bytes from pty
3339: Read 2 bytes from pty
1756: Wrote 76 bytes to master pty
3339: Read 74 bytes from pty
3339: Read 2 bytes from pty
1756: Wrote 100 bytes to master pty
3339: Read 98 bytes from pty
3339: Read 2 bytes from pty
1756: Wrote 124 bytes to master pty
3339: Read 122 bytes from pty
3339: Read 2 bytes from pty
1756: Wrote 148 bytes to master pty
3339: Read 146 bytes from pty
3339: Read 2 bytes from pty
1756: Wrote 172 bytes to master pty
3339: Read 170 bytes from pty
3339: Read 2 bytes from pty
1756: Wrote 196 bytes to master pty
3339: Read 194 bytes from pty
3339: Read 2 bytes from pty
1756: Wrote 220 bytes to master pty
3339: Read 218 bytes from pty
3339: Read 2 bytes from pty
1756: Wrote 244 bytes to master pty
3339: Read 242 bytes from pty
3339: Read 2 bytes from pty
1756: Wrote 268 bytes to master pty
3339: Read 266 bytes from pty
1756: Wrote 28 bytes to master pty
1756: Wrote 52 bytes to master pty
1756: Wrote 76 bytes to master pty
1756: Wrote 100 bytes to master pty
1756: Wrote 124 bytes to master pty
1756: Wrote 148 bytes to master pty
1756: Wrote 172 bytes to master pty
1756: Wrote 196 bytes to master pty
3339: Read 2 bytes from pty
3339: Read 26 bytes from pty
3339: Read 2 bytes from pty
3339: Read 50 bytes from pty
3339: Read 2 bytes from pty
3339: Read 74 bytes from pty
3339: Read 2 bytes from pty
3339: Read 98 bytes from pty
3339: Read 2 bytes from pty
3339: Read 122 bytes from pty
3339: Read 2 bytes from pty
3339: Read 146 bytes from pty
3339: Read 2 bytes from pty
3339: Read 170 bytes from pty
3339: Read 2 bytes from pty
3339: Read 194 bytes from pty
3339: Read 2 bytes from pty

[processes hang at this point]

ps -l1756
UID  PID PPID CPU PRI NI  VSZ RSS WCHAN  STAT TTY      TIME COMMAND
100 1756 3460   0  85  0 2896 776 ttyraw S    ttyp1 0:00.01 ./ptytest 
%ps -l3339
UID  PID PPID CPU PRI NI  VSZ RSS WCHAN  STAT TTY      TIME COMMAND
100 3339 1756   0  85  0 2896 644 ttyraw S    ttyp1 0:00.00 ./ptytest 
%
%pstat -t |grep 'ttyp2'
ttyp2   124  0 1024 1248 256     82 OC            0     0 termios
%fg
./ptytest
%exit
%exit

Script done on Wed Jun 10 00:27:23 2009

>How-To-Repeat:


	I don't know how to fix the problem, but the following test program,
who's output is shown above, reliably reproduces the problem for me  on
every NetBSD-5 system I've tried.

To compile:
cc -O -o ptytest ptytest.c -lutil

/**************************************************************************
NAME: Brian Buhrow
DATE: June 9, 2009
PURPOSE: The purpose of this test program is to see if we can figure out
why ptys don't seem to work right under NetBSD-5.x.  There seems to be some
sort of deadlock issue between the pty master and the slave under certain
conditions, where the pty gets data between the master and the slave, and
each is waiting for someting to happen.
The master is waiting for a write(2) to complete, and the slave is waiting
for read(2) to complete.
This works fine under NetBSD-4.x and earlier, but NetBSD-5 seems to have a
problem.
This is a test program which should easily reproduce the problem.
**************************************************************************/

#ifndef LINT
static char rcsid[] = "$Id$";
#endif /*LINT*/

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <util.h>
#include <sys/types.h>

/*Slave reading process*/

int slave(int slavefd)
{
	char buf[512], *ptr;
	int bytesread;
	pid_t slpid;

	slpid = getpid();
	while(1) {
		bytesread = read(slavefd, buf, sizeof(buf));
		printf("%d: Read %d bytes from pty\n",slpid,bytesread);
		if (bytesread < 0) {
			perror("Error eading from pty");
			exit(1);
		}
	}

	exit(0); /*not reached*/
}

/*Master writing process*/
int master(int masterfd)
{
	char buf[512], *ptr;
	int outbytes,i;
	pid_t curpid;

	curpid = getpid();

	while(1) {
		sprintf(buf, "q {Subject: June Monitor}\rd\r");
		for (i = 0;i < 11;i ++) {
			outbytes = write(masterfd, buf, strlen(buf));
			ptr = buf;
			strncat(buf, ptr, 24);
			printf("%d: Wrote %d bytes to master pty\n",curpid,outbytes);
			if (outbytes < 0) {
				perror("write");
			}
		}
		bzero(buf, sizeof(buf));
		sleep(5);
	}

	exit(0); /*not reached*/
}


main(int argc, char **argv) 
{
	pid_t child;
	int masterfd, slavefd, status;
	char ptyname[256];

	status = openpty(&masterfd, &slavefd, ptyname, NULL, NULL);
	if (status < 0) {
		perror("Openpty");
		exit(1);
	}
	child = fork();
	if (child < 0) {
		perror("fork");
		exit(1);
	}
	if (child) {
		printf("%s: Master process(%d) is writing to slave process (%d)\n",
		argv[0],getpid(),child);
		printf("%s: Using pty %s\n",argv[0],ptyname);
		master(masterfd);
	} else {
		slave(slavefd);
	}

	/*not reached*/
	exit(0);
}

>Fix:

	Don't know at this time.  Suggestions welcome.

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->plunky
Responsible-Changed-By: plunky@NetBSD.org
Responsible-Changed-When: Fri, 12 Jun 2009 09:29:38 +0000
Responsible-Changed-Why:
I have applied a fix to -current, will close this when it is pulled up to netbsd-5 branch


State-Changed-From-To: open->analyzed
State-Changed-By: plunky@NetBSD.org
State-Changed-When: Fri, 12 Jun 2009 09:29:38 +0000
State-Changed-Why:
I worked out what the problem is (wrong condvar)


From: Iain Hibbert <plunky@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/41566 CVS commit: src/sys/kern
Date: Fri, 12 Jun 2009 09:26:50 +0000

 Module Name:	src
 Committed By:	plunky
 Date:		Fri Jun 12 09:26:50 UTC 2009

 Modified Files:
 	src/sys/kern: tty_pty.c

 Log Message:
 Writes on the controlling tty were not being awoken from blocks,
 use the correct condvar to make this happen.

 this fixes PR/41566


 To generate a diff of this commit:
 cvs rdiff -u -r1.116 -r1.117 src/sys/kern/tty_pty.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Manuel Bouyer <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/41566 CVS commit: [netbsd-5] src/sys/kern
Date: Wed, 17 Jun 2009 20:17:37 +0000

 Module Name:	src
 Committed By:	bouyer
 Date:		Wed Jun 17 20:17:37 UTC 2009

 Modified Files:
 	src/sys/kern [netbsd-5]: tty_pty.c

 Log Message:
 Pull up following revision(s) (requested by plunky in ticket #807):
 	sys/kern/tty_pty.c: revision 1.117
 Writes on the controlling tty were not being awoken from blocks,
 use the correct condvar to make this happen.
 this fixes PR/41566


 To generate a diff of this commit:
 cvs rdiff -u -r1.112 -r1.112.4.1 src/sys/kern/tty_pty.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Manuel Bouyer <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/41566 CVS commit: [netbsd-5-0] src/sys/kern
Date: Wed, 17 Jun 2009 21:34:04 +0000

 Module Name:	src
 Committed By:	bouyer
 Date:		Wed Jun 17 21:34:04 UTC 2009

 Modified Files:
 	src/sys/kern [netbsd-5-0]: tty_pty.c

 Log Message:
 Pull up following revision(s) (requested by plunky in ticket #807):
 	sys/kern/tty_pty.c: revision 1.117
 Writes on the controlling tty were not being awoken from blocks,
 use the correct condvar to make this happen.
 this fixes PR/41566


 To generate a diff of this commit:
 cvs rdiff -u -r1.112 -r1.112.6.1 src/sys/kern/tty_pty.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->closed
State-Changed-By: plunky@NetBSD.org
State-Changed-When: Thu, 18 Jun 2009 07:39:10 +0000
State-Changed-Why:
This has been pulled up to the netbsd-5 and netbsd-5.0 branches now


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.