NetBSD Problem Report #54033

From gson@gson.org  Sat Mar  2 13:50:52 2019
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id A4B397A18B
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  2 Mar 2019 13:50:52 +0000 (UTC)
Message-Id: <20190302135045.9EE9398B726@guava.gson.org>
Date: Sat,  2 Mar 2019 15:50:45 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Interrupted writes can cause pty to stall
X-Send-Pr-Version: 3.95

>Number:         54033
>Category:       kern
>Synopsis:       Interrupted writes can cause pty to stall
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Mar 02 13:55:00 +0000 2019
>Last-Modified:  Sat Mar 02 14:15:00 +0000 2019
>Originator:     Andreas Gustafsson
>Release:        NetBSD 7.2, 8.0
>Organization:

>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:

I'm seeing strange behavior from NetBSD's pseudo-TTYs when writes
to the slave side are interrupted by signals.

My test consists of two programs, "throttle" and "throttlee".  The
"throttle" program runs another program under a PTY and throttles its
output by limiting the rate at which it reads from the PTY master,
recreating the experience of the era of 300 baud modems.

The "throttlee" program writes a repeating test pattern to standard
output, with the twist that it also configures a timer to issue
periodic SIGALRM signals, causing some of the writes to be interrupted
and retried.

Running them together as

  ./throttle ./throttlee

under NetBSD/amd64 7.2 or 8.0, about a dozen lines of the expected
test pattern are printed at the throttled speed:

  0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz
  0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz
  0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz

but then the output unexpectedly stops.  Occasionally, the output may
spontaneously restart and then stop again.  Running "ktruss -R -p" on
the throttlee PID shows an endless stream of alternating SIGALRMs and
interrupted writes once every 20 ms:

 15913      1 throttlee 0.019996518 write(0, 0x7f7fff92a53c, 0x1) Err#4 EINTR
 15913      1 throttlee 0.000001351 SIGALRM caught handler=0x4009a0 mask=0x0 code=0x0
 15913      1 throttlee 0.000003630 setcontext(0x7f7fff92a190) JUSTRETURN
 15913      1 throttlee 0.020003915 write(0, 0x7f7fff92a53c, 0x1) Err#4 EINTR
 15913      1 throttlee 0.000001191 SIGALRM caught handler=0x4009a0 mask=0x0 code=0x0
 15913      1 throttlee 0.000004607 setcontext(0x7f7fff92a190) JUSTRETURN
 15913      1 throttlee 0.020001545 write(0, 0x7f7fff92a53c, 0x1) Err#4 EINTR
 15913      1 throttlee 0.000001172 SIGALRM caught handler=0x4009a0 mask=0x0 code=0x0
 15913      1 throttlee 0.000003959 setcontext(0x7f7fff92a190) JUSTRETURN
 15913      1 throttlee 0.020001820 write(0, 0x7f7fff92a53c, 0x1) Err#4 EINTR
 15913      1 throttlee 0.000000856 SIGALRM caught handler=0x4009a0 mask=0x0 code=0x0

Running "ktruss -R -p" on the throttle PID, the output is even
stranger, showing a carriage return being read from the PTY master
device every 20 ms, even though they should only occur at the end of
each 75-character line, once every 1500 ms or so, and never twice in a
row:

 15132      1 throttle 0.000007160 poll(0x7f7fff4731a0, 0x2, 0xffffffff) = 1
 15132      1 throttle 0.000004604 read(0x4, 0x7f7fff47317b, 0x1) = 1
       "\r"
 15132      1 throttle 0.000004586 write(0x1, 0x7f7fff47317b, 0x1) = 1
       "\r"
 15132      1 throttle 0.020010898 poll(0, 0, 0x5) = 0
 15132      1 throttle 0.000011133 poll(0x7f7fff4731a0, 0x2, 0xffffffff) = 1
 15132      1 throttle 0.000004079 read(0x4, 0x7f7fff47317b, 0x1) = 1
       "\r"
 15132      1 throttle 0.000004049 write(0x1, 0x7f7fff47317b, 0x1) = 1
       "\r"
 15132      1 throttle 0.019971826 poll(0, 0, 0x5) = 0

I'm setting the priority to "low" as I'm not aware of this issue
currently impacting any real-world applications, only this somewhat
contrived test case.

>How-To-Repeat:

The test programs can be downloaded and run as follows:

  ftp http://www.gson.org/netbsd/bugs/pty-interrupted/throttle.tar
  tar xfv throttle.tar
  cd throttle
  make
  ./throttle ./throttlee

>Fix:

>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: kern/54033: Interrupted writes can cause pty to stall
Date: Sun, 03 Mar 2019 01:09:31 +1100

 this may or may not be related.

 some relatively small portion of the time when i run
 "cvs <something> | less" and the cvs has a lot of
 output (eg, annotate or log), and i do not let it
 complete running to completion and less having all the
 output buffered internally, by ^Z'ing back to the shell
 and doing other stuff.

 sometimes that ^Z hangs and i can't use that pty again
 until i kill -9 the pipe-writing process (eg, cvs log)
 at which point, i get my shell back and less is now
 suspended with some partial input.

 this was worse a few years ago and now happens only
 sometimes, vs often.  at the time, i thought the fix
 that christos had made solved my problem, but now i
 suspect it only makes it less likely to hit.


 .mrg.

From: Christos Zoulas <christos@zoulas.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 Andreas Gustafsson <gson@gson.org>
Subject: Re: kern/54033: Interrupted writes can cause pty to stall
Date: Sat, 2 Mar 2019 09:11:40 -0500

 > On Mar 2, 2019, at 9:10 AM, matthew green <mrg@eterna.com.au> wrote:
 > 
 > The following reply was made to PR kern/54033; it has been noted by GNATS.
 > 
 > From: matthew green <mrg@eterna.com.au>
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 >    netbsd-bugs@netbsd.org
 > Subject: re: kern/54033: Interrupted writes can cause pty to stall
 > Date: Sun, 03 Mar 2019 01:09:31 +1100
 > 
 > this may or may not be related.
 > 
 > some relatively small portion of the time when i run
 > "cvs <something> | less" and the cvs has a lot of
 > output (eg, annotate or log), and i do not let it
 > complete running to completion and less having all the
 > output buffered internally, by ^Z'ing back to the shell
 > and doing other stuff.
 > 
 > sometimes that ^Z hangs and i can't use that pty again
 > until i kill -9 the pipe-writing process (eg, cvs log)
 > at which point, i get my shell back and less is now
 > suspended with some partial input.
 > 
 > this was worse a few years ago and now happens only
 > sometimes, vs often.  at the time, i thought the fix
 > that christos had made solved my problem, but now i
 > suspect it only makes it less likely to hit.
 > 
 > 
 Can you please remind me what was the fix that I made?
 Perhaps it is incomplete, or incorrect?

 christos

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.