NetBSD Problem Report #54033
From gson@gson.org Sat Mar 2 13:50:52 2019
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id A4B397A18B
for <gnats-bugs@gnats.NetBSD.org>; Sat, 2 Mar 2019 13:50:52 +0000 (UTC)
Message-Id: <20190302135045.9EE9398B726@guava.gson.org>
Date: Sat, 2 Mar 2019 15:50:45 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Interrupted writes can cause pty to stall
X-Send-Pr-Version: 3.95
>Number: 54033
>Category: kern
>Synopsis: Interrupted writes can cause pty to stall
>Confidential: no
>Severity: serious
>Priority: low
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Mar 02 13:55:00 +0000 2019
>Last-Modified: Sat Mar 02 14:15:00 +0000 2019
>Originator: Andreas Gustafsson
>Release: NetBSD 7.2, 8.0
>Organization:
>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:
I'm seeing strange behavior from NetBSD's pseudo-TTYs when writes
to the slave side are interrupted by signals.
My test consists of two programs, "throttle" and "throttlee". The
"throttle" program runs another program under a PTY and throttles its
output by limiting the rate at which it reads from the PTY master,
recreating the experience of the era of 300 baud modems.
The "throttlee" program writes a repeating test pattern to standard
output, with the twist that it also configures a timer to issue
periodic SIGALRM signals, causing some of the writes to be interrupted
and retried.
Running them together as
./throttle ./throttlee
under NetBSD/amd64 7.2 or 8.0, about a dozen lines of the expected
test pattern are printed at the throttled speed:
0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz
0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz
0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz
but then the output unexpectedly stops. Occasionally, the output may
spontaneously restart and then stop again. Running "ktruss -R -p" on
the throttlee PID shows an endless stream of alternating SIGALRMs and
interrupted writes once every 20 ms:
15913 1 throttlee 0.019996518 write(0, 0x7f7fff92a53c, 0x1) Err#4 EINTR
15913 1 throttlee 0.000001351 SIGALRM caught handler=0x4009a0 mask=0x0 code=0x0
15913 1 throttlee 0.000003630 setcontext(0x7f7fff92a190) JUSTRETURN
15913 1 throttlee 0.020003915 write(0, 0x7f7fff92a53c, 0x1) Err#4 EINTR
15913 1 throttlee 0.000001191 SIGALRM caught handler=0x4009a0 mask=0x0 code=0x0
15913 1 throttlee 0.000004607 setcontext(0x7f7fff92a190) JUSTRETURN
15913 1 throttlee 0.020001545 write(0, 0x7f7fff92a53c, 0x1) Err#4 EINTR
15913 1 throttlee 0.000001172 SIGALRM caught handler=0x4009a0 mask=0x0 code=0x0
15913 1 throttlee 0.000003959 setcontext(0x7f7fff92a190) JUSTRETURN
15913 1 throttlee 0.020001820 write(0, 0x7f7fff92a53c, 0x1) Err#4 EINTR
15913 1 throttlee 0.000000856 SIGALRM caught handler=0x4009a0 mask=0x0 code=0x0
Running "ktruss -R -p" on the throttle PID, the output is even
stranger, showing a carriage return being read from the PTY master
device every 20 ms, even though they should only occur at the end of
each 75-character line, once every 1500 ms or so, and never twice in a
row:
15132 1 throttle 0.000007160 poll(0x7f7fff4731a0, 0x2, 0xffffffff) = 1
15132 1 throttle 0.000004604 read(0x4, 0x7f7fff47317b, 0x1) = 1
"\r"
15132 1 throttle 0.000004586 write(0x1, 0x7f7fff47317b, 0x1) = 1
"\r"
15132 1 throttle 0.020010898 poll(0, 0, 0x5) = 0
15132 1 throttle 0.000011133 poll(0x7f7fff4731a0, 0x2, 0xffffffff) = 1
15132 1 throttle 0.000004079 read(0x4, 0x7f7fff47317b, 0x1) = 1
"\r"
15132 1 throttle 0.000004049 write(0x1, 0x7f7fff47317b, 0x1) = 1
"\r"
15132 1 throttle 0.019971826 poll(0, 0, 0x5) = 0
I'm setting the priority to "low" as I'm not aware of this issue
currently impacting any real-world applications, only this somewhat
contrived test case.
>How-To-Repeat:
The test programs can be downloaded and run as follows:
ftp http://www.gson.org/netbsd/bugs/pty-interrupted/throttle.tar
tar xfv throttle.tar
cd throttle
make
./throttle ./throttlee
>Fix:
>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: re: kern/54033: Interrupted writes can cause pty to stall
Date: Sun, 03 Mar 2019 01:09:31 +1100
this may or may not be related.
some relatively small portion of the time when i run
"cvs <something> | less" and the cvs has a lot of
output (eg, annotate or log), and i do not let it
complete running to completion and less having all the
output buffered internally, by ^Z'ing back to the shell
and doing other stuff.
sometimes that ^Z hangs and i can't use that pty again
until i kill -9 the pipe-writing process (eg, cvs log)
at which point, i get my shell back and less is now
suspended with some partial input.
this was worse a few years ago and now happens only
sometimes, vs often. at the time, i thought the fix
that christos had made solved my problem, but now i
suspect it only makes it less likely to hit.
.mrg.
From: Christos Zoulas <christos@zoulas.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>
Cc: kern-bug-people@netbsd.org,
gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org,
Andreas Gustafsson <gson@gson.org>
Subject: Re: kern/54033: Interrupted writes can cause pty to stall
Date: Sat, 2 Mar 2019 09:11:40 -0500
> On Mar 2, 2019, at 9:10 AM, matthew green <mrg@eterna.com.au> wrote:
>
> The following reply was made to PR kern/54033; it has been noted by GNATS.
>
> From: matthew green <mrg@eterna.com.au>
> To: gnats-bugs@NetBSD.org
> Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org
> Subject: re: kern/54033: Interrupted writes can cause pty to stall
> Date: Sun, 03 Mar 2019 01:09:31 +1100
>
> this may or may not be related.
>
> some relatively small portion of the time when i run
> "cvs <something> | less" and the cvs has a lot of
> output (eg, annotate or log), and i do not let it
> complete running to completion and less having all the
> output buffered internally, by ^Z'ing back to the shell
> and doing other stuff.
>
> sometimes that ^Z hangs and i can't use that pty again
> until i kill -9 the pipe-writing process (eg, cvs log)
> at which point, i get my shell back and less is now
> suspended with some partial input.
>
> this was worse a few years ago and now happens only
> sometimes, vs often. at the time, i thought the fix
> that christos had made solved my problem, but now i
> suspect it only makes it less likely to hit.
>
>
Can you please remind me what was the fix that I made?
Perhaps it is incomplete, or incorrect?
christos
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.