NetBSD Problem Report #56143
From mouse@Stone.Rodents-Montreal.ORG Tue May 4 15:41:18 2021
Return-Path: <mouse@Stone.Rodents-Montreal.ORG>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id D57581A923F
for <gnats-bugs@gnats.NetBSD.org>; Tue, 4 May 2021 15:41:18 +0000 (UTC)
Message-Id: <202105041541.LAA22934@Stone.Rodents-Montreal.ORG>
Date: Tue, 4 May 2021 11:41:15 -0400 (EDT)
From: Mouse <mouse@Rodents-Montreal.ORG>
Reply-To: mouse@Rodents-Montreal.ORG
To: gnats-bugs@NetBSD.org
Subject: Serial-line speed switch can corrupt "drained" output
X-Send-Pr-Version: 3.95
>Number: 56143
>Category: kern
>Synopsis: Serial-line speed switch can corrupt "drained" output
>Confidential: no
>Severity: serious
>Priority: low
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue May 04 15:45:00 +0000 2021
>Last-Modified: Tue May 04 20:55:01 +0000 2021
>Originator: Mouse
>Release: NetBSD 9.1
>Organization:
Dis-
>Environment:
System: NetBSD Aaeon-9.Rodents-Montreal.ORG 9.1 NetBSD 9.1 (GEN91) #15: Fri Apr 16 12:48:33 EDT 2021 mouse@Aaeon-9.Rodents-Montreal.ORG:/home/mouse/kbuild/GEN91 amd64
Architecture: x86_64
Machine: amd64
The machine is an Aaeon "industrial" computer with real serial ports.
The serial ports in question attach as
[ 1.000003] acpi0 at mainbus0: Intel ACPICA 20190405
...
[ 1.047160] com0 at acpi0 (UAR1, PNP0501-1): io 0x3f8-0x3ff irq 4
[ 1.047160] com0: ns16550a, working fifo
[ 1.047160] com1 at acpi0 (UAR2, PNP0501-2): io 0x2f8-0x2ff irq 3
[ 1.047160] com1: ns16550a, working fifo
>Description:
When using TCSADRAIN to change serial-port speeds, the drain
operation appears to not drain far enough; it can corrupt the
last octet (I speculate it can corrupt more than that if the
hardware in question has more queueing, but that's a guess).
>How-To-Repeat:
Set the speed to one speed (4800 in our test case). Write data
to the serial port (30 octets, in our case). Change speeds (to
9600 in our case) with TCSADRAIN before the written data has
been fully sent. Note that most of the output is sent
correctly, but the last octet is corrupted (in our case, 0x0a
becomes 0xf3 - not that surprising based on the line state
waveform).
I don't know why this is; ttywait_timo checks TS_BUSY as well
as t_outq.c_cc. Perhaps the com driver doesn't set TS_BUSY
entirely correctly?
>Fix:
Unknown.
For my purposes at the moment, I'm working around it in
userland by setting with TCSADRAIN|TCSASOFT, then sleeping 5ms,
5ms because at 4800 each character is a smidgen over 3ms wide
and 5ms allows a little extra room, then setting with just
TCSADRAIN.
Working on 9.1 is unpleasant enough I am not motivated to put
my own time into this, and work is unlikely to want me to put
work time into fixing it when we have a workaround.
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mouse@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
>Audit-Trail:
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56143: Serial-line speed switch can corrupt "drained" output
Date: Wed, 05 May 2021 03:13:30 +0700
Date: Tue, 4 May 2021 15:45:01 +0000 (UTC)
From: Mouse <mouse@Rodents-Montreal.ORG>
Message-ID: <20210504154501.1790A1A9244@mollari.NetBSD.org>
| I don't know why this is; ttywait_timo checks TS_BUSY as well
| as t_outq.c_cc. Perhaps the com driver doesn't set TS_BUSY
| entirely correctly?
More likely the hardware is saying that it has transmitted the
character when all it has really done is move it to its shift
register (and so the output character buffer is free for the next).
The driver would need to test deeper hardware state to tell if the
hardware has actually finished transmitting the character, and is
fully idle. I'm not even sure if all hardware has a way to return
that kind of state (but I semi-recall - it has been a very long time -
that some hardware does provide that info). It would need special case
handling in each driver for this ioctl to get this right though.
kre
From: Mouse <mouse@Rodents-Montreal.ORG>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56143: Serial-line speed switch can corrupt "drained" output
Date: Tue, 4 May 2021 16:53:13 -0400 (EDT)
>> I don't know why this is; ttywait_timo checks TS_BUSY as well as
>> t_outq.c_cc. Perhaps the com driver doesn't set TS_BUSY entirely
>> correctly?
> More likely the hardware is saying that it has transmitted the
> character when all it has really done is move it to its shift
> register (and so the output character buffer is free for the next).
Possibly. Then it's either "the hardware lies" or "the driver turns
off TS_BUSY before it should", I guess.
I suspect it's the latter, in that com.c turns of TS_BUSY
unconditionally in com_txsoft, which is probably "FIFO is below
low-water mark (or moral equivalent, if tiny/no FIFO)" rather than a
true "transmitter is idle". Looking at comreg.h, I see LSR_TSRE versus
LSR_TXRDY, but I don't see anything obvious permitting interrupts on
LSR_TSRE going active. It would be ugly to have to poll LSR_TSRE, but
if it's the only way to get TS_BUSY right....
> The driver would need to test deeper hardware state to tell if the
> hardware has actually finished transmitting the character, and is
> fully idle.
The (few) serial-line chips I've looked at things like register bits
for have had "transmitter is idle" state bits, like LSR_TSRE above, and
usually interrupt generation logic, which is documented as being what
we want here. The 16550A is not among the chips I know, though, and,
well, see above. There's also the question of LSR_TSRE tells the real
truth.
> I'm not even sure if all hardware has a way to return that kind of
> state (but I semi-recall - it has been a very long time - that some
> hardware does provide that info). It would need special case
> handling in each driver for this ioctl to get this right though.
Well, it seems to me that it's no more special-case handling than the
driver must already be doing to pilot the hardware.
As for doing something "for this ioctl", by the time they hit the
driver, TIOCSETA, TIOCSETAF, and TIOCSETAW aren't ioctls any longer,
and actually aren't even calls into the driver except for a call
through tp->t_param, unless I've missed something. Look at kern/tty.c,
starting with line 1123 in the 9.1 version (tty.c,v 1.281).
Mouse
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.