NetBSD Problem Report #56143

From mouse@Stone.Rodents-Montreal.ORG  Tue May  4 15:41:18 2021
Return-Path: <mouse@Stone.Rodents-Montreal.ORG>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id D57581A923F
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  4 May 2021 15:41:18 +0000 (UTC)
Message-Id: <202105041541.LAA22934@Stone.Rodents-Montreal.ORG>
Date: Tue, 4 May 2021 11:41:15 -0400 (EDT)
From: Mouse <mouse@Rodents-Montreal.ORG>
Reply-To: mouse@Rodents-Montreal.ORG
To: gnats-bugs@NetBSD.org
Subject: Serial-line speed switch can corrupt "drained" output
X-Send-Pr-Version: 3.95

>Number:         56143
>Category:       kern
>Synopsis:       Serial-line speed switch can corrupt "drained" output
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue May 04 15:45:00 +0000 2021
>Last-Modified:  Tue May 04 20:55:01 +0000 2021
>Originator:     Mouse
>Release:        NetBSD 9.1
>Organization:
	Dis-
>Environment:
System: NetBSD Aaeon-9.Rodents-Montreal.ORG 9.1 NetBSD 9.1 (GEN91) #15: Fri Apr 16 12:48:33 EDT 2021 mouse@Aaeon-9.Rodents-Montreal.ORG:/home/mouse/kbuild/GEN91 amd64
Architecture: x86_64
Machine: amd64

The machine is an Aaeon "industrial" computer with real serial ports.
The serial ports in question attach as

[     1.000003] acpi0 at mainbus0: Intel ACPICA 20190405
...
[     1.047160] com0 at acpi0 (UAR1, PNP0501-1): io 0x3f8-0x3ff irq 4
[     1.047160] com0: ns16550a, working fifo
[     1.047160] com1 at acpi0 (UAR2, PNP0501-2): io 0x2f8-0x2ff irq 3
[     1.047160] com1: ns16550a, working fifo
>Description:
	When using TCSADRAIN to change serial-port speeds, the drain
	operation appears to not drain far enough; it can corrupt the
	last octet (I speculate it can corrupt more than that if the
	hardware in question has more queueing, but that's a guess).
>How-To-Repeat:
	Set the speed to one speed (4800 in our test case).  Write data
	to the serial port (30 octets, in our case).  Change speeds (to
	9600 in our case) with TCSADRAIN before the written data has
	been fully sent.  Note that most of the output is sent
	correctly, but the last octet is corrupted (in our case, 0x0a
	becomes 0xf3 - not that surprising based on the line state
	waveform).

	I don't know why this is; ttywait_timo checks TS_BUSY as well
	as t_outq.c_cc.  Perhaps the com driver doesn't set TS_BUSY
	entirely correctly?
>Fix:
	Unknown.

	For my purposes at the moment, I'm working around it in
	userland by setting with TCSADRAIN|TCSASOFT, then sleeping 5ms,
	5ms because at 4800 each character is a smidgen over 3ms wide
	and 5ms allows a little extra room, then setting with just
	TCSADRAIN.

	Working on 9.1 is unpleasant enough I am not motivated to put
	my own time into this, and work is unlikely to want me to put
	work time into fixing it when we have a workaround.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

>Audit-Trail:
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56143: Serial-line speed switch can corrupt "drained" output
Date: Wed, 05 May 2021 03:13:30 +0700

     Date:        Tue,  4 May 2021 15:45:01 +0000 (UTC)
     From:        Mouse <mouse@Rodents-Montreal.ORG>
     Message-ID:  <20210504154501.1790A1A9244@mollari.NetBSD.org>

   | 	I don't know why this is; ttywait_timo checks TS_BUSY as well
   | 	as t_outq.c_cc.  Perhaps the com driver doesn't set TS_BUSY
   | 	entirely correctly?

 More likely the hardware is saying that it has transmitted the
 character when all it has really done is move it to its shift
 register (and so the output character buffer is free for the next).

 The driver would need to test deeper hardware state to tell if the
 hardware has actually finished transmitting the character, and is
 fully idle.   I'm not even sure if all hardware has a way to return
 that kind of state (but I semi-recall - it has been a very long time -
 that some hardware does provide that info).   It would need special case
 handling in each driver for this ioctl to get this right though.

 kre


From: Mouse <mouse@Rodents-Montreal.ORG>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56143: Serial-line speed switch can corrupt "drained" output
Date: Tue, 4 May 2021 16:53:13 -0400 (EDT)

 >> I don't know why this is; ttywait_timo checks TS_BUSY as well as
 >> t_outq.c_cc.  Perhaps the com driver doesn't set TS_BUSY entirely
 >> correctly?
 > More likely the hardware is saying that it has transmitted the
 > character when all it has really done is move it to its shift
 > register (and so the output character buffer is free for the next).

 Possibly.  Then it's either "the hardware lies" or "the driver turns
 off TS_BUSY before it should", I guess.

 I suspect it's the latter, in that com.c turns of TS_BUSY
 unconditionally in com_txsoft, which is probably "FIFO is below
 low-water mark (or moral equivalent, if tiny/no FIFO)" rather than a
 true "transmitter is idle".  Looking at comreg.h, I see LSR_TSRE versus
 LSR_TXRDY, but I don't see anything obvious permitting interrupts on
 LSR_TSRE going active.  It would be ugly to have to poll LSR_TSRE, but
 if it's the only way to get TS_BUSY right....

 > The driver would need to test deeper hardware state to tell if the
 > hardware has actually finished transmitting the character, and is
 > fully idle.

 The (few) serial-line chips I've looked at things like register bits
 for have had "transmitter is idle" state bits, like LSR_TSRE above, and
 usually interrupt generation logic, which is documented as being what
 we want here.  The 16550A is not among the chips I know, though, and,
 well, see above.  There's also the question of LSR_TSRE tells the real
 truth.

 > I'm not even sure if all hardware has a way to return that kind of
 > state (but I semi-recall - it has been a very long time - that some
 > hardware does provide that info).   It would need special case
 > handling in each driver for this ioctl to get this right though.

 Well, it seems to me that it's no more special-case handling than the
 driver must already be doing to pilot the hardware.

 As for doing something "for this ioctl", by the time they hit the
 driver, TIOCSETA, TIOCSETAF, and TIOCSETAW aren't ioctls any longer,
 and actually aren't even calls into the driver except for a call
 through tp->t_param, unless I've missed something.  Look at kern/tty.c,
 starting with line 1123 in the 9.1 version (tty.c,v 1.281).

 					Mouse

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.