NetBSD Problem Report #57662
From www@netbsd.org Sun Oct 15 20:43:16 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 7529D1A9238
for <gnats-bugs@gnats.NetBSD.org>; Sun, 15 Oct 2023 20:43:16 +0000 (UTC)
Message-Id: <20231015204314.2B9A41A9239@mollari.NetBSD.org>
Date: Sun, 15 Oct 2023 20:43:14 +0000 (UTC)
From: als@thangorodrim.ch
Reply-To: als@thangorodrim.ch
To: gnats-bugs@NetBSD.org
Subject: StarTech ICUSB23208FD 8-Port USB-to-Serial Adapter Hub fails on Alix with NetBSD/i386 9.3
X-Send-Pr-Version: www-1.0
>Number: 57662
>Category: port-i386
>Synopsis: StarTech ICUSB23208FD 8-Port USB-to-Serial Adapter Hub fails on Alix with NetBSD/i386 9.3
>Confidential: no
>Severity: serious
>Priority: low
>Responsible: port-i386-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Oct 15 20:45:01 +0000 2023
>Last-Modified: Mon Nov 13 08:20:01 +0000 2023
>Originator: Alexander Schreiber
>Release:
>Organization:
>Environment:
NetBSD khazad-dum 9.3 NetBSD 9.3 (KHAZAD_DUM) #3: Wed Aug 17 18:46:46 UTC 2022 root@khazad-dum:/net/build/khazad-dum/obj/sys/arch/i386/compile/KHAZAD_DUM i386
>Description:
I have acquired a StarTech 8-Port USB-to-Serial Adapter Hub, model number ICUSB23208FD with the intent of setting up a serial console server. Choosen machine is an alix2f13 (https://www.pcengines.ch/alix2f13.htm, LX800 AMD Geode CPU, Pentium Class).
Upon plugging in, the USB-to-Serial Adapter Hub is recognized:
Oct 15 14:48:59 khazad-dum /netbsd: [ 14167442.4086486] fatal breakpoint trap in supervisor mode
Oct 15 14:48:59 khazad-dum /netbsd: [ 14167442.4086486] trap type 1 code 0 eip 0xc0114bf4 cs 0x8 eflags 0x200202 cr2 0xb83d7860 ilevel 0x8 esp 0xcc8a9f6c
Oct 15 14:48:59 khazad-dum /netbsd: [ 14167442.4086486] curlwp 0xc0e80020 pid 0 lid 2 lowest kstack 0xcca062c0
Oct 15 14:48:59 khazad-dum /netbsd: uhub2 at uhub1 port 1: Terminus Technology (0x1a40) USB 2.0 Hub [MTT] (0x101), class 9/0, rev 2.00/1.00, addr 2
Oct 15 14:48:59 khazad-dum /netbsd: [ 14404697.6109689] uhub2: multiple transaction translators
Oct 15 14:48:59 khazad-dum /netbsd: [ 14404697.6109689] uhub2: 4 ports with 4 removable, self powered
Oct 15 14:49:01 khazad-dum /netbsd: [ 14404698.9188070] uftdi0 at uhub2 port 2 configuration 1 interface 0
Oct 15 14:49:01 khazad-dum /netbsd: [ 14404698.9295400] uftdi0: FTDI (0x403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 3
Oct 15 14:49:01 khazad-dum /netbsd: [ 14404698.9409797] ucom0 at uftdi0 portno 1
Oct 15 14:49:01 khazad-dum /netbsd: [ 14404698.9409797] uftdi1 at uhub2 port 2 configuration 1 interface 1
Oct 15 14:49:01 khazad-dum /netbsd: [ 14404698.9488820] uftdi1: FTDI (0x403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 3
Oct 15 14:49:01 khazad-dum /netbsd: [ 14404698.9597156] ucom1 at uftdi1 portno 2
Oct 15 14:49:01 khazad-dum /netbsd: [ 14404698.9597156] uftdi2 at uhub2 port 2 configuration 1 interface 2
Oct 15 14:49:01 khazad-dum /netbsd: [ 14404698.9690499] uftdi2: FTDI (0x403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 3
Oct 15 14:49:01 khazad-dum /netbsd: [ 14404698.9690499] ucom2 at uftdi2 portno 3
Oct 15 14:49:01 khazad-dum /netbsd: [ 14404698.9815384] uftdi3 at uhub2 port 2 configuration 1 interface 3
Oct 15 14:49:01 khazad-dum /netbsd: [ 14404698.9888359] uftdi3: FTDI (0x403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 3
Oct 15 14:49:01 khazad-dum /netbsd: [ 14404698.9888359] ucom3 at uftdi3 portno 4
Oct 15 14:49:02 khazad-dum /netbsd: [ 14404699.9689657] uftdi4 at uhub2 port 3 configuration 1 interface 0
Oct 15 14:49:02 khazad-dum /netbsd: [ 14404699.9804221] uftdi4: FTDI (0x403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 4
Oct 15 14:49:02 khazad-dum /netbsd: [ 14404699.9918598] ucom4 at uftdi4 portno 1
Oct 15 14:49:02 khazad-dum /netbsd: [ 14404699.9918598] uftdi5 at uhub2 port 3 configuration 1 interface 1
Oct 15 14:49:02 khazad-dum /netbsd: [ 14404699.9992033] uftdi5: FTDI (0x403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 4
Oct 15 14:49:02 khazad-dum /netbsd: [ 14404700.0106410] ucom5 at uftdi5 portno 2
Oct 15 14:49:02 khazad-dum /netbsd: [ 14404700.0106410] uftdi6 at uhub2 port 3 configuration 1 interface 2
Oct 15 14:49:02 khazad-dum /netbsd: [ 14404700.0199490] uftdi6: FTDI (0x403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 4
Oct 15 14:49:02 khazad-dum /netbsd: [ 14404700.0293960] ucom6 at uftdi6 portno 3
Oct 15 14:49:02 khazad-dum /netbsd: [ 14404700.0293960] uftdi7 at uhub2 port 3 configuration 1 interface 3
Oct 15 14:49:02 khazad-dum /netbsd: [ 14404700.0399267] uftdi7: FTDI (0x403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 4
Oct 15 14:49:02 khazad-dum /netbsd: [ 14404700.0399267] ucom7 at uftdi7 portno 4
(and yes, the "fatal breakpoint trap in supervisor mode" smells)
I can then connect via "cu -l /dev/ttyU0 -s 115200" to an attached device,
and disconnect via the "~." sequence. This works 1-2 times and then any further attempts to do so only yield "cu: link down" and then cu exiting.
I've trace cu and the relevant section from the trace dumped with
"kdump -R" is:
12400 1 cu 0.000009779 CALL open(0xbfb4b77a,6,0)
12400 1 cu 0.000009779 NAMI "/dev/ttyU0"
12400 1 cu 12.994282345 RET open -1 errno 5 Input/output error
Unplugging the device results in the USB devices disconnecting with some
delay:
Oct 15 18:38:57 khazad-dum /netbsd: [ 14418495.3727440] ucom0: detached
Oct 15 18:38:57 khazad-dum /netbsd: [ 14418495.3727440] uftdi0: detached
Oct 15 18:38:57 khazad-dum /netbsd: [ 14418495.3727440] uftdi0: at uhub2 port 2 (addr 3) disconnected
Oct 15 18:38:59 khazad-dum /netbsd: [ 14418497.3830518] ucom1: detached
Oct 15 18:38:59 khazad-dum /netbsd: [ 14418497.3830518] uftdi1: detached
Oct 15 18:38:59 khazad-dum /netbsd: [ 14418497.3830518] uftdi1: at uhub2 port 2 (addr 3) disconnected
Oct 15 18:39:01 khazad-dum /netbsd: [ 14418499.3933598] ucom2: detached
Oct 15 18:39:01 khazad-dum /netbsd: [ 14418499.3933598] uftdi2: detached
Oct 15 18:39:01 khazad-dum /netbsd: [ 14418499.3933598] uftdi2: at uhub2 port 2 (addr 3) disconnected
Oct 15 18:39:03 khazad-dum /netbsd: [ 14418501.4036678] ucom3: detached
Oct 15 18:39:03 khazad-dum /netbsd: [ 14418501.4036678] uftdi3: detached
Oct 15 18:39:03 khazad-dum /netbsd: [ 14418501.4036678] uftdi3: at uhub2 port 2 (addr 3) disconnected
Oct 15 18:39:06 khazad-dum /netbsd: [ 14418504.4141291] ucom4: detached
Oct 15 18:39:06 khazad-dum /netbsd: [ 14418504.4141291] uftdi4: detached
Oct 15 18:39:06 khazad-dum /netbsd: [ 14418504.4141291] uftdi4: at uhub2 port 3 (addr 4) disconnected
Oct 15 18:39:08 khazad-dum /netbsd: [ 14418506.4244371] ucom5: detached
Oct 15 18:39:08 khazad-dum /netbsd: [ 14418506.4244371] uftdi5: detached
Oct 15 18:39:08 khazad-dum /netbsd: [ 14418506.4244371] uftdi5: at uhub2 port 3 (addr 4) disconnected
Oct 15 18:39:10 khazad-dum /netbsd: [ 14418508.4347451] ucom6: detached
Oct 15 18:39:10 khazad-dum /netbsd: [ 14418508.4347451] uftdi6: detached
Oct 15 18:39:10 khazad-dum /netbsd: [ 14418508.4347451] uftdi6: at uhub2 port 3 (addr 4) disconnected
Oct 15 18:39:12 khazad-dum /netbsd: [ 14418510.4450534] ucom7: detached
Oct 15 18:39:12 khazad-dum /netbsd: [ 14418510.4450534] uftdi7: detached
Oct 15 18:39:12 khazad-dum /netbsd: [ 14418510.4450534] uftdi7: at uhub2 port 3 (addr 4) disconnected
Oct 15 18:39:15 khazad-dum /netbsd: [ 14418513.4555144] uhub2: detached
Oct 15 18:39:15 khazad-dum /netbsd: [ 14418513.4555144] uhub2: at uhub1 port 1 (addr 2) disconnected
Plugging it back in only yields an error after some time:
Oct 15 18:40:43 khazad-dum /netbsd: [ 14418601.0689389] uhub1: autoconfiguration error: device problem, disabling port 1
I've reinstalled the same machine with Linux (Debian bookworm) and it
works with no apparent issues.
relevant dmesg snippet from that setup (mostly for h/w id):
[ 46.587001] usbserial: USB Serial support registered for FTDI USB Serial Device
[ 46.630146] ftdi_sio 2-2:1.0: FTDI USB Serial Device converter detected
[ 46.650945] usb 2-2: Detected FT232R
[ 46.668978] usb 2-2: FTDI USB Serial Device converter now attached to ttyUSB0
[ 46.704109] ftdi_sio 1-1.2:1.0: FTDI USB Serial Device converter detected
[ 46.927654] usb 1-1.2: Detected FT4232H
[ 46.944386] usb 1-1.2: FTDI USB Serial Device converter now attached to ttyUSB1
[ 46.964563] ftdi_sio 1-1.2:1.1: FTDI USB Serial Device converter detected
[ 46.987101] usb 1-1.2: Detected FT4232H
[ 46.998834] usb 1-1.2: FTDI USB Serial Device converter now attached to ttyUSB2
[ 47.023395] ftdi_sio 1-1.2:1.2: FTDI USB Serial Device converter detected
[ 47.043118] usb 1-1.2: Detected FT4232H
[ 47.054859] usb 1-1.2: FTDI USB Serial Device converter now attached to ttyUSB3
[ 47.079375] ftdi_sio 1-1.2:1.3: FTDI USB Serial Device converter detected
[ 47.099113] usb 1-1.2: Detected FT4232H
[ 47.111007] usb 1-1.2: FTDI USB Serial Device converter now attached to ttyUSB4
[ 47.134324] ftdi_sio 1-1.3:1.0: FTDI USB Serial Device converter detected
[ 47.154893] usb 1-1.3: Detected FT4232H
[ 47.170720] usb 1-1.3: FTDI USB Serial Device converter now attached to ttyUSB5
[ 47.191453] ftdi_sio 1-1.3:1.1: FTDI USB Serial Device converter detected
[ 47.211397] usb 1-1.3: Detected FT4232H
[ 47.226716] usb 1-1.3: FTDI USB Serial Device converter now attached to ttyUSB6
[ 47.246908] ftdi_sio 1-1.3:1.2: FTDI USB Serial Device converter detected
[ 47.266903] usb 1-1.3: Detected FT4232H
[ 47.282737] usb 1-1.3: FTDI USB Serial Device converter now attached to ttyUSB7
[ 47.303569] ftdi_sio 1-1.3:1.3: FTDI USB Serial Device converter detected
[ 47.322895] usb 1-1.3: Detected FT4232H
[ 47.338747] usb 1-1.3: FTDI USB Serial Device converter now attached to ttyUSB8
I've tried the same USB-to-Serial Adapter Hub on an APU (https://www.pcengines.ch/apu2.htm, AMD GX-412TC, amd64) running NetBSD/amd64 9.3 and it works without any apparent problems.
I currently have no other machine types running NetBSD/i386 to crosstest.
So it looks like this is confined to the combination of
- PCEngines Alix
- NetBSD/i386
I do not realistically expect a fix and this PR is just intended as documentation if someone else wants to use that device combo.
Note: while the NetBSD logs come from a machine running a (slightly, mostly
enabling diagnostics and dropping unneeded stuff) custom kernel, the initial installation having the same issues ran GENERIC from a fresh install.
>How-To-Repeat:
- acquire PEngines Alix
- install NetBSD/i386 9.3
- acquire StarTech ICUSB23208FD 8-Port USB-to-Serial Adapter Hub
- plug above device into Alix
- try to use the above device as a serial console interface
(e.g.: "cu -l /dev/ttyU8 -s 115200")
>Fix:
none
>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Sun, 15 Oct 2023 23:31:07 +0200
On Sun, Oct 15, 2023 at 08:45:01PM +0000, als@thangorodrim.ch wrote:
> [...]
>
> I can then connect via "cu -l /dev/ttyU0 -s 115200" to an attached device,
> and disconnect via the "~." sequence. This works 1-2 times and then any further attempts to do so only yield "cu: link down" and then cu exiting.
>
I've seen this with other USB a non-USB serial ports.
Usually this is because the serial port is not completely closed (cu usually
has 2 processes running, one may be left over when exiting).
killing the leftover processes allows a new cu session to start
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Alexander Schreiber <als@thangorodrim.ch>
To: gnats-bugs@netbsd.org
Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Mon, 16 Oct 2023 01:51:48 +0200
On Sun, Oct 15, 2023 at 09:35:02PM +0000, Manuel Bouyer wrote:
> The following reply was made to PR port-i386/57662; it has been noted by GNATS.
>
> From: Manuel Bouyer <bouyer@antioche.eu.org>
> To: gnats-bugs@netbsd.org
> Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org
> Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
> Adapter Hub fails on Alix with NetBSD/i386 9.3
> Date: Sun, 15 Oct 2023 23:31:07 +0200
>
> On Sun, Oct 15, 2023 at 08:45:01PM +0000, als@thangorodrim.ch wrote:
> > [...]
> >
> > I can then connect via "cu -l /dev/ttyU0 -s 115200" to an attached device,
> > and disconnect via the "~." sequence. This works 1-2 times and then any further attempts to do so only yield "cu: link down" and then cu exiting.
> >
>
> I've seen this with other USB a non-USB serial ports.
> Usually this is because the serial port is not completely closed (cu usually
> has 2 processes running, one may be left over when exiting).
> killing the leftover processes allows a new cu session to start
The fact that:
- upon device plug-in, I get a "fatal breakpoint trap in supervisor mode"
- upon removing, waiting a few min and plugging the device back
in I get "uhub1: autoconfiguration error: device problem, disabling port 1"
tells me that this might not be the problem.
I suspect either some state corruption in the kernel driver or (actually
more likely) incorrect handling of device quirks (quirk being a polite
way of saying the device does something broken). This might be a combination
of USB host issues (it appears to work fine on a different h/w with
amd64) and NetBSD not having as much developer time to deal with devices
behaving oddly (it works with Linux on the same h/w).
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison
From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Sun, 15 Oct 2023 21:28:05 -0400
On Mon, 16 Oct 2023 at 01:51:48 +0200, Alexander Schreiber wrote:
>The fact that:
> - upon device plug-in, I get a "fatal breakpoint trap in supervisor mode"
> - upon removing, waiting a few min and plugging the device back
> in I get "uhub1: autoconfiguration error: device problem, disabling por=
t 1"
>tells me that this might not be the problem.
>
>I suspect either some state corruption in the kernel driver or (actually
>more likely) incorrect handling of device quirks (quirk being a polite
>way of saying the device does something broken). This might be a combinati=
on
>of USB host issues (it appears to work fine on a different h/w with
>amd64) and NetBSD not having as much developer time to deal with devices
>behaving oddly (it works with Linux on the same h/w).
You don't mention what USB controller details are specific to this
hardware in question (e.g., dmesg excerpt of that part). Separately,
have you tried a -current kernel?
While I don't know if there's an overlap here, there have been issues
(with memory fragmentation) that can cause USB devices to not work in
some contexts (on some hardware). E.g., if I plug a USB 3 external
hard drive into a Raspberry Pi 3B+ on 9.3_STABLE, it fails to attach,
whereas there are no such issues on a typical amd64 laptop also
running 9.3_STABLE (different arch, different USB controller, 8x
difference in RAM). See for instance
http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=3D54977
Regards,
Dave
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Alexander Schreiber <als@thangorodrim.ch>
Cc: gnats-bugs@netbsd.org, port-i386-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Mon, 16 Oct 2023 09:01:17 +0200
On Mon, Oct 16, 2023 at 01:51:48AM +0200, Alexander Schreiber wrote:
> On Sun, Oct 15, 2023 at 09:35:02PM +0000, Manuel Bouyer wrote:
> > The following reply was made to PR port-i386/57662; it has been noted by GNATS.
> >
> > From: Manuel Bouyer <bouyer@antioche.eu.org>
> > To: gnats-bugs@netbsd.org
> > Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
> > netbsd-bugs@netbsd.org
> > Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
> > Adapter Hub fails on Alix with NetBSD/i386 9.3
> > Date: Sun, 15 Oct 2023 23:31:07 +0200
> >
> > On Sun, Oct 15, 2023 at 08:45:01PM +0000, als@thangorodrim.ch wrote:
> > > [...]
> > >
> > > I can then connect via "cu -l /dev/ttyU0 -s 115200" to an attached device,
> > > and disconnect via the "~." sequence. This works 1-2 times and then any further attempts to do so only yield "cu: link down" and then cu exiting.
> > >
> >
> > I've seen this with other USB a non-USB serial ports.
> > Usually this is because the serial port is not completely closed (cu usually
> > has 2 processes running, one may be left over when exiting).
> > killing the leftover processes allows a new cu session to start
>
> The fact that:
> - upon device plug-in, I get a "fatal breakpoint trap in supervisor mode"
this I don't get. This should cause a panic or a drop to ddb,
but in any case the kernel shouldn't keep running
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Alexander Schreiber <als@thangorodrim.ch>
To: gnats-bugs@netbsd.org
Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Mon, 16 Oct 2023 09:14:45 +0200
On Mon, Oct 16, 2023 at 01:30:02AM +0000, David H. Gutteridge wrote:
> The following reply was made to PR port-i386/57662; it has been noted by GNATS.
>
> From: "David H. Gutteridge" <david@gutteridge.ca>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
> Adapter Hub fails on Alix with NetBSD/i386 9.3
> Date: Sun, 15 Oct 2023 21:28:05 -0400
>
> On Mon, 16 Oct 2023 at 01:51:48 +0200, Alexander Schreiber wrote:
> >The fact that:
> > - upon device plug-in, I get a "fatal breakpoint trap in supervisor mode"
> > - upon removing, waiting a few min and plugging the device back
> > in I get "uhub1: autoconfiguration error: device problem, disabling por=
> t 1"
> >tells me that this might not be the problem.
> >
> >I suspect either some state corruption in the kernel driver or (actually
> >more likely) incorrect handling of device quirks (quirk being a polite
> >way of saying the device does something broken). This might be a combinati=
> on
> >of USB host issues (it appears to work fine on a different h/w with
> >amd64) and NetBSD not having as much developer time to deal with devices
> >behaving oddly (it works with Linux on the same h/w).
>
> You don't mention what USB controller details are specific to this
> hardware in question (e.g., dmesg excerpt of that part). Separately,
> have you tried a -current kernel?
I haven't tried -current, no.
As for machine info, here is the full dmesg:
[ 1.000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
[ 1.000000] 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
[ 1.000000] 2018, 2019, 2020, 2021, 2022
[ 1.000000] The NetBSD Foundation, Inc. All rights reserved.
[ 1.000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[ 1.000000] The Regents of the University of California. All rights reserved.
[ 1.000000] NetBSD 9.3 (KHAZAD_DUM) #3: Wed Aug 17 18:46:46 UTC 2022
[ 1.000000] root@khazad-dum.angband.thangorodrim.de:/net/build/khazad-dum/obj/sys/arch/i386/compile/KHAZAD_DUM
[ 1.000000] total memory = 255 MB
[ 1.000000] avail memory = 242 MB
[ 1.000000] rnd: seeded with 256 bits
[ 1.000000] timecounter: Timecounters tick every 10.000 msec
[ 1.000000] Kernelized RAIDframe activated
[ 1.000000] running cgd selftest aes-xts-256 aes-xts-512 done
[ 1.000000] timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
[ 1.000003] Generic PC
[ 1.000003] mainbus0 (root)
[ 1.000003] Firmware Error (ACPI): A valid RSDP was not found (20190405/tbxfroot-261)
[ 1.000003] autoconfiguration error: acpi_probe: failed to initialize tables
[ 1.000003] ACPI Error: Could not remove SCI handler (20190405/evmisc-312)
[ 1.000003] cpu0 at mainbus0
[ 1.000003] cpu0: Geode(TM) Integrated Processor by AMD PCS, id 0x5a2
[ 1.000003] cpu0: package 0, core 0, smt 0
[ 1.000003] pci0 at mainbus0 bus 0: configuration mode 1
[ 1.000003] pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
[ 1.000003] pchb0 at pci0 dev 1 function 0: vendor 1022 product 2080 (rev. 0x33)
[ 1.000003] glxsb0 at pci0 dev 1 function 2: RNG AES
[ 1.000003] vr0 at pci0 dev 9 function 0: vendor 1106 product 3053 (rev. 0x96)
[ 1.000003] vr0: interrupting at irq 10
[ 1.000003] vr0: Ethernet address 00:0d:b9:19:a3:44
[ 1.000003] ukphy0 at vr0 phy 1: OUI 0x0002c6, model 0x0034, rev. 3
[ 1.000003] ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
[ 1.000003] vr1 at pci0 dev 11 function 0: vendor 1106 product 3053 (rev. 0x96)
[ 1.000003] vr1: interrupting at irq 15
[ 1.000003] vr1: Ethernet address 00:0d:b9:19:a3:45
[ 1.000003] ukphy1 at vr1 phy 1: OUI 0x0002c6, model 0x0034, rev. 3
[ 1.000003] ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
[ 1.000003] gcscpcib0 at pci0 dev 15 function 0: vendor 1022 product 2090 (rev. 0x03)
[ 1.000003] timecounter: Timecounter "gcscpcib0" frequency 3579545 Hz quality 1000
[ 1.006160] gcscpcib0: Watchdog Timer via MFGPT0, GPIO
[ 1.006160] gpio0 at gcscpcib0: 32 pins
[ 1.006160] viaide0 at pci0 dev 15 function 2: AMD CS5536 IDE Controller (rev. 0x01)
[ 1.006160] viaide0: bus-master DMA support present
[ 1.006160] viaide0: primary channel wired to compatibility mode
[ 1.006160] viaide0: primary channel interrupting at irq 14
[ 1.006160] atabus0 at viaide0 channel 0
[ 1.006160] viaide0: secondary channel wired to compatibility mode
[ 1.006160] viaide0: secondary channel ignored (disabled)
[ 1.006160] ohci0 at pci0 dev 15 function 4: vendor 1022 product 2094 (rev. 0x02)
[ 1.006160] ohci0: interrupting at irq 12
[ 1.006160] ohci0: OHCI version 1.0, legacy support
[ 1.006160] usb0 at ohci0: USB revision 1.0
[ 1.006160] gcscehci0 at pci0 dev 15 function 5: vendor 1022 product 2095 (rev. 0x02)
[ 1.006160] gcscehci0: interrupting at irq 12
[ 1.006160] gcscehci0: EHCI version 1.0
[ 1.006160] gcscehci0: 1 companion controller, 4 ports: ohci0
[ 1.006160] usb1 at gcscehci0: USB revision 2.0
[ 1.006160] isa0 at gcscpcib0
[ 1.006160] com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
[ 1.006160] com0: console
[ 1.006160] com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
[ 1.006160] attimer0 at isa0 port 0x40-0x43
[ 1.006160] pcppi0 at isa0 port 0x61
[ 1.006160] midi0 at pcppi0: PC speaker
[ 1.006160] sysbeep0 at pcppi0
[ 1.006160] attimer0: attached to pcppi0
[ 1.006160] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
[ 1.599963] uhub0 at usb0: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
[ 1.610805] uhub0: 4 ports with 4 removable, self powered
[ 1.610805] uhub1 at usb1: NetBSD (0000) EHCI root hub (0000), class 9/0, rev 2.00/1.00, addr 1
[ 1.610805] uhub1: 4 ports with 4 removable, self powered
[ 1.619966] IPsec: Initialized Security Association Processing.
[ 4.590009] wd0 at atabus0 drive 0
[ 4.590009] wd0: <SanDisk SDCFX3-008G>
[ 4.602862] wd0: drive supports 4-sector PIO transfers, LBA addressing
[ 4.602862] wd0: 7815 MB, 15880 cyl, 16 head, 63 sec, 512 bytes/sect x 16007040 sectors
[ 4.610660] wd0: 32-bit data port
[ 4.610660] wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)
[ 4.610660] wd0(viaide0:0:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA)
[ 4.620014] WARNING: 1 error while detecting hardware; check system log.
[ 4.630593] boot device: wd0
[ 4.630593] root on wd0a dumps on wd0b
[ 4.640012] root file system type: ffs
[ 4.640012] kern.module.path=/stand/i386/9.3/modules
>
> While I don't know if there's an overlap here, there have been issues
> (with memory fragmentation) that can cause USB devices to not work in
> some contexts (on some hardware). E.g., if I plug a USB 3 external
> hard drive into a Raspberry Pi 3B+ on 9.3_STABLE, it fails to attach,
> whereas there are no such issues on a typical amd64 laptop also
> running 9.3_STABLE (different arch, different USB controller, 8x
> difference in RAM). See for instance
> http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=3D54977
Invalid PR number, I assume you meant 54977? And while I can see memory
fragmentation being an issue with block devices, I wouldn't expect a
serial device to need that much memory to run into it, especially on
a freshly booted machine that runs little else (sshd, ntpd). Although
yes, this machine only does have 256M of RAM, so .. maybe?
Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison
From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Mon, 23 Oct 2023 00:02:02 -0400
On Mon, 16 Oct 2023 at 09:14:45 +0200, Alexander Schreiber wrote:
[...]
>Invalid PR number, I assume you meant 54977? And while I can see memory
>fragmentation being an issue with block devices, I wouldn't expect a
>serial device to need that much memory to run into it, especially on
>a freshly booted machine that runs little else (sshd, ntpd). Although
>yes, this machine only does have 256M of RAM, so .. maybe?
Yes, that's Gnats not handling escaped/encoded content correctly,
should indeed be 54977.
There's perhaps more than one issue occurring here. While the final
"autoconfiguration" error you mentioned is quite generic, part of why I
mentioned the potential overlap is because that's the same symptom I
see when I'm able to initially mount a umass device successfully upon
initial boot, but some time later, the same device will be rejected
with that error. Could be completely different causes, of course.
I'd suggest testing with a -current kernel. (Sorry, I don't have any
particularly useful insights here; I do use USB-serial adapters, but
nothing quite like that.)
Regards,
Dave
From: Alexander Schreiber <als@thangorodrim.ch>
To: gnats-bugs@netbsd.org
Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Mon, 23 Oct 2023 21:34:56 +0200
On Mon, Oct 23, 2023 at 04:05:01AM +0000, David H. Gutteridge wrote:
> The following reply was made to PR port-i386/57662; it has been noted by GNATS.
>
> From: "David H. Gutteridge" <david@gutteridge.ca>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
> Adapter Hub fails on Alix with NetBSD/i386 9.3
> Date: Mon, 23 Oct 2023 00:02:02 -0400
>
> On Mon, 16 Oct 2023 at 09:14:45 +0200, Alexander Schreiber wrote:
> [...]
> >Invalid PR number, I assume you meant 54977? And while I can see memory
> >fragmentation being an issue with block devices, I wouldn't expect a
> >serial device to need that much memory to run into it, especially on
> >a freshly booted machine that runs little else (sshd, ntpd). Although
> >yes, this machine only does have 256M of RAM, so .. maybe?
>
> Yes, that's Gnats not handling escaped/encoded content correctly,
> should indeed be 54977.
>
> There's perhaps more than one issue occurring here. While the final
> "autoconfiguration" error you mentioned is quite generic, part of why I
> mentioned the potential overlap is because that's the same symptom I
> see when I'm able to initially mount a umass device successfully upon
> initial boot, but some time later, the same device will be rejected
> with that error. Could be completely different causes, of course.
>
> I'd suggest testing with a -current kernel. (Sorry, I don't have any
> particularly useful insights here; I do use USB-serial adapters, but
> nothing quite like that.)
And I just did, I grabbed NetBSD 10.0_BETA i386 boot iso from:
http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-10/202310212330Z/i386/installation/cdrom/boot.iso
and installed it on another one of those Alix machines I've got kicking
around. Somewhat similar behaviour. I can do a few sessions with cu
(3-5 seems to be typical) and then, with the next session, cu just hangs
(sitting in status D, no output and so far I've been willing to wait a few
min but to no avail).
top says:
CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Memory: 57M Act, 11M Wired, 13M Exec, 31M File, 138M Free
Swap: 512M Total, 512M Free / Pools: 22M Used
and again, nothing in dmesg.
uname -a says:
NetBSD machine-name 10.0_BETA NetBSD 10.0_BETA (GENERIC) #0: Sat Oct 21 13:01:00 UTC 2023 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/i386/compile/GENERIC i386
Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison
From: matthew green <mrg@eterna.com.au>
To: Alexander Schreiber <als@thangorodrim.ch>
Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Tue, 24 Oct 2023 07:40:43 +1100
Alexander Schreiber writes:
> And I just did, I grabbed NetBSD 10.0_BETA i386 boot iso from:
>
> http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-10/202310212330Z/i386/in=
stallation/cdrom/boot.iso
>
> and installed it on another one of those Alix machines I've got kicking
> around. Somewhat similar behaviour. I can do a few sessions with cu
> (3-5 seems to be typical) and then, with the next session, cu just hangs
> (sitting in status D, no output and so far I've been willing to wait a f=
ew
> min but to no avail).
>
> top says:
>
> CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100%=
idle
>
> Memory: 57M Act, 11M Wired, 13M Exec, 31M File, 138M Free
> Swap: 512M Total, 512M Free / Pools: 22M Used
>
> and again, nothing in dmesg. =
so cu(1) is hung, let's find out why. you can use crash(8)
to do this fairly easily.
# crash
crash> ps|grep cu
26171 26171 3 1 180 ffffa11149cde640 cu poll
crash> bt/a ffffa11149cde640
trace: pid 26171 lid 26171 at 0xffffac019fc24d00
sleepq_block() at sleepq_block+0x15e
sel_do_scan() at sel_do_scan+0x62a
pollcommon() at pollcommon+0xd2
sys_poll() at sys_poll+0x57
syscall() at syscall+0x1ae
--- syscall (number 209) ---
syscall+0x1ae:
where the argument to "bt/a" is the address ps provided.
in my example, cu(1) is sitting in poll() like i'd expect,
can you show what your cu(1) backtraces are?
from the back trace we'll have an idea on the next step.
thanks.
.mrg.
From: Alexander Schreiber <als@thangorodrim.ch>
To: matthew green <mrg@eterna.com.au>
Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Wed, 25 Oct 2023 01:06:51 +0200
On Tue, Oct 24, 2023 at 07:40:43AM +1100, matthew green wrote:
> Alexander Schreiber writes:
> > And I just did, I grabbed NetBSD 10.0_BETA i386 boot iso from:
> >
> > http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-10/202310212330Z/i386/installation/cdrom/boot.iso
> >
> > and installed it on another one of those Alix machines I've got kicking
> > around. Somewhat similar behaviour. I can do a few sessions with cu
> > (3-5 seems to be typical) and then, with the next session, cu just hangs
> > (sitting in status D, no output and so far I've been willing to wait a few
> > min but to no avail).
> >
> > top says:
> >
> > CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
> >
> > Memory: 57M Act, 11M Wired, 13M Exec, 31M File, 138M Free
> > Swap: 512M Total, 512M Free / Pools: 22M Used
> >
> > and again, nothing in dmesg.
>
> so cu(1) is hung, let's find out why. you can use crash(8)
> to do this fairly easily.
>
> # crash
> crash> ps|grep cu
> 26171 26171 3 1 180 ffffa11149cde640 cu poll
> crash> bt/a ffffa11149cde640
> trace: pid 26171 lid 26171 at 0xffffac019fc24d00
> sleepq_block() at sleepq_block+0x15e
> sel_do_scan() at sel_do_scan+0x62a
> pollcommon() at pollcommon+0xd2
> sys_poll() at sys_poll+0x57
> syscall() at syscall+0x1ae
> --- syscall (number 209) ---
> syscall+0x1ae:
>
> where the argument to "bt/a" is the address ps provided.
> in my example, cu(1) is sitting in poll() like i'd expect,
> can you show what your cu(1) backtraces are?
>
> from the back trace we'll have an idea on the next step.
Did that. Interestingly, what I did this time:
- did one cu session, terminated it normally
- got distracted for at least an hour
- did another cu session, that promptly hung
invoked crash:
console# crash
Crash version 10.0_BETA, image version 10.0_BETA.
Kernel compiled without options LOCKDEBUG.
Output from a running system is unreliable.
crash> ps|grep cu
1524 1524 3 0 1000000 c260c640 cu usbxfer
0 17 3 0 200 c1eaa100 lnxrcugc lnxrcugc
crash> bt/a c260c640
trace: pid 1524 lid 1524 at 0xce80ab18
sleepq_block(0,0,c14afda4,c14afda4,0,c1d43100,c1ec1808,c1d43100,c1f0ce80,c1d4310
0) at sleepq_block+0x138
cv_wait(c1edbe84,c1ec1808,c1edbe78,1ee4c25,0,2,23d6c,0,ce80abf8,ce80abf8) at cv_
wait+0xc6
usbd_transfer(c1edbe78,0,0,0,ce80abb0,0,8,8,8,c1edbe78) at usbd_transfer+0x15c
usbd_do_request_len(c1f0ce00,ce80abf8,0,0,0,0,1388,ce80ac3c,c03ed0df,c1f0ce00) a
t usbd_do_request_len+0xad
usbd_do_request(c1f0ce00,ce80abf8,0,40,1,0,4af,c1eaf00c,dd313180,ce80ac70) at us
bd_do_request+0x3f
uftdi_open(c1f14340,1,0,c1ef5c00,0,0,4200,1841,0,c1ef5c00) at uftdi_open+0x4e
ucomopen(4200,0,7,2000,c260c640,c0d28fe2,c22d17c0,0,c1f153c0,100) at ucomopen+0x
307
cdev_open(4200,0,7,2000,c260c640,d,c22db040,42,c22b8dc0,0) at cdev_open+0x11c
spec_open(ce80ad90,0,c116aed8,c22db040,7,c2cf8640,7,c1fc7000,ce80ae9c,c0d18b64)
at spec_open+0x1ef
VOP_OPEN(c22db040,7,c2cf8640,7,c22db040,c2cf8640,c22db040,0,c1f1a590,c1fdd000) a
t VOP_OPEN+0x3e
vn_open(0,c1f1a590,10,7,0,ce80aed0,ce80aecb,ce80aed4,c1f1a590,c260c640) at vn_op
en+0x2ff
do_open(c260c640,0,c1f1a590,6,0,ce80af34,c260b5c0,0,c1f1a590,c260c640) at do_ope
n+0xa1
do_sys_openat(6,0,ce80af34,c14af694,ce80af9c,c049b4c6,c260c640,ce80af68,ce80af60
,c1f09b00) at do_sys_openat+0x72
sys_open(c260c640,ce80af68,ce80af60,c1f09b00,0,5,ce80af60,ce80af68,0,0) at sys_o
pen+0x2c
syscall() at syscall+0x1d6
--- syscall (number 5) ---
b8f54857:
I hope that is helpful. Let me know what else I can do to help.
Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison
From: matthew green <mrg@eterna.com.au>
To: Alexander Schreiber <als@thangorodrim.ch>
Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Thu, 26 Oct 2023 08:45:34 +1100
> Did that. Interestingly, what I did this time:
> - did one cu session, terminated it normally
> - got distracted for at least an hour
> - did another cu session, that promptly hung
>
> invoked crash:
>
> console# crash
> Crash version 10.0_BETA, image version 10.0_BETA.
> Kernel compiled without options LOCKDEBUG.
> Output from a running system is unreliable.
> crash> ps|grep cu
> 1524 1524 3 0 1000000 c260c640 cu usbxfer
> 0 17 3 0 200 c1eaa100 lnxrcugc lnxrcug=
c
> crash> bt/a c260c640
> trace: pid 1524 lid 1524 at 0xce80ab18
> sleepq_block(0,0,c14afda4,c14afda4,0,c1d43100,c1ec1808,c1d43100,c1f0ce80=
,c1d43100) at sleepq_block+0x138
> cv_wait(c1edbe84,c1ec1808,c1edbe78,1ee4c25,0,2,23d6c,0,ce80abf8,ce80abf8=
) at cv_wait+0xc6
> usbd_transfer(c1edbe78,0,0,0,ce80abb0,0,8,8,8,c1edbe78) at usbd_transfer=
+0x15c
> usbd_do_request_len(c1f0ce00,ce80abf8,0,0,0,0,1388,ce80ac3c,c03ed0df,c1f=
0ce00) at usbd_do_request_len+0xad
> usbd_do_request(c1f0ce00,ce80abf8,0,40,1,0,4af,c1eaf00c,dd313180,ce80ac7=
0) at usbd_do_request+0x3f
> uftdi_open(c1f14340,1,0,c1ef5c00,0,0,4200,1841,0,c1ef5c00) at uftdi_open=
+0x4e
> ucomopen(4200,0,7,2000,c260c640,c0d28fe2,c22d17c0,0,c1f153c0,100) at uco=
mopen+0x307
[ ... ]
>
> I hope that is helpful. Let me know what else I can do to help.
OK, so seems like a transfer isn't completing like it should.
can you build a kernel with USB_DEBUG, UFTDI_DEBUG, which ever of
XHCI_DEBUG / UHCI_DEBUG / OHCI_DEBUG / EHCI_DEBUG the ufdti is
attached to, reboot without them plugged in, set the various
sysctl that will be eg hw.usb.debug / hw.xhci.debug / etc, to 1,
plug the devies in and try to trigger a hang. when it hangs, use
"vmstat -u usbhist" to obtain the usb history which should provide
a bunch of useful data to what has happened.
usbd_xfer_schedule_timeout() should have setup a timeout for this,
the default would be 5 i think with this via usbd_do_request().
.mrg.
From: Alexander Schreiber <als@thangorodrim.ch>
To: matthew green <mrg@eterna.com.au>
Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Fri, 27 Oct 2023 22:41:27 +0200
On Thu, Oct 26, 2023 at 08:45:34AM +1100, matthew green wrote:
> > Did that. Interestingly, what I did this time:
> > - did one cu session, terminated it normally
> > - got distracted for at least an hour
> > - did another cu session, that promptly hung
> >
> > invoked crash:
> >
> > console# crash
> > Crash version 10.0_BETA, image version 10.0_BETA.
> > Kernel compiled without options LOCKDEBUG.
> > Output from a running system is unreliable.
> > crash> ps|grep cu
> > 1524 1524 3 0 1000000 c260c640 cu usbxfer
> > 0 17 3 0 200 c1eaa100 lnxrcugc lnxrcugc
> > crash> bt/a c260c640
> > trace: pid 1524 lid 1524 at 0xce80ab18
> > sleepq_block(0,0,c14afda4,c14afda4,0,c1d43100,c1ec1808,c1d43100,c1f0ce80,c1d43100) at sleepq_block+0x138
> > cv_wait(c1edbe84,c1ec1808,c1edbe78,1ee4c25,0,2,23d6c,0,ce80abf8,ce80abf8) at cv_wait+0xc6
> > usbd_transfer(c1edbe78,0,0,0,ce80abb0,0,8,8,8,c1edbe78) at usbd_transfer+0x15c
> > usbd_do_request_len(c1f0ce00,ce80abf8,0,0,0,0,1388,ce80ac3c,c03ed0df,c1f0ce00) at usbd_do_request_len+0xad
> > usbd_do_request(c1f0ce00,ce80abf8,0,40,1,0,4af,c1eaf00c,dd313180,ce80ac70) at usbd_do_request+0x3f
> > uftdi_open(c1f14340,1,0,c1ef5c00,0,0,4200,1841,0,c1ef5c00) at uftdi_open+0x4e
> > ucomopen(4200,0,7,2000,c260c640,c0d28fe2,c22d17c0,0,c1f153c0,100) at ucomopen+0x307
> [ ... ]
> >
> > I hope that is helpful. Let me know what else I can do to help.
>
> OK, so seems like a transfer isn't completing like it should.
>
> can you build a kernel with USB_DEBUG, UFTDI_DEBUG, which ever of
> XHCI_DEBUG / UHCI_DEBUG / OHCI_DEBUG / EHCI_DEBUG the ufdti is
> attached to, reboot without them plugged in, set the various
> sysctl that will be eg hw.usb.debug / hw.xhci.debug / etc, to 1,
> plug the devies in and try to trigger a hang. when it hangs, use
> "vmstat -u usbhist" to obtain the usb history which should provide
> a bunch of useful data to what has happened.
>
> usbd_xfer_schedule_timeout() should have setup a timeout for this,
> the default would be 5 i think with this via usbd_do_request().
Ok, done.
This time it managed to get stuck right at the second connection attempt:
- establish first connection: cu -l /dev/ttyU0 -s 115200
--> I get the expected "Connected" and can interact with the
attached serial console, then I disconnect
- establish second connection: cu -l /dev/ttyU1 -s 115200
--> no output from cu, no response to the ~. disconnect sequence
--> wait a minute or so, then vmstat -u usbhist output dumped to file
diff -urN ./sys/arch/i386/conf/GENERIC ./sys/arch/i386/conf/CONSOLE
--- ./sys/arch/i386/conf/GENERIC 2023-10-09 23:33:46.000000000 +0000
+++ ./sys/arch/i386/conf/CONSOLE 2023-10-26 12:50:55.582874835 +0000
@@ -1519,3 +1519,12 @@
options PAX_MPROTECT=1 # PaX mprotect(2) restrictions
options PAX_MPROTECT_DEBUG=1 # PaX mprotect debug
options PAX_ASLR=1 # PaX Address Space Layout Randomization
+
+options USB_DEBUG
+options UFTDI_DEBUG
+options UHCI_DEBUG
+options XHCI_DEBUG
+options OHCI_DEBUG
+options EHCI_DEBUG
+
sysctl -a|grep debug|grep hw|grep -E '(usb|hci)'
hw.usb.debug = 1
hw.uhci.debug = 1
hw.ohci.debug = 1
hw.ehci.debug = 1
hw.xhci.debug = 1
dmesg when the device was plugged in:
Oct 27 19:31:36 console /netbsd: [ 339.2244729] uhub2 at uhub1 port 1: Terminus Technology (0x1a40) USB 2.0 Hub [MTT] (0x0101), class 9/0, rev 2.00/1.00, addr 2
Oct 27 19:31:36 console /netbsd: [ 339.2397117] uhub2: multiple transaction translators
Oct 27 19:31:36 console /netbsd: [ 339.2445789] uhub2: 4 ports with 4 removable, self powered
Oct 27 19:31:37 console /netbsd: [ 340.0845023] uftdi0 at uhub2 port 2 configuration 1 interface 0
Oct 27 19:31:37 console /netbsd: [ 340.0845023] uftdi0: FTDI (0x0403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 3
Oct 27 19:31:37 console /netbsd: [ 340.1002791] ucom0 at uftdi0 portno 1
Oct 27 19:31:37 console /netbsd: [ 340.1044987] uftdi1 at uhub2 port 2 configuration 1 interface 1
Oct 27 19:31:37 console /netbsd: [ 340.1044987] uftdi1: FTDI (0x0403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 3
Oct 27 19:31:37 console /netbsd: [ 340.1183381] ucom1 at uftdi1 portno 2
Oct 27 19:31:37 console /netbsd: [ 340.1183381] uftdi2 at uhub2 port 2 configuration 1 interface 2
Oct 27 19:31:37 console /netbsd: [ 340.1253221] uftdi2: FTDI (0x0403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 3
Oct 27 19:31:37 console /netbsd: [ 340.1364119] ucom2 at uftdi2 portno 3
Oct 27 19:31:37 console /netbsd: [ 340.1364119] uftdi3 at uhub2 port 2 configuration 1 interface 3
Oct 27 19:31:37 console /netbsd: [ 340.1453230] uftdi3: FTDI (0x0403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 3
Oct 27 19:31:37 console /netbsd: [ 340.1544997] ucom3 at uftdi3 portno 4
Oct 27 19:31:37 console /netbsd: [ 340.6545175] uftdi4 at uhub2 port 3 configuration 1 interface 0
Oct 27 19:31:37 console /netbsd: [ 340.6545175] uftdi4: FTDI (0x0403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 4
Oct 27 19:31:37 console /netbsd: [ 340.6710667] ucom4 at uftdi4 portno 1
Oct 27 19:31:37 console /netbsd: [ 340.6746100] uftdi5 at uhub2 port 3 configuration 1 interface 1
Oct 27 19:31:37 console /netbsd: [ 340.6746100] uftdi5: FTDI (0x0403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 4
Oct 27 19:31:37 console /netbsd: [ 340.6891425] ucom5 at uftdi5 portno 2
Oct 27 19:31:37 console /netbsd: [ 340.6891425] uftdi6 at uhub2 port 3 configuration 1 interface 2
Oct 27 19:31:37 console /netbsd: [ 340.6961357] uftdi6: FTDI (0x0403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 4
Oct 27 19:31:37 console /netbsd: [ 340.7072252] ucom6 at uftdi6 portno 3
Oct 27 19:31:37 console /netbsd: [ 340.7072252] uftdi7 at uhub2 port 3 configuration 1 interface 3
Oct 27 19:31:37 console /netbsd: [ 340.7147651] uftdi7: FTDI (0x0403) USB <-> Serial Cable (0x6011), rev 2.00/8.00, addr 4
Oct 27 19:31:37 console /netbsd: [ 340.7252493] ucom7 at uftdi7 portno 4
The output of "vmstat -u usbhist" is about 2.2M, and I'm not sure how gnats
deals with either attachments or hilariously long emails, so I've parked
that here: https://www.thangorodrim.ch/tmp/NetBSD10BETA_serial_vmstat-usbhist
Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison
From: matthew green <mrg@eterna.com.au>
To: Alexander Schreiber <als@thangorodrim.ch>
Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org, riastradh@netbsd.org
Subject: re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Sat, 28 Oct 2023 09:25:56 +1100
OK, this at least agrees with what i thought was happening. from
> https://www.thangorodrim.ch/tmp/NetBSD10BETA_serial_vmstat-usbhist
all the useful stuff is near the very bottom, you can see the parts
of the 2nd to last successful transfer:
1698435122.516436 usbd_transfer#608@0: called: xfer =3D 0xc217fe78, flags =
=3D 0x2, pipe =3D 0xc3000100, running =3D 0
1698435122.516442 ehci_device_ctrl_start#82@0: called!
...
1698435122.516509 usbd_xfer_schedule_timeout#604@0: called: xfer 0xc217fe7=
8 timeout 5000ms set 0 reset 0
1698435122.516515 usbd_transfer#608@0: <- sleeping on xfer 0xc217fe78
1698435122.516796 ehci_intr#609@0: called!
...
1698435122.516958 usb_transfer_complete#613@0: <- done xfer 0xc217fe78, wa=
keup
but the next one doesn't get this far:
1698435126.993608 usbd_transfer#609@0: called: xfer =3D 0xc217fe78, flags =
=3D 0x2, pipe =3D 0xc3000100, running =3D 0
1698435126.993612 ehci_device_ctrl_start#83@0: called!
...
1698435126.993667 usbd_xfer_schedule_timeout#605@0: called: xfer 0xc217fe7=
8 timeout 5000ms set 0 reset 0
1698435126.993672 usbd_transfer#609@0: <- sleeping on xfer 0xc217fe78
1698435131.987617 usbd_xfer_timeout#1@0: called: xfer 0xc217fe78 status 1
but here it should complete with failure but the next logs are:
1698435131.987621 usbd_xfer_probe_timeout#1@0: called: xfer 0xc217fe78 tim=
eout 5000ms set 1 reset 0
1698435131.987625 usbd_xfer_probe_timeout#1@0: xfer 0xc217fe78 timing out
1698435131.987628 usbd_xfer_timeout#1@0: xfer 0xc217fe78 schedule timeout =
task
1698435131.987631 usb_add_task#1@0: called!
1698435131.987652 usbd_xfer_timeout_task#1@0: called: xfer 0xc217fe78 stat=
us 1
1698435131.987655 usbd_xfer_probe_timeout#2@0: called: xfer 0xc217fe78 tim=
eout 5000ms set 1 reset 0
1698435131.987658 usbd_xfer_probe_timeout#2@0: xfer 0xc217fe78 timing out
1698435131.987661 usbd_xfer_timeout_task#1@0: xfer 0xc217fe78 timed out
1698435131.987664 ehci_abortx#2@0: called!
1698435131.987667 ehci_abortx#2@0: xfer=3D0xc217fe78 pipe=3D0xc3000100
1698435131.987675 ehci_sync_hc#8@0: called!
1698435131.987679 ehci_sync_hc#8@0: cmd =3D 0x00020071 sts =3D 0x0000c008
there's no completion event here, even though it timed out. Taylor,
any ideas off hand?
.mrg.
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Alexander Schreiber <als@thangorodrim.ch>
Cc: matthew green <mrg@eterna.com.au>,
port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Sat, 28 Oct 2023 01:51:45 +0000
This is a multi-part message in MIME format.
--=_3GH/d8a4GiEuk6sY511/abKYWvHzDwXu
Can you please try the attached patch and see if it makes a
difference?
Note that it might not fix whatever the underlying problem is, but it
should at least cause the timeout on ehci sync to work instead of
hanging forever.
--=_3GH/d8a4GiEuk6sY511/abKYWvHzDwXu
Content-Type: text/plain; charset="ISO-8859-1"; name="pr57662-ehcisyncloop"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr57662-ehcisyncloop.patch"
From 2ccb4d42055737abfb68af0222d8871cbbc8cb6e Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh@NetBSD.org>
Date: Sat, 28 Oct 2023 01:42:37 +0000
Subject: [PATCH] ehci(4): Fix cv_timedwait loop in ehci_sync_hc.
Stop when
now - starttime >=3D delta,
i.e., when at least delta ticks have elapsed since the start, not
when
endtime - now > delta,
i.e., more than delta ticks _remain_ to sleep, which is never going
to happen (except on arithmetic overflow).
As is, what will happen in the case that should time out is that we
wake up after delta ticks, and find now =3D getticks() is exactly
endtime, so we retry cv_timedwait with timo=3D(endtime - now)=3D0 which
means sleep indefinitely with no timeout as if with cv_wait.
PR kern/57662
XXX pullup-10
---
sys/dev/usb/ehci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c
index 111957b34bd8..b59ad33b6395 100644
--- a/sys/dev/usb/ehci.c
+++ b/sys/dev/usb/ehci.c
@@ -2277,7 +2277,7 @@ ehci_sync_hc(ehci_softc_t *sc)
*/
while (sc->sc_doorbelllwp =3D=3D curlwp) {
now =3D getticks();
- if (endtime - now > delta) {
+ if (now - starttime >=3D delta) {
sc->sc_doorbelllwp =3D NULL;
cv_signal(&sc->sc_doorbell);
DPRINTF("doorbell timeout", 0, 0, 0, 0);
--=_3GH/d8a4GiEuk6sY511/abKYWvHzDwXu--
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Alexander Schreiber <als@thangorodrim.ch>
Cc: matthew green <mrg@eterna.com.au>,
port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Sat, 28 Oct 2023 02:05:02 +0000
This is a multi-part message in MIME format.
--=_iH22Mq8cEGadrlwrn55fhppMX7+AHtfW
Also: Can you try this patch on top as well?
--=_iH22Mq8cEGadrlwrn55fhppMX7+AHtfW
Content-Type: text/plain; charset="ISO-8859-1"; name="pr57662-ehcisyncloop.2"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr57662-ehcisyncloop.2.patch"
From fe71f2d88628e0a801d04fc4ad4b4667272c065f Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh@NetBSD.org>
Date: Sat, 28 Oct 2023 02:01:10 +0000
Subject: [PATCH] ehci(4): Fix bug causing missed wakeups since ehci.c 1.308.
For reasons beyond me now, I used cv_signal on the same cv that is
used to wait for the doorbell to be available _and_ to wait for the
host controller to acknowledge the doorbell. Which means when the
host controller acknowledges the doorbell, we might wake some thread
waiting for the doorbell to be available -- and leave the thread
waiting for the doorbell acknowledgment hanging indefinitely.
PR port-i386/57662
XXX pullup-10
---
sys/dev/usb/ehci.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c
index b59ad33b6395..3ffdec9c6657 100644
--- a/sys/dev/usb/ehci.c
+++ b/sys/dev/usb/ehci.c
@@ -826,7 +826,7 @@ ehci_doorbell(void *addr)
if (sc->sc_doorbelllwp =3D=3D NULL)
DPRINTF("spurious doorbell interrupt", 0, 0, 0, 0);
sc->sc_doorbelllwp =3D NULL;
- cv_signal(&sc->sc_doorbell);
+ cv_broadcast(&sc->sc_doorbell);
mutex_exit(&sc->sc_lock);
}
=20
@@ -2279,7 +2279,7 @@ ehci_sync_hc(ehci_softc_t *sc)
now =3D getticks();
if (now - starttime >=3D delta) {
sc->sc_doorbelllwp =3D NULL;
- cv_signal(&sc->sc_doorbell);
+ cv_broadcast(&sc->sc_doorbell);
DPRINTF("doorbell timeout", 0, 0, 0, 0);
#ifdef DIAGNOSTIC /* XXX DIAGNOSTIC abuse, do this differently */
printf("ehci_sync_hc: timed out\n");
--=_iH22Mq8cEGadrlwrn55fhppMX7+AHtfW--
From: Alexander Schreiber <als@thangorodrim.ch>
To: Taylor R Campbell <riastradh@netbsd.org>
Cc: matthew green <mrg@eterna.com.au>, port-i386-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Sat, 28 Oct 2023 22:26:06 +0200
On Sat, Oct 28, 2023 at 02:05:02AM +0000, Taylor R Campbell wrote:
> Also: Can you try this patch on top as well?
> >From fe71f2d88628e0a801d04fc4ad4b4667272c065f Mon Sep 17 00:00:00 2001
> From: Taylor R Campbell <riastradh@NetBSD.org>
> Date: Sat, 28 Oct 2023 02:01:10 +0000
> Subject: [PATCH] ehci(4): Fix bug causing missed wakeups since ehci.c 1.308.
>
> For reasons beyond me now, I used cv_signal on the same cv that is
> used to wait for the doorbell to be available _and_ to wait for the
> host controller to acknowledge the doorbell. Which means when the
> host controller acknowledges the doorbell, we might wake some thread
> waiting for the doorbell to be available -- and leave the thread
> waiting for the doorbell acknowledgment hanging indefinitely.
>
> PR port-i386/57662
>
> XXX pullup-10
> ---
> sys/dev/usb/ehci.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Both patches applied and things do look better. Now, instead of hanging
forever, after the second (so, at the third) connection attempt, cu
errors out with: "cu: link down" after a few seconds. This behavior then
sticks.
Relevant snippet from the trace:
1699 1699 cu 0.000010616 CALL open(0xbfb9a56e,6,0)
1699 1699 cu 0.000011733 NAMI "/dev/ttyU1"
1699 1699 cu 11.991814027 RET open -1 errno 5 Input/output error
vmstat usbhist dump at:
https://www.thangorodrim.ch/tmp/NetBSD10BETA_serial_vmstat-usbhist_patch-1
Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57662 CVS commit: src/sys/dev/usb
Date: Sat, 28 Oct 2023 21:18:15 +0000
Module Name: src
Committed By: riastradh
Date: Sat Oct 28 21:18:15 UTC 2023
Modified Files:
src/sys/dev/usb: ehci.c
Log Message:
ehci(4): Fix cv_timedwait loop in ehci_sync_hc.
Stop when
now - starttime >= delta,
i.e., when at least delta ticks have elapsed since the start, not
when
endtime - now > delta,
i.e., more than delta ticks _remain_ to sleep, which is never going
to happen (except on arithmetic overflow).
As is, what will happen in the case that should time out is that we
wake up after delta ticks, and find now = getticks() is exactly
endtime, so we retry cv_timedwait with timo=(endtime - now)=0 which
means sleep indefinitely with no timeout as if with cv_wait.
PR port-i386/57662
XXX pullup-10
To generate a diff of this commit:
cvs rdiff -u -r1.317 -r1.318 src/sys/dev/usb/ehci.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57662 CVS commit: src/sys/dev/usb
Date: Sat, 28 Oct 2023 21:18:32 +0000
Module Name: src
Committed By: riastradh
Date: Sat Oct 28 21:18:32 UTC 2023
Modified Files:
src/sys/dev/usb: ehci.c
Log Message:
ehci(4): Fix bug causing missed wakeups since ehci.c 1.308.
For reasons beyond me now, I used cv_signal on the same cv that is
used to wait for the doorbell to be available _and_ to wait for the
host controller to acknowledge the doorbell. Which means when the
host controller acknowledges the doorbell, we might wake some thread
waiting for the doorbell to be available -- and leave the thread
waiting for the doorbell acknowledgment hanging indefinitely.
PR port-i386/57662
XXX pullup-10
To generate a diff of this commit:
cvs rdiff -u -r1.318 -r1.319 src/sys/dev/usb/ehci.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Alexander Schreiber <als@thangorodrim.ch>
Cc: matthew green <mrg@eterna.com.au>,
port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Sat, 28 Oct 2023 21:29:09 +0000
> Date: Sat, 28 Oct 2023 22:26:06 +0200
> From: Alexander Schreiber <als@thangorodrim.ch>
>=20
> Both patches applied and things do look better. Now, instead of hanging
> forever, after the second (so, at the third) connection attempt, cu
> errors out with: "cu: link down" after a few seconds. This behavior then
> sticks.
>=20
> Relevant snippet from the trace:
>=20
> 1699 1699 cu 0.000010616 CALL open(0xbfb9a56e,6,0)
> 1699 1699 cu 0.000011733 NAMI "/dev/ttyU1"
> 1699 1699 cu 11.991814027 RET open -1 errno 5 Input/output er=
ror
>=20
> vmstat usbhist dump at:
>=20
> https://www.thangorodrim.ch/tmp/NetBSD10BETA_serial_vmstat-usbhist_patch-1
Great, thanks! So we've fixed the logic to abort on timeout that I
broke last year (which is good!), but now we have to figure out why
the transfer is timing out at all.
From: Alexander Schreiber <als@thangorodrim.ch>
To: Taylor R Campbell <riastradh@netbsd.org>
Cc: matthew green <mrg@eterna.com.au>, port-i386-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Sun, 29 Oct 2023 22:14:32 +0100
On Sat, Oct 28, 2023 at 09:29:09PM +0000, Taylor R Campbell wrote:
> > Date: Sat, 28 Oct 2023 22:26:06 +0200
> > From: Alexander Schreiber <als@thangorodrim.ch>
> >
> > Both patches applied and things do look better. Now, instead of hanging
> > forever, after the second (so, at the third) connection attempt, cu
> > errors out with: "cu: link down" after a few seconds. This behavior then
> > sticks.
> >
> > Relevant snippet from the trace:
> >
> > 1699 1699 cu 0.000010616 CALL open(0xbfb9a56e,6,0)
> > 1699 1699 cu 0.000011733 NAMI "/dev/ttyU1"
> > 1699 1699 cu 11.991814027 RET open -1 errno 5 Input/output error
> >
> > vmstat usbhist dump at:
> >
> > https://www.thangorodrim.ch/tmp/NetBSD10BETA_serial_vmstat-usbhist_patch-1
>
> Great, thanks! So we've fixed the logic to abort on timeout that I
> broke last year (which is good!), but now we have to figure out why
> the transfer is timing out at all.
Small update: I rebuilt the kernel (stripped out a lot of unneeded stuff
from the config to get build time down and enable DIAGNOSTICS) and redid
the test. Now, while cu hangs, I see matching complaints in dmesg:
[ 394.1174957] ehci_sync_hc: timed out
[ 400.1176959] ehci_sync_hc: timed out
[ 406.1178962] ehci_sync_hc: timed out
and then cu comes back with "cu: link down". I didn't see those kernel
messages before, so I guess that is courtesy of DIAGNOSTICS.
Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57662 CVS commit: [netbsd-10] src/sys/dev/usb
Date: Mon, 30 Oct 2023 17:45:11 +0000
Module Name: src
Committed By: martin
Date: Mon Oct 30 17:45:10 UTC 2023
Modified Files:
src/sys/dev/usb [netbsd-10]: ehci.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #440):
sys/dev/usb/ehci.c: revision 1.318
sys/dev/usb/ehci.c: revision 1.319
ehci(4): Fix cv_timedwait loop in ehci_sync_hc.
Stop when
now - starttime >= delta,
i.e., when at least delta ticks have elapsed since the start, not
when
endtime - now > delta,
i.e., more than delta ticks _remain_ to sleep, which is never going
to happen (except on arithmetic overflow).
As is, what will happen in the case that should time out is that we
wake up after delta ticks, and find now = getticks() is exactly
endtime, so we retry cv_timedwait with timo=(endtime - now)=0 which
means sleep indefinitely with no timeout as if with cv_wait.
PR port-i386/57662
ehci(4): Fix bug causing missed wakeups since ehci.c 1.308.
For reasons beyond me now, I used cv_signal on the same cv that is
used to wait for the doorbell to be available _and_ to wait for the
host controller to acknowledge the doorbell. Which means when the
host controller acknowledges the doorbell, we might wake some thread
waiting for the doorbell to be available -- and leave the thread
waiting for the doorbell acknowledgment hanging indefinitely.
PR port-i386/57662
To generate a diff of this commit:
cvs rdiff -u -r1.315.2.1 -r1.315.2.2 src/sys/dev/usb/ehci.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Alexander Schreiber <als@thangorodrim.ch>
To: Taylor R Campbell <riastradh@netbsd.org>
Cc: matthew green <mrg@eterna.com.au>, port-i386-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Subject: Re: port-i386/57662: StarTech ICUSB23208FD 8-Port USB-to-Serial
Adapter Hub fails on Alix with NetBSD/i386 9.3
Date: Mon, 13 Nov 2023 09:12:00 +0100
On Sat, Oct 28, 2023 at 09:29:09PM +0000, Taylor R Campbell wrote:
> > Date: Sat, 28 Oct 2023 22:26:06 +0200
> > From: Alexander Schreiber <als@thangorodrim.ch>
> >
> > Both patches applied and things do look better. Now, instead of hanging
> > forever, after the second (so, at the third) connection attempt, cu
> > errors out with: "cu: link down" after a few seconds. This behavior then
> > sticks.
> >
> > Relevant snippet from the trace:
> >
> > 1699 1699 cu 0.000010616 CALL open(0xbfb9a56e,6,0)
> > 1699 1699 cu 0.000011733 NAMI "/dev/ttyU1"
> > 1699 1699 cu 11.991814027 RET open -1 errno 5 Input/output error
> >
> > vmstat usbhist dump at:
> >
> > https://www.thangorodrim.ch/tmp/NetBSD10BETA_serial_vmstat-usbhist_patch-1
>
> Great, thanks! So we've fixed the logic to abort on timeout that I
> broke last year (which is good!), but now we have to figure out why
> the transfer is timing out at all.
I've done another test: Boot up machine, plug in USB2serial device,
try some connections, wait for them to time out and things to settle.
Unplug USB2serial device, wait a bit for things to settle, plug it
back in. Result:
- unplugging the device is noticed and the expected messages show
up in the kernel log, but with a scattering of time out messages
- plugging the device yields USB problems, ports disabled, device
no longer recognized
So, maybe the hardware side of the USB stack for this little Geode CPU
manages to get wedged?
Syslog snippet attached:
--------------------- /var/log/messages begins ----------------------
Oct 31 14:47:42 console /netbsd: [ 394.1174957] ehci_sync_hc: timed out
Oct 31 14:47:42 console /netbsd: [ 400.1176959] ehci_sync_hc: timed out
Oct 31 14:47:42 console /netbsd: [ 406.1178962] ehci_sync_hc: timed out
Oct 31 14:47:42 console /netbsd: [ 154334.7451176] ehci_sync_hc: timed out
Oct 31 14:47:42 console /netbsd: [ 154335.7451510] ehci_sync_hc: timed out
Oct 31 14:47:42 console /netbsd: [ 154335.7451510] ucom0: detached
Oct 31 14:47:42 console /netbsd: [ 154335.7451510] uftdi0: detached
Oct 31 14:47:42 console /netbsd: [ 154335.7551535] uftdi0: at uhub2 port 2 (addr 3) disconnected
Oct 31 14:47:44 console /netbsd: [ 154336.7551844] ehci_sync_hc: timed out
Oct 31 14:47:44 console /netbsd: [ 154337.7552181] ehci_sync_hc: timed out
Oct 31 14:47:44 console /netbsd: [ 154337.7552181] ucom1: detached
Oct 31 14:47:44 console /netbsd: [ 154337.7552181] uftdi1: detached
Oct 31 14:47:44 console /netbsd: [ 154337.7652254] uftdi1: at uhub2 port 2 (addr 3) disconnected
Oct 31 14:47:46 console /netbsd: [ 154338.7652515] ehci_sync_hc: timed out
Oct 31 14:47:46 console /netbsd: [ 154339.7652849] ehci_sync_hc: timed out
Oct 31 14:47:46 console /netbsd: [ 154339.7652849] ucom2: detached
Oct 31 14:47:46 console /netbsd: [ 154339.7652849] uftdi2: detached
Oct 31 14:47:46 console /netbsd: [ 154339.7752871] uftdi2: at uhub2 port 2 (addr 3) disconnected
Oct 31 14:47:48 console /netbsd: [ 154340.7753188] ehci_sync_hc: timed out
Oct 31 14:47:48 console /netbsd: [ 154341.7753519] ehci_sync_hc: timed out
Oct 31 14:47:48 console /netbsd: [ 154341.7753519] ucom3: detached
Oct 31 14:47:48 console /netbsd: [ 154341.7753519] uftdi3: detached
Oct 31 14:47:48 console /netbsd: [ 154341.7853609] uftdi3: at uhub2 port 2 (addr 3) disconnected
Oct 31 14:47:51 console /netbsd: [ 154342.7853857] ehci_sync_hc: timed out
Oct 31 14:47:51 console /netbsd: [ 154343.7854193] ehci_sync_hc: timed out
Oct 31 14:47:51 console /netbsd: [ 154344.7854541] ehci_sync_hc: timed out
Oct 31 14:47:51 console /netbsd: [ 154344.7854541] ucom4: detached
Oct 31 14:47:51 console /netbsd: [ 154344.7854541] uftdi4: detached
Oct 31 14:47:51 console /netbsd: [ 154344.7854541] uftdi4: at uhub2 port 3 (addr 4) disconnected
Oct 31 14:47:53 console /netbsd: [ 154345.7954864] ehci_sync_hc: timed out
Oct 31 14:47:53 console /netbsd: [ 154346.7955198] ehci_sync_hc: timed out
Oct 31 14:47:53 console /netbsd: [ 154346.7955198] ucom5: detached
Oct 31 14:47:53 console /netbsd: [ 154346.7955198] uftdi5: detached
Oct 31 14:47:53 console /netbsd: [ 154346.8055240] uftdi5: at uhub2 port 3 (addr 4) disconnected
Oct 31 14:47:55 console /netbsd: [ 154347.8055532] ehci_sync_hc: timed out
Oct 31 14:47:55 console /netbsd: [ 154348.8055869] ehci_sync_hc: timed out
Oct 31 14:47:55 console /netbsd: [ 154348.8055869] ucom6: detached
Oct 31 14:47:55 console /netbsd: [ 154348.8055869] uftdi6: detached
Oct 31 14:47:55 console /netbsd: [ 154348.8155903] uftdi6: at uhub2 port 3 (addr 4) disconnected
Oct 31 14:47:57 console /netbsd: [ 154349.8156214] ehci_sync_hc: timed out
Oct 31 14:47:57 console /netbsd: [ 154350.8156548] ehci_sync_hc: timed out
Oct 31 14:47:57 console /netbsd: [ 154350.8156548] ucom7: detached
Oct 31 14:47:57 console /netbsd: [ 154350.8156548] uftdi7: detached
Oct 31 14:47:57 console /netbsd: [ 154350.8256604] uftdi7: at uhub2 port 3 (addr 4) disconnected
Oct 31 14:48:00 console /netbsd: [ 154351.8256876] ehci_sync_hc: timed out
Oct 31 14:48:00 console /netbsd: [ 154352.8257210] ehci_sync_hc: timed out
Oct 31 14:48:00 console /netbsd: [ 154353.8257544] ehci_sync_hc: timed out
Oct 31 14:48:00 console /netbsd: [ 154353.8257544] uhub2: detached
Oct 31 14:48:00 console /netbsd: [ 154353.8257544] uhub2: at uhub1 port 1 (addr 2) disconnected
Oct 31 14:48:30 console ntpd[732]: Soliciting pool server 195.186.1.101
Oct 31 14:49:13 console /netbsd: [ 154354.8357878] ehci_sync_hc: timed out
Oct 31 14:49:13 console /netbsd: [ 154369.6462824] ehci_sync_hc: timed out
Oct 31 14:49:13 console /netbsd: [ 154375.8564896] ehci_sync_hc: timed out
Oct 31 14:49:13 console /netbsd: [ 154382.0666969] ehci_sync_hc: timed out
Oct 31 14:49:13 console /netbsd: [ 154388.2769042] ehci_sync_hc: timed out
Oct 31 14:49:13 console /netbsd: [ 154394.8471234] ehci_sync_hc: timed out
Oct 31 14:49:13 console /netbsd: [ 154401.0573307] ehci_sync_hc: timed out
Oct 31 14:49:13 console /netbsd: [ 154407.2675380] ehci_sync_hc: timed out
Oct 31 14:49:13 console /netbsd: [ 154413.4777449] ehci_sync_hc: timed out
Oct 31 14:49:13 console /netbsd: [ 154413.9677616] gcscehci0: handing over full speed device on port 1 to ohci0
Oct 31 14:49:13 console /netbsd: [ 154420.0279638] ehci_sync_hc: timed out
Oct 31 14:49:13 console /netbsd: [ 154426.2381711] ehci_sync_hc: timed out
Oct 31 14:49:13 console /netbsd: [ 154427.4482115] ehci_sync_hc: timed out
Oct 31 14:49:13 console /netbsd: [ 154427.4482115] uhub1: autoconfiguration error: device problem, disabling port 1
Oct 31 14:49:34 console ntpd[732]: Soliciting pool server 81.6.2.165
Oct 31 14:50:00 console cron[5467]: (root) CMD START (/usr/libexec/atrun)
Oct 31 14:50:00 console cron[4190]: (root) CMD FINISH (/usr/libexec/atrun)
Oct 31 14:50:39 console ntpd[732]: Soliciting pool server 81.94.123.16
Oct 31 14:50:58 console /netbsd: [ 154474.8897948] ehci_sync_hc: timed out
Oct 31 14:50:58 console /netbsd: [ 154481.1000032] ehci_sync_hc: timed out
Oct 31 14:50:58 console /netbsd: [ 154487.3102093] ehci_sync_hc: timed out
Oct 31 14:50:58 console /netbsd: [ 154493.5204177] ehci_sync_hc: timed out
Oct 31 14:50:58 console /netbsd: [ 154494.0104329] gcscehci0: handing over full speed device on port 1 to ohci0
Oct 31 14:50:58 console /netbsd: [ 154500.0706352] ehci_sync_hc: timed out
Oct 31 14:50:58 console /netbsd: [ 154506.2808422] ehci_sync_hc: timed out
Oct 31 14:50:58 console /netbsd: [ 154512.4910497] ehci_sync_hc: timed out
Oct 31 14:50:58 console /netbsd: [ 154518.7012567] ehci_sync_hc: timed out
Oct 31 14:50:58 console /netbsd: [ 154519.1912733] gcscehci0: handing over full speed device on port 1 to ohci0
Oct 31 14:50:58 console /netbsd: [ 154525.2514756] ehci_sync_hc: timed out
Oct 31 14:50:58 console /netbsd: [ 154531.4616828] ehci_sync_hc: timed out
Oct 31 14:50:58 console /netbsd: [ 154532.6717232] ehci_sync_hc: timed out
Oct 31 14:50:58 console /netbsd: [ 154532.6717232] uhub1: autoconfiguration error: device problem, disabling port 1
--------------------- /var/log/messages ends ----------------------
Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.