NetBSD Problem Report #59497
From stix@stix.id.au Tue Jul 1 09:19:42 2025
Return-Path: <stix@stix.id.au>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 22F561A923A
for <gnats-bugs@gnats.NetBSD.org>; Tue, 1 Jul 2025 09:19:42 +0000 (UTC)
Message-Id: <20250701091227.833741A010@stix.id.au>
Date: Tue, 1 Jul 2025 19:12:27 +1000 (AEST)
From: stix@stix.id.au
Reply-To: stix@stix.id.au
To: gnats-bugs@NetBSD.org
Subject: Panic in ucompoll
X-Send-Pr-Version: 3.95
>Number: 59497
>Notify-List: bad@bsd.de
>Category: kern
>Synopsis: Panic in ucompoll
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jul 01 09:20:00 +0000 2025
>Last-Modified: Sun Jul 20 13:25:01 +0000 2025
>Originator: Paul Ripke
>Release: NetBSD 10.1_STABLE
>Organization:
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.
>Environment:
System: NetBSD slave 10.1_STABLE NetBSD 10.1_STABLE (SLAVE) #17: Fri Apr 18 13:51:35 AEST 2025 stix@slave:/home/netbsd/netbsd-10/obj.amd64/home/netbsd/netbsd-10/src/sys/arch/amd64/compile/SLAVE amd64
Architecture: x86_64
Machine: amd64
>Description:
Crash appears due to intermittent disconnect/reconnect of a uplcom device while open.
Device is:
addr 54: full speed, power 100 mA, config 1, USB-Serial Controller D(0x2303), Prolific Technology Inc.(0x067b), rev 4.00(0x0400)
uplcom0 at uhub9 port 1
uplcom0: Prolific Technology Inc. (0x067b) USB-Serial Controller D (0x2303), rev 1.10/4.00, addr 5
ucom0 at uplcom0
Periodically:
Jun 28 14:27:01 slave /netbsd: [ 2332157.9354694] ucom2: detached
Jun 28 14:27:01 slave /netbsd: [ 2332157.9354694] uplcom1: detached
Jun 28 14:27:01 slave /netbsd: [ 2332157.9354694] uplcom1: at uhub1 port 8 (addr 52) disconnected
Jun 28 14:27:10 slave /netbsd: [ 2332166.7886134] uplcom1 at uhub1 port 8
Jun 28 14:27:10 slave /netbsd: [ 2332166.7886134] uplcom1: Prolific Technology Inc. (0x067b) USB-Serial Controller D (0x2303), rev 1.10/4.00, addr 53
Jun 28 14:27:10 slave /netbsd: [ 2332166.8096137] ucom2 at uplcom1
Jun 28 14:27:10 slave /netbsd: [ 2332166.8246139] ucom2: detached
Jun 28 14:27:10 slave /netbsd: [ 2332166.8246139] uplcom1: detached
Jun 28 14:27:10 slave /netbsd: [ 2332166.8246139] uplcom1: at uhub1 port 8 (addr 53) disconnected
Jun 28 14:27:11 slave /netbsd: [ 2332167.3396223] uplcom1 at uhub1 port 8
Jun 28 14:27:11 slave /netbsd: [ 2332167.3396223] uplcom1: Prolific Technology Inc. (0x067b) USB-Serial Controller D (0x2303), rev 1.10/4.00, addr 54
Jun 28 14:27:11 slave /netbsd: [ 2332167.3606226] ucom2 at uplcom1
crash> bt
__kernel_end() at 0
kern_reboot() at sys_reboot
vpanic() at vpanic+0x18d
panic() at vprintf
trap() at startlwp
--- trap (number 6) ---
ucompoll() at ucompoll+0x2a
cdev_poll() at cdev_poll+0x87
spec_poll() at spec_poll+0x6a
VOP_POLL() at VOP_POLL+0x5d
sel_do_scan() at sel_do_scan+0x3ba
selcommon() at selcommon+0x309
sys___select50() at sys___select50+0x75
syscall() at syscall+0x1fc
--- syscall (number 417) ---
syscall+0x1fc:
Have core and kernel with symbols.
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
From: Christoph Badura <bad@bsd.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/59497: Panic in ucompoll
Date: Wed, 2 Jul 2025 00:47:37 +0200
On Tue, Jul 01, 2025 at 09:20:00AM +0000, stix@stix.id.au wrote:
> Crash appears due to intermittent disconnect/reconnect of a uplcom device while open.
Are you sure this is a genuine Prolific device? I've tried to get some
Prolific USB serial fobs at the start of the year and found that the market
is swamped with buggy fake prolific chips. Even supposedly reputable
manufacturers had fake chips on the fobs that claimed to be PL2303HX /
PL2303HXD. In the end i managed to get some fobs with genuine Prolific
chips for some USD 20 per fob. The fake ones all sold for about USD 3-4 and
were easily identifiable by the missing part number and Prolific logo on the
SSOP chip.
The real ones also don't periodically disconnect/reconnect. :-)
Of course, using the fake chips shouldn't crash the system.
Obviously you were running a process that had the corresponding ttyUX open
when the crash happened. Otherwise it wouldn't have been triggered from
the select(2) code. Can you please describe what command exactly you were
running and what its command line options and other configuration settings
were. I'd like to try to reproduce this locally.
> crash> bt
> __kernel_end() at 0
> kern_reboot() at sys_reboot
> vpanic() at vpanic+0x18d
> panic() at vprintf
> trap() at startlwp
> --- trap (number 6) ---
> ucompoll() at ucompoll+0x2a
> cdev_poll() at cdev_poll+0x87
> spec_poll() at spec_poll+0x6a
> VOP_POLL() at VOP_POLL+0x5d
> sel_do_scan() at sel_do_scan+0x3ba
> selcommon() at selcommon+0x309
> sys___select50() at sys___select50+0x75
> syscall() at syscall+0x1fc
> --- syscall (number 417) ---
> syscall+0x1fc:
>
> Have core and kernel with symbols.
Could you try to disassemble the ucompoll() until the offending
instruction?
Could you try to find out if TS_CANCEL is set in tp->t_state?
> >How-To-Repeat:
>
> >Fix:
This might be relatively easy to work around.
ucycom(4) has (https://nxr.netbsd.org/xref/src/sys/dev/usb/ucycom.c#897):
if (sc->sc_dying)
return EIO;
of course, it should return POLLHUP.
uhso has (https://nxr.netbsd.org/xref/src/sys/dev/usb/uhso.c#1791):
if (!device_is_active(sc->sc_dev))
return POLLHUP;
So apparently there is no agreement how this should be handled.
Could you try adding
if (sc->sc_dying)
return POLLHUP;
before line 853 in ucom.c and see if that makes the symtomps go away?
But maybe the right fix would be to make ttycancel() deal with any pending
select()s too? Or something similar that ties in with the d_cancel
framework?
--chris
From: Paul Ripke <stix@stix.id.au>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, stix@stix.id.au
Subject: Re: kern/59497: Panic in ucompoll
Date: Thu, 3 Jul 2025 21:25:18 +1000
On Tue, Jul 01, 2025 at 11:00:02PM +0000, Christoph Badura via gnats wrote:
> On Tue, Jul 01, 2025 at 09:20:00AM +0000, stix@stix.id.au wrote:
> > Crash appears due to intermittent disconnect/reconnect of a uplcom device while open.
>
> Are you sure this is a genuine Prolific device? I've tried to get some
> Prolific USB serial fobs at the start of the year and found that the market
> is swamped with buggy fake prolific chips. Even supposedly reputable
> manufacturers had fake chips on the fobs that claimed to be PL2303HX /
> PL2303HXD. In the end i managed to get some fobs with genuine Prolific
> chips for some USD 20 per fob. The fake ones all sold for about USD 3-4 and
> were easily identifiable by the missing part number and Prolific logo on the
> SSOP chip.
I'm really not sure - it's old, and it was cheap. I have used it for the
serial console on an old Sun SPARCserver 5, but that system now has dodgy RAM
that needs replacing.
> The real ones also don't periodically disconnect/reconnect. :-)
I should hope not :)
I was considering shopping around for a USB FTDI-based serial adapter -
but I wonder if there are also fakes of those on the market...
> Of course, using the fake chips shouldn't crash the system.
Indeed.
> Obviously you were running a process that had the corresponding ttyUX open
> when the crash happened. Otherwise it wouldn't have been triggered from
> the select(2) code. Can you please describe what command exactly you were
> running and what its command line options and other configuration settings
> were. I'd like to try to reproduce this locally.
That could be challenging. I had it hooked up to a Tandy Color Computer (coco1)
at 38400 baud, via alligator clips, and the software was drivewire.py:
https://github.com/n6il/pyDriveWire
Basically doing remote floppy disk access over the serial port.
> > crash> bt
> > __kernel_end() at 0
> > kern_reboot() at sys_reboot
> > vpanic() at vpanic+0x18d
> > panic() at vprintf
> > trap() at startlwp
> > --- trap (number 6) ---
> > ucompoll() at ucompoll+0x2a
> > cdev_poll() at cdev_poll+0x87
> > spec_poll() at spec_poll+0x6a
> > VOP_POLL() at VOP_POLL+0x5d
> > sel_do_scan() at sel_do_scan+0x3ba
> > selcommon() at selcommon+0x309
> > sys___select50() at sys___select50+0x75
> > syscall() at syscall+0x1fc
> > --- syscall (number 417) ---
> > syscall+0x1fc:
> >
> > Have core and kernel with symbols.
>
> Could you try to disassemble the ucompoll() until the offending
> instruction?
That's easy, it's a tiny function:
(gdb) x/20i ucompoll
0xffffffff804960a5 <ucompoll>: push %rbp
0xffffffff804960a6 <ucompoll+1>: mov %rsp,%rbp
0xffffffff804960a9 <ucompoll+4>: push %r13
0xffffffff804960ab <ucompoll+6>: push %r12
0xffffffff804960ad <ucompoll+8>: mov %esi,%r12d
0xffffffff804960b0 <ucompoll+11>: mov %rdx,%r13
0xffffffff804960b3 <ucompoll+14>: mov %edi,%eax
0xffffffff804960b5 <ucompoll+16>: shr $0xc,%eax
0xffffffff804960b8 <ucompoll+19>: movzbl %dil,%esi
0xffffffff804960bc <ucompoll+23>: and $0x3ff00,%eax
0xffffffff804960c1 <ucompoll+28>: or %eax,%esi
0xffffffff804960c3 <ucompoll+30>: mov $0xffffffff81896660,%rdi
0xffffffff804960ca <ucompoll+37>: call 0xffffffff80e42be0 <device_lookup_private>
0xffffffff804960cf <ucompoll+42>: mov 0xe8(%rax),%rdi <------
0xffffffff804960d6 <ucompoll+49>: mov 0x168(%rdi),%rax
0xffffffff804960dd <ucompoll+56>: mov 0x60(%rax),%rax
0xffffffff804960e1 <ucompoll+60>: mov %r13,%rdx
0xffffffff804960e4 <ucompoll+63>: mov %r12d,%esi
0xffffffff804960e7 <ucompoll+66>: pop %r12
0xffffffff804960e9 <ucompoll+68>: pop %r13
> Could you try to find out if TS_CANCEL is set in tp->t_state?
Yeah, I was actually wondering how to do that. I can't figure out for the
life of me how to switch between cpu stacks in gdb. I realize most of the
kernel debugging I've done has been on single cpu machines...
However, doesn't this imply sc is null?
(gdb) p ucom_cd
$9 = {
cd_list = {
le_next = 0xffffffff818966a0 <umidi_cd>,
le_prev = 0xffffffff81896620 <ugen_cd>
},
cd_attach = {
lh_first = 0xffffffff81815260 <ucom_ca>
},
cd_devs = 0x0,
cd_name = 0xffffffff813e59e8 "ucom",
cd_class = DV_DULL,
cd_ndevs = 0,
cd_attrs = 0x0
}
> This might be relatively easy to work around.
>
> ucycom(4) has (https://nxr.netbsd.org/xref/src/sys/dev/usb/ucycom.c#897):
>
> if (sc->sc_dying)
> return EIO;
>
> of course, it should return POLLHUP.
>
> uhso has (https://nxr.netbsd.org/xref/src/sys/dev/usb/uhso.c#1791):
>
> if (!device_is_active(sc->sc_dev))
> return POLLHUP;
>
> So apparently there is no agreement how this should be handled.
>
> Could you try adding
>
> if (sc->sc_dying)
> return POLLHUP;
>
> before line 853 in ucom.c and see if that makes the symtomps go away?
or perhaps:
if (sc == NULL)
return POLLHUP;
?
> But maybe the right fix would be to make ttycancel() deal with any pending
> select()s too? Or something similar that ties in with the d_cancel
> framework?
Yeah, I haven't studied the code that much as yet.
--
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.
From: Christoph Badura <bad@bsd.de>
To:
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org
Subject: Re: kern/59497: Panic in ucompoll
Date: Fri, 4 Jul 2025 00:13:27 +0200
On Thu, Jul 03, 2025 at 09:25:18PM +1000, Paul Ripke wrote:
> On Tue, Jul 01, 2025 at 11:00:02PM +0000, Christoph Badura via gnats wrote:
> > On Tue, Jul 01, 2025 at 09:20:00AM +0000, stix@stix.id.au wrote:
> I'm really not sure - it's old, and it was cheap. I have used it for the
> serial console on an old Sun SPARCserver 5, but that system now has dodgy RAM
> that needs replacing.
The photo of the chip that you sent me privately make it clear that it is a
genuine PL-2303HX. Good for you, I guess. Bad for us as it suggests we
have a bug in our driver that causes the disconnects.
> I was considering shopping around for a USB FTDI-based serial adapter -
> but I wonder if there are also fakes of those on the market...
I think there are also fakes on the market. Genuine FTDI fobs seem to be
available mostly via Mouser, Farnell, etc. I ended up buying a couple at
~USD25 from Farnell earlier this year; before I could hunt down a source
for genuine Prolific fobs -- which cost basically the same.
> > [...] I'd like to try to reproduce this locally.
>
> That could be challenging. I had it hooked up to a Tandy Color Computer (coco1)
> at 38400 baud, via alligator clips, and the software was drivewire.py:
>
> https://github.com/n6il/pyDriveWire
>
> Basically doing remote floppy disk access over the serial port.
Well, I could just try out pyDriveWire without a CoCo (or anything else)
connected and see if that provokes the crash, too.
> > Could you try to disassemble the ucompoll() until the offending
> > instruction?
>
> That's easy, it's a tiny function:
>
> (gdb) x/20i ucompoll
> 0xffffffff804960a5 <ucompoll>: push %rbp
> 0xffffffff804960a6 <ucompoll+1>: mov %rsp,%rbp
> 0xffffffff804960a9 <ucompoll+4>: push %r13
> 0xffffffff804960ab <ucompoll+6>: push %r12
> 0xffffffff804960ad <ucompoll+8>: mov %esi,%r12d
> 0xffffffff804960b0 <ucompoll+11>: mov %rdx,%r13
> 0xffffffff804960b3 <ucompoll+14>: mov %edi,%eax
> 0xffffffff804960b5 <ucompoll+16>: shr $0xc,%eax
> 0xffffffff804960b8 <ucompoll+19>: movzbl %dil,%esi
> 0xffffffff804960bc <ucompoll+23>: and $0x3ff00,%eax
> 0xffffffff804960c1 <ucompoll+28>: or %eax,%esi
> 0xffffffff804960c3 <ucompoll+30>: mov $0xffffffff81896660,%rdi
> 0xffffffff804960ca <ucompoll+37>: call 0xffffffff80e42be0 <device_lookup_private>
> 0xffffffff804960cf <ucompoll+42>: mov 0xe8(%rax),%rdi <------
> 0xffffffff804960d6 <ucompoll+49>: mov 0x168(%rdi),%rax
> 0xffffffff804960dd <ucompoll+56>: mov 0x60(%rax),%rax
> 0xffffffff804960e1 <ucompoll+60>: mov %r13,%rdx
> 0xffffffff804960e4 <ucompoll+63>: mov %r12d,%esi
> 0xffffffff804960e7 <ucompoll+66>: pop %r12
> 0xffffffff804960e9 <ucompoll+68>: pop %r13
>
> > Could you try to find out if TS_CANCEL is set in tp->t_state?
>
> Yeah, I was actually wondering how to do that. I can't figure out for the
> life of me how to switch between cpu stacks in gdb. I realize most of the
> kernel debugging I've done has been on single cpu machines...
>
> However, doesn't this imply sc is null?
Yes, that has to be the ``tp = sc->sc_tty'' assignment.
Do you have the kernel messages right before the panic? I.e. print the
contents of msgbuf. Your original mail only showed what is syslogged,
doesn't it?
What I'm wondering is if the panic happend between a "ucom2:
detached\nuplcom1: detached" and a subsequent "uplcom1 at uhub1 port 8".
sc being null implies the device being detached, if I remember things
correctly. Which makes the situation somewhat worse, because detaching
the device should revoke the open vnode for the device.
Maybe spec_poll() needs to check if sn->sn_gone is set after calling
spec_io_enter()?
https://nxr.netbsd.org/xref/src/sys/miscfs/specfs/spec_vnops.c#1378
https://nxr.netbsd.org/xref/src/sys/miscfs/specfs/spec_vnops.c#618?
But maybe that is pampering over the symptoms. I haven't stared long
enough at the code.
> > This might be relatively easy to work around.
> >
> > ucycom(4) has (https://nxr.netbsd.org/xref/src/sys/dev/usb/ucycom.c#897):
> >
> > if (sc->sc_dying)
> > return EIO;
> >
> > of course, it should return POLLHUP.
> >
> > uhso has (https://nxr.netbsd.org/xref/src/sys/dev/usb/uhso.c#1791):
> >
> > if (!device_is_active(sc->sc_dev))
> > return POLLHUP;
> >
> > So apparently there is no agreement how this should be handled.
> >
> > Could you try adding
> >
> > if (sc->sc_dying)
> > return POLLHUP;
> >
> > before line 853 in ucom.c and see if that makes the symtomps go away?
>
> or perhaps:
>
> if (sc == NULL)
> return POLLHUP;
>
> ?
That certainly would avoid the crash. But I think it is just pampering
over the symptoms.
Or maybe it and the other two placesshould return POLLERR like spec_poll()
does?
> > But maybe the right fix would be to make ttycancel() deal with any pending
> > select()s too? Or something similar that ties in with the d_cancel
> > framework?
>
> Yeah, I haven't studied the code that much as yet.
What a rabbit hole!
I'm sorry, I don't have time right now and the next 2 weeks to dive down
into it. But you do have a local workaround, I think. And if you can
debug this further, we would greatly appreciate it.
--chris
From: Christoph Badura <bad@bsd.de>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org
Cc:
Subject: Re: kern/59497: Panic in ucompoll
Date: Fri, 4 Jul 2025 00:43:52 +0200
Actually, could you test a -current kernel?
I missed that you are reporting this against 10.1_STABLE.
--chris
From: Paul Ripke <stix@stix.id.au>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/59497: Panic in ucompoll
Date: Sat, 5 Jul 2025 17:44:42 +1000
Re msgbuf, after dumping it out, I realise this crash was actually due to
a failing USB hub flaking out intermittently. I have seen this device
intermittently disconnect/reconnect without that hub, so there's still
something going on.
[ 3804001.1947412] ukbd0: was console keyboard
[ 3804001.1947412] wskbd0: detached
[ 3804001.1987412] ukbd0: detached
[ 3804001.1987412] uhidev0: detached
[ 3804001.1987412] uhidev0: at uhub8 port 1 (addr 2) disconnected
[ 3804001.1987412] wskbd2: disconnecting from wsdisplay0
[ 3804001.1987412] wskbd2: detached
[ 3804001.1987412] ukbd3: detached
[ 3804001.2027413] wskbd1: disconnecting from wsdisplay0
[ 3804001.2027413] wskbd1: detached
[ 3804001.2027413] ukbd2: detached
[ 3804001.2027413] ukbd1: detached
[ 3804001.2027413] uhid2: detached
[ 3804001.2027413] uhid1: detached
[ 3804001.2027413] uhid0: detached
[ 3804001.2027413] uhidev1: detached
[ 3804001.2027413] uhidev1: at uhub8 port 1 (addr 2) disconnected
[ 3804001.2117415] wsmouse0: detached
[ 3804001.2117415] ums0: detached
[ 3804001.2117415] uhidev2: detached
[ 3804001.2117415] uhidev2: at uhub8 port 2 (addr 12) disconnected
[ 3804001.2127414] uhid7: detached
[ 3804001.2127414] uhid6: detached
[ 3804001.2127414] uhid5: detached
[ 3804001.2157415] wskbd3: disconnecting from wsdisplay0
[ 3804001.2157415] wskbd3: detached
[ 3804001.2157415] ukbd4: detached
[ 3804001.2157415] uhidev4: detached
[ 3804001.2157415] uhidev4: at uhub8 port 2 (addr 12) disconnected
[ 3804001.2157415] uhidev5: detached
[ 3804001.2157415] uhidev5: at uhub8 port 2 (addr 12) disconnected
[ 3804001.2307417] ucom1: detached
[ 3804001.2307417] uplcom0: detached
[ 3804001.2307417] uplcom0: at uhub9 port 1 (addr 13) disconnected
[ 3804001.2407419] ucom0: detached
[ 3804001.2407419] umodem0: detached
[ 3804001.2407419] umodem0: at uhub9 port 2 (addr 11) disconnected
[ 3804001.2477419] uhub9: detached
[ 3804001.2477419] uhub9: at uhub8 port 3 (addr 4) disconnected
[ 3804001.2537420] uhub8: detached
[ 3804001.2537420] uhub8: at uhub1 port 5 (addr 1) disconnected
[ 3804001.7737508] uhub8 at uhub1 port 5: GenesysLogic (0x05e3) USB2.0 Hub (0x0610), class 9/0, rev 2.10/92.26, addr 14
[ 3804001.7737508] uhub8: multiple transaction translators
[ 3804001.7887511] uhub8: 4 ports with 1 removable, self powered
[ 3804002.1207568] uvm_fault(0xffffb1c7ac104780, 0x0, 1) -> e
[ 3804002.1207568] fatal page fault in supervisor mode
[ 3804002.1207568] trap type 6 code 0 rip 0xffffffff804960cf cs 0x8 rflags 0x10246 cr2 0xe8 ilevel 0 rsp 0xffffb41236bc5bf0
[ 3804002.1207568] curlwp 0xffffb1ca652e1340 pid 23833.26753 lowest kstack 0xffffb41236bc12c0
[ 3804002.1207568] panic: trap
[ 3804002.1207568] cpu1: Begin traceback...
[ 3804002.1207568] vpanic() at netbsd:vpanic+0x183
[ 3804002.1217568] panic() at netbsd:panic+0x3c
[ 3804002.1227568] trap() at netbsd:trap+0xbaf
[ 3804002.1227568] --- trap (number 6) ---
[ 3804002.1227568] ucompoll() at netbsd:ucompoll+0x2a
[ 3804002.1227568] cdev_poll() at netbsd:cdev_poll+0x87
[ 3804002.1237565] spec_poll() at netbsd:spec_poll+0x6a
[ 3804002.1237565] VOP_POLL() at netbsd:VOP_POLL+0x5d
[ 3804002.1247569] sel_do_scan() at netbsd:sel_do_scan+0x3ba
[ 3804002.1247569] selcommon() at netbsd:selcommon+0x309
[ 3804002.1247569] sys___select50() at netbsd:sys___select50+0x75
[ 3804002.1257569] syscall() at netbsd:syscall+0x1fc
[ 3804002.1257569] --- syscall (number 417) ---
[ 3804002.1257569] netbsd:syscall+0x1fc:
[ 3804002.1257569] cpu1: End traceback...
--
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.
From: Christoph Badura <bad@bsd.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/59497: Panic in ucompoll
Date: Sat, 5 Jul 2025 11:17:29 +0200
On Sat, Jul 05, 2025 at 07:45:02AM +0000, Paul Ripke via gnats wrote:
> Re msgbuf, after dumping it out, I realise this crash was actually due to
> a failing USB hub flaking out intermittently. I have seen this device
> intermittently disconnect/reconnect without that hub, so there's still
> something going on.
I'm confused. Does the device also intermittently disconnect/reconnect
without the hub?
Anyway, even a flaky USB hub shouldn't cause a panic.
> [ 3804001.2307417] ucom1: detached
> [ 3804001.2307417] uplcom0: detached
> [ 3804001.2307417] uplcom0: at uhub9 port 1 (addr 13) disconnected
> [ 3804001.2407419] ucom0: detached
> [ 3804001.2537420] uhub8: detached
> [ 3804001.2537420] uhub8: at uhub1 port 5 (addr 1) disconnected
> [ 3804001.7737508] uhub8 at uhub1 port 5: GenesysLogic (0x05e3) USB2.0 Hub (0x0610), class 9/0, rev 2.10/92.26, addr 14
> [ 3804001.7737508] uhub8: multiple transaction translators
> [ 3804001.7887511] uhub8: 4 ports with 1 removable, self powered
> [ 3804002.1207568] uvm_fault(0xffffb1c7ac104780, 0x0, 1) -> e
> [ 3804002.1207568] fatal page fault in supervisor mode
This looks to me like the trap happens while uplcom0 (did it move from
uplcom1?) was disconnected/detached.
If you are testing the suggested changes (check for sc != NULL and/or the
change for spec_poll()) could you add a printf when it triggers so that we
can verify that this happens while the uplcom/ucom is disconnected?
--chris
From: Paul Ripke <stix@stix.id.au>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, stix@stix.id.au, bad@bsd.de
Subject: Re: kern/59497: Panic in ucompoll
Date: Sun, 20 Jul 2025 23:24:39 +1000
On Sat, Jul 05, 2025 at 09:20:02AM +0000, Christoph Badura via gnats wrote:
>
> On Sat, Jul 05, 2025 at 07:45:02AM +0000, Paul Ripke via gnats wrote:
> > Re msgbuf, after dumping it out, I realise this crash was actually due to
> > a failing USB hub flaking out intermittently. I have seen this device
> > intermittently disconnect/reconnect without that hub, so there's still
> > something going on.
>
> I'm confused. Does the device also intermittently disconnect/reconnect
> without the hub?
Yes, it does - but it seems the only crash dump I had was one due to the
failing hub.
> Anyway, even a flaky USB hub shouldn't cause a panic.
Indeed.
> > [ 3804001.2307417] ucom1: detached
> > [ 3804001.2307417] uplcom0: detached
> > [ 3804001.2307417] uplcom0: at uhub9 port 1 (addr 13) disconnected
> > [ 3804001.2407419] ucom0: detached
> > [ 3804001.2537420] uhub8: detached
> > [ 3804001.2537420] uhub8: at uhub1 port 5 (addr 1) disconnected
> > [ 3804001.7737508] uhub8 at uhub1 port 5: GenesysLogic (0x05e3) USB2.0 Hub (0x0610), class 9/0, rev 2.10/92.26, addr 14
> > [ 3804001.7737508] uhub8: multiple transaction translators
> > [ 3804001.7887511] uhub8: 4 ports with 1 removable, self powered
> > [ 3804002.1207568] uvm_fault(0xffffb1c7ac104780, 0x0, 1) -> e
> > [ 3804002.1207568] fatal page fault in supervisor mode
>
> This looks to me like the trap happens while uplcom0 (did it move from
> uplcom1?) was disconnected/detached.
It may have moved - I do have two of them, and have used both on occasion.
> If you are testing the suggested changes (check for sc != NULL and/or the
> change for spec_poll()) could you add a printf when it triggers so that we
> can verify that this happens while the uplcom/ucom is disconnected?
I ran a test - with the 'if (sc == null) return POLLHUP' patch, with drivewire
running on /dev/dtyU0, and pulling the USB:
[ 2724.7698958] xhci0: xhci_reset_endpoint: endpoint 0x0: timed out
[ 2724.7738960] WARNING: pipe closed with active xfers on addr 4
[ 2724.7808961] ucom0: detached
[ 2724.7808961] uplcom0: detached
[ 2724.7808961] uplcom0: at uhub8 port 3 (addr 4) disconnected
[ 2725.6829076] ucompoll: sc == NULL
[ 2725.6829076] uvm_fault(0xffff80e5b04f1848, 0x0, 1) -> e
[ 2725.6829076] fatal page fault in supervisor mode
[ 2725.6829076] trap type 6 code 0 rip 0xffffffff80497ab7 cs 0x8 rflags 0x10246 cr2 0xe8 ilevel 0 rsp 0xffff881237c78cd0
[ 2725.6829076] curlwp 0xffff80e6239de680 pid 6220.6226 lowest kstack 0xffff881237c742c0
[ 2725.6829076] panic: trap
[ 2725.6829076] cpu1: Begin traceback...
[ 2725.6829076] vpanic() at netbsd:vpanic+0x183
[ 2725.6839076] panic() at netbsd:panic+0x3c
[ 2725.6849078] trap() at netbsd:trap+0xbaf
[ 2725.6849078] --- trap (number 6) ---
[ 2725.6849078] ucomread() at netbsd:ucomread+0x2a
[ 2725.6849078] cdev_read() at netbsd:cdev_read+0x87
[ 2725.6859078] spec_read() at netbsd:spec_read+0x2d3
[ 2725.6859078] VOP_READ() at netbsd:VOP_READ+0x42
[ 2725.6869079] vn_read() at netbsd:vn_read+0x18e
[ 2725.6869079] dofileread() at netbsd:dofileread+0x79
[ 2725.6869079] sys_read() at netbsd:sys_read+0x49
[ 2725.6879078] syscall() at netbsd:syscall+0x1fc
[ 2725.6879078] --- syscall (number 3) ---
[ 2725.6879078] netbsd:syscall+0x1fc:
[ 2725.6879078] cpu1: End traceback...
So, this exited the poll, and died in read, which I guess is an
improvement? If I get the chance, I'll try to figure out how this
is supposed to work.
btw: this is still on the 10.1 branch.
Cheers,
--
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.