NetBSD Problem Report #58466
From stix@stix.id.au Thu Jul 25 04:22:48 2024
Return-Path: <stix@stix.id.au>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 4BAF61A9239
for <gnats-bugs@gnats.NetBSD.org>; Thu, 25 Jul 2024 04:22:48 +0000 (UTC)
Message-Id: <20240725042235.DCBC619E8B@stix.id.au>
Date: Thu, 25 Jul 2024 14:22:35 +1000 (AEST)
From: stix@stix.id.au
Reply-To: stix@stix.id.au
To: gnats-bugs@NetBSD.org
Subject: Kernel panic in ucompoll
X-Send-Pr-Version: 3.95
>Number: 58466
>Category: kern
>Synopsis: Kernel panic in ucompoll
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Jul 25 04:25:00 +0000 2024
>Last-Modified: Thu Jul 25 05:05:01 +0000 2024
>Originator: Paul Ripke
>Release: NetBSD 10.0_STABLE ~2024-06-26
>Organization:
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.
>Environment:
System: NetBSD slave 10.0_STABLE NetBSD 10.0_STABLE (SLAVE) #10: Wed Jun 26 09:22:33 AEST 2024 stix@slave:/home/netbsd/netbsd-10/obj.amd64/home/netbsd/netbsd-10/src/sys/arch/amd64/compile/SLAVE amd64
Architecture: x86_64
Machine: amd64
>Description:
Kernel panic in ucompoll, when unlocking xscreensaver (?!? coincidence??)
I've seen this a few times now, unfortunately never got a dump.
Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] uvm_fault(0xffff92260e431b38, 0x0, 1) -> e
Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] fatal page fault in supervisor mode
Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] trap type 6 code 0 rip 0xffffffff804957ff cs 0x8 rflags 0x10246 cr2 0xe8 ilevel 0 rsp 0xffffcb8450369bf0
Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] curlwp 0xffff9226569840c0 pid 2195.2810 lowest kstack 0xffffcb84503652c0
Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] panic: trap
Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] cpu0: Begin traceback...
Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] vpanic() at netbsd:vpanic+0x183
Jul 24 21:48:55 slave /netbsd: [ 2260291.0668541] panic() at netbsd:panic+0x3c
Jul 24 21:48:55 slave /netbsd: [ 2260291.0678538] trap() at netbsd:trap+0xbaf
Jul 24 21:48:55 slave /netbsd: [ 2260291.0678538] --- trap (number 6) ---
Jul 24 21:48:55 slave /netbsd: [ 2260291.0678538] ucompoll() at netbsd:ucompoll+0x2a
Jul 24 21:48:55 slave /netbsd: [ 2260291.0688538] cdev_poll() at netbsd:cdev_poll+0x87
Jul 24 21:48:55 slave /netbsd: [ 2260291.0698538] spec_poll() at netbsd:spec_poll+0x6a
Jul 24 21:48:55 slave /netbsd: [ 2260291.0698538] VOP_POLL() at netbsd:VOP_POLL+0x5d
Jul 24 21:48:55 slave /netbsd: [ 2260291.0708538] sel_do_scan() at netbsd:sel_do_scan+0x3ba
Jul 24 21:48:55 slave /netbsd: [ 2260291.0718537] selcommon() at netbsd:selcommon+0x18c
Jul 24 21:48:55 slave /netbsd: [ 2260291.0718537] sys___select50() at netbsd:sys___select50+0x75
Jul 24 21:48:55 slave /netbsd: [ 2260291.0728538] syscall() at netbsd:syscall+0x1fc
Jul 24 21:48:55 slave /netbsd: [ 2260291.0728538] --- syscall (number 417) ---
Jul 24 21:48:55 slave /netbsd: [ 2260291.0738538] netbsd:syscall+0x1fc:
Jul 24 21:48:55 slave /netbsd: [ 2260291.0738538] cpu0: End traceback...
Do we dump the x86 error code from page faults? I'm not seeing it above.
Given the code in ucompoll, I'm wondering if it was on instruction fetch...
although thinking again, cr2 being 0xe8, it's probably the offset of the
function pointer in the struct, with nullptr base address.
>How-To-Repeat:
Unknown.
>Fix:
Unknown.
>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Paul Ripke <stix@stix.id.au>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org,
hannken@NetBSD.org
Subject: Re: kern/58466: Kernel panic in ucompoll
Date: Thu, 25 Jul 2024 05:04:13 +0000
> Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] uvm_fault(0xffff92260e4=
31b38, 0x0, 1) -> e
The 0x0 here means something is trying to use the null page.
> Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] trap type 6 code 0 rip =
0xffffffff804957ff cs 0x8 rflags 0x10246 cr2 0xe8 ilevel 0 rsp 0xffffcb8450=
369bf0
The cr2 here is the actual address, 0xe8.
> Jul 24 21:48:55 slave /netbsd: [ 2260291.0678538] ucompoll() at netbsd:uc=
ompoll+0x2a
This is the faulting instruction, and:
(gdb) x/i ucompoll+0x2a
0xffffffff804be468 <ucompoll+42>: mov 0xe8(%rax),%edi
(gdb) print &((struct ucom_softc *)0)->sc_tty
$2 =3D (struct tty **) 0xe8
(gdb) list *(ucompoll+0x2a)
0xffffffff804be468 is in ucompoll (/home/riastradh/netbsd/current/src/sys/d=
ev/usb/ucom.c:849).
844 int
845 ucompoll(dev_t dev, int events, struct lwp *l)
846 {
847 const int unit =3D UCOMUNIT(dev);
848 struct ucom_softc * const sc =3D device_lookup_private(&uco=
m_cd, unit);
849 struct tty *tp =3D sc->sc_tty;
850 =20
851 UCOMHIST_FUNC(); UCOMHIST_CALLED();
852 =20
853 return (*tp->t_linesw->l_poll)(tp, events, l);
So sc is null, and it crashes trying to compute sc->sc_tty.
But how is sc null? It shouldn't be possible to enter ucompoll
without a device private for the unit number -- either:
(a) there has never been such a unit, in which case there should be no
paths to ucompoll with this number; or
(b) that unit is being detached concurrently, in which case spec_poll
should either
i. acquire a reference that blocks detach from finishing until
ucompoll done (by holding up spec_io_drain which holds up
spec_close which holds up vdevgone), or
ii. (possibly block and then) fail with POLLERR, via failure in
spec_io_enter -> vdead_check, without entering ucompoll; or
(c) that unit has been detached, in which case the vnode has been
revoked with vdevgone in ucomdetach and should no longer be
accessible as such and ucompoll should again not be entered.
Obviously I'm missing a path where control can sneak into ucompoll
with a detached unit, though!
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.