NetBSD Problem Report #58466

From stix@stix.id.au  Thu Jul 25 04:22:48 2024
Return-Path: <stix@stix.id.au>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 4BAF61A9239
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 25 Jul 2024 04:22:48 +0000 (UTC)
Message-Id: <20240725042235.DCBC619E8B@stix.id.au>
Date: Thu, 25 Jul 2024 14:22:35 +1000 (AEST)
From: stix@stix.id.au
Reply-To: stix@stix.id.au
To: gnats-bugs@NetBSD.org
Subject: Kernel panic in ucompoll
X-Send-Pr-Version: 3.95

>Number:         58466
>Category:       kern
>Synopsis:       Kernel panic in ucompoll
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jul 25 04:25:00 +0000 2024
>Last-Modified:  Thu Jul 25 05:05:01 +0000 2024
>Originator:     Paul Ripke
>Release:        NetBSD 10.0_STABLE ~2024-06-26
>Organization:
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
 discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.
>Environment:


System: NetBSD slave 10.0_STABLE NetBSD 10.0_STABLE (SLAVE) #10: Wed Jun 26 09:22:33 AEST 2024 stix@slave:/home/netbsd/netbsd-10/obj.amd64/home/netbsd/netbsd-10/src/sys/arch/amd64/compile/SLAVE amd64
Architecture: x86_64
Machine: amd64
>Description:
Kernel panic in ucompoll, when unlocking xscreensaver (?!? coincidence??)
I've seen this a few times now, unfortunately never got a dump.

Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] uvm_fault(0xffff92260e431b38, 0x0, 1) -> e
Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] fatal page fault in supervisor mode
Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] trap type 6 code 0 rip 0xffffffff804957ff cs 0x8 rflags 0x10246 cr2 0xe8 ilevel 0 rsp 0xffffcb8450369bf0
Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] curlwp 0xffff9226569840c0 pid 2195.2810 lowest kstack 0xffffcb84503652c0
Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] panic: trap
Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] cpu0: Begin traceback...
Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] vpanic() at netbsd:vpanic+0x183
Jul 24 21:48:55 slave /netbsd: [ 2260291.0668541] panic() at netbsd:panic+0x3c
Jul 24 21:48:55 slave /netbsd: [ 2260291.0678538] trap() at netbsd:trap+0xbaf
Jul 24 21:48:55 slave /netbsd: [ 2260291.0678538] --- trap (number 6) ---
Jul 24 21:48:55 slave /netbsd: [ 2260291.0678538] ucompoll() at netbsd:ucompoll+0x2a
Jul 24 21:48:55 slave /netbsd: [ 2260291.0688538] cdev_poll() at netbsd:cdev_poll+0x87
Jul 24 21:48:55 slave /netbsd: [ 2260291.0698538] spec_poll() at netbsd:spec_poll+0x6a
Jul 24 21:48:55 slave /netbsd: [ 2260291.0698538] VOP_POLL() at netbsd:VOP_POLL+0x5d
Jul 24 21:48:55 slave /netbsd: [ 2260291.0708538] sel_do_scan() at netbsd:sel_do_scan+0x3ba
Jul 24 21:48:55 slave /netbsd: [ 2260291.0718537] selcommon() at netbsd:selcommon+0x18c
Jul 24 21:48:55 slave /netbsd: [ 2260291.0718537] sys___select50() at netbsd:sys___select50+0x75
Jul 24 21:48:55 slave /netbsd: [ 2260291.0728538] syscall() at netbsd:syscall+0x1fc
Jul 24 21:48:55 slave /netbsd: [ 2260291.0728538] --- syscall (number 417) ---
Jul 24 21:48:55 slave /netbsd: [ 2260291.0738538] netbsd:syscall+0x1fc:
Jul 24 21:48:55 slave /netbsd: [ 2260291.0738538] cpu0: End traceback...


Do we dump the x86 error code from page faults? I'm not seeing it above.
Given the code in ucompoll, I'm wondering if it was on instruction fetch...
although thinking again, cr2 being 0xe8, it's probably the offset of the
function pointer in the struct, with nullptr base address.

>How-To-Repeat:
Unknown.

>Fix:
Unknown.

>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Paul Ripke <stix@stix.id.au>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org,
	hannken@NetBSD.org
Subject: Re: kern/58466: Kernel panic in ucompoll
Date: Thu, 25 Jul 2024 05:04:13 +0000

 > Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] uvm_fault(0xffff92260e4=
 31b38, 0x0, 1) -> e

 The 0x0 here means something is trying to use the null page.

 > Jul 24 21:48:55 slave /netbsd: [ 2260291.0658539] trap type 6 code 0 rip =
 0xffffffff804957ff cs 0x8 rflags 0x10246 cr2 0xe8 ilevel 0 rsp 0xffffcb8450=
 369bf0

 The cr2 here is the actual address, 0xe8.

 > Jul 24 21:48:55 slave /netbsd: [ 2260291.0678538] ucompoll() at netbsd:uc=
 ompoll+0x2a

 This is the faulting instruction, and:

 (gdb) x/i ucompoll+0x2a
    0xffffffff804be468 <ucompoll+42>:    mov    0xe8(%rax),%edi
 (gdb) print &((struct ucom_softc *)0)->sc_tty
 $2 =3D (struct tty **) 0xe8
 (gdb) list *(ucompoll+0x2a)
 0xffffffff804be468 is in ucompoll (/home/riastradh/netbsd/current/src/sys/d=
 ev/usb/ucom.c:849).
 844     int
 845     ucompoll(dev_t dev, int events, struct lwp *l)
 846     {
 847             const int unit =3D UCOMUNIT(dev);
 848             struct ucom_softc * const sc =3D device_lookup_private(&uco=
 m_cd, unit);
 849             struct tty *tp =3D sc->sc_tty;
 850    =20
 851             UCOMHIST_FUNC(); UCOMHIST_CALLED();
 852    =20
 853             return (*tp->t_linesw->l_poll)(tp, events, l);

 So sc is null, and it crashes trying to compute sc->sc_tty.

 But how is sc null?  It shouldn't be possible to enter ucompoll
 without a device private for the unit number -- either:

 (a) there has never been such a unit, in which case there should be no
     paths to ucompoll with this number; or

 (b) that unit is being detached concurrently, in which case spec_poll
     should either

      i. acquire a reference that blocks detach from finishing until
         ucompoll done (by holding up spec_io_drain which holds up
         spec_close which holds up vdevgone), or

     ii. (possibly block and then) fail with POLLERR, via failure in
         spec_io_enter -> vdead_check, without entering ucompoll; or

 (c) that unit has been detached, in which case the vnode has been
     revoked with vdevgone in ucomdetach and should no longer be
     accessible as such and ucompoll should again not be entered.

 Obviously I'm missing a path where control can sneak into ucompoll
 with a detached unit, though!

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.