NetBSD Problem Report #59235
From mlelstv@tazz.1st.de Sun Mar 30 08:51:41 2025
Return-Path: <mlelstv@tazz.1st.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 9C58D1A9239
for <gnats-bugs@gnats.NetBSD.org>; Sun, 30 Mar 2025 08:51:41 +0000 (UTC)
Message-Id: <20250330085110.73545CCAE5@tazz.1st.de>
Date: Sun, 30 Mar 2025 10:51:10 +0200 (CEST)
From: mlelstv@netbsd.org
Reply-To: mlelstv@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: efi(8) panics
X-Send-Pr-Version: 3.95
>Number: 59235
>Notify-List: riastradh@NetBSD.org
>Category: kern
>Synopsis: efi(8) panics
>Confidential: no
>Severity: serious
>Priority: low
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Mar 30 08:55:00 +0000 2025
>Last-Modified: Sun Mar 30 20:38:37 +0000 2025
>Originator: Michael van Elst
>Release: NetBSD 10.99.12
>Organization:
>Environment:
System: NetBSD tazz 10.99.12 NetBSD 10.99.12 (TAZZ) #7: Sat Mar 29 20:45:44 UTC 2025 mlelstv@slowpoke:/scratch2/obj.amd64/scratch/netbsd-current/src/sys/arch/amd64/compile/TAZZ amd64
Architecture: x86_64
Machine: amd64
>Description:
Running efi without arguments causes a kernel panic.
With a LOCKDEBUG kernel you get this:
[ 39236.0798177] Mutex error: rw_vector_enter,304: spin lock held
[ 39236.0798177] lock address : netbsd:efi_runtime_lock
[ 39236.0798177] type : spin
[ 39236.0798177] initialized : netbsd:efi_init+0x2e6
[ 39236.0798177] shared holds : 0 exclusive: 1
[ 39236.0798177] shares wanted: 0 exclusive: 0
[ 39236.0798177] relevant cpu : 0 last held: 0
[ 39236.0798177] relevant lwp : 0xffff8723827b9800 last held: 0xffff8723827b9800
[ 39236.0798177] last locked* : netbsd:efi_runtime_enter+0x22
[ 39236.0798177] unlocked : 0
[ 39236.0798177] owner field : 0x0000000000010600 wait/spin: 0/1
[ 39236.0820692] panic: LOCKDEBUG: Mutex error: rw_vector_enter,304: spin lock h
eld
[ 39236.0820692] cpu0: Begin traceback...
[ 39236.0820692] vpanic() at netbsd:vpanic+0x171
[ 39236.0820692] panic() at netbsd:panic+0x3c
[ 39236.0820692] lockdebug_abort1() at netbsd:lockdebug_abort1+0xe4
[ 39236.0820692] rw_enter() at netbsd:rw_enter+0x80
[ 39236.0820692] uvm_fault_internal() at netbsd:uvm_fault_internal+0x12b
[ 39236.0820692] trap() at netbsd:trap+0x3a7
[ 39236.0820692] --- trap (number 6) ---
[ 39236.0820692] ?() at dab1bd08
[ 39236.0820692] cpu0: End traceback...
[ 39236.0820692] fatal breakpoint trap in supervisor mode
[ 39236.0820692] trap type 1 code 0 rip 0xffffffff8023541d cs 0x8 rflags 0x202 c
r2 0xd6202288 ilevel 0x8 rsp 0xffff8f82b1f675e0
[ 39236.0820692] curlwp 0xffff8723827b9800 pid 5708.5708 lowest kstack 0xffff8f8
2b1f632c0
Stopped in pid 5708.5708 (efi) at netbsd:breakpoint+0x5: leave
db{0}>
However, this seems to be a consequence of generating a trap while holding
a spinlock.
gdb shows more stack frames:
...
#11 0xffffffff809c32b8 in panic (
fmt=fmt@entry=0xffffffff80eb3828 "LOCKDEBUG: %s error: %s,%zu: %s")
at /scratch/netbsd-current/src/sys/kern/subr_prf.c:209
#12 0xffffffff809b6c8b in lockdebug_abort1 (dopanic=true,
msg=0xffffffff80e2bee8 "spin lock held", s=6, ld=0xffff8f800fc07240,
line=304, func=0xffffffff80db9680 <__func__.6> "rw_vector_enter")
at /scratch/netbsd-current/src/sys/kern/subr_lockdebug.c:818
#13 lockdebug_abort1 (func=0xffffffff80db9680 <__func__.6> "rw_vector_enter",
line=304, ld=0xffff8f800fc07240, s=6,
msg=0xffffffff80e2bee8 "spin lock held", dopanic=<optimized out>)
at /scratch/netbsd-current/src/sys/kern/subr_lockdebug.c:796
#14 0xffffffff80980386 in rw_vector_enter (rw=0xffff872377e3a3c8, op=RW_READER)
at /scratch/netbsd-current/src/sys/kern/kern_rwlock.c:304
#15 0xffffffff8091708e in vm_map_lock_read (map=<optimized out>)
at /scratch/netbsd-current/src/sys/uvm/uvm_map.c:726
#16 0xffffffff8090f92b in uvmfault_lookup (write_lock=false,
ufi=0xffff8f82b1f67800)
at /scratch/netbsd-current/src/sys/uvm/uvm_fault_i.h:122
#17 uvm_fault_check (maxprot=false, ranons=<synthetic pointer>,
flt=0xffff8f82b1f67838, ufi=0xffff8f82b1f67800)
at /scratch/netbsd-current/src/sys/uvm/uvm_fault.c:992
#18 uvm_fault_internal (orig_map=orig_map@entry=0xffff872377e3a3c0,
vaddr=vaddr@entry=3592429568, access_type=access_type@entry=1,
fault_flag=fault_flag@entry=0)
at /scratch/netbsd-current/src/sys/uvm/uvm_fault.c:902
#19 0xffffffff8023c180 in trap (frame=0xffff8f82b1f67aa0)
at /scratch/netbsd-current/src/sys/arch/amd64/amd64/trap.c:519
#20 0xffffffff80234ad4 in alltraps ()
#21 0x00000000dab1bd08 in ?? ()
#22 0xffff8f82b1f67c08 in ?? ()
#23 0xffff8f82b1f67c50 in ?? ()
#24 0xffff87236d797000 in ?? ()
#25 0xffff8f82b1f67ef0 in ?? ()
#26 0x0000000000000100 in ?? ()
#27 0xffff8f82b1f67ef0 in ?? ()
#28 0xffff87236d797000 in ?? ()
#29 0xffff8f82b1f67c50 in ?? ()
#30 0xffff87247411c000 in ?? ()
#31 0xffffffff8058fb25 in efi_runtime_nextvar (namesize=0x0,
name=0xffff87236d797000, vendor=0xffff8f82b1f67ef0)
at /scratch/netbsd-current/src/sys/arch/x86/x86/efi_machdep.c:948
#32 0xffff87230d781080 in ?? ()
#33 0x0000000000000000 in ?? ()
Calling efi_runtime_nextvar with namesize==NULL might be reason. Also,
name points to just NUL-Bytes. But the arguments are questionable, there
is only one place where efi_nextvar() is called (in sys/dev/efi.c), and
the namesize argument is the pointer to a local variable &namesize that
cannot possibly be NULL.
>How-To-Repeat:
Run efi(8) without arguments as root or member of wheel.
>Fix:
>Release-Note:
>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59235 CVS commit: src/sbin/efi
Date: Sun, 30 Mar 2025 14:30:40 +0000
Module Name: src
Committed By: riastradh
Date: Sun Mar 30 14:30:40 UTC 2025
Modified Files:
src/sbin/efi: efiio.c
Log Message:
efi(8): EFI_VARNAME_MAXLENGTH is in bytes, not CHAR16.
Same with struct efi_var_ioc::namesize.
This shouldn't change the semantics of the program -- it was just
allocating twice the maximum buffer space that the kernel would ever
actually use; now it only allocates exactly the maximum buffer space
that the kernel will ever actually use.
Prompted by (but will not fix):
PR kern/59235: efi(8) panics
To generate a diff of this commit:
cvs rdiff -u -r1.2 -r1.3 src/sbin/efi/efiio.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59235 CVS commit: src
Date: Sun, 30 Mar 2025 14:36:49 +0000
Module Name: src
Committed By: riastradh
Date: Sun Mar 30 14:36:49 UTC 2025
Modified Files:
src/sbin/efi: defs.h efiio.c
src/sys/dev: efi.c
Log Message:
efi(8)/efi(9): Rename EFI_VARNAME_MAXLENGTH -> EFI_VARNAME_MAXBYTES.
This should help avoid potential confusion over the units.
No functional change intended.
Prompted by (but will not fix):
PR kern/59235: efi(8) panics
To generate a diff of this commit:
cvs rdiff -u -r1.1 -r1.2 src/sbin/efi/defs.h
cvs rdiff -u -r1.3 -r1.4 src/sbin/efi/efiio.c
cvs rdiff -u -r1.9 -r1.10 src/sys/dev/efi.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Taylor R Campbell <riastradh@NetBSD.org>
To: mlelstv@NetBSD.org
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/59235: efi(8) panics
Date: Sun, 30 Mar 2025 19:10:17 +0000
This is a multi-part message in MIME format.
--=_k+R2A0jPCfl/fFvU84G4E4n02s6AxFtx
I don't know what's going wrong here -- I suspect the firmware on this
machine might be buggy. I reviewed arguments and units and EFI spec,
and I haven't seen anything wrong in what NetBSD is doing, either in
userland or in the kernel, except for allocating a buffer that's twice
as large as can ever be used in userland (which is now fixed).
The stack frame gdb shows is obviously bogus; there are no calls to
efi_runtime_nextvar with null `namesize'. `name' pointing to all NUL
bytes is to be expected: to start iterating over all variables, you
first call GetNextVariableName (a.k.a. struct efi_rt::rt_scanvar) with
a variable name that starts with NUL.
The attached patch sets up an onfault handler while executing the EFI
runtime services call in an attempt to recover gracefully from certain
classes of buggy firmware rather than crash the system. Of course,
this might suppress diagnostic information -- but the diagnostic
information we have doesn't seem to be very useful anyway.
--=_k+R2A0jPCfl/fFvU84G4E4n02s6AxFtx
Content-Type: text/plain; charset="ISO-8859-1"; name="pr59235-efirtonfaultabort"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr59235-efirtonfaultabort.patch"
# HG changeset patch
# User Taylor R Campbell <riastradh@NetBSD.org>
# Date 1743344228 0
# Sun Mar 30 14:17:08 2025 +0000
# Branch trunk
# Node ID ccfb5337d46228a1ecee5217824307ac1f2ed5ff
# Parent dc0a5778375a753d3e8b04e840cacb7f95896328
# EXP-Topic riastradh-pr59235-efifault
WIP: Set up a pcb_onfault handler during efirt access to abort.
PR kern/59235: efi(8) panics
diff -r dc0a5778375a -r ccfb5337d462 sys/arch/amd64/amd64/cpufunc.S
--- a/sys/arch/amd64/amd64/cpufunc.S Sun Mar 30 00:07:51 2025 +0000
+++ b/sys/arch/amd64/amd64/cpufunc.S Sun Mar 30 14:17:08 2025 +0000
@@ -503,3 +503,12 @@ ENTRY(svs_quad_copy)
movsq
ret
END(svs_quad_copy)
+
+/*
+ * pcb_onfault routine for EFI runtime services. Error code is in %rax.
+ * All other registers are undefined. See efi_machdep.c for usage.
+ */
+ENTRY(efi_runtime_pcb_onfault)
+ movq %rax,%rdi
+ jmp _C_LABEL(efi_runtime_fault_abort)
+END(efi_runtime_pcb_onfault)
diff -r dc0a5778375a -r ccfb5337d462 sys/arch/x86/x86/efi_machdep.c
--- a/sys/arch/x86/x86/efi_machdep.c Sun Mar 30 00:07:51 2025 +0000
+++ b/sys/arch/x86/x86/efi_machdep.c Sun Mar 30 14:17:08 2025 +0000
@@ -805,9 +805,44 @@ fail: /*
=20
struct efi_runtime_cookie {
void *erc_pmap_cookie;
+ label_t *erc_label;
};
=20
/*
+ * efi_runtime_fault_cookie, efi_runtime_fault_error
+ *
+ * State for longjmp on fault. Access serialized by
+ * efi_runtime_lock.
+ */
+static struct efi_runtime_cookie *efi_runtime_fault_cookie;
+static int efi_runtime_fault_error;
+
+/*
+ * efi_runtime_pcb_onfault
+ *
+ * Return address for pcb_onfault during EFI runtime services
+ * call. This takes an error code in %rax as if it were a return
+ * value -- it is not a normal function to call.
+ */
+void efi_runtime_pcb_onfault(void);
+
+/*
+ * efi_runtime_fault_abort(error)
+ *
+ * Standard procedure call triggered by pcb_onfault. Invoked by
+ * efi_runtime_pcb_onfault. Makes 1 come flying out of the setjmp
+ * that began the EFI runtime services call.
+ */
+void efi_runtime_fault_abort(int);
+void
+efi_runtime_fault_abort(int error)
+{
+
+ efi_runtime_fault_error =3D error;
+ longjmp(efi_runtime_fault_cookie->erc_label);
+}
+
+/*
* efi_runtime_enter(cookie)
*
* Prepare to call an EFI runtime service, storing state for the
@@ -815,8 +850,9 @@ struct efi_runtime_cookie {
* done.
*/
static void
-efi_runtime_enter(struct efi_runtime_cookie *cookie)
+efi_runtime_enter(struct efi_runtime_cookie *cookie, label_t *label)
{
+ struct pcb * const pcb =3D lwp_getpcb(curlwp);
=20
KASSERT(efi_runtime_pmap !=3D NULL);
=20
@@ -847,6 +883,15 @@ efi_runtime_enter(struct efi_runtime_coo
* run privileged, which they need in order to do I/O anyway.
*/
cookie->erc_pmap_cookie =3D pmap_activate_sync(efi_runtime_pmap);
+
+ /*
+ * If the EFI runtime services code is broken, try to recover
+ * gracefully.
+ */
+ cookie->erc_label =3D label;
+ efi_runtime_fault_cookie =3D cookie;
+ KASSERT(pcb->pcb_onfault =3D=3D NULL);
+ pcb->pcb_onfault =3D &efi_runtime_pcb_onfault;
}
=20
/*
@@ -858,13 +903,38 @@ efi_runtime_enter(struct efi_runtime_coo
static void
efi_runtime_exit(struct efi_runtime_cookie *cookie)
{
+ struct pcb * const pcb =3D lwp_getpcb(curlwp);
=20
+ KASSERT(pcb->pcb_onfault =3D=3D &efi_runtime_pcb_onfault);
+ pcb->pcb_onfault =3D NULL;
pmap_deactivate_sync(efi_runtime_pmap, cookie->erc_pmap_cookie);
fpu_kern_leave();
mutex_exit(&efi_runtime_lock);
}
=20
/*
+ * efi_runtime_faulted()
+ *
+ * To be called by an EFI runtime services wrapper function when 1
+ * comes flying out of setjmp, meaning the actual call had
+ * faulted.
+ */
+static efi_status
+efi_runtime_faulted(void)
+{
+ struct efi_runtime_cookie *cookie =3D efi_runtime_fault_cookie;
+ int error =3D efi_runtime_fault_error;
+
+ KASSERT(mutex_owned(&efi_runtime_lock));
+ KASSERT(cookie !=3D NULL);
+
+ efi_runtime_exit(cookie);
+
+ /* XXX */
+ return (error =3D=3D EFAULT ? EFI_DEVICE_ERROR : EFI_INVALID_PARAMETER);
+}
+
+/*
* efi_runtime_gettime(tm, tmcap)
*
* Call RT->GetTime, or return EFI_UNSUPPORTED if unsupported.
@@ -874,11 +944,15 @@ efi_runtime_gettime(struct efi_tm *tm, s
{
efi_status status;
struct efi_runtime_cookie cookie;
+ label_t label;
=20
if (efi_rt.rt_gettime =3D=3D NULL)
return EFI_UNSUPPORTED;
=20
- efi_runtime_enter(&cookie);
+ if (setjmp(&label))
+ return efi_runtime_faulted();
+
+ efi_runtime_enter(&cookie, &label);
status =3D efi_rt.rt_gettime(tm, tmcap);
efi_runtime_exit(&cookie);
=20
@@ -896,11 +970,15 @@ efi_runtime_settime(struct efi_tm *tm)
{
efi_status status;
struct efi_runtime_cookie cookie;
+ label_t label;
=20
if (efi_rt.rt_settime =3D=3D NULL)
return EFI_UNSUPPORTED;
=20
- efi_runtime_enter(&cookie);
+ if (setjmp(&label))
+ return efi_runtime_faulted();
+
+ efi_runtime_enter(&cookie, &label);
status =3D efi_rt.rt_settime(tm);
efi_runtime_exit(&cookie);
=20
@@ -918,11 +996,15 @@ efi_runtime_getvar(efi_char *name, struc
{
efi_status status;
struct efi_runtime_cookie cookie;
+ label_t label;
=20
if (efi_rt.rt_getvar =3D=3D NULL)
return EFI_UNSUPPORTED;
=20
- efi_runtime_enter(&cookie);
+ if (setjmp(&label))
+ return efi_runtime_faulted();
+
+ efi_runtime_enter(&cookie, &label);
status =3D efi_rt.rt_getvar(name, vendor, attrib, datasize, data);
efi_runtime_exit(&cookie);
=20
@@ -940,11 +1022,15 @@ efi_runtime_nextvar(unsigned long *names
{
efi_status status;
struct efi_runtime_cookie cookie;
+ label_t label;
=20
if (efi_rt.rt_scanvar =3D=3D NULL)
return EFI_UNSUPPORTED;
=20
- efi_runtime_enter(&cookie);
+ if (setjmp(&label))
+ return efi_runtime_faulted();
+
+ efi_runtime_enter(&cookie, &label);
status =3D efi_rt.rt_scanvar(namesize, name, vendor);
efi_runtime_exit(&cookie);
=20
@@ -962,11 +1048,15 @@ efi_runtime_setvar(efi_char *name, struc
{
efi_status status;
struct efi_runtime_cookie cookie;
+ label_t label;
=20
if (efi_rt.rt_setvar =3D=3D NULL)
return EFI_UNSUPPORTED;
=20
- efi_runtime_enter(&cookie);
+ if (setjmp(&label))
+ return efi_runtime_faulted();
+
+ efi_runtime_enter(&cookie, &label);
status =3D efi_rt.rt_setvar(name, vendor, attrib, datasize, data);
efi_runtime_exit(&cookie);
=20
--=_k+R2A0jPCfl/fFvU84G4E4n02s6AxFtx--
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.