NetBSD Problem Report #57661
From www@netbsd.org Sun Oct 15 12:12:43 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 866001A9238
for <gnats-bugs@gnats.NetBSD.org>; Sun, 15 Oct 2023 12:12:43 +0000 (UTC)
Message-Id: <20231015121242.36BCF1A923A@mollari.NetBSD.org>
Date: Sun, 15 Oct 2023 12:12:42 +0000 (UTC)
From: logix@foobar.franken.de
Reply-To: logix@foobar.franken.de
To: gnats-bugs@NetBSD.org
Subject: Crash when booting on Xeon Silver 4416+ in KVM/Qemu
X-Send-Pr-Version: www-1.0
>Number: 57661
>Notify-List: riastradh@NetBSD.org, manu@NetBSD.org, andrew.cagney@gmail.com
>Category: port-amd64
>Synopsis: Crash when booting on Xeon Silver 4416+ in KVM/Qemu
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: port-amd64-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Oct 15 12:15:00 +0000 2023
>Closed-Date: Mon Feb 02 03:44:26 +0000 2026
>Last-Modified: Mon Feb 02 22:55:01 +0000 2026
>Originator: Harold Gutch
>Release: NetBSD current
>Organization:
>Environment:
NetBSD 10.99.10 NetBSD 10.99.10 (GENERIC) #0: Thu Oct 12 23:51:05 UTC 2023 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
Booting a current kernel on a Xeon Silver 4416+ CPU inside KVM/Qemu yields:
[ 1.5179371] uvm_fault(0xffffffff81911220, 0xffffac805182a000, 2) -> e
[ 1.5179371] fatal page fault in supervisor mode
[ 1.5179371] trap type 6 code 0x2 rip 0xffffffff80fdcfdc cs 0x8 rflags 0x10206 cr2 0xffffac805182a000 ilevel 0 rsp 0xffffffff81d56d38
[ 1.5280253] curlwp 0xffffffff8188b9c0 pid 0.0 lowest kstack 0xffffffff81d512c0
kernel: page fault trap, code=0
Stopped in pid 0.0 (system) at netbsd:memset+0x2c: repe stosq %es:(%rdi)
memset() at netbsd:memset+0x2c
lwp_create() at netbsd:lwp_create+0x325
fork1() at netbsd:fork1+0x43b
main()at netbsd:main+0x47a
The stack trace is a bit of a red herring, I traced down the memset to line 343 of src/sys/arch/x86/x86/fpu.c, so we actually have:
lwp_create() -> uvm_lwp_fork() -> cpu_lwp_fork() -> fpu_lwp_fork() -> memset()
Adding
printf("DEBUG: sizeof(pcb2->pcb_savefpu)==%ld\n", sizeof(pcb2->pcb_savefpu));
printf("DEBUG: x86_fpu_save_size==%d\n", x86_fpu_save_size);
before the memset() call prints
[ 1.8432366] DEBUG: sizeof(pcb2->pcb_savefpu)==576
[ 1.8432366] DEBUG: x86_fpu_save_size==11008
Changing the VM's CPU to Sandy Bridge prints
[ 1.8897648] DEBUG: sizeof(pcb2->pcb_savefpu)==576
[ 1.8897648] DEBUG: x86_fpu_save_size==832
... which also *seems* odd, but the machine works then. But the comment in line 80 of src/sys/arch/amd64/include/pcb.h appears to suggest that pcb_savefpu goes until the end of the page, so I guess the 832 vs 576 bytes discrepancy falls under "yes... but that's OK". But with x86_fpu_save_size==11008 we are writing far beyond the end of the page.
>How-To-Repeat:
Boot NetBSD on Linux in Qemu with "-cpu host" on a host with a Xeon Silver 4416+ CPU.
Possibly alternatively: Boot NetBSD natively on a machine with such a CPU (untested as of now, I don't have such a machine in testing state available right now)
>Fix:
In Qemu, select a different CPU type without AVX-512, e.g., Sandy Bridge.
>Release-Note:
>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: src/sys/arch/x86/x86
Date: Sun, 15 Oct 2023 13:13:22 +0000
Module Name: src
Committed By: riastradh
Date: Sun Oct 15 13:13:22 UTC 2023
Modified Files:
src/sys/arch/x86/x86: identcpu.c
Log Message:
x86: Panic if cpuid's fpu save size is larger than we support.
Ideally this wouldn't panic, but the alternative right now is to
crash in a memset later -- or silently corrupt kernel memory -- so
this doesn't make the situation worse than it was before.
PR kern/57661
XXX pullup-10
XXX pullup-9
XXX pullup-8
To generate a diff of this commit:
cvs rdiff -u -r1.123 -r1.124 src/sys/arch/x86/x86/identcpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: src/sys/arch/x86/x86
Date: Tue, 17 Oct 2023 11:12:33 +0000
Module Name: src
Committed By: riastradh
Date: Tue Oct 17 11:12:33 UTC 2023
Modified Files:
src/sys/arch/x86/x86: identcpu.c
Log Message:
x86: Panic early if fpu save size is too large, take 2.
This shouldn't break any existing systems (for real this time), but
it should make the failure mode more obvious on systems that are
already broken.
PR kern/57661
XXX pullup-10
XXX pullup-9
XXX pullup-8
To generate a diff of this commit:
cvs rdiff -u -r1.126 -r1.127 src/sys/arch/x86/x86/identcpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: src/sys/arch/x86/x86
Date: Tue, 17 Oct 2023 14:17:42 +0000
Module Name: src
Committed By: riastradh
Date: Tue Oct 17 14:17:42 UTC 2023
Modified Files:
src/sys/arch/x86/x86: identcpu.c
Log Message:
Revert "x86: Panic early if fpu save size is too large, take 2."
Apparently this is too early to print anything useful, so it just
causes a reboot loop.
PR kern/57661
To generate a diff of this commit:
cvs rdiff -u -r1.127 -r1.128 src/sys/arch/x86/x86/identcpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Taylor R Campbell <riastradh@NetBSD.org>
To: netbsd-bugs@NetBSD.org, gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-amd64/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu
Date: Fri, 11 Apr 2025 14:25:46 +0000
This is a multi-part message in MIME format.
--=_tVq9BOFzAnpXS9N5Qy8XBZQoi+hFOgDk
Followup from thread on tech-kern:
https://mail-index.netbsd.org/tech-kern/2025/04/11/msg030349.html
--=_tVq9BOFzAnpXS9N5Qy8XBZQoi+hFOgDk
Content-Type: message/rfc822
Content-Disposition: inline
Date: Fri, 11 Apr 2025 14:24:47 +0000
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Emmanuel Dreyfus <manu@netbsd.org>
CC: tech-kern@netbsd.org
Subject: Re: Page fault early in boot
Message-Id: <20250411142453.75A4284D3F@mail.netbsd.org>
In-reply-to: <Z_jRY4rZhMk4WALO@homeworld.netbsd.org> (manu@netbsd.org)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
> Date: Fri, 11 Apr 2025 08:22:59 +0000
> From: Emmanuel Dreyfus <manu@netbsd.org>
>=20
> I tried to boot NetBSD-current (i386 and amd6), BIOS and UEFI modes) on a=
=20
> Dell server, it always panic on a "fatal page fault in supervisor mode"
> early during the boot. The backtrace says we went there through
> main/fork1/lwp_create/memset
>=20
> A screenshot is available there:
> https://dl.espci.fr/ticket/2a34048a0a9749c68cfb0dc380cc2259
>=20
> Any idea how to debug that? I expected NetBSD to lack some hardware
> support on this recent machine, not to be unable to reach single user=20
> mode.
This looks like the same issue as:
PR kern/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu
https://gnats.netbsd.org/57661
Some additional diagnostics from the PR:
> The stack trace is a bit of a red herring, I traced down the memset to li=
ne 343 of src/sys/arch/x86/x86/fpu.c, so we actually have:
> lwp_create() -> uvm_lwp_fork() -> cpu_lwp_fork() -> fpu_lwp_fork() -> mem=
set()
>=20
> Adding
> printf("DEBUG: sizeof(pcb2->pcb_savefpu)=3D=3D%ld\n", sizeof(pcb2->pcb_=
savefpu));
> printf("DEBUG: x86_fpu_save_size=3D=3D%d\n", x86_fpu_save_size);
>=20
> before the memset() call prints
>=20
> [ 1.8432366] DEBUG: sizeof(pcb2->pcb_savefpu)=3D=3D576
> [ 1.8432366] DEBUG: x86_fpu_save_size=3D=3D11008
>=20
> Changing the VM's CPU to Sandy Bridge prints
>=20
> [ 1.8897648] DEBUG: sizeof(pcb2->pcb_savefpu)=3D=3D576
> [ 1.8897648] DEBUG: x86_fpu_save_size=3D=3D832
So, the x86_fpu_save_size is larger than we can handle and we don't
report the problem gracefully either -- my initial attempt to put a
more obvious panic in x86/identcpu.c when we initially determine
x86_fpu_save_size hit the problem that it happens too early and sends
the machine into a reboot loop.
Unfortunately, simply expanding the space for pcb_savefpu is not
trivial:
80 /*
81 * IMPORTANT NOTE: this structure, including the variable-sized FPU=
state at
82 * the end, must fit within one page.
83 */
84 struct pcb {
...
103 union savefpu pcb_savefpu __aligned(64); /* floating point state */
104 /* **** DO NOT ADD ANYTHING HERE **** */
105 };
https://nxr.netbsd.org/xref/src/sys/arch/amd64/include/pcb.h?r=3D1.32#80
The top comment came from this commit:
> Module Name: src
> Committed By: maxv
> Date: Tue Mar 17 17:18:49 UTC 2020
>=20
> Modified Files:
> src/sys/arch/amd64/include: cpu.h param.h pcb.h types.h
> src/sys/arch/x86/x86: vm_machdep.c
>=20
> Log Message:
> Add a redzone between the pcb and the stack. Sent to port-amd64@.
https://mail-index.netbsd.org/source-changes/2020/03/17/msg115178.html
So, the immediate issue is that the memset in fpu_lwp_fork is hitting
the guard page. Perhaps resolving that is simply a matter of teaching
cpu_uarea_alloc to advance by round_page(offsetof(struct pcb,
pcb_savefpu) + x86_fpu_save_size) rather than by PAGE_SIZE when
deciding where to set the guard page:
364 void *
365 cpu_uarea_alloc(bool system)
366 {
367 vaddr_t base, va;
368 paddr_t pa;
369=20
370 base =3D uvm_km_alloc(kernel_map, USPACE + PAGE_SIZE, 0,
371 UVM_KMF_WIRED|UVM_KMF_WAITVA);
372=20
373 /* Page[1] =3D RedZone */
374 va =3D base + PAGE_SIZE;
375 if (!pmap_extract(pmap_kernel(), va, &pa)) {
376 panic("%s: impossible, Page[1] unmapped", __func__);
377 }
378 pmap_kremove(va, PAGE_SIZE);
379 uvm_pagefree(PHYS_TO_VM_PAGE(pa));
https://nxr.netbsd.org/xref/src/sys/arch/x86/x86/vm_machdep.c?r=3D1.46#352
But the bottom comment, /* **** DO NOT ADD ANYTHING HERE **** */, is
much older:
> Module Name: src
> Committed By: dsl
> Date: Thu Feb 20 18:19:10 UTC 2014
>=20
> Modified Files:
> src/sys/arch/amd64/amd64: genassym.cf machdep.c
> src/sys/arch/amd64/include: pcb.h proc.h
> src/sys/arch/i386/i386: locore.S machdep.c
> src/sys/arch/i386/include: pcb.h proc.h
> src/sys/arch/x86/x86: vm_machdep.c
> src/sys/sys: param.h
>=20
> Log Message:
> Move the amd64 and i386 pcb to the bottom of the uarea, and move the
> kernel stack to the top.
> Change the pcb layouts so that fpu save area is at the end and is
> 64byte aligned ready for xsave (saving the ymm registers).
> Welcome to 6.99.32
https://mail-index.netbsd.org/source-changes/2014/02/20/msg051841.html
It's not obvious to me whether the comment affects anything at hand.
Perhaps it's just admonishing future developers not to add things
_other than fpu state_ because the fpu state is actually variable size
and may extend past the size of a union savefpu object.
--=_tVq9BOFzAnpXS9N5Qy8XBZQoi+hFOgDk--
From: Taylor R Campbell <riastradh@NetBSD.org>
To: hgutch@NetBSD.org, manu@NetBSD.org, andrew.cagney@gmail.com
Cc: netbsd-bugs@NetBSD.org, gnats-bugs@NetBSD.org
Subject: Re: port-amd64/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu
Date: Fri, 11 Apr 2025 18:12:49 +0000
This is a multi-part message in MIME format.
--=_jGdl80rZzbJ3AV6nW+2j7DYaEAx46dnv
The attached patch bumps UPAGES and USPACE to make room for the extra
FPU save size. Can you please try it out on the affected machines?
And, if it boots, can you run the attached program xfeat2.c and share
the output?
The patch isn't great as is -- it unconditionally increases the memory
usage of every thread by two pages, even on machines where they'll
never be used. But if this works, we can look into sizing it (mostly)
dynamically based on the cpuid results.
--=_jGdl80rZzbJ3AV6nW+2j7DYaEAx46dnv
Content-Type: text/plain; charset="ISO-8859-1"; name="pr57661-x86fpusaveoverflow-growuspacepoc"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr57661-x86fpusaveoverflow-growuspacepoc.patch"
diff -r 1cb0546d18b6 sys/arch/amd64/amd64/machdep.c
--- a/sys/arch/amd64/amd64/machdep.c Thu Apr 10 18:53:29 2025 +0000
+++ b/sys/arch/amd64/amd64/machdep.c Fri Apr 11 17:01:24 2025 +0000
@@ -1784,6 +1784,26 @@ init_x86_64(paddr_t first_avail)
consinit(); /* XXX SHOULD NOT BE DONE HERE */
=20
/*
+ * Validate x86_fpu_save_size (determined by cpu_probe above)
+ * and set uspace based on it. We do this now, rather than
+ * earlier, so that if we have to panic, the console has been
+ * initialized. We don't allow arbitrary x86_fpu_save_size
+ * because we statically preallocate the largest possible size
+ * early at start in locore.S, before we know the save size.
+ *
+ * cpu_uarea_alloc in x86/vm_machdep.c relies on this
+ * validation.
+ */
+ __CTASSERT(offsetof(struct pcb, pcb_savefpu) < PAGE_SIZE);
+ if (x86_fpu_save_size > PAGE_SIZE - offsetof(struct pcb, pcb_savefpu) +
+ UPAGES_FPU*PAGE_SIZE) {
+ panic("x86_fpu_save_size too large: %u > %zu",
+ x86_fpu_save_size,
+ (PAGE_SIZE - offsetof(struct pcb, pcb_savefpu) +
+ UPAGES_FPU*PAGE_SIZE));
+ }
+
+ /*
* Initialize PAGE_SIZE-dependent variables.
*/
uvm_md_init();
diff -r 1cb0546d18b6 sys/arch/amd64/include/param.h
--- a/sys/arch/amd64/include/param.h Thu Apr 10 18:53:29 2025 +0000
+++ b/sys/arch/amd64/include/param.h Fri Apr 11 17:01:24 2025 +0000
@@ -69,11 +69,29 @@
#define SINCR 1 /* increment of stack/NBPG */
=20
#if defined(KASAN) || defined(KMSAN)
-#define UPAGES 8
+#define UPAGES_KxSAN 2
+#else
+#define UPAGES_KxSAN 0
+#endif
+#if defined(SVS)
+#define UPAGES_SVS 1
+#else
+#define UPAGES_SVS 0
+#endif
+#define UPAGES_PCB 1 /* one page for the PCB */
+#define UPAGES_FPU 2 /* two extra pages for fpusave */
+#define UPAGES_RED 1 /* one page for red zone between pcb/stack */
+#define UPAGES_STACK 3 /* three pages (12 KiB) of stack space */
+#define UPAGES \
+ (UPAGES_PCB + UPAGES_FPU + UPAGES_RED + UPAGES_STACK + UPAGES_SVS + \
+ UPAGES_KxSAN)
+
+#if defined(KASAN) || defined(KMSAN)
+__CTASSERT(UPAGES =3D=3D 10);
#elif defined(SVS)
-#define UPAGES 6 /* 1 page used internally by SVS */
+__CTASSERT(UPAGES =3D=3D 8);
#else
-#define UPAGES 5 /* pages of u-area (1 for redzone) */
+__CTASSERT(UPAGES =3D=3D 7);
#endif
#define USPACE (UPAGES * NBPG) /* total size of u-area */
=20
diff -r 1cb0546d18b6 sys/arch/x86/x86/vm_machdep.c
--- a/sys/arch/x86/x86/vm_machdep.c Thu Apr 10 18:53:29 2025 +0000
+++ b/sys/arch/x86/x86/vm_machdep.c Fri Apr 11 17:01:24 2025 +0000
@@ -351,9 +351,11 @@ vunmapbuf(struct buf *bp, vsize_t len)
#ifdef __HAVE_CPU_UAREA_ROUTINES
/*
* Layout of the uarea:
- * Page[0] =3D PCB
- * Page[1] =3D RedZone
- * Page[2] =3D Stack
+ * Page[0] =3D PCB and start of FPU save area
+ * Page[1] =3D FPU save extension page 1
+ * Page[2] =3D FPU save extension page 2
+ * Page[3] =3D RedZone
+ * Page[4] =3D Stack
* Page[...] =3D Stack
* Page[UPAGES-1] =3D Stack
* Page[UPAGES] =3D RedZone
@@ -370,8 +372,8 @@ cpu_uarea_alloc(bool system)
base =3D uvm_km_alloc(kernel_map, USPACE + PAGE_SIZE, 0,
UVM_KMF_WIRED|UVM_KMF_WAITVA);
=20
- /* Page[1] =3D RedZone */
- va =3D base + PAGE_SIZE;
+ /* Page[UPAGES_PCB + UPAGES_FPU] =3D RedZone */
+ va =3D base + (UPAGES_PCB + UPAGES_FPU)*PAGE_SIZE;
if (!pmap_extract(pmap_kernel(), va, &pa)) {
panic("%s: impossible, Page[1] unmapped", __func__);
}
--=_jGdl80rZzbJ3AV6nW+2j7DYaEAx46dnv
Content-Type: text/plain; charset="ISO-8859-1"; name="xfeat2"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="xfeat2.c"
#include <stdint.h>
#include <stdio.h>
#include <util.h>
#define XCR0_FMT "\177\020" \
"b\000" "x87 FPU/MMX\0" \
"b\001" "SSE\0" \
"b\002" "AVX\0" \
"b\003" "BNDREG\0" \
"b\004" "BNDCSR\0" \
"b\005" "Opmask\0" \
"b\006" "ZMM_Hi256\0" \
"b\007" "Hi16_ZMM\0" \
"b\011" "PKRU\0" \
/* end of XCR0_FMT */
int
main(void)
{
uint32_t eax, ebx, ecx, edx, xcr0lo, xcr0hi;
uint64_t xcr0;
char buf[128];
asm("cpuid"
: "=3Da"(eax), "=3Db"(ebx), "=3Dc"(ecx), "=3Dd"(edx)
: "a"(0x0d), "c"(0x00));
snprintb(buf, sizeof(buf), XCR0_FMT, eax);
printf("XFeatureSupportedMask[0:31] =3D %s\n", buf);
printf("XFeatureEnabledSizeMask =3D 0x%x\n", ebx);
printf("XFeatureSupportedSizeMask =3D 0x%x\n", ecx);
snprintb(buf, sizeof(buf), XCR0_FMT, (uint64_t)edx << 32);
printf("XFeatureSupportedMask[32:63] =3D %s\n", buf);
fflush(stdout);
asm("xgetbv" : "=3Da"(xcr0lo), "=3Dd"(xcr0hi) : "c"(0));
xcr0 =3D (uint64_t)xcr0hi << 32 | xcr0lo;
snprintb(buf, sizeof(buf), XCR0_FMT, xcr0);
printf("xcr0 =3D %s\n", buf);
fflush(stdout);
fflush(stdout);
return ferror(stdout);
}
--=_jGdl80rZzbJ3AV6nW+2j7DYaEAx46dnv--
From: Harold Gutch <logix@foobar.franken.de>
To: gnats-bugs@netbsd.org
Cc: port-amd64-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, riastradh@NetBSD.org, manu@NetBSD.org,
andrew.cagney@gmail.com
Subject: Re: port-amd64/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu
Date: Fri, 11 Apr 2025 20:32:46 +0200
Success for me (in a VM) with your patch.
[ 1.000004] cpu0 at mainbus0 apid 0
[ 1.000004] cpu0: Use lfence to serialize rdtsc
[ 1.000004] cpu0: Intel(R) Xeon(R) Silver 4416+, id 0x806f8
[ 1.000004] cpu0: node 0, package 0, core 0, smt 0
[...]
and the system reaches multiuser.
: {7} ./xfeat2
XFeatureSupportedMask[0:31] = 0x602e7<x87 FPU/MMX,SSE,AVX,Opmask,ZMM_Hi256,Hi16_ZMM,PKRU>
XFeatureEnabledSizeMask = 0xa80
XFeatureSupportedSizeMask = 0x2b00
XFeatureSupportedMask[32:63] = 0
xcr0 = 0xe7<x87 FPU/MMX,SSE,AVX,Opmask,ZMM_Hi256,Hi16_ZMM>
thanks!
Harold
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Harold Gutch <logix@foobar.franken.de>
Cc: gnats-bugs@netbsd.org, port-amd64-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
manu@NetBSD.org, andrew.cagney@gmail.com
Subject: Re: port-amd64/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu
Date: Fri, 11 Apr 2025 21:09:44 +0000
This is a multi-part message in MIME format.
--=_9iRVuQui7SP0XI0my0i42KB+RbbJU+ZS
Thanks, can you try with this patch?
It incurs less cost on machines with smaller FPU save sizes -- a few
branches and an extra pointer indirection in the save/restore path,
but no pages of memory wasted on every thread. But it's a lot more
involved, so it's more likely I've screwed something up.
(Another alternative would be to dynamically change USPACE, but that
is also pretty involved and likely to get things wrong -- plus it
still entails requires allocating the pages for kernel threads which
will never use them.)
--=_9iRVuQui7SP0XI0my0i42KB+RbbJU+ZS
Content-Type: text/plain; charset="ISO-8859-1"; name="pr57661-x86fpusaveoverflow-separatefpusave"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr57661-x86fpusaveoverflow-separatefpusave.patch"
diff -r 1cb0546d18b6 sys/arch/amd64/amd64/genassym.cf
--- a/sys/arch/amd64/amd64/genassym.cf Thu Apr 10 18:53:29 2025 +0000
+++ b/sys/arch/amd64/amd64/genassym.cf Fri Apr 11 21:07:26 2025 +0000
@@ -186,7 +186,6 @@ define PCB_FLAGS offsetof(struct pcb, p
define PCB_COMPAT32 PCB_COMPAT32
define PCB_FS offsetof(struct pcb, pcb_fs)
define PCB_GS offsetof(struct pcb, pcb_gs)
-define PCB_SAVEFPU offsetof(struct pcb, pcb_savefpu)
=20
define TF_RDI offsetof(struct trapframe, tf_rdi)
define TF_RSI offsetof(struct trapframe, tf_rsi)
diff -r 1cb0546d18b6 sys/arch/amd64/include/param.h
--- a/sys/arch/amd64/include/param.h Thu Apr 10 18:53:29 2025 +0000
+++ b/sys/arch/amd64/include/param.h Fri Apr 11 21:07:26 2025 +0000
@@ -69,11 +69,27 @@
#define SINCR 1 /* increment of stack/NBPG */
=20
#if defined(KASAN) || defined(KMSAN)
-#define UPAGES 8
+#define UPAGES_KxSAN 2
+#else
+#define UPAGES_KxSAN 0
+#endif
+#if defined(SVS)
+#define UPAGES_SVS 1
+#else
+#define UPAGES_SVS 0
+#endif
+#define UPAGES_PCB 1 /* one page for the PCB */
+#define UPAGES_RED 1 /* one page for red zone between pcb/stack */
+#define UPAGES_STACK 3 /* three pages (12 KiB) of stack space */
+#define UPAGES \
+ (UPAGES_PCB + UPAGES_RED + UPAGES_STACK + UPAGES_SVS + UPAGES_KxSAN)
+
+#if defined(KASAN) || defined(KMSAN)
+__CTASSERT(UPAGES =3D=3D 8);
#elif defined(SVS)
-#define UPAGES 6 /* 1 page used internally by SVS */
+__CTASSERT(UPAGES =3D=3D 6);
#else
-#define UPAGES 5 /* pages of u-area (1 for redzone) */
+__CTASSERT(UPAGES =3D=3D 5);
#endif
#define USPACE (UPAGES * NBPG) /* total size of u-area */
=20
diff -r 1cb0546d18b6 sys/arch/amd64/include/pcb.h
--- a/sys/arch/amd64/include/pcb.h Thu Apr 10 18:53:29 2025 +0000
+++ b/sys/arch/amd64/include/pcb.h Fri Apr 11 21:07:26 2025 +0000
@@ -97,13 +97,15 @@ struct pcb {
struct dbreg *pcb_dbregs;
uint16_t pcb_fpu_dflt_cw;
int pcb_iopl;
+ union savefpu *pcb_savefpu;
=20
- uint32_t pcb_unused[8]; /* unused */
+ uint32_t pcb_unused[6]; /* unused */
=20
- union savefpu pcb_savefpu __aligned(64); /* floating point state */
+ union savefpu pcb_savefpusmall __aligned(64); /* small fpu state */
/* **** DO NOT ADD ANYTHING HERE **** */
};
#ifndef __lint__
+__CTASSERT(offsetof(struct pcb, pcb_savefpusmall) =3D=3D 128);
__CTASSERT(sizeof(struct pcb) - sizeof (union savefpu) =3D=3D 128);
#endif
=20
diff -r 1cb0546d18b6 sys/arch/i386/i386/genassym.cf
--- a/sys/arch/i386/i386/genassym.cf Thu Apr 10 18:53:29 2025 +0000
+++ b/sys/arch/i386/i386/genassym.cf Fri Apr 11 21:07:26 2025 +0000
@@ -192,7 +192,6 @@ define PCB_ESP0 offsetof(struct pcb, pc
define PCB_FSD offsetof(struct pcb, pcb_fsd)
define PCB_GSD offsetof(struct pcb, pcb_gsd)
define PCB_IOMAP offsetof(struct pcb, pcb_iomap)
-define PCB_SAVEFPU offsetof(struct pcb, pcb_savefpu)
=20
define TF_CS offsetof(struct trapframe, tf_cs)
define TF_EIP offsetof(struct trapframe, tf_eip)
diff -r 1cb0546d18b6 sys/arch/i386/include/pcb.h
--- a/sys/arch/i386/include/pcb.h Thu Apr 10 18:53:29 2025 +0000
+++ b/sys/arch/i386/include/pcb.h Fri Apr 11 21:07:26 2025 +0000
@@ -95,16 +95,18 @@ struct pcb {
=20
#define PCB_DBREGS 0x01
int pcb_flags;
-
- int not_used[15];
+ union savefpu *pcb_savefpu;
=20
- /* floating point state */
- union savefpu pcb_savefpu __aligned(64);
+ int not_used[14];
+
+ /* small fpu state */
+ union savefpu pcb_savefpusmall __aligned(64);
/* **** DO NOT ADD ANYTHING HERE **** */
=20
};
#ifndef __lint__
/* This doesn't really matter, but there is a lot of implied padding */
+__CTASSERT(offsetof(struct pcb, pcb_savefpusmall) =3D=3D 128);
__CTASSERT(sizeof(struct pcb) - sizeof (union savefpu) =3D=3D 128);
#endif
=20
diff -r 1cb0546d18b6 sys/arch/x86/include/cpu.h
--- a/sys/arch/x86/include/cpu.h Thu Apr 10 18:53:29 2025 +0000
+++ b/sys/arch/x86/include/cpu.h Fri Apr 11 21:07:26 2025 +0000
@@ -481,6 +481,7 @@ extern uint64_t x86_xsave_features;
extern size_t x86_xsave_offsets[];
extern size_t x86_xsave_sizes[];
extern uint32_t x86_fpu_mxcsr_mask;
+bool x86_fpu_save_separate_p(void);
=20
extern void (*x86_cpu_idle)(void);
#define cpu_idle() (*x86_cpu_idle)()
diff -r 1cb0546d18b6 sys/arch/x86/include/cpu_extended_state.h
--- a/sys/arch/x86/include/cpu_extended_state.h Thu Apr 10 18:53:29 2025 +0=
000
+++ b/sys/arch/x86/include/cpu_extended_state.h Fri Apr 11 21:07:26 2025 +0=
000
@@ -226,6 +226,8 @@ struct xstate {
* It is defined this way to separate the definitions and to
* minimise the number of union/struct selectors.
* NB: Some userspace stuff (eg firefox) uses it to parse ucontext.
+ * NB: This is not actually the largest possible save space;
+ * x86_fpu_save_size may be larger.
*/
union savefpu {
struct save87 sv_87;
diff -r 1cb0546d18b6 sys/arch/x86/x86/fpu.c
--- a/sys/arch/x86/x86/fpu.c Thu Apr 10 18:53:29 2025 +0000
+++ b/sys/arch/x86/x86/fpu.c Fri Apr 11 21:07:26 2025 +0000
@@ -136,11 +136,35 @@ void fpu_switch(struct lwp *, struct lwp
=20
uint32_t x86_fpu_mxcsr_mask __read_mostly =3D 0;
=20
+static const union savefpu safe_fpu_storage __aligned(64) =3D {
+ .sv_xmm =3D {
+ .fx_mxcsr =3D __SAFE_MXCSR__,
+ },
+};
+static const union savefpu zero_fpu_storage __aligned(64);
+
+static const void *safe_fpu __read_mostly =3D &safe_fpu_storage;
+static const void *zero_fpu __read_mostly =3D &zero_fpu_storage;
+
+/*
+ * x86_fpu_save_separate_p()
+ *
+ * True if we allocate the FPU save space separately, outside the
+ * struct pcb itself, because it doesn't fit in a single page.
+ */
+bool
+x86_fpu_save_separate_p(void)
+{
+
+ return x86_fpu_save_size >
+ PAGE_SIZE - offsetof(struct pcb, pcb_savefpusmall);
+}
+
static inline union savefpu *
fpu_lwp_area(struct lwp *l)
{
struct pcb *pcb =3D lwp_getpcb(l);
- union savefpu *area =3D &pcb->pcb_savefpu;
+ union savefpu *area =3D pcb->pcb_savefpu;
=20
KASSERT((l->l_flag & LW_SYSTEM) =3D=3D 0);
if (l =3D=3D curlwp) {
@@ -155,7 +179,7 @@ static inline void
fpu_save_lwp(struct lwp *l)
{
struct pcb *pcb =3D lwp_getpcb(l);
- union savefpu *area =3D &pcb->pcb_savefpu;
+ union savefpu *area =3D pcb->pcb_savefpu;
int s;
=20
s =3D splvm();
@@ -192,11 +216,55 @@ fpuinit(struct cpu_info *ci)
void
fpuinit_mxcsr_mask(void)
{
+ /*
+ * If the CPU's x86 fpu save size is larger than union savefpu,
+ * we have to allocate larger buffers for the safe and zero FPU
+ * states used here and by fpu_kern_enter/leave.
+ *
+ * Note: This is NOT the same as x86_fpu_save_separate_p(),
+ * which may have a little more space than union savefpu.
+ */
+ const bool allocfpusave =3D x86_fpu_save_size > sizeof(union savefpu);
+ vaddr_t va;
+
+#ifdef __i386__
+ if (x86_fpu_save_separate_p()) {
+ /*
+ * XXX Need to teach cpu_uarea_alloc/free to allocate a
+ * separate fpu save space -- currently only done on
+ * amd64, not on i386. Also need to make fpu_lwp_fork
+ * not clobber pcb_savefpu.
+ */
+ panic("NetBSD/i386 does not support fpu save size %u",
+ x86_fpu_save_size);
+ }
+#endif
+
#ifndef XENPV
- union savefpu fpusave __aligned(64);
+ union savefpu fpusave_stack __aligned(64);
+ union savefpu *fpusave;
u_long psl;
=20
- memset(&fpusave, 0, sizeof(fpusave));
+ /*
+ * Allocate a temporary save space from the stack if it fits,
+ * or from the heap otherwise, so we can query its mxcsr mask.
+ */
+ if (allocfpusave) {
+ /*
+ * Need 64-byte alignment for XSAVE instructions.
+ * kmem_* doesn't guarantee that and we don't have a
+ * handy posix_memalign in the kernel unless we hack it
+ * ourselves with vmem(9), so just ask for page
+ * alignment with uvm_km(9).
+ */
+ __CTASSERT(PAGE_SIZE >=3D 64);
+ va =3D uvm_km_alloc(kernel_map, x86_fpu_save_size, PAGE_SIZE,
+ UVM_KMF_WIRED|UVM_KMF_ZERO|UVM_KMF_WAITVA);
+ fpusave =3D (void *)va;
+ } else {
+ fpusave =3D &fpusave_stack;
+ memset(fpusave, 0, sizeof(*fpusave));
+ }
=20
/* Disable interrupts, and enable FPU */
psl =3D x86_read_psl();
@@ -204,16 +272,25 @@ fpuinit_mxcsr_mask(void)
clts();
=20
/* Fill in the FPU area */
- fxsave(&fpusave);
+ fxsave(fpusave);
=20
/* Restore previous state */
stts();
x86_write_psl(psl);
=20
- if (fpusave.sv_xmm.fx_mxcsr_mask =3D=3D 0) {
+ if (fpusave->sv_xmm.fx_mxcsr_mask =3D=3D 0) {
x86_fpu_mxcsr_mask =3D __INITIAL_MXCSR_MASK__;
} else {
- x86_fpu_mxcsr_mask =3D fpusave.sv_xmm.fx_mxcsr_mask;
+ x86_fpu_mxcsr_mask =3D fpusave->sv_xmm.fx_mxcsr_mask;
+ }
+
+ /*
+ * Free the temporary save space.
+ */
+ if (allocfpusave) {
+ uvm_km_free(kernel_map, va, x86_fpu_save_size, UVM_KMF_WIRED);
+ fpusave =3D NULL;
+ va =3D 0;
}
#else
/*
@@ -223,6 +300,26 @@ fpuinit_mxcsr_mask(void)
*/
x86_fpu_mxcsr_mask =3D __INITIAL_MXCSR_MASK__;
#endif
+
+ /*
+ * If necessary, allocate FPU save spaces for safe or zero FPU
+ * state, for fpu_kern_enter/leave.
+ */
+ if (allocfpusave) {
+ __CTASSERT(PAGE_SIZE >=3D 64);
+
+ va =3D uvm_km_alloc(kernel_map, x86_fpu_save_size, PAGE_SIZE,
+ UVM_KMF_WIRED|UVM_KMF_ZERO|UVM_KMF_WAITVA);
+ memcpy((void *)va, &safe_fpu_storage,
+ sizeof(safe_fpu_storage));
+ safe_fpu =3D (void *)va;
+
+ va =3D uvm_km_alloc(kernel_map, x86_fpu_save_size, PAGE_SIZE,
+ UVM_KMF_WIRED|UVM_KMF_ZERO|UVM_KMF_WAITVA);
+ memcpy((void *)va, &zero_fpu_storage,
+ sizeof(zero_fpu_storage));
+ zero_fpu =3D (void *)va;
+ }
}
=20
static inline void
@@ -305,7 +402,7 @@ void
fpu_handle_deferred(void)
{
struct pcb *pcb =3D lwp_getpcb(curlwp);
- fpu_area_restore(&pcb->pcb_savefpu, x86_xsave_features,
+ fpu_area_restore(pcb->pcb_savefpu, x86_xsave_features,
!(curlwp->l_proc->p_flag & PK_32));
}
=20
@@ -321,7 +418,7 @@ fpu_switch(struct lwp *oldlwp, struct lw
if (oldlwp->l_md.md_flags & MDL_FPU_IN_CPU) {
KASSERT(!(oldlwp->l_flag & LW_SYSTEM));
pcb =3D lwp_getpcb(oldlwp);
- fpu_area_save(&pcb->pcb_savefpu, x86_xsave_features,
+ fpu_area_save(pcb->pcb_savefpu, x86_xsave_features,
!(oldlwp->l_proc->p_flag & PK_32));
oldlwp->l_md.md_flags &=3D ~MDL_FPU_IN_CPU;
}
@@ -338,14 +435,28 @@ fpu_lwp_fork(struct lwp *l1, struct lwp=20
if (__predict_false(l2->l_flag & LW_SYSTEM)) {
return;
}
+
+#ifdef __i386__
+ /*
+ * XXX i386 currently never allocates separate FPU save space
+ * (so it's limited to 4096 - 128 bytes of space available in
+ * the pcb page). Changing this requires removing the panic at
+ * boot in fpuinit_mxcsr_mask and adopting something like the
+ * uarea allocation in cpu_uarea_alloc/free like amd64.
+ */
+ KASSERT(x86_fpu_save_size <=3D
+ PAGE_SIZE - offsetof(struct pcb, pcb_savefpusmall));
+ pcb2->pcb_savefpu =3D &pcb2->pcb_savefpusmall;
+#endif
+
/* For init(8). */
if (__predict_false(l1->l_flag & LW_SYSTEM)) {
- memset(&pcb2->pcb_savefpu, 0, x86_fpu_save_size);
+ memset(pcb2->pcb_savefpu, 0, x86_fpu_save_size);
return;
}
=20
fpu_save =3D fpu_lwp_area(l1);
- memcpy(&pcb2->pcb_savefpu, fpu_save, x86_fpu_save_size);
+ memcpy(pcb2->pcb_savefpu, fpu_save, x86_fpu_save_size);
l2->l_md.md_flags &=3D ~MDL_FPU_IN_CPU;
}
=20
@@ -378,11 +489,6 @@ fpu_lwp_abandon(struct lwp *l)
void
fpu_kern_enter(void)
{
- static const union savefpu safe_fpu __aligned(64) =3D {
- .sv_xmm =3D {
- .fx_mxcsr =3D __SAFE_MXCSR__,
- },
- };
struct lwp *l =3D curlwp;
struct cpu_info *ci;
int s;
@@ -421,7 +527,7 @@ fpu_kern_enter(void)
/*
* Zero the FPU registers and install safe control words.
*/
- fpu_area_restore(&safe_fpu, x86_xsave_features, /*is_64bit*/false);
+ fpu_area_restore(safe_fpu, x86_xsave_features, /*is_64bit*/false);
}
=20
/*
@@ -432,7 +538,6 @@ fpu_kern_enter(void)
void
fpu_kern_leave(void)
{
- static const union savefpu zero_fpu __aligned(64);
struct cpu_info *ci =3D curcpu();
int s;
=20
@@ -451,7 +556,7 @@ fpu_kern_leave(void)
* through Spectre-class attacks to userland, even if there are
* no bugs in fpu state management.
*/
- fpu_area_restore(&zero_fpu, x86_xsave_features, /*is_64bit*/false);
+ fpu_area_restore(zero_fpu, x86_xsave_features, /*is_64bit*/false);
=20
/*
* Set CR0_TS again so that the kernel can't accidentally use
diff -r 1cb0546d18b6 sys/arch/x86/x86/vm_machdep.c
--- a/sys/arch/x86/x86/vm_machdep.c Thu Apr 10 18:53:29 2025 +0000
+++ b/sys/arch/x86/x86/vm_machdep.c Fri Apr 11 21:07:26 2025 +0000
@@ -366,10 +366,42 @@ cpu_uarea_alloc(bool system)
{
vaddr_t base, va;
paddr_t pa;
+ struct pcb *pcb;
=20
base =3D uvm_km_alloc(kernel_map, USPACE + PAGE_SIZE, 0,
UVM_KMF_WIRED|UVM_KMF_WAITVA);
=20
+ /*
+ * Prepare the FPU save area:
+ *
+ * 1. If this is a system thread, no save area.
+ * XXX Allocate/free one in kthread_fpu_enter/exit_md.
+ *
+ * 2. If this is a user thread, and the fpu save size is large
+ * enough, allocate an extra block of memory for it.
+ *
+ * 3. Otherwise, this is a user thread and the fpu save size
+ * fits inside the pcb page, so use that.
+ *
+ * XXX Note that this is currently amd64-only -- if you extend
+ * this FPU save space allocation to i386, make sure to update
+ * fpu_lwp_fork so it doesn't clobber the pcb_savefpu pointer.
+ * We panic on i386 boot if the CPU needs more space in
+ * fpuinit_mxcsr_mask.
+ */
+ pcb =3D (void *)base;
+ if (system) { /* (1) */
+ pcb->pcb_savefpu =3D NULL;
+ } else if (x86_fpu_save_separate_p()) { /* (2) */
+ __CTASSERT(PAGE_SIZE >=3D 64);
+ /* No need to zero -- caller will initialize. */
+ va =3D uvm_km_alloc(kernel_map, x86_fpu_save_size, PAGE_SIZE,
+ UVM_KMF_WIRED|UVM_KMF_WAITVA);
+ pcb->pcb_savefpu =3D (void *)va;
+ } else { /* (3) */
+ pcb->pcb_savefpu =3D &pcb->pcb_savefpusmall;
+ }
+
/* Page[1] =3D RedZone */
va =3D base + PAGE_SIZE;
if (!pmap_extract(pmap_kernel(), va, &pa)) {
@@ -394,8 +426,20 @@ cpu_uarea_alloc(bool system)
bool
cpu_uarea_free(void *addr)
{
+ const struct pcb *const pcb =3D addr;
vaddr_t base =3D (vaddr_t)addr;
=20
+ /*
+ * If we allocated a separate FPU save area, free it.
+ */
+ if (pcb->pcb_savefpu !=3D NULL &&
+ pcb->pcb_savefpu !=3D &pcb->pcb_savefpusmall) {
+ KASSERTMSG(x86_fpu_save_separate_p(), "pcb=3D%p pcb_savefpu=3D%p",
+ pcb, pcb->pcb_savefpu);
+ uvm_km_free(kernel_map, (vaddr_t)pcb->pcb_savefpu,
+ x86_fpu_save_size, UVM_KMF_WIRED);
+ }
+
KASSERT(!pmap_extract(pmap_kernel(), base + PAGE_SIZE, NULL));
KASSERT(!pmap_extract(pmap_kernel(), base + USPACE, NULL));
uvm_km_free(kernel_map, base, USPACE + PAGE_SIZE, UVM_KMF_WIRED);
--=_9iRVuQui7SP0XI0my0i42KB+RbbJU+ZS--
From: Harold Gutch <logix@foobar.franken.de>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@netbsd.org, port-amd64-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, manu@NetBSD.org,
andrew.cagney@gmail.com
Subject: Re: port-amd64/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu
Date: Sat, 12 Apr 2025 10:48:53 +0200
On Fri, Apr 11, 2025 at 09:09:44PM +0000, Taylor R Campbell wrote:
> Thanks, can you try with this patch?
Looks good with this one too. I haven't done extensive testing, but I
reach multiuser and nothing seems out of the ordinary.
Harold
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: src/sys/arch/amd64/include
Date: Thu, 24 Apr 2025 01:49:30 +0000
Module Name: src
Committed By: riastradh
Date: Thu Apr 24 01:49:30 UTC 2025
Modified Files:
src/sys/arch/amd64/include: param.h
Log Message:
amd64/param.h: Make UPAGES definition clearer.
Break it down into subaccounts that are summed at the end to make it
clear how much each part is using, and how many pages are actually
reserved for the stack.
No functional change intended.
PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
To generate a diff of this commit:
cvs rdiff -u -r1.38 -r1.39 src/sys/arch/amd64/include/param.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: src/sys/arch
Date: Thu, 24 Apr 2025 01:50:40 +0000
Module Name: src
Committed By: riastradh
Date: Thu Apr 24 01:50:39 UTC 2025
Modified Files:
src/sys/arch/amd64/amd64: genassym.cf
src/sys/arch/amd64/include: pcb.h
src/sys/arch/i386/i386: genassym.cf
src/sys/arch/i386/include: pcb.h
src/sys/arch/x86/include: cpu.h cpu_extended_state.h
src/sys/arch/x86/x86: fpu.c vm_machdep.c
Log Message:
amd64: Allocate FPU save state outside pcb if it's too large.
We have seen x86_fpu_save_size values (CPUID[EAX=0x0d, ECX=0].ECX) as
large as 11008 bytes, notably with Intel AMX TILEDATA's 8192-byte
state.
We only do this for user threads, and only on machines where it's
necessary, to avoid incurring much overhead. There is still a tiny
bit of overhead when saving and restoring the FPU state by using a
pointer indirection instead of arithmetic indirection for access to
struct pcb::pcb_savefpu, but this is probably a drop in the bucket
compared to the memory traffic incurred by the FPU state save/restore
anyway.
For now, these paths are mostly disabled on i386. We could enable
them but it will require either rewriting cpu_uarea_alloc/free for
i386, or adopting a guard page like amd64 does, which might be costly
and so should be undertaken only with some thought and care. And
since Intel AMX instructions only work in 64-bit mode, it's not
likely to be useful on i386.
PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
These changes, as a side effect, may fix:
PR kern/57258: kthread_fpu_enter/exit problem
by making sure to allocate an FPU save space that is large enough to
guarantee fpu_kern_enter/leave work safely, instead of just using a
union savefpu object on the stack (which, at 576 bytes, may be too
small on some machines, particularly with AVX512 requiring ~2.5K).
(But we'll have to do some extra work with kthread_fpu_enter/exit_md
-- if we try doing them again on x86 -- to actually allocate the
separate pcb on these machines!)
To generate a diff of this commit:
cvs rdiff -u -r1.98 -r1.99 src/sys/arch/amd64/amd64/genassym.cf
cvs rdiff -u -r1.32 -r1.33 src/sys/arch/amd64/include/pcb.h
cvs rdiff -u -r1.136 -r1.137 src/sys/arch/i386/i386/genassym.cf
cvs rdiff -u -r1.59 -r1.60 src/sys/arch/i386/include/pcb.h
cvs rdiff -u -r1.139 -r1.140 src/sys/arch/x86/include/cpu.h
cvs rdiff -u -r1.18 -r1.19 src/sys/arch/x86/include/cpu_extended_state.h
cvs rdiff -u -r1.89 -r1.90 src/sys/arch/x86/x86/fpu.c
cvs rdiff -u -r1.46 -r1.47 src/sys/arch/x86/x86/vm_machdep.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: src/sys/arch/x86/include
Date: Thu, 24 Apr 2025 01:51:08 +0000
Module Name: src
Committed By: riastradh
Date: Thu Apr 24 01:51:08 UTC 2025
Modified Files:
src/sys/arch/x86/include: specialreg.h
Log Message:
x86: Add some more XCR0 bits and references.
PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
To generate a diff of this commit:
cvs rdiff -u -r1.216 -r1.217 src/sys/arch/x86/include/specialreg.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: src/sys/arch/x86/x86
Date: Thu, 24 Apr 2025 01:51:21 +0000
Module Name: src
Committed By: riastradh
Date: Thu Apr 24 01:51:21 UTC 2025
Modified Files:
src/sys/arch/x86/x86: identcpu.c
Log Message:
x86: Reserve space for only the extended CPU state we will use.
CPUID[EAX=0x0d, ECX=0].ECX, i.e., the value of descs[2] after
x86_cpuid2(0x0d, 0, descs), gives the size in bytes of the extended CPU
state for all features supported by the hardware in CPUID[EAX=0x0d,
ECX=0].EAX which can be enabled in XCR0. However, on i386, it is
senseless to leave TILECFG and TILEDATA enabled, because they are state
for Intel AMX instructions which work only in 64-bit mode.
So, instead of querying the hardware's supported features and maximum
_supported_ extended CPU state size:
1. Query the hardware's supported features.
2. Enable only those supported by software as well (XCR0_FPU).
3. Query the hardware's maximum _enabled_ extended CPU state size.
We will also disable TILECFG and TILEDATA on amd64 for now too
because:
(a) This is not a regression, at least for TILEDATA (and I'm not sure
any machines ship with TILECFG but not TILEDATA), because the
size overflowed the PCB page and therefore never worked on amd64
(PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu).
(b) We need a little extra work to properly support reading and
writing a process's TILECFG and TILEDATA in ptrace(2), and that
work hasn't been done yet.
While here, write out x86_cpuid2(0x0d, <ecx>, ...) explicitly, rather
than x86_cpuid(0x0d, ...), to make it clear that ECX must be set --
otherwise we may get garbage. (It is, perhaps, an accident that
x86_cpuid(<eax>, ...) always sets ECX=0, but other CPUID access
paths, like gcc's <cpuid.h> __cpuid(<eax>, ...), do not, so let's
make it clear for the reader.)
XXX When we enable TILECFG and TILEDATA in amd64, we should arrange
to disable them in compat32 processes -- no sense in allocating extra
space for state they can't use anyway, since the Intel AMX
instructions work only in 64-bit mode. However, selectively
disabling them in some contexts might require hardware support for
XFD, Extended Feature Disable, which is another kettle of fish to
deal with.
PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
To generate a diff of this commit:
cvs rdiff -u -r1.134 -r1.135 src/sys/arch/x86/x86/identcpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: src/sys/arch/x86/include
Date: Thu, 24 Apr 2025 01:51:43 +0000
Module Name: src
Committed By: riastradh
Date: Thu Apr 24 01:51:43 UTC 2025
Modified Files:
src/sys/arch/x86/include: specialreg.h
Log Message:
amd64: Enable TILECFG and TILEDATA registers.
This allows processes to use the registers, and NetBSD will save and
restore them in context switches. But it does not expose them to
ptrace(2) or debuggers like all the other extended CPU state
(xmm/ymm/zmm) -- that will require more work.
PR kern/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu
PR port-amd64/59299: Support Intel AMX CPU state (TILECFG/TILEDATA)
To generate a diff of this commit:
cvs rdiff -u -r1.217 -r1.218 src/sys/arch/x86/include/specialreg.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: src/sys/arch/amd64/include
Date: Sun, 27 Apr 2025 01:32:09 +0000
Module Name: src
Committed By: riastradh
Date: Sun Apr 27 01:32:09 UTC 2025
Modified Files:
src/sys/arch/amd64/include: param.h
Log Message:
amd64/param.h: Fix KASAN/KMSAN build (and ALL build by extension).
Make UPAGES match what it was before my recent changes, as verified
by the __CTASSERT below, justifying the existence of the __CTASSERT.
As I recall, SVS is incompatible with KASAN/KMSAN, so it doesn't
contribute to the sum; presumably KASAN/KMSAN requires three pages,
though I'm not sure where this is documented. If it turns out this
accounting is wrong, we should fix it and cross-reference any relevant
constraints affecting the accounting. (But for now I'm just making
sure to restore the status quo of these definitions, which was my
intent all along with adding the __CTASSERT.)
Issue noted by hannken@.
PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
To generate a diff of this commit:
cvs rdiff -u -r1.41 -r1.42 src/sys/arch/amd64/include/param.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: port-amd64-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, logix@foobar.franken.de,
riastradh@NetBSD.org, manu@NetBSD.org, andrew.cagney@gmail.com
Subject: Re: PR/57661 CVS commit: src/sys/arch/amd64/include
Date: Mon, 28 Apr 2025 11:56:42 +0000
On Sun, Apr 27, 2025 at 01:35:01AM +0000, Taylor R Campbell wrote:
> Th following reply was made to PR port-amd64/57661; it has been noted by GNATS.
I tested -current after this batch of changes, the bug is fixed.
cpu0: highest basic info 00000020
cpu0: highest extended info 80000008
cpu0: "Intel(R) Xeon(R) Silver 4410Y"
cpu0: Intel 4th gen Xeon Scalable (Sapphire Rapids) (686-class), 2000.00 MHz
cpu0: CPU base freq 2000000000 Hz
cpu0: CPU max freq 3900000000 Hz
cpu0: TSC freq CPUID 2000000000 Hz
cpu0: family 0x6 model 0x8f stepping 0x8 (id 0x806f8)
cpu0: WARNING: Currently this info can't print correctly (level = 1, pgsize = 6)
cpu0: WARNING: Currently this info can't print correctly (level = 1, pgsize = 15)
cpu0: WARNING: Currently this info can't print correctly (level = 1, pgsize = 6)
cpu0: WARNING: Currently this info can't print correctly (level = 2, pgsize = 7)
cpu0: WARNING: Currently this info can't print correctly (level = 2, pgsize = 9)
cpu0: features 0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE>
cpu0: features 0xbfebfbff<MCA,CMOV,PAT,PSE36,CLFSH,DS,ACPI,MMX,FXSR,SSE,SSE2>
cpu0: features 0xbfebfbff<SS,HTT,TM,PBE>
cpu0: features1 0x7ffefbff<SSE3,PCLMULQDQ,DTES64,MONITOR,DS-CPL,VMX,SMX,EST>
cpu0: features1 0x7ffefbff<TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,DCA,SSE41>
cpu0: features1 0x7ffefbff<SSE42,X2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE>
cpu0: features1 0x7ffefbff<OSXSAVE,AVX,F16C,RDRAND>
cpu0: features2 0x2c100800<SYSCALL/SYSRET,XD,P1GB,RDTSCP,EM64T>
cpu0: features3 0x121<LAHF,LZCNT,PREFETCHW>
cpu0: features5 0xf3bfbfff<FSGSBASE,TSCADJUST,SGX,BMI1,HLE,AVX2,FDPEXONLY,SMEP>
cpu0: features5 0xf3bfbfff<BMI2,ERMS,INVPCID,RTM,QM,FPUCSDS,PQE,AVX512F>
cpu0: features5 0xf3bfbfff<AVX512DQ,RDSEED,ADX,SMAP,AVX512_IFMA,CLFLUSHOPT>
cpu0: features5 0xf3bfbfff<CLWB,PT,AVX512CD,SHA,AVX512BW,AVX512VL>
cpu0: features6 0xfb417fee<AVX512_VBMI,UMIP,PKU,WAITPKG,AVX512_VBMI2,CET_SS>
cpu0: features6 0xfb417fee<GFNI,VAES,VPCLMULQDQ,AVX512_VNNI,AVX512_BITALG>
cpu0: features6 0xfb417fee<TME_EN,AVX512_VPOPCNTDQ,LA57,MAWAU=0,RDPID>
cpu0: features6 0xfb417fee<BUS_LOCK_DETECT,CLDEMOTE,MOVDIRI,MOVDIR64B,ENQCMD>
cpu0: features6 0xfb417fee<SGXLC,PKS>
cpu0: features7 0xffdd4432<SGX_KEYS,FSRM,UINTR,MD_CLEAR,SERIALIZE,TSXLDTRK>
cpu0: features7 0xffdd4432<PCONFIG,ARCH_LBR,CET_IBT,AMX_BF16,AVX512_FP16>
cpu0: features7 0xffdd4432<AMX_TILE,AMX_INT8,IBRS,STIBP,L1D_FLUSH,ARCH_CAP>
cpu0: features7 0xffdd4432<CORE_CAP,SSBD>
cpu0: xsave features 0x602e7<x87,SSE,AVX,Opmask,ZMM_Hi256,Hi16_ZMM,PKRU>
cpu0: xsave instructions 0x1f<XSAVEOPT,XSAVEC,XGETBV,XSAVES,XFD>
cpu0: xsave area size: current 11008, maximum 11008, xgetbv enabled
cpu0: enabled xsave 0x600e7<x87,SSE,AVX,Opmask,ZMM_Hi256,Hi16_ZMM>
cpu0: I-cache: 32KB 64B/line 8-way, D-cache: 48KB 64B/line 12-way
cpu0: L2 cache: 2MB 64B/line 16-way
cpu0: L3 cache: 30MB 64B/line 15-way
cpu0: 64B prefetching
cpu0: ITLB: 256 4KB entries 8-way, 32 4MB entries 8-way
cpu0: Load only TLB: 8 1GB entries fully associative
cpu0: Store only TLB: 16 1GB entries fully associative
cpu0: L2 STLB: 1024 1GB entries 8-way
cpu0: Initial APIC ID 0
cpu0: Cluster/Package ID 0
cpu0: Core ID 0
cpu0: SMT ID 0
cpu0: MONITOR/MWAIT extensions 0x3<EMX,IBE>
cpu0: monitor-line size 64
cpu0: C1 substates 2
cpu0: C3 substates 1
cpu0: DSPM-eax 0x80077<DTS,IDA,ARAT,PLN,ECMD,PTM,HFI>
cpu0: DSPM-ecx 0x9<HWF,EPB,NTDC=0>
cpu0: SEF highest subleaf 00000002
cpu0: SEF-subleaf1-eax 0x1c30<AVXVNNI,AVX512_BF16,FZLRMS,FSRSB,FSRCS>
cpu0: SEF-subleaf1-edx 0x40000<CET_SSS>
cpu0: SEF-subleaf2-edx 0x17<PSFD,IPRED_CTRL,RRSBA_CTRL,BHI_CTRL>
cpu0: Power Management features: 0x100<ITSC>
cpu0: AMD Extended features 0x200<WBNOINVD>
cpu0: Perfmon: Ver. 5
cpu0: Perfmon: General: bitwidth 48, 8 counters
cpu0: Perfmon: General: avail 0xff<CORECYCL,INST,REFCYCL,LLCREF,LLCMISS,BRINST>
cpu0: Perfmon: General: avail 0xff<BRMISPR,TOPDOWNSLOT>
cpu0: Perfmon: Fixed: bitwidth 48, 4 counters
cpu0: Perfmon: Fixed: avail 0xf<INST,CLK_CORETHREAD,CLK_REF_TSC,TOPDOWNSLOT>
cpu0: microcode version 0x2b0005d2, platform ID 0
--
Emmanuel Dreyfus
manu@netbsd.org
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: src/sys/arch
Date: Mon, 28 Apr 2025 13:01:28 +0000
Module Name: src
Committed By: riastradh
Date: Mon Apr 28 13:01:28 UTC 2025
Modified Files:
src/sys/arch/amd64/include: pcb.h
src/sys/arch/x86/include: specialreg.h
src/sys/arch/x86/x86: fpu.c
Log Message:
xen: Stop-gap FPU PCB fix; disable Intel AMX for now.
Since the custom cpu_uarea_alloc/free are disabled under XENPV,
nothing would initialize struct pcb::pcb_savefpu to point either to
struct pcb::pcb_savefpusmall, or to a separately allocated large area
on machines with Intel AMX TILECFG/TILEDATA requiring it. So the
memset in fpu_lwp_fork would crash on null pointer dereference:
[ 1.0000030] uvm_fault(0xffffffff8094a300, 0x0, 2) -> e
[ 1.0000030] fatal page fault in supervisor mode
[ 1.0000030] trap type 6 code 0x2 rip 0xffffffff8062795c cs 0xe030 rflags 0x10202 cr2 0 ilevel 0 rsp 0xffffffff80adad38
[ 1.0000030] curlwp 0xffffffff8078f880 pid 0.0 lowest kstack 0xffffffff80ad62c0
kernel: page fault trap, code=0
Stopped in pid 0.0 (system) at netbsd:memset+0x2c: repe stosq %es:(%rdi)
memset() at netbsd:memset+0x2c
lwp_create() at netbsd:lwp_create+0x2f1
fork1() at netbsd:fork1+0x42c
main() at netbsd:main+0x44f
In order to support Intel AMX TILECFG/TILEDATA, or any other CPU
extensions that increase the XSAVE area beyond what fits in a single
page after struct pcb, we would need to enable the the custom
cpu_uarea_alloc/free. Currently that would imply allocating stack
guard pages (`redzone') under XENPV; if there's some reason the stack
guard pages don't work, we could also push #ifdef XENPV conditionals
into cpu_uarea_alloc/free to cover the guard pages -- to be
considered.
PR kern/59371: Xen domU uvm_fault since FPU state allocation patch
PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
To generate a diff of this commit:
cvs rdiff -u -r1.34 -r1.35 src/sys/arch/amd64/include/pcb.h
cvs rdiff -u -r1.218 -r1.219 src/sys/arch/x86/include/specialreg.h
cvs rdiff -u -r1.90 -r1.91 src/sys/arch/x86/x86/fpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 14 May 2025 01:13:07 +0000
State-Changed-Why:
fixed in HEAD, needs pullup-9 and pullup-10
There are several changes here, with different levels of risk.
x86/identcpu rev. 1.135 (x86: Reserve space for only the extended CPU
state we will use.) is probably the lowest risk to pull up, and will
likely fix this symptom on its own. But it's not enough to fix the
problem I hypothesize underlies PR kern/57258: kthread_fpu_enter/exit
problem <https://gnats.NetBSD.org/57258>, which is that on machines
with AVX-512 -- or any >576-byte xsave area size -- the logic in
fpu_kern_enter/leave uses a static 576-byte union savefpu buffer to
initialize a >576-byte xsave area.
That problem is fixed by x86/fpu.c rev. 1.90, among a collection of
other higher-risk changes (amd64: Allocate FPU save state outside pcb
if it's too large.). So, we might have to split that change up (and
test it) to make a lower-risk fix for the fpu_kern_enter/leave issue,
say for pullup-9 -- just the part to allocate safe_fpu and zero_fpu in
fpu.c separately if too large, not the parts to support
TILECFG/TILEDATA by separately allocating an fpu save area for each pcb
on affected machines.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: src/sys/arch/x86/x86
Date: Wed, 14 May 2025 23:39:01 +0000
Module Name: src
Committed By: riastradh
Date: Wed May 14 23:39:01 UTC 2025
Modified Files:
src/sys/arch/x86/x86: fpu.c
Log Message:
x86/fpu.c: Revert unnecessary part of rev. 1.90.
Most of the changes including x86/fpu.c rev. 1.90 served to support a
separately allocated XSAVE area in each thread's pcb. On machines
with large XSAVE areas, the fpu.c changes also:
1. allocated a separate area for fpuinit_mxcsr_mask, and
2. allocated a separate area for the safe/zero states of
fpu_kern_enter/leave.
But (1) is unnecessary because we uncondiitonally use FXSAVE for
that, for which union savefpu is enough -- actually, struct fxsave
(exactly 512 bytes) is enough. To be done in a separate change.
PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
To generate a diff of this commit:
cvs rdiff -u -r1.91 -r1.92 src/sys/arch/x86/x86/fpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: src/sys/arch/x86/x86
Date: Wed, 14 May 2025 23:39:54 +0000
Module Name: src
Committed By: riastradh
Date: Wed May 14 23:39:54 UTC 2025
Modified Files:
src/sys/arch/x86/x86: fpu.c
Log Message:
x86/fpu.c: Reduce needless stack usage of fpuinit_mxcsr_mask.
This is using the FXSAVE instruction explicitly, which is defined to
operate on a 512-byte area, unconditionally, so let's just use the
struct fxsave type we've defined for that instead of the larger union
savefpu.
XXX The fxsave/fxrstor functions should really be typed, but that
requires moving includes or definitions around between cpufunc.h and
cpu_extende_state.h which might have userland API implications that
I'd rather not think about right now.
PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
To generate a diff of this commit:
cvs rdiff -u -r1.92 -r1.93 src/sys/arch/x86/x86/fpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: port-amd64-maintainer@netbsd.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org, riastradh@NetBSD.org,
logix@foobar.franken.de, manu@NetBSD.org, andrew.cagney@gmail.com
Subject: Re: port-amd64/57661 (Crash when booting on Xeon Silver 4416+ in
KVM/Qemu)
Date: Thu, 15 May 2025 06:07:30 +0000
On Wed, May 14, 2025 at 01:13:08AM +0000, riastradh@NetBSD.org wrote:
> There are several changes here, with different levels of risk.
FWIW, I have been able to boot with this change set:
https://mail-index.netbsd.org/source-changes/2025/04/24/msg156552.html
https://mail-index.netbsd.org/source-changes/2025/04/24/msg156553.html
https://mail-index.netbsd.org/source-changes/2025/04/24/msg156554.html
https://mail-index.netbsd.org/source-changes/2025/04/24/msg156555.html
https://mail-index.netbsd.org/source-changes/2025/04/24/msg156556.html
https://mail-index.netbsd.org/source-changes/2025/04/28/msg156672.html
--
Emmanuel Dreyfus
manu@netbsd.org
State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Thu, 15 May 2025 12:45:55 +0000
State-Changed-Why:
Fixed in HEAD, and I have requested the following pullups for minimal
parts of the fix:
pullup-9 #1954 https://releng.netbsd.org/cgi-bin/req-9.cgi?show=1954
=> Just the change to x86/identcpu.c that limits XCR0 to the NetBSD-
supported features before we query CPUID for the state size.
pullup-10 #1118 https://releng.netbsd.org/cgi-bin/req-10.cgi?show=1118
pullup-10 #1119 https://releng.netbsd.org/cgi-bin/req-10.cgi?show=1119
=> x86/identcpu.c change as for netbsd-9, and x86/fpu.c change to
allocate adequate space for safe/zero FPU states if the save size
exceeds 576 bytes (union savefpu)
I think this should be enough to fix the crashes on both branches,
without changing anything about how pcbs are allocated or adding support
for Intel AMX TILECFG/TILEDATA, but it would be good to test too on an
affected machine.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: [netbsd-10] src/sys/arch/x86/x86
Date: Thu, 15 May 2025 18:06:57 +0000
Module Name: src
Committed By: martin
Date: Thu May 15 18:06:57 UTC 2025
Modified Files:
src/sys/arch/x86/x86 [netbsd-10]: identcpu.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #1118):
sys/arch/x86/x86/identcpu.c: revision 1.135
x86: Reserve space for only the extended CPU state we will use.
CPUID[EAX=0x0d, ECX=0].ECX, i.e., the value of descs[2] after
x86_cpuid2(0x0d, 0, descs), gives the size in bytes of the extended CPU
state for all features supported by the hardware in CPUID[EAX=0x0d,
ECX=0].EAX which can be enabled in XCR0. However, on i386, it is
senseless to leave TILECFG and TILEDATA enabled, because they are state
for Intel AMX instructions which work only in 64-bit mode.
So, instead of querying the hardware's supported features and maximum
_supported_ extended CPU state size:
1. Query the hardware's supported features.
2. Enable only those supported by software as well (XCR0_FPU).
3. Query the hardware's maximum _enabled_ extended CPU state size.
We will also disable TILECFG and TILEDATA on amd64 for now too
because:
(a) This is not a regression, at least for TILEDATA (and I'm not sure
any machines ship with TILECFG but not TILEDATA), because the
size overflowed the PCB page and therefore never worked on amd64
(PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu).
(b) We need a little extra work to properly support reading and
writing a process's TILECFG and TILEDATA in ptrace(2), and that
work hasn't been done yet.
While here, write out x86_cpuid2(0x0d, <ecx>, ...) explicitly, rather
than x86_cpuid(0x0d, ...), to make it clear that ECX must be set --
otherwise we may get garbage. (It is, perhaps, an accident that
x86_cpuid(<eax>, ...) always sets ECX=0, but other CPUID access
paths, like gcc's <cpuid.h> __cpuid(<eax>, ...), do not, so let's
make it clear for the reader.)
XXX When we enable TILECFG and TILEDATA in amd64, we should arrange
to disable them in compat32 processes -- no sense in allocating extra
space for state they can't use anyway, since the Intel AMX
instructions work only in 64-bit mode. However, selectively
disabling them in some contexts might require hardware support for
XFD, Extended Feature Disable, which is another kettle of fish to
deal with.
PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
To generate a diff of this commit:
cvs rdiff -u -r1.123.4.3 -r1.123.4.4 src/sys/arch/x86/x86/identcpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: [netbsd-9] src/sys/arch/x86/x86
Date: Thu, 15 May 2025 18:09:45 +0000
Module Name: src
Committed By: martin
Date: Thu May 15 18:09:45 UTC 2025
Modified Files:
src/sys/arch/x86/x86 [netbsd-9]: identcpu.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #1954):
sys/arch/x86/x86/identcpu.c: revision 1.135 (via patch)
x86: Reserve space for only the extended CPU state we will use.
CPUID[EAX=0x0d, ECX=0].ECX, i.e., the value of descs[2] after
x86_cpuid2(0x0d, 0, descs), gives the size in bytes of the extended CPU
state for all features supported by the hardware in CPUID[EAX=0x0d,
ECX=0].EAX which can be enabled in XCR0. However, on i386, it is
senseless to leave TILECFG and TILEDATA enabled, because they are state
for Intel AMX instructions which work only in 64-bit mode.
So, instead of querying the hardware's supported features and maximum
_supported_ extended CPU state size:
1. Query the hardware's supported features.
2. Enable only those supported by software as well (XCR0_FPU).
3. Query the hardware's maximum _enabled_ extended CPU state size.
We will also disable TILECFG and TILEDATA on amd64 for now too
because:
(a) This is not a regression, at least for TILEDATA (and I'm not sure
any machines ship with TILECFG but not TILEDATA), because the
size overflowed the PCB page and therefore never worked on amd64
(PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu).
(b) We need a little extra work to properly support reading and
writing a process's TILECFG and TILEDATA in ptrace(2), and that
work hasn't been done yet.
While here, write out x86_cpuid2(0x0d, <ecx>, ...) explicitly, rather
than x86_cpuid(0x0d, ...), to make it clear that ECX must be set --
otherwise we may get garbage. (It is, perhaps, an accident that
x86_cpuid(<eax>, ...) always sets ECX=0, but other CPUID access
paths, like gcc's <cpuid.h> __cpuid(<eax>, ...), do not, so let's
make it clear for the reader.)
XXX When we enable TILECFG and TILEDATA in amd64, we should arrange
to disable them in compat32 processes -- no sense in allocating extra
space for state they can't use anyway, since the Intel AMX
instructions work only in 64-bit mode. However, selectively
disabling them in some contexts might require hardware support for
XFD, Extended Feature Disable, which is another kettle of fish to
deal with.
PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
To generate a diff of this commit:
cvs rdiff -u -r1.93.2.7 -r1.93.2.8 src/sys/arch/x86/x86/identcpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 02 Feb 2026 03:44:26 +0000
State-Changed-Why:
pullups applied to all affected branches
From: "Thomas Klausner" <wiz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: pkgsrc/lang/ghc910
Date: Mon, 2 Feb 2026 21:28:58 +0000
Module Name: pkgsrc
Committed By: wiz
Date: Mon Feb 2 21:28:58 UTC 2026
Modified Files:
pkgsrc/lang/ghc910: Makefile distinfo
pkgsrc/lang/ghc910/patches: patch-libraries_text_cbits_measure__off.c
Log Message:
ghc910: remove AVX workaround for NetBSD 11+
Problem has been fixed as part of PR 57661.
Bump PKGREVISION.
To generate a diff of this commit:
cvs rdiff -u -r1.7 -r1.8 pkgsrc/lang/ghc910/Makefile
cvs rdiff -u -r1.13 -r1.14 pkgsrc/lang/ghc910/distinfo
cvs rdiff -u -r1.1 -r1.2 \
pkgsrc/lang/ghc910/patches/patch-libraries_text_cbits_measure__off.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Thomas Klausner" <wiz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57661 CVS commit: pkgsrc/lang/ghc98
Date: Mon, 2 Feb 2026 22:52:49 +0000
Module Name: pkgsrc
Committed By: wiz
Date: Mon Feb 2 22:52:49 UTC 2026
Modified Files:
pkgsrc/lang/ghc98: Makefile distinfo
pkgsrc/lang/ghc98/patches: patch-libraries_text_cbits_measure__off.c
Log Message:
ghc98: remove AVX workaround for NetBSD 11+
Problem has been fixed as part of PR 57661.
Bump PKGREVISION.
To generate a diff of this commit:
cvs rdiff -u -r1.12 -r1.13 pkgsrc/lang/ghc98/Makefile
cvs rdiff -u -r1.18 -r1.19 pkgsrc/lang/ghc98/distinfo
cvs rdiff -u -r1.1 -r1.2 \
pkgsrc/lang/ghc98/patches/patch-libraries_text_cbits_measure__off.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2026
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.