NetBSD Problem Report #58410
From www@netbsd.org Wed Jul 10 05:18:46 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 0ADDC1A9238
for <gnats-bugs@gnats.NetBSD.org>; Wed, 10 Jul 2024 05:18:46 +0000 (UTC)
Message-Id: <20240710051844.A94CC1A9239@mollari.NetBSD.org>
Date: Wed, 10 Jul 2024 05:18:44 +0000 (UTC)
From: prlw1@cam.ac.uk
Reply-To: prlw1@cam.ac.uk
To: gnats-bugs@NetBSD.org
Subject: x86_patch() panic with core ultra processor
X-Send-Pr-Version: www-1.0
>Number: 58410
>Category: port-amd64
>Synopsis: x86_patch() panic with core ultra processor
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-amd64-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jul 10 05:20:00 +0000 2024
>Last-Modified: Tue Aug 06 08:05:01 +0000 2024
>Originator: Patrick Welche
>Release: 10.99.11/amd64 2024 Jul 8
>Organization:
>Environment:
>Description:
Booting a netbsd-INSTALL kernel on a laptop with an intel 155H panics with:
panic: kernel diagnostic assertion "rcr4() & CR4_SMAP" failed: file "/usr/src/sys/arch/x86/x86/patch.c", line 356
trap type 1 code 0 rip 0xffffffff8023415 cs 0x8 rflags 0x202 cr2 0 ilevel 0 rsp 0xffffffff81cd8dc0
https://nxr.netbsd.org/xref/src/sys/arch/x86/x86/patch.c#356
(How do you make use of kern-INSTALL.symbols?)
That's all for now (db_read_int: cannot find `msgbufenabled')
>How-To-Repeat:
>Fix:
>Audit-Trail:
From: matthew green <mrg@eterna23.net>
To: gnats-bugs@netbsd.org
Cc: port-amd64-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Wed, 10 Jul 2024 15:30:56 +1000
how did you get a core ultra! i've been trying to buy one since
i heard about them at the start of the year ;)
> panic: kernel diagnostic assertion "rcr4() & CR4_SMAP" failed: file "/=
usr/src/sys/arch/x86/x86/patch.c", line 356
> trap type 1 code 0 rip 0xffffffff8023415 cs 0x8 rflags 0x202 cr2 0 ile=
vel 0 rsp 0xffffffff81cd8dc0
can you poke around the bios menus and look for something about
this? it might be called something protection -- it's basically
a feature where user mappings aren't visible to kernel mappings
by default, kinda like real CPUs that have entirely separate
address spaces for kernel and user spaces, and require special
setup to read/write from user addresses.
.mrg.
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc: matthew green <mrg@eterna23.net>
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Wed, 10 Jul 2024 09:38:44 +0100
Firing at will I disabled:
- SMM Security Mitigation ("However, this feature may cause compatibility
issues or loss of functionality with some legacy tools and applications")
- OS Kernel DMA protection
and still see the panic. (Don't see anything else to turn off)
Now with a GENERICDBG kernel I can see dmesg in ddb:
ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 0x20, 120 pins
x2APIC disabled by user and enabled by BIOS; ignoring user setting.
bogus MADT X2APIC entry (id = 0x10 11 18 19 20 21 28...
On the laptop I am writing this on, I see
ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 0x20, 120 pins
cpu0 at mainbus0 apid 0
which means the cpu0 etc are missing, with bogus MADT instead
I also get the panic with "boot -1":
x86_patch() at netbsd:x86_patch+0xfb (arch/x86/x86/patch.c:356)
cpu_boot_secondary_processors() + 0x16
main() + 0x3a2
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Wed, 10 Jul 2024 10:08:50 +0100
Switching off all virtualization support in the BIOS, I see
ioapic0 at mainbus0 apid 2
cpu0 at mainbus0 apid 16
and it wedges.
ctrl-alt-esc does nothing - I can't get into ddb
From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Wed, 10 Jul 2024 22:46:43 +0200
I remember several PRs where active x2APIC was causing this weird
issue which manifested later an issue with SMAP.
Is there any specific setting to disable x2APIC in BIOS?
Jaromir
Le mer. 10 juil. 2024 =C3=A0 10:40, Patrick Welche <prlw1@welche.eu> a =C3=
=A9crit :
>
> The following reply was made to PR port-amd64/58410; it has been noted by=
GNATS.
>
> From: Patrick Welche <prlw1@welche.eu>
> To: gnats-bugs@netbsd.org
> Cc: matthew green <mrg@eterna23.net>
> Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processo=
r
> Date: Wed, 10 Jul 2024 09:38:44 +0100
>
> Firing at will I disabled:
>
> - SMM Security Mitigation ("However, this feature may cause compatibilit=
y
> issues or loss of functionality with some legacy tools and application=
s")
> - OS Kernel DMA protection
>
> and still see the panic. (Don't see anything else to turn off)
>
> Now with a GENERICDBG kernel I can see dmesg in ddb:
>
> ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 0x20, 120 pins
> x2APIC disabled by user and enabled by BIOS; ignoring user setting.
> bogus MADT X2APIC entry (id =3D 0x10 11 18 19 20 21 28...
>
> On the laptop I am writing this on, I see
>
> ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 0x20, 120 pins
> cpu0 at mainbus0 apid 0
>
> which means the cpu0 etc are missing, with bogus MADT instead
>
>
> I also get the panic with "boot -1":
>
> x86_patch() at netbsd:x86_patch+0xfb (arch/x86/x86/patch.c:356)
> cpu_boot_secondary_processors() + 0x16
> main() + 0x3a2
>
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Wed, 10 Jul 2024 22:27:32 +0100
On Wed, Jul 10, 2024 at 08:50:02PM +0000, Jaromír Doleček wrote:
> I remember several PRs where active x2APIC was causing this weird
> issue which manifested later an issue with SMAP.
> Is there any specific setting to disable x2APIC in BIOS?
No - I already had a good rummage... (odd that disabling virtualization
caused a change to a hang rather than a panic)
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Thu, 11 Jul 2024 21:06:41 +0100
I see you were probably referring to port-amd64/54489, but unfortuately
I don't have that option.
Now I no longer see the panic, and can not enter ddb, so no dmesg. I just
get a hang interestingly after:
ioapic0 at mainbus0 apid 2: pa 0xfec00000, virtual wire mode, version 0x20, 120pins
cpu 0 at mainbus0 apid 16
cpu0: Use cpuid to serialize rdtsc
*** hang ***
Looking at the code, I would expect a "lfence" rather than a "cpuid"?
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Thu, 11 Jul 2024 21:55:48 +0100
On Thu, Jul 11, 2024 at 08:10:02PM +0000, Patrick Welche wrote:
> Looking at the code, I would expect a "lfence" rather than a "cpuid"?
and that is because in tsc_setfunc(struct cpu_info *ci)
ci->ci_feat_val[i] = 0x0 for i in {0,..,7} (X86 CPUID feature bits)
as are the other entries such as ci_vendor
so ci isn't null, but it points to an empty cpu_info
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Fri, 12 Jul 2024 15:42:51 +0100
On Thu, Jul 11, 2024 at 09:00:03PM +0000, Patrick Welche wrote:
> so ci isn't null, but it points to an empty cpu_info
on the other hand
cpu_id: 0x8
cpu_number: 0x10
cpu_role: 0x2
cpu_feature[0] = 0xbfebfbff
cpu_feature[1] = 0x77fafbff
cpu_feature[2] = 0x2c100800
cpu_feature[3] = 0x121
cpu_feature[4] = 0x0
cpu_feature[5] = 0x239c27eb
cpu_feature[6] = 0x994007ac
still not quite sure where the hang is. (All I can do is add printfs)
boot -1 also hangs.
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Fri, 12 Jul 2024 16:38:27 +0100
Hangs in tsc_sync_api - not too surprising given the empty cpu_info
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Sun, 14 Jul 2024 18:07:43 +0100
FreeBSD successfully boots. It decodes the above
cpu_feature[5] = 0x239c27eb
FSGSBASE,TSCADJ,BMI1,AVX2,FDPEXC,SMEP,BMI2,ERMS,INVPCID,NFPUSG,RDSEED,ADX,SMAP,CLFUSHOPT,CLWB,PROCTRACE,SHA
=> SMAP is there, cf original panic
cpu_feature[6] = 0x994007bc (I typoed 7ac above)
UMIP,PKU,OSPKE,WAITPKG,GFNI,VAES,VPCLMULQDQ,RDPID,MOVDIRI,MOVDIR64B
Event timer "LAPIC" quality 600
WARNING: L3 data cache covers more APIC IDs than a package (6 > 3)
FreeBSD/SMP: Multiprocessor System Detected: 22 CPUs
FreeBSD/SMP: Non-uniform topology
...
ioapic0 <Version 2.0> irqs 0-119
Launching APs: 8 9 1 2 21 6 5 16 11 10 4 17 12 3 19 20 1418 13 15 7
...
hwpstate_intel0: <Intel Speed Shift> on cpu0 ... cpu21
Timecounter "TSC-low" frequency 1497600475 Hz quality 1000
From: RVP <rvp@SDF.ORG>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Sun, 14 Jul 2024 23:25:43 +0000 (UTC)
On Sun, 14 Jul 2024, Patrick Welche wrote:
> FreeBSD successfully boots. It decodes the above
>
Does disabling x2apic in the kernel (like this, for instance) help:
```
diff -urN a/src/sys/arch/x86/x86/lapic.c b/src/sys/arch/x86/x86/lapic.c
--- a/src/sys/arch/x86/x86/lapic.c 2024-02-26 01:38:14.577331969 +0000
+++ b/src/sys/arch/x86/x86/lapic.c 2024-07-14 23:22:34.869939414 +0000
@@ -315,6 +315,7 @@
"already enabled by BIOS; enabling.\n", reason);
reason = NULL;
}
+ reason = "-- XXX Core Ultra XXX."
if (reason == NULL)
x2apic_mode = true;
else
```
-RVP
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Tue, 16 Jul 2024 09:43:17 +0100
On Sun, Jul 14, 2024 at 11:30:03PM +0000, RVP wrote:
> Does disabling x2apic in the kernel (like this, for instance) help:
Unfortunately not:
ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 0x20, 120 pins
x2APIC available but disabled -- XXX Core Ultra XXX.
cpu0 at mainbus0 apid 16
cpu0: Use cpuid to serialize rdtsc
*hang*
(and my guess is that should be "Use lfence to serialize rdtsc" but
ci contains zeros)
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Fri, 26 Jul 2024 15:26:53 +0100
Features look OK, but cpu_info_primary ci_data contains:
cpu_index = 0
cpu_core_id = 16
cpu_smt_id = 0
cpu_numa_id = 0
cpu_cc_freq = 0
cpu_cc_skew = 0
Guessing _freq shouldn't be zero...
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Fri, 26 Jul 2024 17:22:11 +0100
This time "boot -1vx" successfully booted!
ci is now no longer pointing to a cpu_info containing zeros, but one
contains sane values like those in cpu_info_primary, and lfence is
correctly selected as opposed to cpuid to serialize rdtsc.
I note that this time, caa->cpu_role = BP as one would expect, and not
AP as before.
Should be able to install / have access to dmesg, and stop the add
printf, compile, copy to USB, reboot routine.
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Tue, 30 Jul 2024 05:52:46 +0100
In mpacpi_config_cpu(), as x2apic_mode=0, the ACPI_MADT_TYPE_LOCAL_APIC
case is followed, and
lapic->Id=16
cpunum=32 !!
=> role=2=application processor rather than boot processor.
cpunum = i82489_cpu_number()
If only there were 32 cpus in this laptop - non-consecutive numbering?
Should be using x2apic?
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Sat, 3 Aug 2024 10:37:11 +0100
According to acpidump, the cpus for which FLAGS={ENABLED} are:
APIC: Length=604, Revision=5, Checksum=233,
OEMID=DELL, OEM Table ID=Dell Inc, OEM Revision=0x2,
Creator ID=, Creator Revision=0x1000013
Local APIC ADDR=0xfee00000
Flags={PC-AT}
Type=Local APIC
ACPI CPU=8
Flags={ENABLED}
APIC ID=16
---
ACPI CPU= 8 APIC ID=16
ACPI CPU= 9 APIC ID=17
ACPI CPU=10 APIC ID=24
ACPI CPU=11 APIC ID=25
ACPI CPU=12 APIC ID=32
ACPI CPU=13 APIC ID=33
ACPI CPU=14 APIC ID=40
ACPI CPU=15 APIC ID=41
ACPI CPU=16 APIC ID=48
ACPI CPU=17 APIC ID=49
ACPI CPU=18 APIC ID=56
ACPI CPU=19 APIC ID=57
ACPI CPU= 0 APIC ID= 0
ACPI CPU= 1 APIC ID= 2
ACPI CPU= 2 APIC ID= 4
ACPI CPU= 3 APIC ID= 6
ACPI CPU= 4 APIC ID= 8
ACPI CPU= 5 APIC ID=10
ACPI CPU= 6 APIC ID=12
ACPI CPU= 7 APIC ID=14
ACPI CPU=20 APIC ID=64
ACPI CPU=21 APIC ID=66
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Mon, 5 Aug 2024 20:09:05 +0100
Mystery over: it seems things go wrong with the first cpu to be configured
is not the boot cpu.
cpunum = 32 returned from lapic is presumably the boot cpu.
When booting in single processor mode (-1), cpu4 is the one which is
configured, which makes sense:
> ACPI CPU= 8 APIC ID=16
> ACPI CPU= 9 APIC ID=17
> ACPI CPU=10 APIC ID=24
> ACPI CPU=11 APIC ID=25
> ACPI CPU=12 APIC ID=32 <- cpu4
> ACPI CPU=13 APIC ID=33
> ACPI CPU=14 APIC ID=40
> ACPI CPU=15 APIC ID=41
> ACPI CPU=16 APIC ID=48
> ACPI CPU=17 APIC ID=49
> ACPI CPU=18 APIC ID=56
> ACPI CPU=19 APIC ID=57
> ACPI CPU= 0 APIC ID= 0
> ACPI CPU= 1 APIC ID= 2
> ACPI CPU= 2 APIC ID= 4
> ACPI CPU= 3 APIC ID= 6
> ACPI CPU= 4 APIC ID= 8
> ACPI CPU= 5 APIC ID=10
> ACPI CPU= 6 APIC ID=12
> ACPI CPU= 7 APIC ID=14
> ACPI CPU=20 APIC ID=64
> ACPI CPU=21 APIC ID=66
This trivial patch bears out the theory:
Index: cpu.c
===================================================================
RCS file: /cvsroot/src/sys/arch/x86/x86/cpu.c,v
retrieving revision 1.210
diff -u -r1.210 cpu.c
--- cpu.c 22 Apr 2024 23:07:47 -0000 1.210
+++ cpu.c 5 Aug 2024 19:08:18 -0000
@@ -337,6 +337,7 @@
#endif
}
+static int xxxpw = 0;
static void
cpu_attach(device_t parent, device_t self, void *aux)
{
@@ -365,6 +366,8 @@
* structure, otherwise use the primary's.
*/
if (caa->cpu_role == CPU_ROLE_AP) {
+ if (xxxpw == 0)
+ return;
if ((boothowto & RB_MD1) != 0) {
aprint_error(": multiprocessor boot disabled\n");
if (!pmf_device_register(self, NULL, NULL))
@@ -379,6 +382,7 @@
ci = (struct cpu_info *)roundup2(ptr, CACHE_LINE_SIZE);
ci->ci_curldt = -1;
} else {
+ xxxpw = 1;
aprint_naive(": %s Processor\n",
caa->cpu_role == CPU_ROLE_SP ? "Single" : "Boot");
ci = &cpu_info_primary;
and cpu4 and all following come up, so I have 18 out of 22 cpus configured.
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Mon, 5 Aug 2024 20:15:43 +0100
On Mon, Aug 05, 2024 at 07:15:01PM +0000, Patrick Welche wrote:
> and cpu4 and all following come up, so I have 18 out of 22 cpus configured.
I should add - and everything is working just fine... multiuser multiprocessor
From: matthew green <mrg@eterna23.net>
To: gnats-bugs@netbsd.org
Cc: port-amd64-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, prlw1@cam.ac.uk
Subject: re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Tue, 06 Aug 2024 11:49:17 +1000
nicely found.
so, the problem is that the boot cpu is not the first cpu, and
we don't handle that right. good job, us :)
hopefully this isn't too hard to figure out. i remember we had
to do something for it on sparc long long ago, when you could
ask the prom to switch cpu, and booting from that cpu would mean
the boot cpu was not the first cpu.
.mrg.
From: Martin Husemann <martin@duskware.de>
To: matthew green <mrg@eterna23.net>
Cc: gnats-bugs@netbsd.org
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Tue, 6 Aug 2024 08:20:03 +0200
On Tue, Aug 06, 2024 at 11:49:17AM +1000, matthew green wrote:
> nicely found.
>
> so, the problem is that the boot cpu is not the first cpu, and
> we don't handle that right. good job, us :)
>
> hopefully this isn't too hard to figure out. i remember we had
> to do something for it on sparc long long ago, when you could
> ask the prom to switch cpu, and booting from that cpu would mean
> the boot cpu was not the first cpu.
We don't really expect the first cpu to be the bootstrap processor,
and the difference between that and other cpu's is mostly
- start with running main() instead of the trampoline code to wait for
MP startup
- allocate the cpu info early (and probably differently) than other CPUs
- (maybe, MD) make sure the boot CPUs is the last one running when halting
the machine
It happens on sparc64 a lot, I have several machines that boot on != cpu0
by default.
But even on x86 we seem to have code to deal (but maybe it has not been
tested that much). See for example
sys/arch/x86/x86/cpu.c:cpu_attach
/*
* Boot processor may not be attached first, but the below
* must be done to allow booting other processors.
*/
if (!again) {
/* Make sure DELAY() (likely i8254_delay()) is initialized. */
DELAY(1);
/*
* Basic init. Compute an approximate frequency for the TSC
* using the i8254. If there's a HPET we'll redo it later.
*/
atomic_or_32(&ci->ci_flags, CPUF_PRESENT | CPUF_PRIMARY);
Martin
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/58410: x86_patch() panic with core ultra processor
Date: Tue, 6 Aug 2024 08:57:27 +0100
On Tue, Aug 06, 2024 at 06:25:01AM +0000, Martin Husemann wrote:
> * Basic init. Compute an approximate frequency for the TSC
> * using the i8254. If there's a HPET we'll redo it later.
It may be that setting TSC fails for AP processors, and once it is set
by the BP all is well... (see the earlier parts of this PR)
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.