NetBSD Problem Report #57737
From h.fath@spg.tu-darmstadt.de Fri Dec 1 11:15:17 2023
Return-Path: <h.fath@spg.tu-darmstadt.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id C7F641A923A
for <gnats-bugs@gnats.NetBSD.org>; Fri, 1 Dec 2023 11:15:17 +0000 (UTC)
Message-Id: <202312011115.3B1BF61l025054@Gstoder.nt.e-technik.tu-darmstadt.de>
Date: Fri, 1 Dec 2023 12:15:06 +0100 (CET)
From: Hauke Fath <hf@spg.tu-darmstadt.de>
Reply-To: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: Hauke Fath <hf@spg.tu-darmstadt.de>
Subject: netbsd-10 panics on current Epyc CPU
X-Send-Pr-Version: 3.95
>Number: 57737
>Category: kern
>Synopsis: netbsd-10 panics on current Epyc CPU
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Dec 01 11:20:00 +0000 2023
>Last-Modified: Wed Dec 13 08:25:01 +0000 2023
>Originator: Hauke Fath
>Release: NetBSD 10.0_RC1
>Organization:
Technische Universitaet Darmstadt
>Environment:
System: NetBSD 10.0_RC1 (GENERIC) #2: Thu Nov 30 13:28:34 CET 2023
Architecture: x86_64
Machine: amd64
>Description:
netbsd-10 panics early on current multi-core Ryzen cpus.
See the boot log for an Epyc 9554P cpu on a Gigabyte R263-Z70
board at
<ftp://oak.causeuse.org/pub/NetBSD/netbsd-10-GA_R263-Z70_epyc9554p.bootlog.gz>
and the related discussion on current-users, where Martin
suggested
"That sounds like an fpu xsave size issue Taylor looked at
recently (but it is not fixed)."
>How-To-Repeat:
Boot netbsd-10 on a recent AMD Epyc system.
>Fix:
Yes, please. :)
>Audit-Trail:
From: matthew green <mrg@eterna23.net>
To: gnats-bugs@netbsd.org, hf@spg.tu-darmstadt.de
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: re: kern/57737: netbsd-10 panics on current Epyc CPU
Date: Wed, 13 Dec 2023 19:20:51 +1100
> netbsd-10 panics early on current multi-core Ryzen cpus.
>
> See the boot log for an Epyc 9554P cpu on a Gigabyte R263-Z70
> board at
>
> <ftp://oak.causeuse.org/pub/NetBSD/netbsd-10-GA_R263-Z70_epyc9554p.boot=
log.gz>
>
> and the related discussion on current-users, where Martin
> suggested
>
> "That sounds like an fpu xsave size issue Taylor looked at
> recently (but it is not fixed)."
there are multiple issues with this system, ouch.
no CPUs attach in this dmesg. cpu0 remains half-attached. this
is some problem with the MADT parser i guess (i don't know this
very well.)
[ 1.0000040] bogus MADT X2APIC entry (id =3D 0x0)
[ 1.0000040] bogus MADT X2APIC entry (id =3D 0x2)
...
[ 1.0000040] bogus MADT X2APIC entry (id =3D 0x7e)
...
[ 1.0000040] bogus MADT X2APIC entry (id =3D 0x5e)
[ 1.0000040] bogus MADT X2APIC entry (id =3D 0x1)
...
[ 1.0000040] bogus MADT X2APIC entry (id =3D 0x7f)
...
[ 1.0000040] bogus MADT X2APIC entry (id =3D 0x5f)
=
ie, 128 cpu threads fail to attach (which matches the specs for
epyc 9554p - 64c/128t.) some devices still attach things to
cpu0 for affinity, even though it's in UP mode:
[ 1.0525126] nvme0: for io queue 1 interrupting at msix0 vec 1 affinity =
to cpu0
... plus nvme1/2/3.
some of the dmesg items seem to have 'nul' chars in them:
[ 1.0000040] ACPI: XSDT 0x00000000A4E13728 000^@0DC (v01 GBT BTUACPI 0=
3042021 AMI 01000013)
[ 1.0525126] AMD 19h/1xh RCEC (Root Complex^@ Event Collectosystem) at p=
ci0 dev 0 function 3 not configured
and then the final crash as reported in this PR:
[ 1.0525126] fatal privileged instruction fault in supervisor mode
[ 1.0525126] trap type 0 code 0 rip 0xffffffff8023c24e cs 0x8 rflags 0x1=
0256 cr2 0 ilevel 0x6 rsp 0xffffffff81d3bab8
[ 1.0525126] curlwp 0xffffffff8188ac00 pid 0.0 lowest kstack 0xffffffff8=
1d362c0
kernel: privileged instruction fault trap, code=3D0
Stopped in pid 0.0 (system) at netbsd:xrstor+0xa: fxsavel
xrstor() at netbsd:xrstor+0xa
aes_selftest() at netbsd:aes_selftest+0x26
aes_modcmd() at netbsd:aes_modcmd+0xe9
module_do_builtin() at netbsd:module_do_builtin+0x142
module_do_builtin() at netbsd:module_do_builtin+0xfa
module_init_class() at netbsd:module_init_class+0x142
.mrg.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.