NetBSD Problem Report #59371

From imil@netbsd.org  Mon Apr 28 05:37:06 2025
Return-Path: <imil@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 0925C1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 28 Apr 2025 05:37:06 +0000 (UTC)
Message-Id: <20250428053704.7AD151A923C@mollari.NetBSD.org>
Date: Mon, 28 Apr 2025 05:37:04 +0000 (UTC)
From: imil@home.imil.net
Reply-To: imil@home.imil.net
To: gnats-bugs@NetBSD.org
Subject: Xen domU uvm_fault since FPU state allocation patch
X-Send-Pr-Version: 3.95

>Number:         59371
>Category:       kern
>Synopsis:       Xen domU uvm_fault since FPU state allocation patch
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    riastradh
>State:          feedback
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Apr 28 05:40:01 +0000 2025
>Closed-Date:    
>Last-Modified:  Mon Apr 28 13:05:01 +0000 2025
>Originator:     Emile `iMil' Heitor
>Release:        NetBSD 10.99.14
>Organization:
	NetBSD
>Environment:
System: NetBSD outcast 10.99.14 NetBSD 10.99.14 (XEN3_DOM0) #1: Wed Apr 23 12:03:35 CEST 2025  imil@tatooine:/home/imil/src/github.com/NetBSD-src/sys/arch/amd64/compile/obj/XEN3_DOM0 amd64
Architecture: x86_64
Machine: amd64
>Description:
	Starting April 25, NetBSD domU kernel would crash with an uvm_fault:

[   1.0000000] cpu_rng: rdrand/rdseed
[   1.0000000] entropy: ready
[   1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
[   1.0000000]     2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013,
[   1.0000000]     2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023,
[   1.0000000]     2024, 2025
[   1.0000000]     The NetBSD Foundation, Inc.  All rights reserved.
[   1.0000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[   1.0000000]     The Regents of the University of California.  All rights reserved.

[   1.0000000] NetBSD 10.99.14 (XEN3_DOMU) #7: Mon Apr 28 06:06:42 CEST 2025
[   1.0000000] 	imil@tatooine:/home/imil/src/NetBSD/src/sys/arch/amd64/compile/obj/XEN3_DOMU
[   1.0000000] total memory = 256 MB
[   1.0000000] avail memory = 235 MB
[   1.0000000] mainbus0 (root)
[   1.0000000] hypervisor0 at mainbus0: Xen version 4.18.4_20241221nb0
[   1.0000000] vcpu0 at hypervisor0
[   1.0000000] vcpu0: Intel(R) Core(TM) i5-7300U CPU @ 2.60GHz, id 0x806e9
[   1.0000000] vcpu0: node 0, package 0, core 0, smt 0
[   1.0000000] xenbus0 at hypervisor0: Xen Virtual Bus Interface
[   1.0000000] xencons0 at hypervisor0: Xen Virtual Console Driver
[   1.0000030] xenbus0: can't get state for device/suspend/event-channel (2)
[   1.0000030] uvm_fault(0xffffffff8094a300, 0x0, 2) -> e
[   1.0000030] fatal page fault in supervisor mode
[   1.0000030] trap type 6 code 0x2 rip 0xffffffff8062795c cs 0xe030 rflags 0x10202 cr2 0 ilevel 0 rsp 0xffffffff80adad38
[   1.0000030] curlwp 0xffffffff8078f880 pid 0.0 lowest kstack 0xffffffff80ad62c0
kernel: page fault trap, code=0
Stopped in pid 0.0 (system) at  netbsd:memset+0x2c:     repe stosq      %es:(%rdi)
memset() at netbsd:memset+0x2c
lwp_create() at netbsd:lwp_create+0x2f1
fork1() at netbsd:fork1+0x42c
main() at netbsd:main+0x44f
ds          40
es          100
fs          1
gs          107
rdi         0
rsi         200
rbp         ffffffff80adad90
rbx         ffff930042b2e000
rdx         200
rcx         40
rax         0
r8          ffffffff8047a38c    start_init
r9          0
r10         fffffe00
r11         ffffffff80adabcc
r12         0
r13         ffff93000092e800
r14         ffffffff8078f880    lwp0
r15         0
rip         ffffffff8062795c    memset+0x2c
cs          e030
rflags      10202
rsp         ffffffff80adad38
ss          e02b
netbsd:memset+0x2c:     repe stosq      %es:(%rdi)
db{0}>

This behavior seems linked to this commit:
https://mail-index.netbsd.org/source-changes/2025/04/24/msg156552.html

Riastradh@ suggested that I try this workaround:

--- sys/arch/x86/x86/fpu.c      2025-04-24 16:57:38.905367169 +0200
+++ sys/arch/x86/x86/fpu.c.patch        2025-04-24 16:58:39.368608934 +0200
@@ -475,6 +475,9 @@
                return;
        }

+#ifdef XENPV
+       pcb2->pcb_savefpu = &pcb2->pcb_savefpusmall;
+#endif
        /* For init(8). */
        if (__predict_false(l1->l_flag & LW_SYSTEM)) {
                memset(pcb2->pcb_savefpu, 0, x86_fpu_save_size);

which indeed permitted the domU to proceed with boot.

>How-To-Repeat:
	Boot a NetBSD Xen/domU kernel from April 25 2025
>Fix:
	Please.

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->riastradh
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Mon, 28 Apr 2025 12:39:57 +0000
Responsible-Changed-Why:
my bug, will provide better workaround shortly and then propose a
proper fix to be considered


State-Changed-From-To: open->analyzed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 28 Apr 2025 12:46:46 +0000
State-Changed-Why:
This happens because amd64 defines __HAVE_CPU_UAREA_ROUTINES only under
#if !defined(XENPV):

    114 #if defined(__x86_64__) && !defined(XENPV)
    115 #if !defined(KASAN) && !defined(KMSAN)
    116 #define	__HAVE_PCPU_AREA 1
    117 #define	__HAVE_DIRECT_MAP 1
    118 #endif
    119 #define	__HAVE_CPU_UAREA_ROUTINES 1
    120 #endif

https://nxr.netbsd.org/xref/src/sys/arch/amd64/include/types.h?r=1.71#114

So the logic in x86/vm_machdep.c's cpu_uarea_alloc/free to initialize
the FPU save area of a newly allocated uarea doesn't kick in.  It's not
clear to me why separate uarea allocation is disabled under XENPV --
they were introduced by maxv@ for a stack guard page (`redzone') back
in March 2020:

https://mail-index.netbsd.org/port-amd64/2020/03/14/msg003179.html
https://mail-index.netbsd.org/source-changes/2020/03/17/msg115178.html

--- a/sys/arch/amd64/include/types.h
+++ b/sys/arch/amd64/include/types.h
...
@@ -114,10 +114,11 @@ typedef   unsigned char           __cpu_simple_lock_nv_t;
 #if defined(__x86_64__) && !defined(XENPV)
 #if !defined(KASAN) && !defined(KMSAN)
 #define        __HAVE_PCPU_AREA 1
 #define        __HAVE_DIRECT_MAP 1
 #endif
+#define        __HAVE_CPU_UAREA_ROUTINES 1
 #if !defined(NO_PCI_MSI_MSIX)
 #define        __HAVE_PCI_MSI_MSIX
 #endif
 #endif

It's possible maxv@ just didn't want to think about XENPV implications
or test it.  It's possible that the guard page works badly in XENPV for
some reason, or somehow hurts performance, I dunno.

So, as a stop-gap measure, let's first just use the same approach as
i386 (no pointer to savefpu area, disable Intel AMX), and then think
about enabling the guard page -- and if the guard page doesn't work for
some reason, we can either leave it at that or make the uarea routine
skip the guard page under XENPV.


State-Changed-From-To: analyzed->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 28 Apr 2025 13:02:32 +0000
State-Changed-Why:
workaround committed to HEAD, please test


From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/59371 CVS commit: src/sys/arch
Date: Mon, 28 Apr 2025 13:01:28 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Apr 28 13:01:28 UTC 2025

 Modified Files:
 	src/sys/arch/amd64/include: pcb.h
 	src/sys/arch/x86/include: specialreg.h
 	src/sys/arch/x86/x86: fpu.c

 Log Message:
 xen: Stop-gap FPU PCB fix; disable Intel AMX for now.

 Since the custom cpu_uarea_alloc/free are disabled under XENPV,
 nothing would initialize struct pcb::pcb_savefpu to point either to
 struct pcb::pcb_savefpusmall, or to a separately allocated large area
 on machines with Intel AMX TILECFG/TILEDATA requiring it.  So the
 memset in fpu_lwp_fork would crash on null pointer dereference:

 [   1.0000030] uvm_fault(0xffffffff8094a300, 0x0, 2) -> e
 [   1.0000030] fatal page fault in supervisor mode
 [   1.0000030] trap type 6 code 0x2 rip 0xffffffff8062795c cs 0xe030 rflags 0x10202 cr2 0 ilevel 0 rsp 0xffffffff80adad38
 [   1.0000030] curlwp 0xffffffff8078f880 pid 0.0 lowest kstack 0xffffffff80ad62c0
 kernel: page fault trap, code=0
 Stopped in pid 0.0 (system) at  netbsd:memset+0x2c:     repe stosq      %es:(%rdi)
 memset() at netbsd:memset+0x2c
 lwp_create() at netbsd:lwp_create+0x2f1
 fork1() at netbsd:fork1+0x42c
 main() at netbsd:main+0x44f

 In order to support Intel AMX TILECFG/TILEDATA, or any other CPU
 extensions that increase the XSAVE area beyond what fits in a single
 page after struct pcb, we would need to enable the the custom
 cpu_uarea_alloc/free.  Currently that would imply allocating stack
 guard pages (`redzone') under XENPV; if there's some reason the stack
 guard pages don't work, we could also push #ifdef XENPV conditionals
 into cpu_uarea_alloc/free to cover the guard pages -- to be
 considered.

 PR kern/59371: Xen domU uvm_fault since FPU state allocation patch

 PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
 KVM/Qemu


 To generate a diff of this commit:
 cvs rdiff -u -r1.34 -r1.35 src/sys/arch/amd64/include/pcb.h
 cvs rdiff -u -r1.218 -r1.219 src/sys/arch/x86/include/specialreg.h
 cvs rdiff -u -r1.90 -r1.91 src/sys/arch/x86/x86/fpu.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.