NetBSD Problem Report #58191

From www@netbsd.org  Wed Apr 24 05:54:56 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 0C1FD1A9238
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 24 Apr 2024 05:54:56 +0000 (UTC)
Message-Id: <20240424055424.5E0D81A923B@mollari.NetBSD.org>
Date: Wed, 24 Apr 2024 05:54:24 +0000 (UTC)
From: schaecsn@gmx.net
Reply-To: schaecsn@gmx.net
To: gnats-bugs@NetBSD.org
Subject: nvmm crashes in NetBSD 10
X-Send-Pr-Version: www-1.0

>Number:         58191
>Category:       kern
>Synopsis:       nvmm crashes in NetBSD 10
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Apr 24 05:55:00 +0000 2024
>Originator:     Stefan Schaeckeler
>Release:        NetBSD 10
>Organization:
>Environment:
NetBSD netbsd10 10.0 NetBSD 10.0 (GENERIC) #0: Thu Mar 28 09:08:09 PDT 2024  root@netbsd10:/usr/obj/sys/arch/amd64/compile/GENERIC amd64

>Description:
I work quite a lot with qemu and nvmm. There were zero problems under NetBSD 9. There are many problems under NetBSD 10.

#1 Here I was running a NetBSD 9 guest and rebuilding it while qemu crashes in nvmm on the NetBSD 10 host:

- - - snip - - -
...
mclinker -I/usr/src/external/bsd/llvm/librt/libLLVMAMDGPUCodeGen/../../lib/../dist/llvm/lib/Target/AMDGPU  -c    -fPIC   -g /usr/src/external/bsd/llvm/librt/libLLVMAMDGPUCodeGen/../../lib/../dist/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp -o SIInstrInfo.pico
qemu-system-x86_64: NVMM: I/O Assist Failed [port=496]
qemu-system-x86_64: NVMM: Failed to execute a VCPU.
Connection to 127.0.0.1 closed by remote host.
start-netbsd09.sh: line 13: 25896 Abort trap              (core dumped) qemu-system-x86_64 $ACCEL -hda netbsd09.img -usb -device usb-tablet -m 512M -netdev user,id=mynet0,host=10.0.2.10,hostfwd=tcp:127.0.0.1:7011-:22 -monitor telnet:127.0.0.1:8011,server,nowait -device e1000,netdev=mynet0 $*
[2]+  Exit 134                ./start-netbsd09.sh -display none -hdb externaldisk.img
netbsd10:/opt/qemu$
- - - snip - - -

- - - snip - - -
netbsd10:/opt/qemu$ gdb -c qemu-system-x86_.core /usr/pkg/bin/qemu-system-x86_64 -d /var/tmp/pkgsrc/emulators/qemu/work/qemu-8.2.2
...
Reading symbols from /usr/pkg/bin/qemu-system-x86_64...
[New process 26490]
[New process 18920]
[New process 16295]
[New process 16647]
[New process 25896]
Core was generated by `qemu-system-x86_'.
Program terminated with signal SIGABRT, Aborted.
#0  0x0000752e65f7e74a in _lwp_kill () from /usr/lib/libc.so.12
[Current thread is 1 (process 26490)]
(gdb) bt
#0  0x0000752e65f7e74a in _lwp_kill () from /usr/lib/libc.so.12
#1  0x0000752e65f83f00 in abort () at /usr/src/lib/libc/stdlib/abort.c:74
#2  0x00000001d318f519 in nvmm_vcpu_exec (cpu=cpu@entry=0x752e6d320980) at ../target/i386/nvmm/nvmm-all.c:1008
#3  0x00000001d318fdd8 in qemu_nvmm_cpu_thread_fn (arg=arg@entry=0x752e6d320980) at ../target/i386/nvmm/nvmm-accel-ops.c:45
#4  0x00000001d342ee67 in qemu_thread_start (args=0x752e6d7fb760) at ../util/qemu-thread-posix.c:541
#5  0x0000752e6640c89f in pthread__create_tramp (cookie=0x752e6deec400) at /usr/src/lib/libpthread/pthread.c:595
#6  0x0000752e65e92f80 in ?? () from /usr/lib/libc.so.12
#7  0xf000ff53f000ff53 in ?? ()
#8  0xf000ff53f000e2c3 in ?? ()
...
#55 0x0000000000000000 in ?? ()

(gdb) frame 2
#2  0x00000001d318f519 in nvmm_vcpu_exec (cpu=cpu@entry=0x752e6d320980) at ../target/i386/nvmm/nvmm-all.c:1008
warning: Source file is more recent than executable.
1008                abort();
(gdb) list
1003
1004            fatal = nvmm_vcpu_loop(cpu);
1005
1006            if (fatal) {
1007                error_report("NVMM: Failed to execute a VCPU.");
1008                abort();
1009            }
1010        }
1011
1012        return ret;
- - - snip - - -

The error comes from a previously called function nvmm_vcpu_loop() -> nvmm_handle_io() -> nvmm_assist_io() which comes from libnvmm. I keep below files here for 2 months in case anyone is interesting in digging deeper

http://corona.crabdance.com/shared/qemu-system-x86_.core.gz  (112M)
http://corona.crabdance.com/shared/qemu-system-x86_64.gz      (23M)
http://corona.crabdance.com/shared/vartmppkgsrc.zip          (232M) (source files; basically: make configure in /usr/pkgsrc/emulators/qemu)


#2 Over the previous weeks the NetBSD 10 kernel crashed 3 times. The first time, when I was just starting qemu. The second time when I was shutting down qemu and a third time while running qemu. Only once a kdump was created. It's not very helpful, though.

- - - snip - - -
netbsd10:~/scratch$ crash -M netbsd.5.core 
Crash version 10.0, image version 10.0.
crash: _kvm_kvatop(0)
Kernel compiled without options LOCKDEBUG.
System panicked: trap
Backtrace from time of crash is available.
crash> bt
vmx_insn_failinvalid() at 0
crash: _kvm_kvatop(ffffd601a610abe8)
crash: kvm_read(0xffffd601a610abe8, 8): invalid translation (invalid level 2 PDE)
crash> 
- - - snip - - -

 Stefan
>How-To-Repeat:
Run qemu and stress the guest for a few days, say build NetBSD distribution within the vm.
>Fix:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.