NetBSD Problem Report #58191
From www@netbsd.org Wed Apr 24 05:54:56 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 0C1FD1A9238
for <gnats-bugs@gnats.NetBSD.org>; Wed, 24 Apr 2024 05:54:56 +0000 (UTC)
Message-Id: <20240424055424.5E0D81A923B@mollari.NetBSD.org>
Date: Wed, 24 Apr 2024 05:54:24 +0000 (UTC)
From: schaecsn@gmx.net
Reply-To: schaecsn@gmx.net
To: gnats-bugs@NetBSD.org
Subject: nvmm crashes in NetBSD 10
X-Send-Pr-Version: www-1.0
>Number: 58191
>Category: kern
>Synopsis: nvmm crashes in NetBSD 10
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Apr 24 05:55:00 +0000 2024
>Originator: Stefan Schaeckeler
>Release: NetBSD 10
>Organization:
>Environment:
NetBSD netbsd10 10.0 NetBSD 10.0 (GENERIC) #0: Thu Mar 28 09:08:09 PDT 2024 root@netbsd10:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
>Description:
I work quite a lot with qemu and nvmm. There were zero problems under NetBSD 9. There are many problems under NetBSD 10.
#1 Here I was running a NetBSD 9 guest and rebuilding it while qemu crashes in nvmm on the NetBSD 10 host:
- - - snip - - -
...
mclinker -I/usr/src/external/bsd/llvm/librt/libLLVMAMDGPUCodeGen/../../lib/../dist/llvm/lib/Target/AMDGPU -c -fPIC -g /usr/src/external/bsd/llvm/librt/libLLVMAMDGPUCodeGen/../../lib/../dist/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp -o SIInstrInfo.pico
qemu-system-x86_64: NVMM: I/O Assist Failed [port=496]
qemu-system-x86_64: NVMM: Failed to execute a VCPU.
Connection to 127.0.0.1 closed by remote host.
start-netbsd09.sh: line 13: 25896 Abort trap (core dumped) qemu-system-x86_64 $ACCEL -hda netbsd09.img -usb -device usb-tablet -m 512M -netdev user,id=mynet0,host=10.0.2.10,hostfwd=tcp:127.0.0.1:7011-:22 -monitor telnet:127.0.0.1:8011,server,nowait -device e1000,netdev=mynet0 $*
[2]+ Exit 134 ./start-netbsd09.sh -display none -hdb externaldisk.img
netbsd10:/opt/qemu$
- - - snip - - -
- - - snip - - -
netbsd10:/opt/qemu$ gdb -c qemu-system-x86_.core /usr/pkg/bin/qemu-system-x86_64 -d /var/tmp/pkgsrc/emulators/qemu/work/qemu-8.2.2
...
Reading symbols from /usr/pkg/bin/qemu-system-x86_64...
[New process 26490]
[New process 18920]
[New process 16295]
[New process 16647]
[New process 25896]
Core was generated by `qemu-system-x86_'.
Program terminated with signal SIGABRT, Aborted.
#0 0x0000752e65f7e74a in _lwp_kill () from /usr/lib/libc.so.12
[Current thread is 1 (process 26490)]
(gdb) bt
#0 0x0000752e65f7e74a in _lwp_kill () from /usr/lib/libc.so.12
#1 0x0000752e65f83f00 in abort () at /usr/src/lib/libc/stdlib/abort.c:74
#2 0x00000001d318f519 in nvmm_vcpu_exec (cpu=cpu@entry=0x752e6d320980) at ../target/i386/nvmm/nvmm-all.c:1008
#3 0x00000001d318fdd8 in qemu_nvmm_cpu_thread_fn (arg=arg@entry=0x752e6d320980) at ../target/i386/nvmm/nvmm-accel-ops.c:45
#4 0x00000001d342ee67 in qemu_thread_start (args=0x752e6d7fb760) at ../util/qemu-thread-posix.c:541
#5 0x0000752e6640c89f in pthread__create_tramp (cookie=0x752e6deec400) at /usr/src/lib/libpthread/pthread.c:595
#6 0x0000752e65e92f80 in ?? () from /usr/lib/libc.so.12
#7 0xf000ff53f000ff53 in ?? ()
#8 0xf000ff53f000e2c3 in ?? ()
...
#55 0x0000000000000000 in ?? ()
(gdb) frame 2
#2 0x00000001d318f519 in nvmm_vcpu_exec (cpu=cpu@entry=0x752e6d320980) at ../target/i386/nvmm/nvmm-all.c:1008
warning: Source file is more recent than executable.
1008 abort();
(gdb) list
1003
1004 fatal = nvmm_vcpu_loop(cpu);
1005
1006 if (fatal) {
1007 error_report("NVMM: Failed to execute a VCPU.");
1008 abort();
1009 }
1010 }
1011
1012 return ret;
- - - snip - - -
The error comes from a previously called function nvmm_vcpu_loop() -> nvmm_handle_io() -> nvmm_assist_io() which comes from libnvmm. I keep below files here for 2 months in case anyone is interesting in digging deeper
http://corona.crabdance.com/shared/qemu-system-x86_.core.gz (112M)
http://corona.crabdance.com/shared/qemu-system-x86_64.gz (23M)
http://corona.crabdance.com/shared/vartmppkgsrc.zip (232M) (source files; basically: make configure in /usr/pkgsrc/emulators/qemu)
#2 Over the previous weeks the NetBSD 10 kernel crashed 3 times. The first time, when I was just starting qemu. The second time when I was shutting down qemu and a third time while running qemu. Only once a kdump was created. It's not very helpful, though.
- - - snip - - -
netbsd10:~/scratch$ crash -M netbsd.5.core
Crash version 10.0, image version 10.0.
crash: _kvm_kvatop(0)
Kernel compiled without options LOCKDEBUG.
System panicked: trap
Backtrace from time of crash is available.
crash> bt
vmx_insn_failinvalid() at 0
crash: _kvm_kvatop(ffffd601a610abe8)
crash: kvm_read(0xffffd601a610abe8, 8): invalid translation (invalid level 2 PDE)
crash>
- - - snip - - -
Stefan
>How-To-Repeat:
Run qemu and stress the guest for a few days, say build NetBSD distribution within the vm.
>Fix:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.