NetBSD Problem Report #55406
From www@netbsd.org Mon Jun 22 12:38:59 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id F03831A9217
for <gnats-bugs@gnats.NetBSD.org>; Mon, 22 Jun 2020 12:38:58 +0000 (UTC)
Message-Id: <20200622123857.EA6D21A9246@mollari.NetBSD.org>
Date: Mon, 22 Jun 2020 12:38:57 +0000 (UTC)
From: nia@pkgsrc.org
Reply-To: nia@pkgsrc.org
To: gnats-bugs@NetBSD.org
Subject: NVMM panic on qemu start on intel
X-Send-Pr-Version: www-1.0
>Number: 55406
>Category: kern
>Synopsis: NVMM panic on qemu start on intel
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jun 22 12:40:00 +0000 2020
>Closed-Date: Sun Aug 02 09:19:18 +0000 2020
>Last-Modified: Sun Aug 02 09:19:18 +0000 2020
>Originator: nia
>Release: -current (and 9.0)
>Organization:
>Environment:
NetBSD r 9.99.64 NetBSD 9.99.64 (R) #12: Sun May 31 09:31:31 IST 2020 nia@r:/home/nia/src/sys/arch/amd64/compile/obj/R amd64
>Description:
After starting a QEMU VM with -accel nvmm, I get this (apparently) nondeterministic panic. This is an intel CPU.
fatal privileged instruction fault in supervisor mode
trap type 0 code 0 rip 0xffffffff813ed454 cs 0x8 rflags 0x10246 cr2 0x77e2d8c1d000 ilevel 0 rsp 0xffffcb80c21e2d90
curlwp 0xffffec096c4584c0 pid 24690.13443 lowest kstack 0xffffcb80c21df2c0
db> crash> callout
ticks wheel arg func
crash: _kvm_kvatop(0)
crash: kvm_read(0x0, 16592): invalid translation (invalid level 4 PDE)
db> crash> bt
_KERNEL_OPT_NARCNET() at 0
_KERNEL_OPT_NARCNET() at 0
sys_reboot() at sys_reboot
db_reboot_cmd() at db_reboot_cmd
db_command() at db_command+0x127
db_command_loop() at db_command_loop+0xa6
db_trap() at db_trap+0xe6
kdb_trap() at kdb_trap+0xe1
trap() at trap+0x2b7
--- trap (number 0) ---
vmx_vmcs_enter() at vmx_vmcs_enter+0xe3
vmx_vcpu_create() at vmx_vcpu_create+0x144
nvmm_ioctl() at nvmm_ioctl+0x292
sys_ioctl() at sys_ioctl+0x550
syscall() at syscall+0x26e
--- syscall (number 54) ---
77e2d27681ba:
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Tue, 23 Jun 2020 16:10:59 +0000
State-Changed-Why:
Did you suspend your host before launching the VM?
From: Jukka Ruohonen <jruohonen@iki.fi>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55406: NVMM panic on qemu start on intel
Date: Tue, 23 Jun 2020 19:15:43 +0300
On Mon, Jun 22, 2020 at 12:40:00PM +0000, nia@pkgsrc.org wrote:
> After starting a QEMU VM with -accel nvmm, I get this (apparently)
> nondeterministic panic. This is an intel CPU.
I don't believe this bug is non-deterministic; I can also reproduce the
panic and its trace.
From: nia <nia@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org, maxv@NetBSD.org, nia@pkgsrc.org
Subject: Re: kern/55406 (NVMM panic on qemu start on intel)
Date: Tue, 23 Jun 2020 17:07:19 +0000
On Tue, Jun 23, 2020 at 04:11:00PM +0000, maxv@NetBSD.org wrote:
> Did you suspend your host before launching the VM?
Probably, the host rarely gets powered off but does get suspended
every night.
From: Maxime Villard <max@m00nbsd.net>
To: nia <nia@NetBSD.org>, <gnats-bugs@netbsd.org>
Cc: <kern-bug-people@netbsd.org>, <netbsd-bugs@netbsd.org>,
<gnats-admin@netbsd.org>
Subject: Re: kern/55406 (NVMM panic on qemu start on intel)
Date: Tue, 23 Jun 2020 19:10:25 +0200
Le 23/06/2020 à 19:07, nia a écrit :
> On Tue, Jun 23, 2020 at 04:11:00PM +0000, maxv@NetBSD.org wrote:
>> Did you suspend your host before launching the VM?
>
> Probably, the host rarely gets powered off but does get suspended
> every night.
Well, that's where the problem comes from. I didn't put a sleep handler for
NVMM, so when the CPU wakes up it can go crazy.
In fact, it can go crazy for many other reasons, because many x86 states
are not restored properly at wake-up time.
For now I wouldn't recommend using sleep states at all.
From: nia <nia@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, nia@pkgsrc.org
Subject: Re: kern/55406 (NVMM panic on qemu start on intel)
Date: Wed, 24 Jun 2020 19:08:02 +0000
On Tue, Jun 23, 2020 at 05:15:02PM +0000, Maxime Villard wrote:
> Le 23/06/2020 à 19:07, nia a écrit :
> > On Tue, Jun 23, 2020 at 04:11:00PM +0000, maxv@NetBSD.org wrote:
> >> Did you suspend your host before launching the VM?
> >
> > Probably, the host rarely gets powered off but does get suspended
> > every night.
>
> Well, that's where the problem comes from. I didn't put a sleep handler for
> NVMM, so when the CPU wakes up it can go crazy.
>
> In fact, it can go crazy for many other reasons, because many x86 states
> are not restored properly at wake-up time.
>
> For now I wouldn't recommend using sleep states at all.
Can NVMM refuse to run on a system that has been resumed until this is fixed?
This would at least help avoid accidental unclean shutdowns.
From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55406 CVS commit: src/sys
Date: Thu, 25 Jun 2020 17:01:20 +0000
Module Name: src
Committed By: maxv
Date: Thu Jun 25 17:01:20 UTC 2020
Modified Files:
src/sys/dev/nvmm: files.nvmm nvmm.c
src/sys/modules/nvmm: nvmm.ioconf
Log Message:
Register NVMM as an actual pseudo-device. Without PMF handler, to
explicitly disallow ACPI suspend if NVMM is running.
Should fix PR/55406.
To generate a diff of this commit:
cvs rdiff -u -r1.2 -r1.3 src/sys/dev/nvmm/files.nvmm
cvs rdiff -u -r1.30 -r1.31 src/sys/dev/nvmm/nvmm.c
cvs rdiff -u -r1.1 -r1.2 src/sys/modules/nvmm/nvmm.ioconf
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Maxime Villard <max@m00nbsd.net>
To: nia <nia@NetBSD.org>, <gnats-bugs@netbsd.org>
Cc: <kern-bug-people@netbsd.org>, <netbsd-bugs@netbsd.org>,
<gnats-admin@netbsd.org>
Subject: Re: kern/55406 (NVMM panic on qemu start on intel)
Date: Thu, 25 Jun 2020 19:06:29 +0200
Should be fixed, please test:
(1) modload nvmm
(2) try to suspend, should fail because nvmm blocks
(3) modunload nvmm
(4) try to suspend, this time should work
(5) resume
(6) modload nvmm
(7) launch vm, should work
From: Maxime Villard <max@m00nbsd.net>
To: nia <nia@NetBSD.org>, <gnats-bugs@netbsd.org>
Cc: <kern-bug-people@netbsd.org>, <netbsd-bugs@netbsd.org>,
<gnats-admin@netbsd.org>
Subject: Re: kern/55406 (NVMM panic on qemu start on intel)
Date: Sun, 12 Jul 2020 11:56:43 +0200
Le 25/06/2020 à 19:06, Maxime Villard a écrit :
> Should be fixed, please test:
>
> (1) modload nvmm
> (2) try to suspend, should fail because nvmm blocks
> (3) modunload nvmm
> (4) try to suspend, this time should work
> (5) resume
> (6) modload nvmm
> (7) launch vm, should work
ping? waiting for feedback before pullup-9
also I have other fixes in the pipeline
From: nia <nia@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, nia@pkgsrc.org
Subject: Re: kern/55406 (NVMM panic on qemu start on intel)
Date: Tue, 21 Jul 2020 12:34:56 +0000
On Sun, Jul 12, 2020 at 10:00:02AM +0000, Maxime Villard wrote:
> Le 25/06/2020 à 19:06, Maxime Villard a écrit :
> > Should be fixed, please test:
> >
> > (1) modload nvmm
> > (2) try to suspend, should fail because nvmm blocks
> > (3) modunload nvmm
> > (4) try to suspend, this time should work
> > (5) resume
> > (6) modload nvmm
> > (7) launch vm, should work
>
> ping? waiting for feedback before pullup-9
>
> also I have other fixes in the pipeline
Works as described, please pullup.
Sorry for the delay, was waiting for -current to be stable again
State-Changed-From-To: feedback->needs-pullups
State-Changed-By: nia@NetBSD.org
State-Changed-When: Tue, 21 Jul 2020 12:37:12 +0000
State-Changed-Why:
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55406 CVS commit: [netbsd-9] src/sys
Date: Sun, 2 Aug 2020 08:49:08 +0000
Module Name: src
Committed By: martin
Date: Sun Aug 2 08:49:08 UTC 2020
Modified Files:
src/sys/dev/nvmm [netbsd-9]: files.nvmm nvmm.c nvmm_internal.h
src/sys/dev/nvmm/x86 [netbsd-9]: nvmm_x86_svm.c nvmm_x86_vmx.c
src/sys/modules/nvmm [netbsd-9]: nvmm.ioconf
Log Message:
Pull up following revision(s) (requested by maxv in ticket #1032):
sys/dev/nvmm/x86/nvmm_x86_vmx.c: revision 1.60 (patch)
sys/dev/nvmm/x86/nvmm_x86_vmx.c: revision 1.61 (patch)
sys/dev/nvmm/nvmm.c: revision 1.30
sys/dev/nvmm/nvmm.c: revision 1.31
sys/dev/nvmm/nvmm.c: revision 1.32
sys/dev/nvmm/nvmm_internal.h: revision 1.15
sys/dev/nvmm/nvmm_internal.h: revision 1.16
sys/dev/nvmm/files.nvmm: revision 1.3
sys/dev/nvmm/x86/nvmm_x86_svm.c: revision 1.62 (patch)
sys/dev/nvmm/x86/nvmm_x86_svm.c: revision 1.63 (patch)
sys/dev/nvmm/x86/nvmm_x86_vmx.c: revision 1.59 (patch)
sys/modules/nvmm/nvmm.ioconf: revision 1.2
Gather the conditions to return from the VCPU loops in nvmm_return_needed(),
and use it in nvmm_do_vcpu_run() as well. This fixes two undesired behaviors:
- When a VM initializes, the many nested page faults that need processing
could cause the calling thread to occupy the CPU too much if we're unlucky
and are only getting repeated nested page faults thousands of times in a
row.
- When the emulator calls nvmm_vcpu_run() and immediately sends a signal to
stop the VCPU, it's better to check signals earlier and leave right away,
rather than doing a round of VCPU run that could increase the time spent
by the emulator waiting for the return.
style
Register NVMM as an actual pseudo-device. Without PMF handler, to
explicitly disallow ACPI suspend if NVMM is running.
Should fix PR/55406.
Print the backend name when attaching.
To generate a diff of this commit:
cvs rdiff -u -r1.2 -r1.2.6.1 src/sys/dev/nvmm/files.nvmm
cvs rdiff -u -r1.22.2.4 -r1.22.2.5 src/sys/dev/nvmm/nvmm.c
cvs rdiff -u -r1.12.2.2 -r1.12.2.3 src/sys/dev/nvmm/nvmm_internal.h
cvs rdiff -u -r1.46.4.5 -r1.46.4.6 src/sys/dev/nvmm/x86/nvmm_x86_svm.c
cvs rdiff -u -r1.36.2.7 -r1.36.2.8 src/sys/dev/nvmm/x86/nvmm_x86_vmx.c
cvs rdiff -u -r1.1 -r1.1.8.1 src/sys/modules/nvmm/nvmm.ioconf
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: needs-pullups->closed
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Sun, 02 Aug 2020 09:19:18 +0000
State-Changed-Why:
Fixed, thanks.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.