NetBSD Problem Report #55406

From www@netbsd.org  Mon Jun 22 12:38:59 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id F03831A9217
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 22 Jun 2020 12:38:58 +0000 (UTC)
Message-Id: <20200622123857.EA6D21A9246@mollari.NetBSD.org>
Date: Mon, 22 Jun 2020 12:38:57 +0000 (UTC)
From: nia@pkgsrc.org
Reply-To: nia@pkgsrc.org
To: gnats-bugs@NetBSD.org
Subject: NVMM panic on qemu start on intel
X-Send-Pr-Version: www-1.0

>Number:         55406
>Category:       kern
>Synopsis:       NVMM panic on qemu start on intel
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jun 22 12:40:00 +0000 2020
>Closed-Date:    Sun Aug 02 09:19:18 +0000 2020
>Last-Modified:  Sun Aug 02 09:19:18 +0000 2020
>Originator:     nia
>Release:        -current (and 9.0)
>Organization:
>Environment:
NetBSD r 9.99.64 NetBSD 9.99.64 (R) #12: Sun May 31 09:31:31 IST 2020  nia@r:/home/nia/src/sys/arch/amd64/compile/obj/R amd64
>Description:
After starting a QEMU VM with -accel nvmm, I get this (apparently) nondeterministic panic. This is an intel CPU.

fatal privileged instruction fault in supervisor mode                                                                                                                         
trap type 0 code 0 rip 0xffffffff813ed454 cs 0x8 rflags 0x10246 cr2 0x77e2d8c1d000 ilevel 0 rsp 0xffffcb80c21e2d90                                                            
curlwp 0xffffec096c4584c0 pid 24690.13443 lowest kstack 0xffffcb80c21df2c0

db> crash> callout
    ticks  wheel               arg  func
crash: _kvm_kvatop(0)
crash: kvm_read(0x0, 16592): invalid translation (invalid level 4 PDE)
db> crash> bt
_KERNEL_OPT_NARCNET() at 0
_KERNEL_OPT_NARCNET() at 0
sys_reboot() at sys_reboot
db_reboot_cmd() at db_reboot_cmd
db_command() at db_command+0x127
db_command_loop() at db_command_loop+0xa6
db_trap() at db_trap+0xe6
kdb_trap() at kdb_trap+0xe1
trap() at trap+0x2b7
--- trap (number 0) ---
vmx_vmcs_enter() at vmx_vmcs_enter+0xe3
vmx_vcpu_create() at vmx_vcpu_create+0x144
nvmm_ioctl() at nvmm_ioctl+0x292
sys_ioctl() at sys_ioctl+0x550
syscall() at syscall+0x26e
--- syscall (number 54) ---
77e2d27681ba:
>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->feedback
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Tue, 23 Jun 2020 16:10:59 +0000
State-Changed-Why:
Did you suspend your host before launching the VM?


From: Jukka Ruohonen <jruohonen@iki.fi>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55406: NVMM panic on qemu start on intel
Date: Tue, 23 Jun 2020 19:15:43 +0300

 On Mon, Jun 22, 2020 at 12:40:00PM +0000, nia@pkgsrc.org wrote:
 > After starting a QEMU VM with -accel nvmm, I get this (apparently)
 > nondeterministic panic.  This is an intel CPU.

 I don't believe this bug is non-deterministic; I can also reproduce the
 panic and its trace.

From: nia <nia@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
	gnats-admin@netbsd.org, maxv@NetBSD.org, nia@pkgsrc.org
Subject: Re: kern/55406 (NVMM panic on qemu start on intel)
Date: Tue, 23 Jun 2020 17:07:19 +0000

 On Tue, Jun 23, 2020 at 04:11:00PM +0000, maxv@NetBSD.org wrote:
 > Did you suspend your host before launching the VM?

 Probably, the host rarely gets powered off but does get suspended
 every night.

From: Maxime Villard <max@m00nbsd.net>
To: nia <nia@NetBSD.org>, <gnats-bugs@netbsd.org>
Cc: <kern-bug-people@netbsd.org>, <netbsd-bugs@netbsd.org>,
	<gnats-admin@netbsd.org>
Subject: Re: kern/55406 (NVMM panic on qemu start on intel)
Date: Tue, 23 Jun 2020 19:10:25 +0200

 Le 23/06/2020 à 19:07, nia a écrit :
 > On Tue, Jun 23, 2020 at 04:11:00PM +0000, maxv@NetBSD.org wrote:
 >> Did you suspend your host before launching the VM?
 > 
 > Probably, the host rarely gets powered off but does get suspended
 > every night.

 Well, that's where the problem comes from. I didn't put a sleep handler for
 NVMM, so when the CPU wakes up it can go crazy.

 In fact, it can go crazy for many other reasons, because many x86 states
 are not restored properly at wake-up time.

 For now I wouldn't recommend using sleep states at all.

From: nia <nia@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, nia@pkgsrc.org
Subject: Re: kern/55406 (NVMM panic on qemu start on intel)
Date: Wed, 24 Jun 2020 19:08:02 +0000

 On Tue, Jun 23, 2020 at 05:15:02PM +0000, Maxime Villard wrote:
 >  Le 23/06/2020 à 19:07, nia a écrit :
 >  > On Tue, Jun 23, 2020 at 04:11:00PM +0000, maxv@NetBSD.org wrote:
 >  >> Did you suspend your host before launching the VM?
 >  > 
 >  > Probably, the host rarely gets powered off but does get suspended
 >  > every night.
 >  
 >  Well, that's where the problem comes from. I didn't put a sleep handler for
 >  NVMM, so when the CPU wakes up it can go crazy.
 >  
 >  In fact, it can go crazy for many other reasons, because many x86 states
 >  are not restored properly at wake-up time.
 >  
 >  For now I wouldn't recommend using sleep states at all.

 Can NVMM refuse to run on a system that has been resumed until this is fixed?
 This would at least help avoid accidental unclean shutdowns.

From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55406 CVS commit: src/sys
Date: Thu, 25 Jun 2020 17:01:20 +0000

 Module Name:	src
 Committed By:	maxv
 Date:		Thu Jun 25 17:01:20 UTC 2020

 Modified Files:
 	src/sys/dev/nvmm: files.nvmm nvmm.c
 	src/sys/modules/nvmm: nvmm.ioconf

 Log Message:
 Register NVMM as an actual pseudo-device. Without PMF handler, to
 explicitly disallow ACPI suspend if NVMM is running.

 Should fix PR/55406.


 To generate a diff of this commit:
 cvs rdiff -u -r1.2 -r1.3 src/sys/dev/nvmm/files.nvmm
 cvs rdiff -u -r1.30 -r1.31 src/sys/dev/nvmm/nvmm.c
 cvs rdiff -u -r1.1 -r1.2 src/sys/modules/nvmm/nvmm.ioconf

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Maxime Villard <max@m00nbsd.net>
To: nia <nia@NetBSD.org>, <gnats-bugs@netbsd.org>
Cc: <kern-bug-people@netbsd.org>, <netbsd-bugs@netbsd.org>,
	<gnats-admin@netbsd.org>
Subject: Re: kern/55406 (NVMM panic on qemu start on intel)
Date: Thu, 25 Jun 2020 19:06:29 +0200

 Should be fixed, please test:

 (1) modload nvmm
 (2) try to suspend, should fail because nvmm blocks
 (3) modunload nvmm
 (4) try to suspend, this time should work
 (5) resume
 (6) modload nvmm
 (7) launch vm, should work

From: Maxime Villard <max@m00nbsd.net>
To: nia <nia@NetBSD.org>, <gnats-bugs@netbsd.org>
Cc: <kern-bug-people@netbsd.org>, <netbsd-bugs@netbsd.org>,
	<gnats-admin@netbsd.org>
Subject: Re: kern/55406 (NVMM panic on qemu start on intel)
Date: Sun, 12 Jul 2020 11:56:43 +0200

 Le 25/06/2020 à 19:06, Maxime Villard a écrit :
 > Should be fixed, please test:
 > 
 > (1) modload nvmm
 > (2) try to suspend, should fail because nvmm blocks
 > (3) modunload nvmm
 > (4) try to suspend, this time should work
 > (5) resume
 > (6) modload nvmm
 > (7) launch vm, should work

 ping? waiting for feedback before pullup-9

 also I have other fixes in the pipeline

From: nia <nia@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, nia@pkgsrc.org
Subject: Re: kern/55406 (NVMM panic on qemu start on intel)
Date: Tue, 21 Jul 2020 12:34:56 +0000

 On Sun, Jul 12, 2020 at 10:00:02AM +0000, Maxime Villard wrote:
 >  Le 25/06/2020 à 19:06, Maxime Villard a écrit :
 >  > Should be fixed, please test:
 >  > 
 >  > (1) modload nvmm
 >  > (2) try to suspend, should fail because nvmm blocks
 >  > (3) modunload nvmm
 >  > (4) try to suspend, this time should work
 >  > (5) resume
 >  > (6) modload nvmm
 >  > (7) launch vm, should work
 >  
 >  ping? waiting for feedback before pullup-9
 >  
 >  also I have other fixes in the pipeline

 Works as described, please pullup.

 Sorry for the delay, was waiting for -current to be stable again

State-Changed-From-To: feedback->needs-pullups
State-Changed-By: nia@NetBSD.org
State-Changed-When: Tue, 21 Jul 2020 12:37:12 +0000
State-Changed-Why:


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55406 CVS commit: [netbsd-9] src/sys
Date: Sun, 2 Aug 2020 08:49:08 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sun Aug  2 08:49:08 UTC 2020

 Modified Files:
 	src/sys/dev/nvmm [netbsd-9]: files.nvmm nvmm.c nvmm_internal.h
 	src/sys/dev/nvmm/x86 [netbsd-9]: nvmm_x86_svm.c nvmm_x86_vmx.c
 	src/sys/modules/nvmm [netbsd-9]: nvmm.ioconf

 Log Message:
 Pull up following revision(s) (requested by maxv in ticket #1032):

 	sys/dev/nvmm/x86/nvmm_x86_vmx.c: revision 1.60 (patch)
 	sys/dev/nvmm/x86/nvmm_x86_vmx.c: revision 1.61 (patch)
 	sys/dev/nvmm/nvmm.c: revision 1.30
 	sys/dev/nvmm/nvmm.c: revision 1.31
 	sys/dev/nvmm/nvmm.c: revision 1.32
 	sys/dev/nvmm/nvmm_internal.h: revision 1.15
 	sys/dev/nvmm/nvmm_internal.h: revision 1.16
 	sys/dev/nvmm/files.nvmm: revision 1.3
 	sys/dev/nvmm/x86/nvmm_x86_svm.c: revision 1.62 (patch)
 	sys/dev/nvmm/x86/nvmm_x86_svm.c: revision 1.63 (patch)
 	sys/dev/nvmm/x86/nvmm_x86_vmx.c: revision 1.59 (patch)
 	sys/modules/nvmm/nvmm.ioconf: revision 1.2

 Gather the conditions to return from the VCPU loops in nvmm_return_needed(),
 and use it in nvmm_do_vcpu_run() as well. This fixes two undesired behaviors:

  - When a VM initializes, the many nested page faults that need processing
    could cause the calling thread to occupy the CPU too much if we're unlucky
    and are only getting repeated nested page faults thousands of times in a
    row.

  - When the emulator calls nvmm_vcpu_run() and immediately sends a signal to
    stop the VCPU, it's better to check signals earlier and leave right away,
    rather than doing a round of VCPU run that could increase the time spent
    by the emulator waiting for the return.

 style

 Register NVMM as an actual pseudo-device. Without PMF handler, to
 explicitly disallow ACPI suspend if NVMM is running.

 Should fix PR/55406.

 Print the backend name when attaching.


 To generate a diff of this commit:
 cvs rdiff -u -r1.2 -r1.2.6.1 src/sys/dev/nvmm/files.nvmm
 cvs rdiff -u -r1.22.2.4 -r1.22.2.5 src/sys/dev/nvmm/nvmm.c
 cvs rdiff -u -r1.12.2.2 -r1.12.2.3 src/sys/dev/nvmm/nvmm_internal.h
 cvs rdiff -u -r1.46.4.5 -r1.46.4.6 src/sys/dev/nvmm/x86/nvmm_x86_svm.c
 cvs rdiff -u -r1.36.2.7 -r1.36.2.8 src/sys/dev/nvmm/x86/nvmm_x86_vmx.c
 cvs rdiff -u -r1.1 -r1.1.8.1 src/sys/modules/nvmm/nvmm.ioconf

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: needs-pullups->closed
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Sun, 02 Aug 2020 09:19:18 +0000
State-Changed-Why:
Fixed, thanks.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.