NetBSD Problem Report #54724

From buhrow@lothlorien.nfbcal.org  Wed Nov 27 23:48:56 2019
Return-Path: <buhrow@lothlorien.nfbcal.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 89BC57A186
	for <gnats-bugs@www.NetBSD.org>; Wed, 27 Nov 2019 23:48:56 +0000 (UTC)
Message-Id: <201911272348.xARNmsFb001241@lothlorien.nfbcal.org>
Date: Wed, 27 Nov 2019 15:48:54 -0800 (PST)
From: buhrow@nfbcal.org
Reply-To: buhrow@nfbcal.org
To: gnats-bugs@www.NetBSD.org
Subject: ZFS/Zvol corrupts kernel memory when running with xen on dom0
X-Send-Pr-Version: 3.95

>Number:         54724
>Category:       kern
>Synopsis:       ZFS/Zvol corrupts kernel memory when running with xen on dom0
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 27 23:50:00 +0000 2019
>Last-Modified:  Thu Feb 27 18:15:01 +0000 2020
>Originator:     Brian Buhrow
>Release:        NetBSD 9.0_BETA
>Organization:
	NFB of California
>Environment:


System: NetBSD via.net 9.0_BETA NetBSD 9.0_BETA (VIANET_DOM0) #7: Fri Nov 22 13:36:22 PST 2019  buhrow@viadev64.via.net:/usr/local/netbsd/obj-64/sys/arch/amd64/compile/VIANET_DOM0 amd64
Architecture: amd64
Machine: amd64
>Description:

	When using zfs zvols as domu disk storage, zfs corrupts kernel memory
on dom0 so that communications through the xbd drivers are corrupted
between the dom0 and the domu's.  The domu's panic with the error: Panic:
biodon2 already
Since we're talking about kernel memory corruption, the dom0 can also
crash, though it's not guaranteed.  I created some patches to try and
instrument more graceful error recovery on the part of the xbdback_xenbus
driver, which helps with the domu panic, but doesn't address the underlying
problem.
	I've verified that this problem does not occur if a vnd(4) based file
is used as the backingstore for the domu disk on the same system.
	It would be nice if zvols could be used for backingstore on domu
disks, since it would ease the management of multiple domu's on a system,
not to mention reducing the time it takes to create a domu disk image on
the dom0 system.
>How-To-Repeat:

1.  Build a stock dom0 kernel with NetBSD-9.0BETA.
Install it, using the xen-debug.gz kernel.  (problem shows up using either
Xen-4.8 or 4.11).

2.  Create a domu with some version of NetBSD on it that uses the xbd disk
for its data.  Create a zfs zvol  with the size set to the size of the
domu's disk you want to use.
I'll include a sample domu config file, below.

3.  Install the NetBSD-9 source tree on the domu.

4.  Run a build.sh release  on the domu, dumping the output into a log
file.

5.  Crash!

Here's the output I see from the kernel,as well as the xen hypervisor
(debug version)

(XEN) grant_table.c:591:d0v0 Bad flags (0) or dom (0). (expected dom 0)
[ 3133.1700793] xen_shm_map: op[0].status = -1 (2)

<The previous line indicates that gref[0] is corrupt and that there were 2
gref entries in the array at the time of the failure, see
sys/arch/xen/x86/xen_shm_machdep.c:180
/*      $NetBSD: xen_shm_machdep.c,v 1.13 2019/01/27 02:08:39 pgoyette Exp $      */

[ 3133.1700793] xbdback_map_shm: xen_shm error -1 xbd IO domain 1: error -1
[ 3133.1700793] xbdback_io domain 1: end request 30 error=-1
[ 3133.1700793] xbdback_io domain 1: end request 1 error=-1

	At this point, the domu is nonfunctional and, very likely, so is the
dom0.


<Here is the sample domu config file>

#  -*- mode: python; -*-
#============================================================================
# Python configuration setup for 'xl create'.
# This script sets the parameters used when a domain is created using 'xm create'.
# You use a separate script for each domain you want to create, or 
# you can set the parameters for the domain on the xm command line.
#============================================================================

#----------------------------------------------------------------------------
# Kernel image file.
kernel = "/var/xen/vianet/viadev64_h/netbsd"

# Initial memory allocation (in megabytes) for the new domain.
memory = 8192

# Number of Virtual CPUS to use, default is 1
vcpus = 1


# A name for your domain. All domains must have different names.
name = "viadev64_h_via_net"

#----------------------------------------------------------------------------
# network configuration.
# The mac address is optional, it will use a random one if not specified.
# By default we create a bridged configuration; when a vif is created
# the script /usr/pkg/etc/xen/scripts/vif-bridge is called to connect
# the bridge to the designated bridge (the bridge should already be up)
vif = [  'bridge=bridge0' ]

#it's possible to use a different script when the vif is created;
# for example to use a routed setup instead of bridged:
# vif = [ 'mac=00:16:3e:00:00:11, ip=10.0.0.1 netmask 255.255.255.0, script=vif-ip' ]

#----------------------------------------------------------------------------
# Define the disk devices you want the domain to have access to, and
# what you want them accessible as.
# Each disk entry is of the form phy:UNAME,DEV,MODE
# where UNAME is the device, DEV is the device name the domain will see,
# and MODE is r for read-only, w for read-write.
# For NetBSD guest DEV doesn't matter, so we can just use increasing numbers
# here. For linux guests you have to use a linux device name (e.g. hda1)
# or the corresponding device number (e.g 0x301 for hda1)

disk = [ '/dev/zvol/dsk/xendisks/viadev64_h,raw,0x1,w' ]

#----------------------------------------------------------------------------
# Boot parameters (e.g. -s, -a, ...)
extra = ""

#============================================================================

#Reboot after shutdowns
on_poweroff = "restart"

>Fix:

	Don't know how to correct the problem at this time.  I tried compiling
the dom0 kernel with options KASAN, but that doesn't seem to be supported
on xen kernels, even if they're on amd64 hardware.

I was hoping to catch zfs in the act of committing its corruption.

>Audit-Trail:
From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: kern/54724: ZFS/Zvol corrupts kernel memory when running with xen
 on dom0
Date: Wed, 26 Feb 2020 22:44:38 +0100

 Can you confirm that you build your solaris and zfs modules with
 options DEBUG, same as the default XEN DOM0 kernel?

 Jaromir

From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: kern/54724: ZFS/Zvol corrupts kernel memory when running with xen
 on dom0
Date: Wed, 26 Feb 2020 22:47:22 +0100

 Err sorry, misread - DEBUG is not enabled for XEN3_DOM0

 Jaromir

 Le mer. 26 f=C3=A9vr. 2020 =C3=A0 22:44, Jarom=C3=ADr Dole=C4=8Dek
 <jaromir.dolecek@gmail.com> a =C3=A9crit :
 >
 > Can you confirm that you build your solaris and zfs modules with
 > options DEBUG, same as the default XEN DOM0 kernel?
 >
 > Jaromir

From: Brian Buhrow <buhrow@nfbcal.org>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org, jaromir.dolecek@gmail.com
Cc: buhrow@nfbcal.org
Subject: Re: kern/54724: ZFS/Zvol corrupts kernel memory when running with xen on dom0
Date: Thu, 27 Feb 2020 10:11:22 -0800

 	hello.  the zfs and solaris modules I'm using were built as part of
 the build .sh release for the -9 branch of the NetBSD tree in October of
 2019.  So, I assume the same options were used to build both the kernel and
 the modules, since they were part of the same build run.
 Is there a command I should use to verify this?

 -thanks
 -Brian

 On Feb 26,  9:45pm, =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= wrote:
 } Subject: Re: kern/54724: ZFS/Zvol corrupts kernel memory when running with
 } The following reply was made to PR kern/54724; it has been noted by GNATS.
 } 
 } From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
 } To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
 } Cc: 
 } Subject: Re: kern/54724: ZFS/Zvol corrupts kernel memory when running with xen
 }  on dom0
 } Date: Wed, 26 Feb 2020 22:44:38 +0100
 } 
 }  Can you confirm that you build your solaris and zfs modules with
 }  options DEBUG, same as the default XEN DOM0 kernel?
 }  
 }  Jaromir
 }  
 >-- End of excerpt from =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?=


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.