NetBSD Problem Report #55397
From kardel@Kardel.name Fri Jun 19 08:47:47 2020
Return-Path: <kardel@Kardel.name>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 3DFBB1A9217
for <gnats-bugs@gnats.NetBSD.org>; Fri, 19 Jun 2020 08:47:47 +0000 (UTC)
Message-Id: <20200619084743.E55E744B33@Andromeda.Kardel.name>
Date: Fri, 19 Jun 2020 10:47:43 +0200 (CEST)
From: kardel@netbsd.org
Reply-To: kardel@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: Xen pvh instance: zpool create on xbd* devices panics
X-Send-Pr-Version: 3.95
>Number: 55397
>Category: kern
>Synopsis: Xen pvh instance: zpool create on xbd* devices panics
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: jdolecek
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Jun 19 08:50:00 +0000 2020
>Closed-Date: Sat Jun 20 12:16:36 +0000 2020
>Last-Modified: Sat Jun 20 12:16:36 +0000 2020
>Originator: Frank Kardel
>Release: NetBSD 9.99.65
>Organization:
>Environment:
System: NetBSD abstest2 9.99.65 NetBSD 9.99.65 (GENERIC) #2: Tue Jun 9 01:58:38 CEST 2020 kardel@Andromeda:/src/NetBSD/cur/src/obj.amd64/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
Attempting to
zpool create data mirror xbd1 xbd2
on a xen pvh instance panics like this:
[ 451.7783806] panic: kernel diagnostic assertion "(req->req_bp->b_flags & B_PHYS) != 0" failed: file "/src/NetBSD/cur/src/sys/arch/xen/xen/xbd_xenbus.c", line 1373
[ 451.7783806] cpu0: Begin traceback...
[ 451.7783806] vpanic() at netbsd:vpanic+0x152
[ 451.7783806] __x86_indirect_thunk_rax() at netbsd:__x86_indirect_thunk_rax
[ 451.7783806] xbd_diskstart() at netbsd:xbd_diskstart+0x7c2
[ 451.7783806] dk_start() at netbsd:dk_start+0xef
[ 451.7783806] spec_strategy() at netbsd:spec_strategy+0x9f
[ 451.7783806] VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x3c
[ 451.7783806] zio_vdev_io_start() at zfs:zio_vdev_io_start+0x192
[ 451.7783806] zio_execute() at zfs:zio_execute+0xe3
[ 451.7783806] zio_nowait() at zfs:zio_nowait+0x5c
[ 451.7783806] vdev_mirror_io_start() at zfs:vdev_mirror_io_start+0x157
[ 451.7783806] zio_vdev_io_start() at zfs:zio_vdev_io_start+0x192
[ 451.7783806] zio_execute() at zfs:zio_execute+0xe3
[ 451.7783806] zio_nowait() at zfs:zio_nowait+0x5c
[ 451.7783806] vdev_mirror_io_start() at zfs:vdev_mirror_io_start+0x157
[ 451.7783806] zio_vdev_io_start() at zfs:zio_vdev_io_start+0x33f
[ 451.7783806] zio_execute() at zfs:zio_execute+0xe3
[ 451.7783806] task_executor() at solaris:task_executor+0x67
[ 451.7783806] threadpool_thread() at netbsd:threadpool_thread+0x19e
[ 451.7783806] cpu0: End traceback...
[ 451.7783806] rebooting..
>How-To-Repeat:
perform "zpool create pool mirror xbd? xbd?" on a pvh xen instance with
appropriate disks.
>Fix:
?
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Fri, 19 Jun 2020 09:10:34 +0000
Responsible-Changed-Why:
Mine. The assertion is there to catch cases which trigger non-optimal
xbd I/O, to weed out those from kernel code. I'll check how is it
triggered by zfs.
From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org, jdolecek@netbsd.org, kern-bug-people@netbsd.org,
netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Cc:
Subject: Re: kern/55397 (Xen pvh instance: zpool create on xbd* devices
panics)
Date: Fri, 19 Jun 2020 13:40:26 +0200
Good.
I found another subsystem triggering the assertion.
Try "fdisk raidX" to panic like this:
[ 300.6631030] panic: kernel diagnostic assertion "(req->req_bp->b_flags
& B_PHYS) != 0" failed: file
"/src/NetBSD/cur/src/sys/arch/xen/xen/xbd_xenbus.c", line 1373
[ 300.6631030] cpu2: Begin traceback...
[ 300.6631030] vpanic() at netbsd:vpanic+0x152
[ 300.6631030] __x86_indirect_thunk_rax() at netbsd:__x86_indirect_thunk_rax
[ 300.6631030] xbd_diskstart() at netbsd:xbd_diskstart+0x7c2
[ 300.6631030] dk_start() at netbsd:dk_start+0xef
[ 300.6730966] rf_DispatchKernelIO() at netbsd:rf_DispatchKernelIO+0x1a8
[ 300.6730966] rf_DiskIOEnqueue() at netbsd:rf_DiskIOEnqueue+0x103
[ 300.6730966] FireNodeList() at netbsd:FireNodeList+0x67
[ 300.6730966] rf_DispatchDAG() at netbsd:rf_DispatchDAG+0x12e
[ 300.6730966] rf_State_ExecuteDAG() at netbsd:rf_State_ExecuteDAG+0xcb
[ 300.6730966] rf_ContinueRaidAccess() at netbsd:rf_ContinueRaidAccess+0xb4
[ 300.6730966] rf_DoAccess() at netbsd:rf_DoAccess+0x10c
[ 300.6830965] raid_diskstart() at netbsd:raid_diskstart+0x100
[ 300.6830965] dk_start() at netbsd:dk_start+0xef
[ 300.6830965] rf_RaidIOThread() at netbsd:rf_RaidIOThread+0x142
[ 300.6830965] cpu2: End traceback...
[ 300.6830965] rebooting...
Frank
On 06/19/20 11:10, jdolecek@NetBSD.org wrote:
> Synopsis: Xen pvh instance: zpool create on xbd* devices panics
>
> Responsible-Changed-From-To: kern-bug-people->jdolecek
> Responsible-Changed-By: jdolecek@NetBSD.org
> Responsible-Changed-When: Fri, 19 Jun 2020 09:10:34 +0000
> Responsible-Changed-Why:
> Mine. The assertion is there to catch cases which trigger non-optimal
> xbd I/O, to weed out those from kernel code. I'll check how is it
> triggered by zfs.
>
>
>
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55397 CVS commit: src/sys/kern
Date: Fri, 19 Jun 2020 13:49:38 +0000
Module Name: src
Committed By: jdolecek
Date: Fri Jun 19 13:49:38 UTC 2020
Modified Files:
src/sys/kern: subr_pool.c
Log Message:
bump the limit on max item size for pool_init()/pool_cache_init() up
to 1 << 24, so that the pools can be used for ZFS block allocations, which
are up to SPA_MAXBLOCKSHIFT (1 << 24)
part of PR kern/55397 by Frank Kardel
To generate a diff of this commit:
cvs rdiff -u -r1.272 -r1.273 src/sys/kern/subr_pool.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55397 CVS commit: src/sys/sys
Date: Fri, 19 Jun 2020 13:52:40 +0000
Module Name: src
Committed By: jdolecek
Date: Fri Jun 19 13:52:40 UTC 2020
Modified Files:
src/sys/sys: param.h
Log Message:
bump version - maximum item size for pool_init()/pool_cache_init() changed
PR kern/55397
To generate a diff of this commit:
cvs rdiff -u -r1.670 -r1.671 src/sys/sys/param.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55397 CVS commit: src/external/cddl/osnet/dist/uts/common/fs/zfs
Date: Fri, 19 Jun 2020 14:13:23 +0000
Module Name: src
Committed By: jdolecek
Date: Fri Jun 19 14:13:23 UTC 2020
Modified Files:
src/external/cddl/osnet/dist/uts/common/fs/zfs: zio.c
Log Message:
use pool_cache for (meta)data buffers also on NetBSD
this should generally slightly improve performance on MP systems, and
specifically for xbd(4) storage avoids slow unaligned I/O buffer handling
this change requires updated kernel, to allow up to SPA_MAXBLOCKSHIFT item
size for pools
fixes PR kern/55397 by Frank Kardel
To generate a diff of this commit:
cvs rdiff -u -r1.6 -r1.7 src/external/cddl/osnet/dist/uts/common/fs/zfs/zio.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: Frank Kardel <kardel@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, Jaromir Dolecek <jdolecek@netbsd.org>,
kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/55397 (Xen pvh instance: zpool create on xbd* devices panics)
Date: Fri, 19 Jun 2020 16:26:18 +0200
Hi,
I've fixed the problem for zfs, you'll need an updated kernel
(src/sys/kern/subr_pool.c) and the zfs module sources
(src/external/....). Can you confirm it works for you?
I can repeat the raidframe path too, I'm working on a fix there.
Jaromir
Le ven. 19 juin 2020 =C3=A0 13:40, Frank Kardel <kardel@netbsd.org> a =C3=
=A9crit :
>
> Good.
>
> I found another subsystem triggering the assertion.
>
> Try "fdisk raidX" to panic like this:
>
> [ 300.6631030] panic: kernel diagnostic assertion "(req->req_bp->b_flags
> & B_PHYS) !=3D 0" failed: file
> "/src/NetBSD/cur/src/sys/arch/xen/xen/xbd_xenbus.c", line 1373
> [ 300.6631030] cpu2: Begin traceback...
> [ 300.6631030] vpanic() at netbsd:vpanic+0x152
> [ 300.6631030] __x86_indirect_thunk_rax() at netbsd:__x86_indirect_thunk_=
rax
> [ 300.6631030] xbd_diskstart() at netbsd:xbd_diskstart+0x7c2
> [ 300.6631030] dk_start() at netbsd:dk_start+0xef
> [ 300.6730966] rf_DispatchKernelIO() at netbsd:rf_DispatchKernelIO+0x1a8
> [ 300.6730966] rf_DiskIOEnqueue() at netbsd:rf_DiskIOEnqueue+0x103
> [ 300.6730966] FireNodeList() at netbsd:FireNodeList+0x67
> [ 300.6730966] rf_DispatchDAG() at netbsd:rf_DispatchDAG+0x12e
> [ 300.6730966] rf_State_ExecuteDAG() at netbsd:rf_State_ExecuteDAG+0xcb
> [ 300.6730966] rf_ContinueRaidAccess() at netbsd:rf_ContinueRaidAccess+0x=
b4
> [ 300.6730966] rf_DoAccess() at netbsd:rf_DoAccess+0x10c
> [ 300.6830965] raid_diskstart() at netbsd:raid_diskstart+0x100
> [ 300.6830965] dk_start() at netbsd:dk_start+0xef
> [ 300.6830965] rf_RaidIOThread() at netbsd:rf_RaidIOThread+0x142
> [ 300.6830965] cpu2: End traceback...
> [ 300.6830965] rebooting...
>
> Frank
>
>
> On 06/19/20 11:10, jdolecek@NetBSD.org wrote:
> > Synopsis: Xen pvh instance: zpool create on xbd* devices panics
> >
> > Responsible-Changed-From-To: kern-bug-people->jdolecek
> > Responsible-Changed-By: jdolecek@NetBSD.org
> > Responsible-Changed-When: Fri, 19 Jun 2020 09:10:34 +0000
> > Responsible-Changed-Why:
> > Mine. The assertion is there to catch cases which trigger non-optimal
> > xbd I/O, to weed out those from kernel code. I'll check how is it
> > triggered by zfs.
> >
> >
> >
>
From: Frank Kardel <kardel@netbsd.org>
To: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>,
Jaromir Dolecek <jdolecek@netbsd.org>, kern-bug-people@netbsd.org,
netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/55397 (Xen pvh instance: zpool create on xbd* devices
panics)
Date: Fri, 19 Jun 2020 17:24:39 +0200
Looks promising:
zpool create: sucess
zfs create (several): sucess
initial writes didn't panic.
Thanks for the quick fix!
Frank
On 06/19/20 16:26, Jaromír Doleček wrote:
> Hi,
>
> I've fixed the problem for zfs, you'll need an updated kernel
> (src/sys/kern/subr_pool.c) and the zfs module sources
> (src/external/....). Can you confirm it works for you?
>
> I can repeat the raidframe path too, I'm working on a fix there.
>
> Jaromir
>
> Le ven. 19 juin 2020 à 13:40, Frank Kardel <kardel@netbsd.org> a écrit :
>> Good.
>>
>> I found another subsystem triggering the assertion.
>>
>> Try "fdisk raidX" to panic like this:
>>
>> [ 300.6631030] panic: kernel diagnostic assertion "(req->req_bp->b_flags
>> & B_PHYS) != 0" failed: file
>> "/src/NetBSD/cur/src/sys/arch/xen/xen/xbd_xenbus.c", line 1373
>> [ 300.6631030] cpu2: Begin traceback...
>> [ 300.6631030] vpanic() at netbsd:vpanic+0x152
>> [ 300.6631030] __x86_indirect_thunk_rax() at netbsd:__x86_indirect_thunk_rax
>> [ 300.6631030] xbd_diskstart() at netbsd:xbd_diskstart+0x7c2
>> [ 300.6631030] dk_start() at netbsd:dk_start+0xef
>> [ 300.6730966] rf_DispatchKernelIO() at netbsd:rf_DispatchKernelIO+0x1a8
>> [ 300.6730966] rf_DiskIOEnqueue() at netbsd:rf_DiskIOEnqueue+0x103
>> [ 300.6730966] FireNodeList() at netbsd:FireNodeList+0x67
>> [ 300.6730966] rf_DispatchDAG() at netbsd:rf_DispatchDAG+0x12e
>> [ 300.6730966] rf_State_ExecuteDAG() at netbsd:rf_State_ExecuteDAG+0xcb
>> [ 300.6730966] rf_ContinueRaidAccess() at netbsd:rf_ContinueRaidAccess+0xb4
>> [ 300.6730966] rf_DoAccess() at netbsd:rf_DoAccess+0x10c
>> [ 300.6830965] raid_diskstart() at netbsd:raid_diskstart+0x100
>> [ 300.6830965] dk_start() at netbsd:dk_start+0xef
>> [ 300.6830965] rf_RaidIOThread() at netbsd:rf_RaidIOThread+0x142
>> [ 300.6830965] cpu2: End traceback...
>> [ 300.6830965] rebooting...
>>
>> Frank
>>
>>
>> On 06/19/20 11:10, jdolecek@NetBSD.org wrote:
>>> Synopsis: Xen pvh instance: zpool create on xbd* devices panics
>>>
>>> Responsible-Changed-From-To: kern-bug-people->jdolecek
>>> Responsible-Changed-By: jdolecek@NetBSD.org
>>> Responsible-Changed-When: Fri, 19 Jun 2020 09:10:34 +0000
>>> Responsible-Changed-Why:
>>> Mine. The assertion is there to catch cases which trigger non-optimal
>>> xbd I/O, to weed out those from kernel code. I'll check how is it
>>> triggered by zfs.
>>>
>>>
>>>
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55397 CVS commit: src/sys/dev/raidframe
Date: Fri, 19 Jun 2020 19:29:40 +0000
Module Name: src
Committed By: jdolecek
Date: Fri Jun 19 19:29:39 UTC 2020
Modified Files:
src/sys/dev/raidframe: rf_dag.h rf_dagfuncs.c rf_diskqueue.c
rf_diskqueue.h rf_netbsd.h rf_netbsdkintf.c
Log Message:
pass down b_flags B_PHYS|B_RAW|B_MEDIA_FLAGS from bio subsystem
to component I/O
fixes the xbd(4) KASSERT() triggered by raidframe, noted in PR kern/55397
by Frank Kardel
To generate a diff of this commit:
cvs rdiff -u -r1.20 -r1.21 src/sys/dev/raidframe/rf_dag.h
cvs rdiff -u -r1.31 -r1.32 src/sys/dev/raidframe/rf_dagfuncs.c
cvs rdiff -u -r1.56 -r1.57 src/sys/dev/raidframe/rf_diskqueue.c
cvs rdiff -u -r1.25 -r1.26 src/sys/dev/raidframe/rf_diskqueue.h
cvs rdiff -u -r1.34 -r1.35 src/sys/dev/raidframe/rf_netbsd.h
cvs rdiff -u -r1.383 -r1.384 src/sys/dev/raidframe/rf_netbsdkintf.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Fri, 19 Jun 2020 19:34:22 +0000
State-Changed-Why:
Both zfs and raidframe should be fixed with up-to-date -current, can you
please confirm?
From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org, jdolecek@netbsd.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org
Cc:
Subject: Re: kern/55397 (Xen pvh instance: zpool create on xbd* devices
panics)
Date: Sat, 20 Jun 2020 13:06:00 +0200
raidframe fdisk manipulations now work - good.
zfs/zpool initialization works - good.
But, zfs scrub runs into a "double fault" - see PR/55402
The assertion caused issues are fixed - PR can be closed
On 06/19/20 21:34, jdolecek@NetBSD.org wrote:
> Synopsis: Xen pvh instance: zpool create on xbd* devices panics
>
> State-Changed-From-To: open->feedback
> State-Changed-By: jdolecek@NetBSD.org
> State-Changed-When: Fri, 19 Jun 2020 19:34:22 +0000
> State-Changed-Why:
> Both zfs and raidframe should be fixed with up-to-date -current, can you
> please confirm?
>
>
>
State-Changed-From-To: feedback->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sat, 20 Jun 2020 12:16:36 +0000
State-Changed-Why:
Problem fixed. Thanks for report and testing.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.