NetBSD Problem Report #54273
From jarle@festningen.uninett.no Wed Jun 5 17:18:02 2019
Return-Path: <jarle@festningen.uninett.no>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id E32887A187
for <gnats-bugs@gnats.NetBSD.org>; Wed, 5 Jun 2019 17:18:02 +0000 (UTC)
Message-Id: <20190605171753.89232170493E@festningen.uninett.no>
Date: Wed, 5 Jun 2019 19:17:53 +0200 (CEST)
From: jarle@norid.no
Reply-To: jarle@norid.no
To: gnats-bugs@NetBSD.org
Subject: "zpool create pool xbd2" panics DOMU kernel
X-Send-Pr-Version: 3.95
>Number: 54273
>Category: port-xen
>Synopsis: "zpool create pool xbd2" panics DOMU kernel
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: hannken
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jun 05 17:20:00 +0000 2019
>Closed-Date: Mon Jun 17 08:12:31 +0000 2019
>Last-Modified: Mon Jun 17 08:12:31 +0000 2019
>Originator: Jarle Fredrik Greipsland
>Release: NetBSD 8.99.42 from 2019-06-03
>Organization:
>Environment:
System: NetBSD vm-2129 8.99.42 NetBSD 8.99.42 (XEN3_DOMU) #2: Tue Jun 4 09:31:29 CEST 2019 jarle@nbuilder:/build/current/amd64/obj/sys/arch/amd64/compile/XEN3_DOMU amd64
Architecture: x86_64
Machine: amd64
>Description:
A VM configured with xbd0 and xbd1 as root and swap, and then with
xbd2, xbd3 and xbd4 as separate block devices for zfs experimentation.
All the xbd* devices are freshly created logical volumes in an lvm
volume group on the hypervisor.
zfs has not previously been configured on this VM. I ran the command:
vm-2129# zpool create pool xbd2
in an xterm window, and then on the console, the following panic
message was printed:
[ 274.4901995] panic: kernel diagnostic assertion "seg <= BLKIF_MAX_SEGMENTS_PER_REQUEST" failed: file "/build/current/src/sys/arch/xen/xen/xbd_xenbus.c", line 1032
[ 274.4901995] cpu0: Begin traceback...
[ 274.4901995] vpanic() at netbsd:vpanic+0x143
[ 274.4901995] kern_assert() at netbsd:kern_assert+0x48
[ 274.4901995] xbd_diskstart() at netbsd:xbd_diskstart+0x3c7
[ 274.4901995] dk_start() at netbsd:dk_start+0xd9
[ 274.4901995] bdev_strategy() at netbsd:bdev_strategy+0x72
[ 274.4901995] spec_strategy() at netbsd:spec_strategy+0x96
[ 274.4901995] VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x54
[ 274.4901995] vdev_disk_io_start() at zfs:vdev_disk_io_start+0x124
[ 274.4901995] zio_vdev_io_start() at zfs:zio_vdev_io_start+0x12e
[ 274.4901995] zio_execute() at zfs:zio_execute+0xb0
[ 274.4901995] zio_nowait() at zfs:zio_nowait+0x5c
[ 274.4901995] vdev_label_read_config() at zfs:vdev_label_read_config+0xc8
[ 274.4901995] vdev_label_init() at zfs:vdev_label_init+0x4cc
[ 274.4901995] vdev_label_init() at zfs:vdev_label_init+0x47
[ 274.4901995] vdev_create() at zfs:vdev_create+0x5a
[ 274.5001272] spa_create() at zfs:spa_create+0x28c
[ 274.5001272] zfs_ioc_pool_create() at zfs:zfs_ioc_pool_create+0x19b
[ 274.5001272] zfsdev_ioctl() at zfs:zfsdev_ioctl+0x265
[ 274.5001272] nb_zfsdev_ioctl() at zfs:nb_zfsdev_ioctl+0x38
[ 274.5001272] VOP_IOCTL() at netbsd:VOP_IOCTL+0x3b
[ 274.5001272] vn_ioctl() at netbsd:vn_ioctl+0xa5
[ 274.5001272] sys_ioctl() at netbsd:sys_ioctl+0x547
[ 274.5001272] syscall() at netbsd:syscall+0x9c
[ 274.5001272] --- syscall (number 54) ---
[ 274.5001272] 7b61eb79534a:
[ 274.5001272] cpu0: End traceback...
[ 274.5001272] dumping to dev 142,17 (offset=4194303, size=0): not possible
[ 274.5001272] rebooting...
and the VM rebooted.
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/54273: "zpool create pool xbd2" panics DOMU kernel
Date: Fri, 7 Jun 2019 17:25:55 +0200
--Apple-Mail=_9C6087D7-48D4-4178-8004-6A2C5146CA3E
Content-Type: multipart/mixed;
boundary="Apple-Mail=_76D7F7A7-D423-4C90-88BD-B93A3CC03BA7"
--Apple-Mail=_76D7F7A7-D423-4C90-88BD-B93A3CC03BA7
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
Looks like a MAXPHYS issue, please try the attached diff.
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig
--Apple-Mail=_76D7F7A7-D423-4C90-88BD-B93A3CC03BA7
Content-Disposition: attachment;
filename=vd_maxphys.diff
Content-Type: application/octet-stream;
x-unix-mode=0644;
name="vd_maxphys.diff"
Content-Transfer-Encoding: 7bit
vd_maxphys
Try to retrieve the per-disk maximum transfer size and use it instead
of MAXPHYS. Eagerly waiting for the merge of tls-maxphys.
diff -r 008151456b52 -r c1c8cf0fb84d external/cddl/osnet/dist/uts/common/fs/zfs/sys/vdev_disk.h
--- external/cddl/osnet/dist/uts/common/fs/zfs/sys/vdev_disk.h
+++ external/cddl/osnet/dist/uts/common/fs/zfs/sys/vdev_disk.h
@@ -52,6 +52,7 @@ typedef struct vdev_disk {
char *vd_minor;
vnode_t *vd_vp;
struct workqueue *vd_wq;
+ int vd_maxphys;
#endif
} vdev_disk_t;
#endif
diff -r 008151456b52 -r c1c8cf0fb84d external/cddl/osnet/dist/uts/common/fs/zfs/vdev_disk.c
--- external/cddl/osnet/dist/uts/common/fs/zfs/vdev_disk.c
+++ external/cddl/osnet/dist/uts/common/fs/zfs/vdev_disk.c
@@ -219,6 +219,27 @@ vdev_disk_open(vdev_t *vd, uint64_t *psi
return (SET_ERROR(EINVAL));
}
+ /* XXXNETBSD Once tls-maxphys gets merged this block becomes:
+ pdk = disk_find_blk(vp->v_rdev);
+ dvd->vd_maxphys = (pdk ? disk_maxphys(pdk) : MACHINE_MAXPHYS);
+ */
+ {
+ struct buf buf = { b_bcount: MAXPHYS };
+ const char *dev_name;
+
+ dev_name = devsw_blk2name(major(vp->v_rdev));
+ if (dev_name) {
+ char disk_name[16];
+
+ snprintf(disk_name, sizeof(disk_name), "%s%d",
+ dev_name, DISKUNIT(vp->v_rdev));
+ pdk = disk_find(disk_name);
+ if (pdk && pdk->dk_driver && pdk->dk_driver->d_minphys)
+ (*pdk->dk_driver->d_minphys)(&buf);
+ }
+ dvd->vd_maxphys = buf.b_bcount;
+ }
+
/*
* XXXNETBSD Compare the devid to the stored value.
*/
@@ -421,6 +442,7 @@ vdev_disk_io_start(zio_t *zio)
zio_interrupt(zio);
return;
}
+ ASSERT3U(dvd->vd_maxphys, >, 0);
vp = dvd->vd_vp;
#endif
@@ -473,7 +495,7 @@ vdev_disk_io_start(zio_t *zio)
mutex_exit(vp->v_interlock);
}
- if (bp->b_bcount <= MAXPHYS) {
+ if (bp->b_bcount <= dvd->vd_maxphys) {
/* We can do this I/O in one pass. */
(void)VOP_STRATEGY(vp, bp);
} else {
@@ -484,7 +506,7 @@ vdev_disk_io_start(zio_t *zio)
resid = zio->io_size;
off = 0;
while (resid != 0) {
- size = uimin(resid, MAXPHYS);
+ size = uimin(resid, dvd->vd_maxphys);
nbp = getiobuf(vp, true);
nbp->b_blkno = btodb(zio->io_offset + off);
/* Below call increments v_numoutput. */
--Apple-Mail=_76D7F7A7-D423-4C90-88BD-B93A3CC03BA7--
--Apple-Mail=_9C6087D7-48D4-4178-8004-6A2C5146CA3E
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAlz6ggMACgkQKoaVJdNr
+uFBGAf/cvBVF9zxzGnpGKqKrmobUaz+4EBwC9FTZxf/3VrqRobO59REsM804cnE
eP2Xsdc8zbmHKmHBWsni4uH8DywF74IX5hr9o/LIAY+BbcSnDGxhiy9pCtqLKdBd
RlnRShuABsL044qw2Dpszzq+gW7GD94ekjYTpsukITI/YXf0Ujljz1cObOcuhzhc
yehkgwH7kmD8Q/Oef525/OLKb6tEas6/AvVsJhsTGqNGY3/8BGJYu55KMV7GunfT
5bPU76qGFummeGBaMbe23YCT3Zd/WncLHOrQroosDwY4vBMcTFe96Xi/VXssBJSx
dNWNxCkpkOog48TNVoMTo25O08JDCA==
=CRcn
-----END PGP SIGNATURE-----
--Apple-Mail=_9C6087D7-48D4-4178-8004-6A2C5146CA3E--
From: Jarle Greipsland <jarle.greipsland@norid.no>
To: gnats-bugs@netbsd.org, hannken@eis.cs.tu-bs.de
Cc:
Subject: Re: port-xen/54273: "zpool create pool xbd2" panics DOMU kernel
Date: Sat, 08 Jun 2019 16:34:55 +0200 (CEST)
"J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de> writes:
> Looks like a MAXPHYS issue, please try the attached diff.
[ ... ]
The diff seems to have fixed the zpool problem. I can now create
a zfs pool.
After having created the standard /pool zfs pool with xbd2 as the
only device, I ran some zpool list, zpool status and other
investigative commands. I also ran a few zfs list/get commands,
but without having done a 'zfs create'. Then I left the system
idle for a while, and it panicked.
Console log:
login: Jun 8 11:41:08 vm-2129 su: jarle to root on /dev/pts/0
[ 15395.0600864] SLOW IO: zio timestamp 14390060087922ns, delta 1004999998514ns, last io 93740092365nspanic: I/O to pool 'pool' appears to be hung on vdev guid 4517737902492308706 at '/dev/xbd2'.
[ 15395.0600864] cpu0: Begin traceback...
[ 15395.0600864] vpanic() at netbsd:vpanic+0x143
[ 15395.0600864] snprintf() at netbsd:snprintf
[ 15395.0600864] vdev_cache_offset_compare() at zfs:vdev_cache_offset_compare
[ 15395.0600864] vdev_deadman() at zfs:vdev_deadman+0x31
[ 15395.0600864] spa_deadman_wq() at zfs:spa_deadman_wq+0xe2
[ 15395.0600864] workqueue_worker() at netbsd:workqueue_worker+0xce
[ 15395.0600864] cpu0: End traceback...
[ 15395.0600864] dumping to dev 142,17 (offset=4194303, size=0): not possible
[ 15395.0600864] rebooting...
-jarle
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54273 CVS commit: src/external/cddl/osnet/dist/uts/common/fs/zfs
Date: Tue, 11 Jun 2019 09:04:38 +0000
Module Name: src
Committed By: hannken
Date: Tue Jun 11 09:04:37 UTC 2019
Modified Files:
src/external/cddl/osnet/dist/uts/common/fs/zfs: vdev_disk.c
src/external/cddl/osnet/dist/uts/common/fs/zfs/sys: vdev_disk.h
Log Message:
Try to retrieve the per-disk maximum transfer size and use it instead
of MAXPHYS. Eagerly waiting for the merge of tls-maxphys.
Addresses PR port-xen/54273: "zpool create pool xbd2" panics DOMU kernel
To generate a diff of this commit:
cvs rdiff -u -r1.9 -r1.10 \
src/external/cddl/osnet/dist/uts/common/fs/zfs/vdev_disk.c
cvs rdiff -u -r1.3 -r1.4 \
src/external/cddl/osnet/dist/uts/common/fs/zfs/sys/vdev_disk.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54273 CVS commit: src/external/cddl/osnet/sys/kern
Date: Tue, 11 Jun 2019 09:05:34 +0000
Module Name: src
Committed By: hannken
Date: Tue Jun 11 09:05:33 UTC 2019
Modified Files:
src/external/cddl/osnet/sys/kern: taskq.c
Log Message:
There is no 1:1 relation between cv_signal() and cv_timedwait() as
the latter implicitly calls cv_signal() on error.
This leads to "tq_waiting > 0" with "tq_running == 0" and the
taskq stalls.
Change task_executor() to increment and decrement "tq_waiting"
and always check and run the queue after cv_timedwait().
Use mstohz(), fix timeout and sort includes.
Addresses PR port-xen/54273: "zpool create pool xbd2" panics DOMU kernel
To generate a diff of this commit:
cvs rdiff -u -r1.9 -r1.10 src/external/cddl/osnet/sys/kern/taskq.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: port-xen-maintainer->hannken
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Tue, 11 Jun 2019 09:17:27 +0000
Responsible-Changed-Why:
Take.
State-Changed-From-To: open->feedback
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Tue, 11 Jun 2019 09:17:27 +0000
State-Changed-Why:
Commited fixes.
Please try again with:
external/cddl/osnet/dist/uts/common/fs/zfs/vdev_disk.c 1.10
external/cddl/osnet/dist/uts/common/fs/zfs/sys/vdev_disk.h 1.4
external/cddl/osnet/sys/kern/taskq.c 1.10
From: Jarle Greipsland <jarle.greipsland@norid.no>
To: gnats-bugs@netbsd.org, hannken@NetBSD.org
Cc:
Subject: Re: port-xen/54273 ("zpool create pool xbd2" panics DOMU kernel)
Date: Wed, 12 Jun 2019 08:21:37 +0200 (CEST)
hannken@NetBSD.org writes:
> Synopsis: "zpool create pool xbd2" panics DOMU kernel
[ ... ]
> Please try again with:
>
> external/cddl/osnet/dist/uts/common/fs/zfs/vdev_disk.c 1.10
> external/cddl/osnet/dist/uts/common/fs/zfs/sys/vdev_disk.h 1.4
> external/cddl/osnet/sys/kern/taskq.c 1.10
The zpool I/O problems seems to be gone. Thank you!
However, zfs doesn't seem to be fully stable. I tried the following:
vm-2129# zpool list
no pools available
vm-2129# ls /etc/zfs/
vm-2129# zpool create pool xbd2
vm-2129# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
pool 31.8G 520K 31.7G - 0% 0% 1.00x ONLINE -
vm-2129# zfs create pool/test
vm-2129# dd if=/dev/zero of=/pool/test/f1 bs=1m count=32
32+0 records in
32+0 records out
33554432 bytes transferred in 0.052 secs (645277538 bytes/sec)
vm-2129# ls -l /pool/test/
total 32776
-rw-r--r-- 1 root wheel 33554432 Jun 12 08:13 f1
vm-2129# zfs snapshot pool/test@secondsago
vm-2129# touch /pool/test/f2
vm-2129# ls /pool/test/
f1 f2
vm-2129# zfs rollback pool/test@secondsago
vm-2129# ls /pool/test
f1
vm-2129# touch /pool/test/f2
and here the system panicked.
Console log:
[ 33217.0900893] panic: kernel diagnostic assertion "error == ENOENT" failed: file "/build/current/src/sys/kern/vfs_vnode.c", line 1422
[ 33217.0900893] cpu0: Begin traceback...
[ 33217.0900893] vpanic() at netbsd:vpanic+0x143
[ 33217.0900893] kern_assert() at netbsd:kern_assert+0x48
[ 33217.1000454] vcache_new() at netbsd:vcache_new+0x35a
[ 33217.1000454] zfs_mknode() at zfs:zfs_mknode+0x40
[ 33217.1000454] zfs_create.isra.17() at zfs:zfs_create.isra.17+0x37d
[ 33217.1000454] zfs_netbsd_create() at zfs:zfs_netbsd_create+0xed
[ 33217.1000454] VOP_CREATE() at netbsd:VOP_CREATE+0x3d
[ 33217.1000454] vn_open() at netbsd:vn_open+0x29b
[ 33217.1000454] do_open() at netbsd:do_open+0x103
[ 33217.1000454] do_sys_openat() at netbsd:do_sys_openat+0x8b
[ 33217.1000454] sys_open() at netbsd:sys_open+0x24
[ 33217.1000454] syscall() at netbsd:syscall+0x9c
[ 33217.1000454] --- syscall (number 5) ---
[ 33217.1000454] 71bc530429fa:
[ 33217.1000454] cpu0: End traceback...
[ 33217.1000454] dumping to dev 142,17 (offset=4194303, size=0): not possible
[ 33217.1000454] rebooting...
This looks not related to the zpool I/O problems.
-jarle
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/54273 ("zpool create pool xbd2" panics DOMU kernel)
Date: Thu, 13 Jun 2019 12:08:00 +0200
> The zpool I/O problems seems to be gone. Thank you!
>
> However, zfs doesn't seem to be fully stable. I tried the following:
<snip>
> This looks not related to the zpool I/O problems.
So this PR may be closed.
Live recv and rollback are known not working yet.
Please file a new PR to category kern.
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54273 CVS commit: src/external/cddl/osnet/dist/uts/common/fs/zfs
Date: Mon, 17 Jun 2019 08:09:58 +0000
Module Name: src
Committed By: hannken
Date: Mon Jun 17 08:09:57 UTC 2019
Modified Files:
src/external/cddl/osnet/dist/uts/common/fs/zfs: zfs_vfsops.c
Log Message:
Add native vfs_suspend()/vfs_resume() before and after
zfs_suspend_fs()/zfs_resume_fs() and get rid of dead "z_sa_hdl == NULL"
znodes before vfs_resume() to keep the vnode cache consistent.
Live rollback should work now.
PR port-xen/54273 ("zpool create pool xbd2" panics DOMU kernel)
To generate a diff of this commit:
cvs rdiff -u -r1.23 -r1.24 \
src/external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: feedback->closed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Mon, 17 Jun 2019 08:12:31 +0000
State-Changed-Why:
All issues resolved -- thanks for the report.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.