NetBSD Problem Report #54273

From jarle@festningen.uninett.no  Wed Jun  5 17:18:02 2019
Return-Path: <jarle@festningen.uninett.no>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E32887A187
	for <gnats-bugs@gnats.NetBSD.org>; Wed,  5 Jun 2019 17:18:02 +0000 (UTC)
Message-Id: <20190605171753.89232170493E@festningen.uninett.no>
Date: Wed,  5 Jun 2019 19:17:53 +0200 (CEST)
From: jarle@norid.no
Reply-To: jarle@norid.no
To: gnats-bugs@NetBSD.org
Subject: "zpool create pool xbd2" panics DOMU kernel
X-Send-Pr-Version: 3.95

>Number:         54273
>Category:       port-xen
>Synopsis:       "zpool create pool xbd2" panics DOMU kernel
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    hannken
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jun 05 17:20:00 +0000 2019
>Closed-Date:    Mon Jun 17 08:12:31 +0000 2019
>Last-Modified:  Mon Jun 17 08:12:31 +0000 2019
>Originator:     Jarle Fredrik Greipsland
>Release:        NetBSD 8.99.42 from 2019-06-03
>Organization:

>Environment:


System: NetBSD vm-2129 8.99.42 NetBSD 8.99.42 (XEN3_DOMU) #2: Tue Jun  4 09:31:29 CEST 2019  jarle@nbuilder:/build/current/amd64/obj/sys/arch/amd64/compile/XEN3_DOMU amd64
Architecture: x86_64
Machine: amd64
>Description:

A VM configured with xbd0 and xbd1 as root and swap, and then with
xbd2, xbd3 and xbd4 as separate block devices for zfs experimentation.
All the xbd* devices are freshly created logical volumes in an lvm
volume group on the hypervisor.

zfs has not previously been configured on this VM.  I ran the command:

vm-2129# zpool create pool xbd2

in an xterm window, and then on the console, the following panic
message was printed:

[ 274.4901995] panic: kernel diagnostic assertion "seg <= BLKIF_MAX_SEGMENTS_PER_REQUEST" failed: file "/build/current/src/sys/arch/xen/xen/xbd_xenbus.c", line 1032 
[ 274.4901995] cpu0: Begin traceback...
[ 274.4901995] vpanic() at netbsd:vpanic+0x143
[ 274.4901995] kern_assert() at netbsd:kern_assert+0x48
[ 274.4901995] xbd_diskstart() at netbsd:xbd_diskstart+0x3c7
[ 274.4901995] dk_start() at netbsd:dk_start+0xd9
[ 274.4901995] bdev_strategy() at netbsd:bdev_strategy+0x72
[ 274.4901995] spec_strategy() at netbsd:spec_strategy+0x96
[ 274.4901995] VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x54
[ 274.4901995] vdev_disk_io_start() at zfs:vdev_disk_io_start+0x124
[ 274.4901995] zio_vdev_io_start() at zfs:zio_vdev_io_start+0x12e
[ 274.4901995] zio_execute() at zfs:zio_execute+0xb0
[ 274.4901995] zio_nowait() at zfs:zio_nowait+0x5c
[ 274.4901995] vdev_label_read_config() at zfs:vdev_label_read_config+0xc8
[ 274.4901995] vdev_label_init() at zfs:vdev_label_init+0x4cc
[ 274.4901995] vdev_label_init() at zfs:vdev_label_init+0x47
[ 274.4901995] vdev_create() at zfs:vdev_create+0x5a
[ 274.5001272] spa_create() at zfs:spa_create+0x28c
[ 274.5001272] zfs_ioc_pool_create() at zfs:zfs_ioc_pool_create+0x19b
[ 274.5001272] zfsdev_ioctl() at zfs:zfsdev_ioctl+0x265
[ 274.5001272] nb_zfsdev_ioctl() at zfs:nb_zfsdev_ioctl+0x38
[ 274.5001272] VOP_IOCTL() at netbsd:VOP_IOCTL+0x3b
[ 274.5001272] vn_ioctl() at netbsd:vn_ioctl+0xa5
[ 274.5001272] sys_ioctl() at netbsd:sys_ioctl+0x547
[ 274.5001272] syscall() at netbsd:syscall+0x9c
[ 274.5001272] --- syscall (number 54) ---
[ 274.5001272] 7b61eb79534a:
[ 274.5001272] cpu0: End traceback...

[ 274.5001272] dumping to dev 142,17 (offset=4194303, size=0): not possible
[ 274.5001272] rebooting...

and the VM rebooted.


>How-To-Repeat:

>Fix:


>Release-Note:

>Audit-Trail:
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/54273: "zpool create pool xbd2" panics DOMU kernel
Date: Fri, 7 Jun 2019 17:25:55 +0200

 --Apple-Mail=_9C6087D7-48D4-4178-8004-6A2C5146CA3E
 Content-Type: multipart/mixed;
 	boundary="Apple-Mail=_76D7F7A7-D423-4C90-88BD-B93A3CC03BA7"


 --Apple-Mail=_76D7F7A7-D423-4C90-88BD-B93A3CC03BA7
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 Looks like a MAXPHYS issue, please try the attached diff.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig


 --Apple-Mail=_76D7F7A7-D423-4C90-88BD-B93A3CC03BA7
 Content-Disposition: attachment;
 	filename=vd_maxphys.diff
 Content-Type: application/octet-stream;
 	x-unix-mode=0644;
 	name="vd_maxphys.diff"
 Content-Transfer-Encoding: 7bit

 vd_maxphys

 Try to retrieve the per-disk maximum transfer size and use it instead
 of MAXPHYS.  Eagerly waiting for the merge of tls-maxphys.

 diff -r 008151456b52 -r c1c8cf0fb84d external/cddl/osnet/dist/uts/common/fs/zfs/sys/vdev_disk.h
 --- external/cddl/osnet/dist/uts/common/fs/zfs/sys/vdev_disk.h
 +++ external/cddl/osnet/dist/uts/common/fs/zfs/sys/vdev_disk.h
 @@ -52,6 +52,7 @@ typedef struct vdev_disk {
  	char            *vd_minor;
  	vnode_t         *vd_vp;
  	struct workqueue *vd_wq;
 +	int		vd_maxphys;
  #endif
  } vdev_disk_t;
  #endif
 diff -r 008151456b52 -r c1c8cf0fb84d external/cddl/osnet/dist/uts/common/fs/zfs/vdev_disk.c
 --- external/cddl/osnet/dist/uts/common/fs/zfs/vdev_disk.c
 +++ external/cddl/osnet/dist/uts/common/fs/zfs/vdev_disk.c
 @@ -219,6 +219,27 @@ vdev_disk_open(vdev_t *vd, uint64_t *psi
  		return (SET_ERROR(EINVAL));
  	}

 +	/* XXXNETBSD Once tls-maxphys gets merged this block becomes:
 +		pdk = disk_find_blk(vp->v_rdev);
 +		dvd->vd_maxphys = (pdk ? disk_maxphys(pdk) : MACHINE_MAXPHYS);
 +	*/
 +	{
 +		struct buf buf = { b_bcount: MAXPHYS };
 +		const char *dev_name;
 +
 +		dev_name = devsw_blk2name(major(vp->v_rdev));
 +		if (dev_name) {
 +			char disk_name[16];
 +
 +			snprintf(disk_name, sizeof(disk_name), "%s%d",
 +			    dev_name, DISKUNIT(vp->v_rdev));
 +			pdk = disk_find(disk_name);
 +			if (pdk && pdk->dk_driver && pdk->dk_driver->d_minphys)
 +				(*pdk->dk_driver->d_minphys)(&buf);
 +		}
 +		dvd->vd_maxphys = buf.b_bcount;
 +	}
 +
  	/*
  	 * XXXNETBSD Compare the devid to the stored value.
  	 */
 @@ -421,6 +442,7 @@ vdev_disk_io_start(zio_t *zio)
  		zio_interrupt(zio);
  		return;
  	}
 +	ASSERT3U(dvd->vd_maxphys, >, 0);
  	vp = dvd->vd_vp;
  #endif

 @@ -473,7 +495,7 @@ vdev_disk_io_start(zio_t *zio)
  		mutex_exit(vp->v_interlock);
  	}

 -	if (bp->b_bcount <= MAXPHYS) {
 +	if (bp->b_bcount <= dvd->vd_maxphys) {
  		/* We can do this I/O in one pass. */
  		(void)VOP_STRATEGY(vp, bp);
  	} else {
 @@ -484,7 +506,7 @@ vdev_disk_io_start(zio_t *zio)
  		resid = zio->io_size;
  		off = 0;
  		while (resid != 0) {
 -			size = uimin(resid, MAXPHYS);
 +			size = uimin(resid, dvd->vd_maxphys);
  			nbp = getiobuf(vp, true);
  			nbp->b_blkno = btodb(zio->io_offset + off);
  			/* Below call increments v_numoutput. */

 --Apple-Mail=_76D7F7A7-D423-4C90-88BD-B93A3CC03BA7--

 --Apple-Mail=_9C6087D7-48D4-4178-8004-6A2C5146CA3E
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----

 iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAlz6ggMACgkQKoaVJdNr
 +uFBGAf/cvBVF9zxzGnpGKqKrmobUaz+4EBwC9FTZxf/3VrqRobO59REsM804cnE
 eP2Xsdc8zbmHKmHBWsni4uH8DywF74IX5hr9o/LIAY+BbcSnDGxhiy9pCtqLKdBd
 RlnRShuABsL044qw2Dpszzq+gW7GD94ekjYTpsukITI/YXf0Ujljz1cObOcuhzhc
 yehkgwH7kmD8Q/Oef525/OLKb6tEas6/AvVsJhsTGqNGY3/8BGJYu55KMV7GunfT
 5bPU76qGFummeGBaMbe23YCT3Zd/WncLHOrQroosDwY4vBMcTFe96Xi/VXssBJSx
 dNWNxCkpkOog48TNVoMTo25O08JDCA==
 =CRcn
 -----END PGP SIGNATURE-----

 --Apple-Mail=_9C6087D7-48D4-4178-8004-6A2C5146CA3E--

From: Jarle Greipsland <jarle.greipsland@norid.no>
To: gnats-bugs@netbsd.org, hannken@eis.cs.tu-bs.de
Cc: 
Subject: Re: port-xen/54273: "zpool create pool xbd2" panics DOMU kernel
Date: Sat, 08 Jun 2019 16:34:55 +0200 (CEST)

 "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de> writes:
 >  Looks like a MAXPHYS issue, please try the attached diff.
 [ ... ]
 The diff seems to have fixed the zpool problem.  I can now create
 a zfs pool.

 After having created the standard /pool zfs pool with xbd2 as the
 only device, I ran some zpool list, zpool status and other
 investigative commands.  I also ran a few zfs list/get commands,
 but without having done a 'zfs create'.  Then I left the system
 idle for a while, and it panicked.

 Console log:

 login: Jun  8 11:41:08 vm-2129 su: jarle to root on /dev/pts/0
 [ 15395.0600864] SLOW IO: zio timestamp 14390060087922ns, delta 1004999998514ns, last io 93740092365nspanic: I/O to pool 'pool' appears to be hung on vdev guid 4517737902492308706 at '/dev/xbd2'.
 [ 15395.0600864] cpu0: Begin traceback...
 [ 15395.0600864] vpanic() at netbsd:vpanic+0x143
 [ 15395.0600864] snprintf() at netbsd:snprintf
 [ 15395.0600864] vdev_cache_offset_compare() at zfs:vdev_cache_offset_compare
 [ 15395.0600864] vdev_deadman() at zfs:vdev_deadman+0x31
 [ 15395.0600864] spa_deadman_wq() at zfs:spa_deadman_wq+0xe2
 [ 15395.0600864] workqueue_worker() at netbsd:workqueue_worker+0xce
 [ 15395.0600864] cpu0: End traceback...

 [ 15395.0600864] dumping to dev 142,17 (offset=4194303, size=0): not possible
 [ 15395.0600864] rebooting...

 					-jarle

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54273 CVS commit: src/external/cddl/osnet/dist/uts/common/fs/zfs
Date: Tue, 11 Jun 2019 09:04:38 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Tue Jun 11 09:04:37 UTC 2019

 Modified Files:
 	src/external/cddl/osnet/dist/uts/common/fs/zfs: vdev_disk.c
 	src/external/cddl/osnet/dist/uts/common/fs/zfs/sys: vdev_disk.h

 Log Message:
 Try to retrieve the per-disk maximum transfer size and use it instead
 of MAXPHYS.  Eagerly waiting for the merge of tls-maxphys.

 Addresses PR port-xen/54273: "zpool create pool xbd2" panics DOMU kernel


 To generate a diff of this commit:
 cvs rdiff -u -r1.9 -r1.10 \
     src/external/cddl/osnet/dist/uts/common/fs/zfs/vdev_disk.c
 cvs rdiff -u -r1.3 -r1.4 \
     src/external/cddl/osnet/dist/uts/common/fs/zfs/sys/vdev_disk.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54273 CVS commit: src/external/cddl/osnet/sys/kern
Date: Tue, 11 Jun 2019 09:05:34 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Tue Jun 11 09:05:33 UTC 2019

 Modified Files:
 	src/external/cddl/osnet/sys/kern: taskq.c

 Log Message:
 There is no 1:1 relation between cv_signal() and cv_timedwait() as
 the latter implicitly calls cv_signal() on error.

 This leads to "tq_waiting > 0" with "tq_running == 0" and the
 taskq stalls.

 Change task_executor() to increment and decrement "tq_waiting"
 and always check and run the queue after cv_timedwait().

 Use mstohz(), fix timeout and sort includes.

 Addresses PR port-xen/54273: "zpool create pool xbd2" panics DOMU kernel


 To generate a diff of this commit:
 cvs rdiff -u -r1.9 -r1.10 src/external/cddl/osnet/sys/kern/taskq.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: port-xen-maintainer->hannken
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Tue, 11 Jun 2019 09:17:27 +0000
Responsible-Changed-Why:
Take.


State-Changed-From-To: open->feedback
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Tue, 11 Jun 2019 09:17:27 +0000
State-Changed-Why:
Commited fixes.

Please try again with:

external/cddl/osnet/dist/uts/common/fs/zfs/vdev_disk.c 1.10
external/cddl/osnet/dist/uts/common/fs/zfs/sys/vdev_disk.h 1.4
external/cddl/osnet/sys/kern/taskq.c 1.10


From: Jarle Greipsland <jarle.greipsland@norid.no>
To: gnats-bugs@netbsd.org, hannken@NetBSD.org
Cc: 
Subject: Re: port-xen/54273 ("zpool create pool xbd2" panics DOMU kernel)
Date: Wed, 12 Jun 2019 08:21:37 +0200 (CEST)

 hannken@NetBSD.org writes:
 > Synopsis: "zpool create pool xbd2" panics DOMU kernel
 [ ... ]
 > Please try again with:
 > 
 > external/cddl/osnet/dist/uts/common/fs/zfs/vdev_disk.c 1.10
 > external/cddl/osnet/dist/uts/common/fs/zfs/sys/vdev_disk.h 1.4
 > external/cddl/osnet/sys/kern/taskq.c 1.10

 The zpool I/O problems seems to be gone.  Thank you!

 However, zfs doesn't seem to be fully stable.  I tried the following:

 vm-2129# zpool list
 no pools available
 vm-2129# ls /etc/zfs/
 vm-2129# zpool create pool xbd2
 vm-2129# zpool list
 NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
 pool  31.8G   520K  31.7G         -     0%     0%  1.00x  ONLINE  -
 vm-2129# zfs create pool/test
 vm-2129# dd if=/dev/zero of=/pool/test/f1 bs=1m count=32
 32+0 records in
 32+0 records out
 33554432 bytes transferred in 0.052 secs (645277538 bytes/sec)
 vm-2129# ls -l /pool/test/
 total 32776
 -rw-r--r--  1 root  wheel  33554432 Jun 12 08:13 f1
 vm-2129# zfs snapshot pool/test@secondsago
 vm-2129# touch /pool/test/f2
 vm-2129# ls /pool/test/
 f1 f2
 vm-2129# zfs rollback pool/test@secondsago
 vm-2129# ls /pool/test
 f1
 vm-2129# touch /pool/test/f2

 and here the system panicked.

 Console log:
 [ 33217.0900893] panic: kernel diagnostic assertion "error == ENOENT" failed: file "/build/current/src/sys/kern/vfs_vnode.c", line 1422 
 [ 33217.0900893] cpu0: Begin traceback...
 [ 33217.0900893] vpanic() at netbsd:vpanic+0x143
 [ 33217.0900893] kern_assert() at netbsd:kern_assert+0x48
 [ 33217.1000454] vcache_new() at netbsd:vcache_new+0x35a
 [ 33217.1000454] zfs_mknode() at zfs:zfs_mknode+0x40
 [ 33217.1000454] zfs_create.isra.17() at zfs:zfs_create.isra.17+0x37d
 [ 33217.1000454] zfs_netbsd_create() at zfs:zfs_netbsd_create+0xed
 [ 33217.1000454] VOP_CREATE() at netbsd:VOP_CREATE+0x3d
 [ 33217.1000454] vn_open() at netbsd:vn_open+0x29b
 [ 33217.1000454] do_open() at netbsd:do_open+0x103
 [ 33217.1000454] do_sys_openat() at netbsd:do_sys_openat+0x8b
 [ 33217.1000454] sys_open() at netbsd:sys_open+0x24
 [ 33217.1000454] syscall() at netbsd:syscall+0x9c
 [ 33217.1000454] --- syscall (number 5) ---
 [ 33217.1000454] 71bc530429fa:
 [ 33217.1000454] cpu0: End traceback...

 [ 33217.1000454] dumping to dev 142,17 (offset=4194303, size=0): not possible
 [ 33217.1000454] rebooting...

 This looks not related to the zpool I/O problems.

 					-jarle

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/54273 ("zpool create pool xbd2" panics DOMU kernel)
Date: Thu, 13 Jun 2019 12:08:00 +0200

 > The zpool I/O problems seems to be gone.  Thank you!
 > 
 > However, zfs doesn't seem to be fully stable.  I tried the following:

 <snip>

 > This looks not related to the zpool I/O problems.

 So this PR may be closed.

 Live recv and rollback are known not working yet.
 Please file a new PR to category kern.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54273 CVS commit: src/external/cddl/osnet/dist/uts/common/fs/zfs
Date: Mon, 17 Jun 2019 08:09:58 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Mon Jun 17 08:09:57 UTC 2019

 Modified Files:
 	src/external/cddl/osnet/dist/uts/common/fs/zfs: zfs_vfsops.c

 Log Message:
 Add native vfs_suspend()/vfs_resume() before and after
 zfs_suspend_fs()/zfs_resume_fs() and get rid of dead "z_sa_hdl == NULL"
 znodes before vfs_resume() to keep the vnode cache consistent.

 Live rollback should work now.

 PR port-xen/54273 ("zpool create pool xbd2" panics DOMU kernel)


 To generate a diff of this commit:
 cvs rdiff -u -r1.23 -r1.24 \
     src/external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: feedback->closed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Mon, 17 Jun 2019 08:12:31 +0000
State-Changed-Why:
All issues resolved -- thanks for the report.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.