NetBSD Problem Report #48849

From www@NetBSD.org  Thu May 29 17:53:42 2014
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id DA41CA6515
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 29 May 2014 17:53:41 +0000 (UTC)
Message-Id: <20140529175340.5CF55A651E@mollari.NetBSD.org>
Date: Thu, 29 May 2014 17:53:40 +0000 (UTC)
From: prlw1@cam.ac.uk
Reply-To: prlw1@cam.ac.uk
To: gnats-bugs@NetBSD.org
Subject: root mirror raid fails on shutdown
X-Send-Pr-Version: www-1.0

>Number:         48849
>Category:       kern
>Synopsis:       root mirror raid fails on shutdown
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    hannken
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu May 29 17:55:00 +0000 2014
>Closed-Date:    Mon Jun 16 08:07:44 +0000 2014
>Last-Modified:  Mon Jun 16 08:07:44 +0000 2014
>Originator:     Patrick Welche
>Release:        -current/amd64 6.99.43
>Organization:
>Environment:
-current/amd64 6.99.43
>Description:
The bootable root partition is a raidframe mirror made from dk wedge components, i.e., the disks were set up with gpt rather than disklabel.

The problem is that the raid appears to work perfectly, but on shutdown it is marked as failed with an "IO Error". On boot, rebuilding the failed component to itself is always successful, and appears to work correctly again until shutdown, when once again it is marked as failed.

wd0,1
          64    14680192      1  GPT part - NetBSD RAIDFrame component
raid
        :dt=RAID:se#512:ns#128:nt#8:sc#1024:nc#14336:\
        :pa#2097152:oa#0:ta=4.2BSD:ba#0:fa#0:\
        :pc#14680064:oc#0:\
        :pd#14680064:od#0:\
        :pe#12582912:oe#2097152:te=4.2BSD:be#0:fe#0:

It is just the root partition which exhibits the problem
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 14680064
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Force

The other raid partitions are fine.

>How-To-Repeat:
Boot a -current/amd64 box with root on raid 1 built from wedges.

Note: I have seen this on the other box with wedges (eg /dev/dk0 component), but not on -current/amd64 boxen with ordinary disklabel components eg /dev/wd0e.
>Fix:

>Release-Note:

>Audit-Trail:
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/48849: root mirror raid fails on shutdown
Date: Thu, 29 May 2014 13:58:24 -0400

 On May 29,  5:55pm, prlw1@cam.ac.uk (prlw1@cam.ac.uk) wrote:
 -- Subject: kern/48849: root mirror raid fails on shutdown

 | The bootable root partition is a raidframe mirror made from dk wedge components, i.e., the disks were set up with gpt rather than disklabel.
 | 
 | The problem is that the raid appears to work perfectly, but on shutdown it is marked as failed with an "IO Error". On boot, rebuilding the failed component to itself is always successful, and appears to work correctly again until shutdown, when once again it is marked as failed.
 | 

 Do you have the messages of a DEBUG/DIAGNOSTIC kernel on shutdown?

 christos

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/48849: root mirror raid fails on shutdown
Date: Thu, 29 May 2014 19:18:56 +0100

 Root is on raid7a, made from /dev/dk0 and /dev/dk7
 ...
 unmounting 0xfffffe811e883008 / (/dev/raid7a)...
 ...
 raid5: detached
 dk13 at wd1 (df25cd9b-6326-11e3-8f70-10bf48bd3389) deleted
 dk13: detached
 dk12 at wd1 (df25cd92-6326-11e3-8f70-10bf48bd3389) deleted
 dk12: detached
 dk10 at wd1 (df25cd7f-6326-11e3-8f70-10bf48bd3389) deleted
 dk10: detached
 dk9 at wd1 (df25cd76-6326-11e3-8f70-10bf48bd3389) deleted
 dk9: detached
 dk8 at wd1 (df25cd6b-6326-11e3-8f70-10bf48bd3389) deleted
 dk8: detached
 dk6 at wd0 (80706d9e-e1f8-11e3-9080-10bf48bd3389) deleted
 dk6: detached
 dk5 at wd0 (80706d9b-e1f8-11e3-9080-10bf48bd3389) deleted
 dk5: detached
 dk3 at wd0 (80706d94-e1f8-11e3-9080-10bf48bd3389) deleted
 dk3: detached
 dk2 at wd0 (80706d90-e1f8-11e3-9080-10bf48bd3389) deleted
 dk2: detached
 dk1 at wd0 (80706d8c-e1f8-11e3-9080-10bf48bd3389) deleted
 dk1: detached
 unmounting 0xfffffe811b89b008 /home (/dev/cgd0a)...
 unmounting 0xfffffe811e883008 / (/dev/raid7a)...
 forcefully unmounting /home (/dev/cgd0a)...
 unmounting 0xfffffe811e883008 / (/dev/raid7a)...
 cgd0: detached
 raid4: detached
 raid4: detached
 dk11 at wd1 (df25cd88-6326-11e3-8f70-10bf48bd3389) deleted
 dk11: detached
 dk4 at wd0 (80706d97-e1f8-11e3-9080-10bf48bd3389) deleted
 dk4: detached
 unmounting 0xfffffe811e883008 / (/dev/raid7a)...
 forcefully unmounting / (/dev/raid7a)...
 raid7: IO Error.  Marking /dev/dk0 as failed.
 dk0 at wd0 (80706d87-e1f8-11e3-9080-10bf48bd3389) deleted
 wd0: detached
 atabus4: detached
 raid7: detached
 raid7: detached
 dk7 at wd1 (df25cd61-6326-11e3-8f70-10bf48bd3389) deleted
 dk7: detached
 wd1: detached

From: christos@zoulas.com (Christos Zoulas)
To: Patrick Welche <prlw1@cam.ac.uk>, gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
	netbsd-bugs@netbsd.org
Subject: Re: kern/48849: root mirror raid fails on shutdown
Date: Thu, 29 May 2014 15:46:17 -0400

 On May 29,  7:18pm, prlw1@cam.ac.uk (Patrick Welche) wrote:
 -- Subject: Re: kern/48849: root mirror raid fails on shutdown

 | Root is on raid7a, made from /dev/dk0 and /dev/dk7
 | ...
 | unmounting 0xfffffe811e883008 / (/dev/raid7a)...

 These:

 unmounting 0xfffffe811b89b008 /home (/dev/cgd0a)...
 unmounting 0xfffffe811e883008 / (/dev/raid7a)...
 forcefully unmounting /home (/dev/cgd0a)...
 unmounting 0xfffffe811e883008 / (/dev/raid7a)...
 cgd0: detached
 raid4: detached
 raid4: detached
 dk11 at wd1 (df25cd88-6326-11e3-8f70-10bf48bd3389) deleted
 dk11: detached
 dk4 at wd0 (80706d97-e1f8-11e3-9080-10bf48bd3389) deleted
 dk4: detached
 unmounting 0xfffffe811e883008 / (/dev/raid7a)...
 forcefully unmounting / (/dev/raid7a)...
 raid7: IO Error.  Marking /dev/dk0 as failed.

 I don't like these forcefully unmounting. We should figure out why they happen.

 christos

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48849: root mirror raid fails on shutdown
Date: Fri, 30 May 2014 12:26:09 +0200

 On 29 May 2014, at 21:50, Christos Zoulas <christos@zoulas.com> wrote:
 <snip>
 >=20
 > unmounting 0xfffffe811e883008 / (/dev/raid7a)...
 > cgd0: detached
 > raid4: detached
 > raid4: detached
 > dk11 at wd1 (df25cd88-6326-11e3-8f70-10bf48bd3389) deleted
 > dk11: detached
 > dk4 at wd0 (80706d97-e1f8-11e3-9080-10bf48bd3389) deleted
 > dk4: detached
 > unmounting 0xfffffe811e883008 / (/dev/raid7a)...
 > forcefully unmounting / (/dev/raid7a)...
 > raid7: IO Error.  Marking /dev/dk0 as failed.
 >=20
 > I don't like these forcefully unmounting. We should figure out why =
 they happen.

 Some files/directories are open during shutdown, while running
 shutdown for example:

 /lib/libc.so.12.190
 /lib/libcrypt.so.1.0
 /lib/libgcc_s.so.1.0
 /lib/libutil.so.7.21
 /libexec/ld.elf_so
 /root
 /sbin/halt
 /sbin/init

 Looks like the device node of a component of raid7 gets closed.
 Please try:

 RCS file: /cvsroot/src/sys/kern/vfs_vnode.c,v
 diff -p -u -2 -r1.36 vfs_vnode.c
 --- vfs_vnode.c 8 May 2014 08:21:53 -0000       1.36
 +++ vfs_vnode.c 30 May 2014 10:23:26 -0000
 @@ -979,6 +979,5 @@ vclean(vnode_t *vp)
 =20
         active =3D (vp->v_usecount > 1);
 -       doclose =3D ! (active && vp->v_type =3D=3D VBLK &&
 -           spec_node_getmountedfs(vp) !=3D NULL);
 +       doclose =3D ! (active && vp->v_type =3D=3D VBLK);
         mutex_exit(vp->v_interlock);

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/48849: root mirror raid fails on shutdown
Date: Fri, 30 May 2014 19:33:11 +0100

 With your patch, I no longer see any forceful unmounts, however the
 computer doesn't shut down either. I sprinkled a few printfs in dounmount(),
 and see:

 Entering dounmount /var (/dev/raid7e)...
 VFS_SYNC(/var) = 0
 VFS_UNMOUNT(/var) = 0 
 called vfs_hooks_unmount(/var)
 Successfully exiting dounmount /var (/dev/raid7e)...
 unmounting 0xfffffe811e883008 / (/dev/raid7a)...
 Entering dounmount / (/dev/raid7a)...
 VFS_SYNC(/) = 0
 VFS_UNMOUNT(/) = 16 
 unmounting 0xfffffe811e883008 / (/dev/raid7a)...
 Entering dounmount / (/dev/raid7a)...
 VFS_SYNC(/) = 0
 VFS_UNMOUNT(/) = 16 
 cd0: detached
 atapibus0: detached
 [other detached snipped]
 raid6: detached
 raid5: detached
 raid5: detached
 -- and this is where we hang

 Note there is no "exiting dounmount" for raid7a

 Breaking into ddb at this point

 Stopped in pid 0.5 (system) at  netbsd:breakpoint+0x5:  leave
 db{0}> bt
 breakpoint() at netbsd:breakpoint+0x5
 comintr() at netbsd:comintr+0x529
 Xintr_ioapic_edge1() at netbsd:Xintr_ioapic_edge1+0xea
 --- interrupt ---
 _kernel_lock() at netbsd:_kernel_lock+0x165
 intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x12
 Xintr_ioapic_level3() at netbsd:Xintr_ioapic_level3+0xf2
 --- interrupt ---
 _kernel_lock() at netbsd:_kernel_lock+0x165
 frag6_fasttimo() at netbsd:frag6_fasttimo+0x1a
 pffasttimo() at netbsd:pffasttimo+0x31
 callout_softclock() at netbsd:callout_softclock+0x1d0
 softint_dispatch() at netbsd:softint_dispatch+0xd3
 DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe80cd845ff0
 Xsoftintr() at netbsd:Xsoftintr+0x4f
 --- interrupt ---
 0:

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48849: root mirror raid fails on shutdown
Date: Wed, 4 Jun 2014 11:18:40 +0200

 --Apple-Mail=_75D36A34-10A6-4B2A-A8E7-2501FAD95608
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 Please try the attached diff.  It will

  Change dk_lookup() to return an anonymous vnode not associated with
  any file system.  Change all consumers of dk_lookup() to get the
  device from "v_rdev" instead of VOP_GETATTR() as specfs does not
  support VOP_GETATTR().  Devices obtained with dk_lookup() will no
  longer disappear on forced unmounts.

 Please make sure you have a full backup as this diff changes
 ccd, cgd, dm/lvm and raid, not all covered by anita tests.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)


 --Apple-Mail=_75D36A34-10A6-4B2A-A8E7-2501FAD95608
 Content-Disposition: attachment;
 	filename=patch.diff
 Content-Type: application/octet-stream;
 	name="patch.diff"
 Content-Transfer-Encoding: 7bit

 Index: sys/dev/ccd.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/ccd.c,v
 retrieving revision 1.148
 diff -p -u -4 -r1.148 ccd.c
 --- sys/dev/ccd.c	6 Apr 2014 00:56:39 -0000	1.148
 +++ sys/dev/ccd.c	4 Jun 2014 08:39:37 -0000
 @@ -120,8 +120,10 @@ __KERNEL_RCSID(0, "$NetBSD: ccd.c,v 1.14

  #include <dev/ccdvar.h>
  #include <dev/dkvar.h>

 +#include <miscfs/specfs/specdev.h> /* for v_rdev */
 +
  #if defined(CCDDEBUG) && !defined(DEBUG)
  #define DEBUG
  #endif

 @@ -291,9 +293,8 @@ ccdinit(struct ccd_softc *cs, char **cpa
      struct lwp *l)
  {
  	struct ccdcinfo *ci = NULL;
  	int ix;
 -	struct vattr va;
  	struct ccdgeom *ccg = &cs->sc_geom;
  	char *tmppath;
  	int error, path_alloced;
  	uint64_t psize, minsize;
 @@ -343,21 +344,9 @@ ccdinit(struct ccd_softc *cs, char **cpa

  		/*
  		 * XXX: Cache the component's dev_t.
  		 */
 -		vn_lock(vpp[ix], LK_SHARED | LK_RETRY);
 -		error = VOP_GETATTR(vpp[ix], &va, l->l_cred);
 -		VOP_UNLOCK(vpp[ix]);
 -		if (error != 0) {
 -#ifdef DEBUG
 -			if (ccddebug & (CCDB_FOLLOW|CCDB_INIT))
 -				printf("%s: %s: getattr failed %s = %d\n",
 -				    cs->sc_xname, ci->ci_path,
 -				    "error", error);
 -#endif
 -			goto out;
 -		}
 -		ci->ci_dev = va.va_rdev;
 +		ci->ci_dev = vpp[ix]->v_rdev;

  		/*
  		 * Get partition information for the component.
  		 */
 Index: sys/dev/cgd.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/cgd.c,v
 retrieving revision 1.87
 diff -p -u -4 -r1.87 cgd.c
 --- sys/dev/cgd.c	25 May 2014 19:23:49 -0000	1.87
 +++ sys/dev/cgd.c	4 Jun 2014 08:39:37 -0000
 @@ -54,8 +54,10 @@ __KERNEL_RCSID(0, "$NetBSD: cgd.c,v 1.87

  #include <dev/dkvar.h>
  #include <dev/cgdvar.h>

 +#include <miscfs/specfs/specdev.h> /* for v_rdev */
 +
  /* Entry Point Functions */

  void	cgdattach(int);

 @@ -808,9 +810,8 @@ static int
  cgdinit(struct cgd_softc *cs, const char *cpath, struct vnode *vp,
  	struct lwp *l)
  {
  	struct	disk_geom *dg;
 -	struct	vattr va;
  	int	ret;
  	char	*tmppath;
  	uint64_t psize;
  	unsigned secsize;
 @@ -825,15 +826,9 @@ cgdinit(struct cgd_softc *cs, const char
  		goto bail;
  	cs->sc_tpath = malloc(cs->sc_tpathlen, M_DEVBUF, M_WAITOK);
  	memcpy(cs->sc_tpath, tmppath, cs->sc_tpathlen);

 -	vn_lock(vp, LK_SHARED | LK_RETRY);
 -	ret = VOP_GETATTR(vp, &va, l->l_cred);
 -	VOP_UNLOCK(vp);
 -	if (ret != 0)
 -		goto bail;
 -
 -	cs->sc_tdev = va.va_rdev;
 +	cs->sc_tdev = vp->v_rdev;

  	if ((ret = getdisksize(vp, &psize, &secsize)) != 0)
  		goto bail;

 Index: sys/dev/dksubr.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/dksubr.c,v
 retrieving revision 1.50
 diff -p -u -4 -r1.50 dksubr.c
 --- sys/dev/dksubr.c	25 May 2014 19:23:49 -0000	1.50
 +++ sys/dev/dksubr.c	4 Jun 2014 08:39:37 -0000
 @@ -47,8 +47,9 @@ __KERNEL_RCSID(0, "$NetBSD: dksubr.c,v 1
  #include <sys/namei.h>
  #include <sys/module.h>

  #include <dev/dkvar.h>
 +#include <miscfs/specfs/specdev.h> /* for v_rdev */

  int	dkdebug = 0;

  #ifdef DEBUG
 @@ -620,9 +621,8 @@ int
  dk_lookup(struct pathbuf *pb, struct lwp *l, struct vnode **vpp)
  {
  	struct nameidata nd;
  	struct vnode *vp;
 -	struct vattr va;
  	int     error;

  	if (l == NULL)
  		return ESRCH;	/* Is ESRCH the best choice? */
 @@ -634,24 +634,31 @@ dk_lookup(struct pathbuf *pb, struct lwp
  		return error;
  	}

  	vp = nd.ni_vp;
 -	if ((error = VOP_GETATTR(vp, &va, l->l_cred)) != 0) {
 -		DPRINTF((DKDB_FOLLOW|DKDB_INIT),
 -		    ("dk_lookup: getattr error = %d\n", error));
 +	if (vp->v_type != VBLK) {
 +		error = ENOTBLK;
  		goto out;
  	}

 -	/* XXX: eventually we should handle VREG, too. */
 -	if (va.va_type != VBLK) {
 -		error = ENOTBLK;
 +	/* Reopen as anonymous vnode to protect against forced unmount. */
 +	if ((error = bdevvp(vp->v_rdev, vpp)) != 0)
  		goto out;
 +	VOP_UNLOCK(vp);
 +	if ((error = vn_close(vp, FREAD | FWRITE, l->l_cred)) != 0) {
 +		vrele(*vpp);
 +		return error;
 +	}
 +	if ((error = VOP_OPEN(*vpp, FREAD | FWRITE, l->l_cred)) != 0) {
 +		vrele(*vpp);
 +		return error;
  	}
 +	mutex_enter((*vpp)->v_interlock);
 +	(*vpp)->v_writecount++;
 +	mutex_exit((*vpp)->v_interlock);

 -	IFDEBUG(DKDB_VNODE, vprint("dk_lookup: vnode info", vp));
 +	IFDEBUG(DKDB_VNODE, vprint("dk_lookup: vnode info", *vpp));

 -	VOP_UNLOCK(vp);
 -	*vpp = vp;
  	return 0;
  out:
  	VOP_UNLOCK(vp);
  	(void) vn_close(vp, FREAD | FWRITE, l->l_cred);
 Index: sys/dev/dm/dm.h
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/dm/dm.h,v
 retrieving revision 1.25
 diff -p -u -4 -r1.25 dm.h
 --- sys/dev/dm/dm.h	9 Dec 2013 09:35:16 -0000	1.25
 +++ sys/dev/dm/dm.h	4 Jun 2014 08:39:38 -0000
 @@ -48,8 +48,10 @@
  #include <sys/device.h>
  #include <sys/disk.h>
  #include <sys/disklabel.h>

 +#include <miscfs/specfs/specdev.h> /* for v_rdev */
 +
  #include <prop/proplib.h>

  #define DM_MAX_TYPE_NAME 16
  #define DM_NAME_LEN 128
 Index: sys/dev/dm/dm_target_linear.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/dm/dm_target_linear.c,v
 retrieving revision 1.13
 diff -p -u -4 -r1.13 dm_target_linear.c
 --- sys/dev/dm/dm_target_linear.c	14 Oct 2011 09:23:30 -0000	1.13
 +++ sys/dev/dm/dm_target_linear.c	4 Jun 2014 08:39:38 -0000
 @@ -191,24 +191,16 @@ dm_target_linear_destroy(dm_table_entry_
  int
  dm_target_linear_deps(dm_table_entry_t * table_en, prop_array_t prop_array)
  {
  	dm_target_linear_config_t *tlc;
 -	struct vattr va;
 -
 -	int error;

  	if (table_en->target_config == NULL)
  		return ENOENT;

  	tlc = table_en->target_config;

 -	vn_lock(tlc->pdev->pdev_vnode, LK_SHARED | LK_RETRY);
 -	error = VOP_GETATTR(tlc->pdev->pdev_vnode, &va, curlwp->l_cred);
 -	VOP_UNLOCK(tlc->pdev->pdev_vnode);
 -	if (error != 0)
 -		return error;
 -
 -	prop_array_add_uint64(prop_array, (uint64_t) va.va_rdev);
 +	prop_array_add_uint64(prop_array,
 +	    (uint64_t) tlc->pdev->pdev_vnode->v_rdev);

  	return 0;
  }
  /*
 Index: sys/dev/dm/dm_target_snapshot.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/dm/dm_target_snapshot.c,v
 retrieving revision 1.15
 diff -p -u -4 -r1.15 dm_target_snapshot.c
 --- sys/dev/dm/dm_target_snapshot.c	14 Oct 2011 09:23:30 -0000	1.15
 +++ sys/dev/dm/dm_target_snapshot.c	4 Jun 2014 08:39:38 -0000
 @@ -347,35 +347,20 @@ int
  dm_target_snapshot_deps(dm_table_entry_t * table_en,
      prop_array_t prop_array)
  {
  	dm_target_snapshot_config_t *tsc;
 -	struct vattr va;
 -
 -	int error;

  	if (table_en->target_config == NULL)
  		return 0;

  	tsc = table_en->target_config;

 -	vn_lock(tsc->tsc_snap_dev->pdev_vnode, LK_SHARED | LK_RETRY);
 -	error = VOP_GETATTR(tsc->tsc_snap_dev->pdev_vnode, &va, curlwp->l_cred);
 -	VOP_UNLOCK(tsc->tsc_snap_dev->pdev_vnode);
 -	if (error != 0)
 -		return error;
 -
 -	prop_array_add_uint64(prop_array, (uint64_t) va.va_rdev);
 +	prop_array_add_uint64(prop_array,
 +	    (uint64_t) tsc->tsc_snap_dev->pdev_vnode->v_rdev);

  	if (tsc->tsc_persistent_dev) {
 -
 -		vn_lock(tsc->tsc_cow_dev->pdev_vnode, LK_SHARED | LK_RETRY);
 -		error = VOP_GETATTR(tsc->tsc_cow_dev->pdev_vnode, &va,
 -		    curlwp->l_cred);
 -		VOP_UNLOCK(tsc->tsc_cow_dev->pdev_vnode);
 -		if (error != 0)
 -			return error;
 -
 -		prop_array_add_uint64(prop_array, (uint64_t) va.va_rdev);
 +		prop_array_add_uint64(prop_array,
 +		    (uint64_t) tsc->tsc_cow_dev->pdev_vnode->v_rdev);

  	}
  	return 0;
  }
 Index: sys/dev/dm/dm_target_stripe.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/dm/dm_target_stripe.c,v
 retrieving revision 1.18
 diff -p -u -4 -r1.18 dm_target_stripe.c
 --- sys/dev/dm/dm_target_stripe.c	7 Aug 2012 16:11:11 -0000	1.18
 +++ sys/dev/dm/dm_target_stripe.c	4 Jun 2014 08:39:38 -0000
 @@ -318,25 +318,17 @@ int
  dm_target_stripe_deps(dm_table_entry_t * table_en, prop_array_t prop_array)
  {
  	dm_target_stripe_config_t *tsc;
  	dm_target_linear_config_t *tlc;
 -	struct vattr va;
 -
 -	int error;

  	if (table_en->target_config == NULL)
  		return ENOENT;

  	tsc = table_en->target_config;

  	TAILQ_FOREACH(tlc, &tsc->stripe_devs, entries) {
 -		vn_lock(tlc->pdev->pdev_vnode, LK_SHARED | LK_RETRY);
 -		error = VOP_GETATTR(tlc->pdev->pdev_vnode, &va, curlwp->l_cred);
 -		VOP_UNLOCK(tlc->pdev->pdev_vnode);
 -		if (error != 0)
 -			return error;
 -
 -		prop_array_add_uint64(prop_array, (uint64_t) va.va_rdev);
 +		prop_array_add_uint64(prop_array,
 +		    (uint64_t) tlc->pdev->pdev_vnode->v_rdev);
  	}

  	return 0;
  }
 Index: sys/dev/raidframe/rf_copyback.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/raidframe/rf_copyback.c,v
 retrieving revision 1.49
 diff -p -u -4 -r1.49 rf_copyback.c
 --- sys/dev/raidframe/rf_copyback.c	14 Oct 2011 09:23:30 -0000	1.49
 +++ sys/dev/raidframe/rf_copyback.c	4 Jun 2014 08:39:39 -0000
 @@ -82,8 +82,10 @@ rf_ConfigureCopyback(RF_ShutdownList_t *
  #include <sys/fcntl.h>
  #include <sys/vnode.h>
  #include <sys/namei.h> /* for pathbuf */

 +#include <miscfs/specfs/specdev.h> /* for v_rdev */
 +
  /* do a complete copyback */
  void
  rf_CopybackReconstructedData(RF_Raid_t *raidPtr)
  {
 @@ -95,9 +97,8 @@ rf_CopybackReconstructedData(RF_Raid_t *
  	char   *databuf;

  	struct pathbuf *dev_pb;
  	struct vnode *vp;
 -	struct vattr va;

  	int ac;

  	fcol = 0;
 @@ -159,22 +160,17 @@ rf_CopybackReconstructedData(RF_Raid_t *

  		/* Ok, so we can at least do a lookup... How about actually
  		 * getting a vp for it? */

 -		vn_lock(vp, LK_SHARED | LK_RETRY);
 -		retcode = VOP_GETATTR(vp, &va, curlwp->l_cred);
 -		VOP_UNLOCK(vp);
 -		if (retcode != 0)
 -			return;
  		retcode = rf_getdisksize(vp, &raidPtr->Disks[fcol]);
  		if (retcode) {
  			return;
  		}

  		raidPtr->raid_cinfo[fcol].ci_vp = vp;
 -		raidPtr->raid_cinfo[fcol].ci_dev = va.va_rdev;
 +		raidPtr->raid_cinfo[fcol].ci_dev = vp->v_rdev;

 -		raidPtr->Disks[fcol].dev = va.va_rdev;	/* XXX or the above? */
 +		raidPtr->Disks[fcol].dev = vp->v_rdev;	/* XXX or the above? */

  		/* we allow the user to specify that only a fraction of the
  		 * disks should be used this is just for debug:  it speeds up
  		 * the parity scan */
 Index: sys/dev/raidframe/rf_disks.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/raidframe/rf_disks.c,v
 retrieving revision 1.85
 diff -p -u -4 -r1.85 rf_disks.c
 --- sys/dev/raidframe/rf_disks.c	25 Mar 2014 16:19:14 -0000	1.85
 +++ sys/dev/raidframe/rf_disks.c	4 Jun 2014 08:39:39 -0000
 @@ -79,8 +79,9 @@ __KERNEL_RCSID(0, "$NetBSD: rf_disks.c,v
  #include <sys/fcntl.h>
  #include <sys/vnode.h>
  #include <sys/namei.h> /* for pathbuf */
  #include <sys/kauth.h>
 +#include <miscfs/specfs/specdev.h> /* for v_rdev */

  static int rf_AllocDiskStructures(RF_Raid_t *, RF_Config_t *);
  static void rf_print_label_status( RF_Raid_t *, int, char *,
  				  RF_ComponentLabel_t *);
 @@ -575,9 +576,8 @@ rf_ConfigureDisk(RF_Raid_t *raidPtr, cha
  {
  	char   *p;
  	struct pathbuf *pb;
  	struct vnode *vp;
 -	struct vattr va;
  	int     error;

  	p = rf_find_non_white(bf);
  	if (p[strlen(p) - 1] == '\n') {
 @@ -630,20 +630,14 @@ rf_ConfigureDisk(RF_Raid_t *raidPtr, cha
  	if (raidPtr->bytesPerSector == 0)
  		raidPtr->bytesPerSector = diskPtr->blockSize;

  	if (diskPtr->status == rf_ds_optimal) {
 -		vn_lock(vp, LK_SHARED | LK_RETRY);
 -		error = VOP_GETATTR(vp, &va, curlwp->l_cred);
 -		VOP_UNLOCK(vp);
 -		if (error != 0)
 -			return (error);
 -
  		raidPtr->raid_cinfo[col].ci_vp = vp;
 -		raidPtr->raid_cinfo[col].ci_dev = va.va_rdev;
 +		raidPtr->raid_cinfo[col].ci_dev = vp->v_rdev;

  		/* This component was not automatically configured */
  		diskPtr->auto_configured = 0;
 -		diskPtr->dev = va.va_rdev;
 +		diskPtr->dev = vp->v_rdev;

  		/* we allow the user to specify that only a fraction of the
  		 * disks should be used this is just for debug:  it speeds up
  		 * the parity scan */
 Index: sys/dev/raidframe/rf_reconstruct.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/raidframe/rf_reconstruct.c,v
 retrieving revision 1.119
 diff -p -u -4 -r1.119 rf_reconstruct.c
 --- sys/dev/raidframe/rf_reconstruct.c	6 Mar 2013 11:38:15 -0000	1.119
 +++ sys/dev/raidframe/rf_reconstruct.c	4 Jun 2014 08:39:39 -0000
 @@ -46,8 +46,10 @@ __KERNEL_RCSID(0, "$NetBSD: rf_reconstru
  #include <sys/vnode.h>
  #include <sys/namei.h> /* for pathbuf */
  #include <dev/raidframe/raidframevar.h>

 +#include <miscfs/specfs/specdev.h> /* for v_rdev */
 +
  #include "rf_raid.h"
  #include "rf_reconutil.h"
  #include "rf_revent.h"
  #include "rf_reconbuffer.h"
 @@ -351,9 +353,8 @@ rf_ReconstructInPlace(RF_Raid_t *raidPtr
  	uint64_t numsec;
  	unsigned int secsize;
  	struct pathbuf *pb;
  	struct vnode *vp;
 -	struct vattr va;
  	int retcode;
  	int ac;

  	rf_lock_mutex2(raidPtr->mutex);
 @@ -455,20 +456,8 @@ rf_ReconstructInPlace(RF_Raid_t *raidPtr

  	/* Ok, so we can at least do a lookup...
  	   How about actually getting a vp for it? */

 -	vn_lock(vp, LK_SHARED | LK_RETRY);
 -	retcode = VOP_GETATTR(vp, &va, curlwp->l_cred);
 -	VOP_UNLOCK(vp);
 -	if (retcode != 0) {
 -		vn_close(vp, FREAD | FWRITE, kauth_cred_get());
 -		rf_lock_mutex2(raidPtr->mutex);
 -		raidPtr->reconInProgress--;
 -		rf_signal_cond2(raidPtr->waitForReconCond);
 -		rf_unlock_mutex2(raidPtr->mutex);
 -		return(retcode);
 -	}
 -
  	retcode = getdisksize(vp, &numsec, &secsize);
  	if (retcode) {
  		vn_close(vp, FREAD | FWRITE, kauth_cred_get());
  		rf_lock_mutex2(raidPtr->mutex);
 @@ -481,11 +470,11 @@ rf_ReconstructInPlace(RF_Raid_t *raidPtr
  	raidPtr->Disks[col].blockSize =	secsize;
  	raidPtr->Disks[col].numBlocks = numsec - rf_protectedSectors;

  	raidPtr->raid_cinfo[col].ci_vp = vp;
 -	raidPtr->raid_cinfo[col].ci_dev = va.va_rdev;
 +	raidPtr->raid_cinfo[col].ci_dev = vp->v_rdev;

 -	raidPtr->Disks[col].dev = va.va_rdev;
 +	raidPtr->Disks[col].dev = vp->v_rdev;

  	/* we allow the user to specify that only a fraction
  	   of the disks should be used this is just for debug:
  	   it speeds up * the parity scan */

 --Apple-Mail=_75D36A34-10A6-4B2A-A8E7-2501FAD95608--

Responsible-Changed-From-To: kern-bug-people->hannken
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Mon, 09 Jun 2014 14:21:57 +0000
Responsible-Changed-Why:
Take.


State-Changed-From-To: open->analyzed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Mon, 09 Jun 2014 14:21:57 +0000
State-Changed-Why:
Added a diff that should solve this problem.

Please test and report back.


From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/48849: root mirror raid fails on shutdown
Date: Thu, 12 Jun 2014 19:03:03 +0100

 Your patch cures the problem: raid7 no longer is failed on shutdown.

 With your patch, all raidframe and cgd are successfully found and configured.
 raidctl -R /dev/dk0 raid7   is successful
 On shutdown:

 unmounting 0xfffffe810e750008 / (/dev/raid7a)...
 Entering dounmount / (/dev/raid7a)...
 VFS_SYNC(/) = 0
 VFS_UNMOUNT(/) = 16 
 unmounting 0xfffffe821cbc8008 /home (/dev/cgd0a)...
 Entering dounmount /home (/dev/cgd0a)...
 VFS_SYNC(/home) = 0
 VFS_UNMOUNT(/home) = 16 
 unmounting 0xfffffe810e750008 / (/dev/raid7a)...
 Entering dounmount / (/dev/raid7a)...
 VFS_SYNC(/) = 0
 VFS_UNMOUNT(/) = 16 
 cd0: detached
 ...
 dk1: detached
 unmounting 0xfffffe821cbc8008 /home (/dev/cgd0a)...
 Entering dounmount /home (/dev/cgd0a)...
 VFS_SYNC(/home) = 0
 VFS_UNMOUNT(/home) = 16 
 unmounting 0xfffffe810e750008 / (/dev/raid7a)...
 Entering dounmount / (/dev/raid7a)...
 VFS_SYNC(/) = 0
 VFS_UNMOUNT(/) = 16 
 forcefully unmounting /home (/dev/cgd0a)...
 Entering dounmount /home (/dev/cgd0a)...
 VFS_SYNC(/home) = 0
 force: tag VT_UFS, ino 367616, on dev 20, 0 flags 0x0, nlink 62
         mode 040755, owner 2171, group 0, size 9216
 VFS_UNMOUNT(/home) = 0 forced
 called vfs_hooks_unmount(/home)
 Successfully exiting dounmount /home (/dev/cgd0a)...
 unmounting 0xfffffe810e750008 / (/dev/raid7a)...
 Entering dounmount / (/dev/raid7a)...
 VFS_SYNC(/) = 0
 VFS_UNMOUNT(/) = 16 
 cgd0: detached
 ...
 dk4: detached
 unmounting 0xfffffe810e750008 / (/dev/raid7a)...
 Entering dounmount / (/dev/raid7a)...
 VFS_SYNC(/) = 0
 VFS_UNMOUNT(/) = 16 
 forcefully unmounting / (/dev/raid7a)...
 Entering dounmount / (/dev/raid7a)...
 VFS_SYNC(/) = 0
 force: tag VT_UFS, ino 2, on dev 18, 112 flags 0x0, nlink 31
         mode 040755, owner 0, group 0, size 1024
 force: tag VT_UFS, ino 107681, on dev 18, 112 flags 0x0, nlink 1
         mode 0100555, owner 0, group 0, size 30818
 force: tag VT_UFS, ino 107611, on dev 18, 112 flags 0x0, nlink 1
         mode 0100555, owner 0, group 0, size 92163
 force: tag VT_UFS, ino 43209, on dev 18, 112 flags 0x0, nlink 1
         mode 0100444, owner 0, group 0, size 105508
 force: tag VT_UFS, ino 43185, on dev 18, 112 flags 0x0, nlink 1
         mode 0100444, owner 0, group 0, size 31017
 force: tag VT_UFS, ino 43053, on dev 18, 112 flags 0x0, nlink 1
         mode 0100444, owner 0, group 0, size 55263
 force: tag VT_UFS, ino 43180, on dev 18, 112 flags 0x0, nlink 1
         mode 0100444, owner 0, group 0, size 1611473
 force: tag VT_UFS, ino 107676, on dev 18, 112 flags 0x0, nlink 3
         mode 0100555, owner 0, group 0, size 13022
 VFS_UNMOUNT(/) = 0 forced
 called vfs_hooks_unmount(/)
 Successfully exiting dounmount / (/dev/raid7a)...
 raid7: detached
 raid7: detached
 dk7 at wd1 (df25cd61-6326-11e3-8f70-10bf48bd3389) deleted
 dk7: detached
 dk0 at wd0 (80706d87-e1f8-11e3-9080-10bf48bd3389) deleted
 dk0: detached
 wd1: detached
 wd0: detached
 atabus5: detached
 atabus4: detached
 acpi0: entering state S5

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48849 CVS commit: src/sys
Date: Sat, 14 Jun 2014 07:39:01 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Sat Jun 14 07:39:01 UTC 2014

 Modified Files:
 	src/sys/dev: ccd.c cgd.c dksubr.c
 	src/sys/dev/dm: dm.h dm_target_linear.c dm_target_snapshot.c
 	    dm_target_stripe.c
 	src/sys/dev/raidframe: rf_copyback.c rf_disks.c rf_reconstruct.c
 	src/sys/sys: param.h

 Log Message:
 Change dk_lookup() to return an anonymous vnode not associated with
 any file system.  Change all consumers of dk_lookup() to get the
 device from "v_rdev" instead of VOP_GETATTR() as specfs does not
 support VOP_GETATTR().  Devices obtained with dk_lookup() will no
 longer disappear on forced unmounts.

 Fix for PR kern/48849 (root mirror raid fails on shutdown)

 Welcome to 6.99.44


 To generate a diff of this commit:
 cvs rdiff -u -r1.148 -r1.149 src/sys/dev/ccd.c
 cvs rdiff -u -r1.87 -r1.88 src/sys/dev/cgd.c
 cvs rdiff -u -r1.50 -r1.51 src/sys/dev/dksubr.c
 cvs rdiff -u -r1.25 -r1.26 src/sys/dev/dm/dm.h
 cvs rdiff -u -r1.13 -r1.14 src/sys/dev/dm/dm_target_linear.c
 cvs rdiff -u -r1.15 -r1.16 src/sys/dev/dm/dm_target_snapshot.c
 cvs rdiff -u -r1.18 -r1.19 src/sys/dev/dm/dm_target_stripe.c
 cvs rdiff -u -r1.49 -r1.50 src/sys/dev/raidframe/rf_copyback.c
 cvs rdiff -u -r1.85 -r1.86 src/sys/dev/raidframe/rf_disks.c
 cvs rdiff -u -r1.119 -r1.120 src/sys/dev/raidframe/rf_reconstruct.c
 cvs rdiff -u -r1.453 -r1.454 src/sys/sys/param.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->closed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Mon, 16 Jun 2014 08:07:44 +0000
State-Changed-Why:
Fix committed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.