NetBSD Problem Report #44972

From yamt@NetBSD.org  Mon May 16 21:58:30 2011
Return-Path: <yamt@NetBSD.org>
Received: by www.NetBSD.org (Postfix, from userid 1270)
	id A668463B8AC; Mon, 16 May 2011 21:58:30 +0000 (UTC)
Message-Id: <20110516215830.A668463B8AC@www.NetBSD.org>
Date: Mon, 16 May 2011 21:58:30 +0000 (UTC)
From: yamt@NetBSD.org
Reply-To: yamt@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: raidctl -R doesn't seem to work
X-Send-Pr-Version: 3.95

>Number:         44972
>Category:       kern
>Synopsis:       raidctl -R doesn't seem to work
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon May 16 22:00:00 +0000 2011
>Last-Modified:  Wed Aug 03 15:05:04 +0000 2011
>Originator:     YAMAMOTO Takashi
>Release:        NetBSD current
>Organization:

>Environment:


System: NetBSD current
Architecture: x86_64
Machine: amd64
>Description:
	raidctl -R doesn't start reconstruction.

>How-To-Repeat:
	after raidctl -C, raidctl -I, and raidctl -i, do the following.

ushi% sudo raidctl -s raid1
Components:
            /dev/dk0: optimal
            /dev/dk1: optimal
No spares.
Component label for /dev/dk0:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2011051702, Mod Counter: 135
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 1565564672
   RAID Level: 1
   Autoconfig: No
   Root partition: No
   Last configured as: raid1
Component label for /dev/dk1:
   Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2011051702, Mod Counter: 135
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 1565564672
   RAID Level: 1
   Autoconfig: No
   Root partition: No
   Last configured as: raid1
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 0% complete.
Copyback is 100% complete.
ushi% sudo raidctl -f /dev/dk1 raid1	# i did this during the raid init 
ushi% sudo raidctl -s raid1
Components:
            /dev/dk0: optimal
            /dev/dk1: failed
No spares.
Component label for /dev/dk0:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2011051702, Mod Counter: 140
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 1565564672
   RAID Level: 1
   Autoconfig: No
   Root partition: No
   Last configured as: raid1
/dev/dk1 status is: failed.  Skipping label.
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
ushi% sudo raidctl -R /dev/dk1 raid1
ushi% sudo raidctl -s raid1
Components:
            /dev/dk0: optimal
            /dev/dk1: failed
No spares.
Component label for /dev/dk0:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2011051702, Mod Counter: 144
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 1565564672
   RAID Level: 1
   Autoconfig: No
   Root partition: No
   Last configured as: raid1
/dev/dk1 status is: failed.  Skipping label.
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
ushi% sudo raidctl -R /dev/dk1 raid1
ushi% sudo raidctl -s raid1
Components:
            /dev/dk0: optimal
            /dev/dk1: failed
No spares.
Component label for /dev/dk0:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2011051702, Mod Counter: 148
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 1565564672
   RAID Level: 1
   Autoconfig: No
   Root partition: No
   Last configured as: raid1
/dev/dk1 status is: failed.  Skipping label.
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
ushi% dmesg|tail
raid1: RAID Level 1
raid1: Components: /dev/dk0 /dev/dk1
raid1: Total Sectors: 5860531968 (2861587 MB)
raid1: GPT GUID: 497d5c1c-7fff-11e0-b07b-0015170bebef
dk2 at raid1: 497d5c30-7fff-11e0-b07b-0015170bebef
dk2: 5860530911 blocks at 1024, type: ffs
Could not verify parity
raid1: Error re-writing parity (1)!
wd3: mbr partition exceeds disk size
raid1: rebuilding: dk_lookup on device: /dev/dk1 failed: 16!
ushi% 

>Fix:


>Audit-Trail:
From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/44972: raidctl -R doesn't seem to work
Date: Mon, 16 May 2011 16:04:13 -0600

 On Mon, 16 May 2011 22:00:01 +0000 (UTC)
 yamt@NetBSD.org wrote:

 > Could not verify parity
 > raid1: Error re-writing parity (1)!
 > wd3: mbr partition exceeds disk size
 > raid1: rebuilding: dk_lookup on device: /dev/dk1 failed: 16!

 Something(tm) didn't close /dev/dk1 properly, or /dev/dk1 was told to
 close, and didn't...   '-R' tells the device to close so the rebuild
 can (re-)open it again.  Something is amiss in there... 

 Later...

 Greg Oster

From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/44972: raidctl -R doesn't seem to work
Date: Mon, 16 May 2011 16:29:08 -0600

 On Mon, 16 May 2011 22:00:01 +0000 (UTC)
 yamt@NetBSD.org wrote:

 > ushi% dmesg|tail
 > raid1: RAID Level 1
 > raid1: Components: /dev/dk0 /dev/dk1
 > raid1: Total Sectors: 5860531968 (2861587 MB)
 > raid1: GPT GUID: 497d5c1c-7fff-11e0-b07b-0015170bebef
 > dk2 at raid1: 497d5c30-7fff-11e0-b07b-0015170bebef
 > dk2: 5860530911 blocks at 1024, type: ffs
 > Could not verify parity
 > raid1: Error re-writing parity (1)!
 > wd3: mbr partition exceeds disk size
 > raid1: rebuilding: dk_lookup on device: /dev/dk1 failed: 16!
 > ushi% 

 in dksubr.c in dk_open() we have:

         if (dk->dk_nwedges != 0 && part != RAW_PART) {
                 ret = EBUSY;
                 goto done;
         }

 What part of those conditions are true, triggering the EBUSY
 for /dev/dk1 ?

 Later...

 Greg Oster

From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/44972: raidctl -R doesn't seem to work
Date: Mon, 16 May 2011 16:10:33 -0600

 On Mon, 16 May 2011 22:05:04 +0000 (UTC)
 Greg Oster <oster@cs.usask.ca> wrote:

 > The following reply was made to PR kern/44972; it has been noted by
 > GNATS.
 > 
 > From: Greg Oster <oster@cs.usask.ca>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/44972: raidctl -R doesn't seem to work
 > Date: Mon, 16 May 2011 16:04:13 -0600
 > 
 >  On Mon, 16 May 2011 22:00:01 +0000 (UTC)
 >  yamt@NetBSD.org wrote:
 >  
 >  > Could not verify parity
 >  > raid1: Error re-writing parity (1)!
 >  > wd3: mbr partition exceeds disk size
 >  > raid1: rebuilding: dk_lookup on device: /dev/dk1 failed: 16!
 >  
 >  Something(tm) didn't close /dev/dk1 properly, or /dev/dk1 was told to
 >  close, and didn't...   '-R' tells the device to close so the rebuild
 >  can (re-)open it again.  Something is amiss in there... 

 Hmm.. this is a non-autoconfigured set... so the call to
 rf_close_component() should be doing a vn_close() on the vp associated
 with /dev/dk1.

 Later...

 Greg Oster

From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, yamt@NetBSD.org
Subject: Re: kern/44972: raidctl -R doesn't seem to work
Date: Fri, 20 May 2011 09:45:25 +0000 (UTC)

 hi,

 >  > Could not verify parity
 >  > raid1: Error re-writing parity (1)!
 >  > wd3: mbr partition exceeds disk size
 >  > raid1: rebuilding: dk_lookup on device: /dev/dk1 failed: 16!
 >  
 >  Something(tm) didn't close /dev/dk1 properly, or /dev/dk1 was told to
 >  close, and didn't...   '-R' tells the device to close so the rebuild
 >  can (re-)open it again.  Something is amiss in there... 

 the message is from the second raidctl -R.
 the first raidctl -R didn't produce anything.

 YAMAMOTO Takashi

 >  
 >  Later...
 >  
 >  Greg Oster

From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, yamt@NetBSD.org
Subject: Re: kern/44972: raidctl -R doesn't seem to work
Date: Mon, 23 May 2011 04:54:46 +0000 (UTC)

 hi,

 >  On Mon, 16 May 2011 22:00:01 +0000 (UTC)
 >  yamt@NetBSD.org wrote:
 >  
 >  > ushi% dmesg|tail
 >  > raid1: RAID Level 1
 >  > raid1: Components: /dev/dk0 /dev/dk1
 >  > raid1: Total Sectors: 5860531968 (2861587 MB)
 >  > raid1: GPT GUID: 497d5c1c-7fff-11e0-b07b-0015170bebef
 >  > dk2 at raid1: 497d5c30-7fff-11e0-b07b-0015170bebef
 >  > dk2: 5860530911 blocks at 1024, type: ffs
 >  > Could not verify parity
 >  > raid1: Error re-writing parity (1)!
 >  > wd3: mbr partition exceeds disk size
 >  > raid1: rebuilding: dk_lookup on device: /dev/dk1 failed: 16!
 >  > ushi% 
 >  
 >  in dksubr.c in dk_open() we have:
 >  
 >          if (dk->dk_nwedges != 0 && part != RAW_PART) {
 >                  ret = EBUSY;
 >                  goto done;
 >          }
 >  
 >  What part of those conditions are true, triggering the EBUSY
 >  for /dev/dk1 ?

 the EBUSY i got was from spec_open.

 there seems to be at least two problems.
 - the DIOCGPART ioctl in rf_ReconstructInPlace failed with ENOTTY
   as dk doesn't support it.
 - rf_ReconstructInPlace leaves the vnode open on errors.

 YAMAMOTO Takashi

 >  
 >  Later...
 >  
 >  Greg Oster

From: "YAMAMOTO Takashi" <yamt@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44972 CVS commit: src/sys/dev/raidframe
Date: Sat, 28 May 2011 00:53:04 +0000

 Module Name:	src
 Committed By:	yamt
 Date:		Sat May 28 00:53:04 UTC 2011

 Modified Files:
 	src/sys/dev/raidframe: rf_reconstruct.c

 Log Message:
 rf_ReconstructInPlace: don't leave a vnode open on errors.
 fixes a part of PR/44972.


 To generate a diff of this commit:
 cvs rdiff -u -r1.114 -r1.115 src/sys/dev/raidframe/rf_reconstruct.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, yamt@NetBSD.org
Subject: Re: kern/44972: raidctl -R doesn't seem to work
Date: Wed,  3 Aug 2011 03:46:57 +0000 (UTC)

 --Boundary-20110803124142-2676200
 Content-Type: Text/Plain; charset=us-ascii

 > - the DIOCGPART ioctl in rf_ReconstructInPlace failed with ENOTTY
 >   as dk doesn't support it.

 the attached patch is to fix this part of the problem.
 can anyone please review and commit?  i guess it's better to use
 rf_getdisksize.

 YAMAMOTO Takashi

 --Boundary-20110803124142-2676200
 Content-Type: Text/Plain; charset=us-ascii
 Content-Disposition: attachment; filename="a.diff"

 Index: rf_reconstruct.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/raidframe/rf_reconstruct.c,v
 retrieving revision 1.115
 diff -u -p -r1.115 rf_reconstruct.c
 --- rf_reconstruct.c	28 May 2011 00:53:04 -0000	1.115
 +++ rf_reconstruct.c	3 Aug 2011 03:44:10 -0000
 @@ -348,7 +348,8 @@ rf_ReconstructInPlace(RF_Raid_t *raidPtr
  	const RF_LayoutSW_t *lp;
  	RF_ComponentLabel_t *c_label;
  	int     numDisksDone = 0, rc;
 -	struct partinfo dpart;
 +	uint64_t numsec;
 +	unsigned int secsize;
  	struct pathbuf *pb;
  	struct vnode *vp;
  	struct vattr va;
 @@ -464,7 +465,7 @@ rf_ReconstructInPlace(RF_Raid_t *raidPtr
  		return(retcode);
  	}

 -	retcode = VOP_IOCTL(vp, DIOCGPART, &dpart, FREAD, curlwp->l_cred);
 +	retcode = getdisksize(vp, &numsec, &secsize);
  	if (retcode) {
  		vn_close(vp, FREAD | FWRITE, kauth_cred_get());
  		rf_lock_mutex2(raidPtr->mutex);
 @@ -474,10 +475,8 @@ rf_ReconstructInPlace(RF_Raid_t *raidPtr
  		return(retcode);
  	}
  	rf_lock_mutex2(raidPtr->mutex);
 -	raidPtr->Disks[col].blockSize =	dpart.disklab->d_secsize;
 -
 -	raidPtr->Disks[col].numBlocks = dpart.part->p_size -
 -		rf_protectedSectors;
 +	raidPtr->Disks[col].blockSize =	secsize;
 +	raidPtr->Disks[col].numBlocks = numsec - rf_protectedSectors;

  	raidPtr->raid_cinfo[col].ci_vp = vp;
  	raidPtr->raid_cinfo[col].ci_dev = va.va_rdev;

 --Boundary-20110803124142-2676200--

From: "Greg Oster" <oster@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44972 CVS commit: src/sys/dev/raidframe
Date: Wed, 3 Aug 2011 15:00:29 +0000

 Module Name:	src
 Committed By:	oster
 Date:		Wed Aug  3 15:00:29 UTC 2011

 Modified Files:
 	src/sys/dev/raidframe: rf_reconstruct.c

 Log Message:
 Address part of PR kern/44972.  From YAMAMOTO Takashi.  Thanks!


 To generate a diff of this commit:
 cvs rdiff -u -r1.115 -r1.116 src/sys/dev/raidframe/rf_reconstruct.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc: yamt@mwd.biglobe.ne.jp, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, yamt@NetBSD.org
Subject: Re: kern/44972: raidctl -R doesn't seem to work
Date: Wed, 3 Aug 2011 09:00:43 -0600

 On Wed,  3 Aug 2011 03:50:04 +0000 (UTC)
 yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:

 > The following reply was made to PR kern/44972; it has been noted by
 > GNATS.
 > 
 > From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 > 	netbsd-bugs@netbsd.org, yamt@NetBSD.org
 > Subject: Re: kern/44972: raidctl -R doesn't seem to work
 > Date: Wed,  3 Aug 2011 03:46:57 +0000 (UTC)
 > 
 >  --Boundary-20110803124142-2676200
 >  Content-Type: Text/Plain; charset=us-ascii
 >  
 >  > - the DIOCGPART ioctl in rf_ReconstructInPlace failed with ENOTTY
 >  >   as dk doesn't support it.
 >  
 >  the attached patch is to fix this part of the problem.
 >  can anyone please review and commit? 

 Reviewed, and committed.  Thanks!

 > i guess it's better to use rf_getdisksize.

 I thought so too, except that rf_getdisksize() would be setting values
 in the raidPtr->Disks[] array without holding the appropriate mutex.
 So your fix is the better one at this point... 

 Later...

 Greg Oster

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.