NetBSD Problem Report #44972
From yamt@NetBSD.org Mon May 16 21:58:30 2011
Return-Path: <yamt@NetBSD.org>
Received: by www.NetBSD.org (Postfix, from userid 1270)
id A668463B8AC; Mon, 16 May 2011 21:58:30 +0000 (UTC)
Message-Id: <20110516215830.A668463B8AC@www.NetBSD.org>
Date: Mon, 16 May 2011 21:58:30 +0000 (UTC)
From: yamt@NetBSD.org
Reply-To: yamt@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: raidctl -R doesn't seem to work
X-Send-Pr-Version: 3.95
>Number: 44972
>Category: kern
>Synopsis: raidctl -R doesn't seem to work
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon May 16 22:00:00 +0000 2011
>Last-Modified: Wed Aug 03 15:05:04 +0000 2011
>Originator: YAMAMOTO Takashi
>Release: NetBSD current
>Organization:
>Environment:
System: NetBSD current
Architecture: x86_64
Machine: amd64
>Description:
raidctl -R doesn't start reconstruction.
>How-To-Repeat:
after raidctl -C, raidctl -I, and raidctl -i, do the following.
ushi% sudo raidctl -s raid1
Components:
/dev/dk0: optimal
/dev/dk1: optimal
No spares.
Component label for /dev/dk0:
Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 2011051702, Mod Counter: 135
Clean: No, Status: 0
sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 1565564672
RAID Level: 1
Autoconfig: No
Root partition: No
Last configured as: raid1
Component label for /dev/dk1:
Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 2011051702, Mod Counter: 135
Clean: No, Status: 0
sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 1565564672
RAID Level: 1
Autoconfig: No
Root partition: No
Last configured as: raid1
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 0% complete.
Copyback is 100% complete.
ushi% sudo raidctl -f /dev/dk1 raid1 # i did this during the raid init
ushi% sudo raidctl -s raid1
Components:
/dev/dk0: optimal
/dev/dk1: failed
No spares.
Component label for /dev/dk0:
Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 2011051702, Mod Counter: 140
Clean: No, Status: 0
sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 1565564672
RAID Level: 1
Autoconfig: No
Root partition: No
Last configured as: raid1
/dev/dk1 status is: failed. Skipping label.
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
ushi% sudo raidctl -R /dev/dk1 raid1
ushi% sudo raidctl -s raid1
Components:
/dev/dk0: optimal
/dev/dk1: failed
No spares.
Component label for /dev/dk0:
Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 2011051702, Mod Counter: 144
Clean: No, Status: 0
sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 1565564672
RAID Level: 1
Autoconfig: No
Root partition: No
Last configured as: raid1
/dev/dk1 status is: failed. Skipping label.
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
ushi% sudo raidctl -R /dev/dk1 raid1
ushi% sudo raidctl -s raid1
Components:
/dev/dk0: optimal
/dev/dk1: failed
No spares.
Component label for /dev/dk0:
Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 2011051702, Mod Counter: 148
Clean: No, Status: 0
sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 1565564672
RAID Level: 1
Autoconfig: No
Root partition: No
Last configured as: raid1
/dev/dk1 status is: failed. Skipping label.
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
ushi% dmesg|tail
raid1: RAID Level 1
raid1: Components: /dev/dk0 /dev/dk1
raid1: Total Sectors: 5860531968 (2861587 MB)
raid1: GPT GUID: 497d5c1c-7fff-11e0-b07b-0015170bebef
dk2 at raid1: 497d5c30-7fff-11e0-b07b-0015170bebef
dk2: 5860530911 blocks at 1024, type: ffs
Could not verify parity
raid1: Error re-writing parity (1)!
wd3: mbr partition exceeds disk size
raid1: rebuilding: dk_lookup on device: /dev/dk1 failed: 16!
ushi%
>Fix:
>Audit-Trail:
From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/44972: raidctl -R doesn't seem to work
Date: Mon, 16 May 2011 16:04:13 -0600
On Mon, 16 May 2011 22:00:01 +0000 (UTC)
yamt@NetBSD.org wrote:
> Could not verify parity
> raid1: Error re-writing parity (1)!
> wd3: mbr partition exceeds disk size
> raid1: rebuilding: dk_lookup on device: /dev/dk1 failed: 16!
Something(tm) didn't close /dev/dk1 properly, or /dev/dk1 was told to
close, and didn't... '-R' tells the device to close so the rebuild
can (re-)open it again. Something is amiss in there...
Later...
Greg Oster
From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/44972: raidctl -R doesn't seem to work
Date: Mon, 16 May 2011 16:29:08 -0600
On Mon, 16 May 2011 22:00:01 +0000 (UTC)
yamt@NetBSD.org wrote:
> ushi% dmesg|tail
> raid1: RAID Level 1
> raid1: Components: /dev/dk0 /dev/dk1
> raid1: Total Sectors: 5860531968 (2861587 MB)
> raid1: GPT GUID: 497d5c1c-7fff-11e0-b07b-0015170bebef
> dk2 at raid1: 497d5c30-7fff-11e0-b07b-0015170bebef
> dk2: 5860530911 blocks at 1024, type: ffs
> Could not verify parity
> raid1: Error re-writing parity (1)!
> wd3: mbr partition exceeds disk size
> raid1: rebuilding: dk_lookup on device: /dev/dk1 failed: 16!
> ushi%
in dksubr.c in dk_open() we have:
if (dk->dk_nwedges != 0 && part != RAW_PART) {
ret = EBUSY;
goto done;
}
What part of those conditions are true, triggering the EBUSY
for /dev/dk1 ?
Later...
Greg Oster
From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/44972: raidctl -R doesn't seem to work
Date: Mon, 16 May 2011 16:10:33 -0600
On Mon, 16 May 2011 22:05:04 +0000 (UTC)
Greg Oster <oster@cs.usask.ca> wrote:
> The following reply was made to PR kern/44972; it has been noted by
> GNATS.
>
> From: Greg Oster <oster@cs.usask.ca>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/44972: raidctl -R doesn't seem to work
> Date: Mon, 16 May 2011 16:04:13 -0600
>
> On Mon, 16 May 2011 22:00:01 +0000 (UTC)
> yamt@NetBSD.org wrote:
>
> > Could not verify parity
> > raid1: Error re-writing parity (1)!
> > wd3: mbr partition exceeds disk size
> > raid1: rebuilding: dk_lookup on device: /dev/dk1 failed: 16!
>
> Something(tm) didn't close /dev/dk1 properly, or /dev/dk1 was told to
> close, and didn't... '-R' tells the device to close so the rebuild
> can (re-)open it again. Something is amiss in there...
Hmm.. this is a non-autoconfigured set... so the call to
rf_close_component() should be doing a vn_close() on the vp associated
with /dev/dk1.
Later...
Greg Oster
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, yamt@NetBSD.org
Subject: Re: kern/44972: raidctl -R doesn't seem to work
Date: Fri, 20 May 2011 09:45:25 +0000 (UTC)
hi,
> > Could not verify parity
> > raid1: Error re-writing parity (1)!
> > wd3: mbr partition exceeds disk size
> > raid1: rebuilding: dk_lookup on device: /dev/dk1 failed: 16!
>
> Something(tm) didn't close /dev/dk1 properly, or /dev/dk1 was told to
> close, and didn't... '-R' tells the device to close so the rebuild
> can (re-)open it again. Something is amiss in there...
the message is from the second raidctl -R.
the first raidctl -R didn't produce anything.
YAMAMOTO Takashi
>
> Later...
>
> Greg Oster
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, yamt@NetBSD.org
Subject: Re: kern/44972: raidctl -R doesn't seem to work
Date: Mon, 23 May 2011 04:54:46 +0000 (UTC)
hi,
> On Mon, 16 May 2011 22:00:01 +0000 (UTC)
> yamt@NetBSD.org wrote:
>
> > ushi% dmesg|tail
> > raid1: RAID Level 1
> > raid1: Components: /dev/dk0 /dev/dk1
> > raid1: Total Sectors: 5860531968 (2861587 MB)
> > raid1: GPT GUID: 497d5c1c-7fff-11e0-b07b-0015170bebef
> > dk2 at raid1: 497d5c30-7fff-11e0-b07b-0015170bebef
> > dk2: 5860530911 blocks at 1024, type: ffs
> > Could not verify parity
> > raid1: Error re-writing parity (1)!
> > wd3: mbr partition exceeds disk size
> > raid1: rebuilding: dk_lookup on device: /dev/dk1 failed: 16!
> > ushi%
>
> in dksubr.c in dk_open() we have:
>
> if (dk->dk_nwedges != 0 && part != RAW_PART) {
> ret = EBUSY;
> goto done;
> }
>
> What part of those conditions are true, triggering the EBUSY
> for /dev/dk1 ?
the EBUSY i got was from spec_open.
there seems to be at least two problems.
- the DIOCGPART ioctl in rf_ReconstructInPlace failed with ENOTTY
as dk doesn't support it.
- rf_ReconstructInPlace leaves the vnode open on errors.
YAMAMOTO Takashi
>
> Later...
>
> Greg Oster
From: "YAMAMOTO Takashi" <yamt@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/44972 CVS commit: src/sys/dev/raidframe
Date: Sat, 28 May 2011 00:53:04 +0000
Module Name: src
Committed By: yamt
Date: Sat May 28 00:53:04 UTC 2011
Modified Files:
src/sys/dev/raidframe: rf_reconstruct.c
Log Message:
rf_ReconstructInPlace: don't leave a vnode open on errors.
fixes a part of PR/44972.
To generate a diff of this commit:
cvs rdiff -u -r1.114 -r1.115 src/sys/dev/raidframe/rf_reconstruct.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, yamt@NetBSD.org
Subject: Re: kern/44972: raidctl -R doesn't seem to work
Date: Wed, 3 Aug 2011 03:46:57 +0000 (UTC)
--Boundary-20110803124142-2676200
Content-Type: Text/Plain; charset=us-ascii
> - the DIOCGPART ioctl in rf_ReconstructInPlace failed with ENOTTY
> as dk doesn't support it.
the attached patch is to fix this part of the problem.
can anyone please review and commit? i guess it's better to use
rf_getdisksize.
YAMAMOTO Takashi
--Boundary-20110803124142-2676200
Content-Type: Text/Plain; charset=us-ascii
Content-Disposition: attachment; filename="a.diff"
Index: rf_reconstruct.c
===================================================================
RCS file: /cvsroot/src/sys/dev/raidframe/rf_reconstruct.c,v
retrieving revision 1.115
diff -u -p -r1.115 rf_reconstruct.c
--- rf_reconstruct.c 28 May 2011 00:53:04 -0000 1.115
+++ rf_reconstruct.c 3 Aug 2011 03:44:10 -0000
@@ -348,7 +348,8 @@ rf_ReconstructInPlace(RF_Raid_t *raidPtr
const RF_LayoutSW_t *lp;
RF_ComponentLabel_t *c_label;
int numDisksDone = 0, rc;
- struct partinfo dpart;
+ uint64_t numsec;
+ unsigned int secsize;
struct pathbuf *pb;
struct vnode *vp;
struct vattr va;
@@ -464,7 +465,7 @@ rf_ReconstructInPlace(RF_Raid_t *raidPtr
return(retcode);
}
- retcode = VOP_IOCTL(vp, DIOCGPART, &dpart, FREAD, curlwp->l_cred);
+ retcode = getdisksize(vp, &numsec, &secsize);
if (retcode) {
vn_close(vp, FREAD | FWRITE, kauth_cred_get());
rf_lock_mutex2(raidPtr->mutex);
@@ -474,10 +475,8 @@ rf_ReconstructInPlace(RF_Raid_t *raidPtr
return(retcode);
}
rf_lock_mutex2(raidPtr->mutex);
- raidPtr->Disks[col].blockSize = dpart.disklab->d_secsize;
-
- raidPtr->Disks[col].numBlocks = dpart.part->p_size -
- rf_protectedSectors;
+ raidPtr->Disks[col].blockSize = secsize;
+ raidPtr->Disks[col].numBlocks = numsec - rf_protectedSectors;
raidPtr->raid_cinfo[col].ci_vp = vp;
raidPtr->raid_cinfo[col].ci_dev = va.va_rdev;
--Boundary-20110803124142-2676200--
From: "Greg Oster" <oster@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/44972 CVS commit: src/sys/dev/raidframe
Date: Wed, 3 Aug 2011 15:00:29 +0000
Module Name: src
Committed By: oster
Date: Wed Aug 3 15:00:29 UTC 2011
Modified Files:
src/sys/dev/raidframe: rf_reconstruct.c
Log Message:
Address part of PR kern/44972. From YAMAMOTO Takashi. Thanks!
To generate a diff of this commit:
cvs rdiff -u -r1.115 -r1.116 src/sys/dev/raidframe/rf_reconstruct.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc: yamt@mwd.biglobe.ne.jp, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, yamt@NetBSD.org
Subject: Re: kern/44972: raidctl -R doesn't seem to work
Date: Wed, 3 Aug 2011 09:00:43 -0600
On Wed, 3 Aug 2011 03:50:04 +0000 (UTC)
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
> The following reply was made to PR kern/44972; it has been noted by
> GNATS.
>
> From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
> To: gnats-bugs@NetBSD.org
> Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org, yamt@NetBSD.org
> Subject: Re: kern/44972: raidctl -R doesn't seem to work
> Date: Wed, 3 Aug 2011 03:46:57 +0000 (UTC)
>
> --Boundary-20110803124142-2676200
> Content-Type: Text/Plain; charset=us-ascii
>
> > - the DIOCGPART ioctl in rf_ReconstructInPlace failed with ENOTTY
> > as dk doesn't support it.
>
> the attached patch is to fix this part of the problem.
> can anyone please review and commit?
Reviewed, and committed. Thanks!
> i guess it's better to use rf_getdisksize.
I thought so too, except that rf_getdisksize() would be setting values
in the raidPtr->Disks[] array without holding the appropriate mutex.
So your fix is the better one at this point...
Later...
Greg Oster
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.