NetBSD Problem Report #42904

From louis@thoth.zabrico.com  Sun Feb 28 20:35:26 2010
Return-Path: <louis@thoth.zabrico.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id E379963B873
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 28 Feb 2010 20:35:25 +0000 (UTC)
Message-Id: <201002282035.o1SKZKsa012188@thoth.zabrico.com>
Date: Sun, 28 Feb 2010 15:35:20 -0500 (EST)
From: louis@thoth.zabrico.com
Reply-To: louis@thoth.zabrico.com
To: gnats-bugs@gnats.NetBSD.org
Subject: RaidFrame panic after removal of RAID-1 member
X-Send-Pr-Version: 3.95

>Number:         42904
>Category:       kern
>Synopsis:       after removal of a failing RaidFrame RAID-1 member, netbsd panics
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Feb 28 20:40:00 +0000 2010
>Closed-Date:    Wed Mar 03 14:27:11 +0000 2010
>Last-Modified:  Sat Mar 06 21:10:03 +0000 2010
>Originator:     Louis Guillaume
>Release:        NetBSD 5.0_STABLE
>Organization:
>Environment:
System: NetBSD xxx.xxx.xxx 5.0_STABLE NetBSD 5.0_STABLE (GENERIC) #13: Wed Dec 30 14:39:00 EST 2009 louis@xx.xx.xxx:/usr/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
First some background on our setup...

# raidctl -s raid0
Components:
           /dev/sd0a: failed
           /dev/sd1a: optimal
No spares.
/dev/sd0a status is: failed.  Skipping label.
Component label for /dev/sd1a:
   Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 20071216, Mod Counter: 280
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 143638784
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Yes
   Last configured as: raid0
Parity status: DIRTY
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

# dmesg | grep sd0
sd0 at scsibus0 target 0 lun 0: <ModusLnk, , > disk fixed
sd0: 70136 MB, 78753 cyl, 2 head, 911 sec, 512 bytes/sect x 143638992 sectors
sd0: sync (12.50ns offset 62), 16-bit (160.000MB/s) transfers, tagged queueing
raid0: Components: /dev/sd0a[**FAILED**] /dev/sd1a

# grep smartd.*sd0d /var/log/messages |tail -3
Feb 26 00:43:04 thoth smartd[296]: Device: /dev/sd0d, opened
Feb 26 00:43:04 thoth smartd[296]: Device: /dev/sd0d, is SMART capable. Adding to "monitor" list.
Feb 26 00:43:04 thoth smartd[296]: Device: /dev/sd0d, SMART Failure: HARDWARE IMPENDING FAILURE TOO MANY BLOCK REASSIGNS

So we got a bad disk and I have to change it out. So I did the following:

  o failed the component with "raidctl -f /dev/sd0a raid0"
  o shut down
  o replaced the disk
  o rebooted
  o Now the system panics right after raidframe initializes.
    Screen shots can be found at...

    ftp://zabrico.com/pub/RaidFrame-Panic-0.jpeg
    ftp://zabrico.com/pub/RaidFrame-Panic-1.jpeg

    In this case, I had removed the failing drive, so we have sd0 on
    scsibus1. This drive normally shows up as sd1 on scsibus1, but that
    shouldn't matter to RaidFrame. At any rate, the same thing happens
    with a new blank (identical) disk in scsibus0.

  o power off
  o replace the "bad" sd0
  o machine boots as normal

>How-To-Repeat:
Not sure if this will be repeatable on other raidframe machines, but here's what causes
it to happen:

  o Set up a RAID-1 device
  o Fail one component with "raidctl -f /dev/xx0a raid0"
  o shut down
  o remove the failed component
  o start up
  o system panics right after, "Kernelized RaidFrame activated".

>Fix:
  See Greg Oster's analysis in this thread...
  http://mail-index.netbsd.org/netbsd-users/2010/02/26/msg005746.html

  not sure if the actual fix is there but...

>Release-Note:

>Audit-Trail:
From: Greg Oster <oster@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/42904 CVS commit: src/sys/dev/raidframe
Date: Wed, 3 Mar 2010 14:23:27 +0000

 Module Name:	src
 Committed By:	oster
 Date:		Wed Mar  3 14:23:27 UTC 2010

 Modified Files:
 	src/sys/dev/raidframe: rf_paritymap.c

 Log Message:
 Don't attempt to read or write component label stuff from/to 'dead disks'.
 Update used spares with the correct parity map bits too.

 Addresses PR#42904 by Louis Guillaume.  Fix confirmed by submitter.
 Thanks!


 To generate a diff of this commit:
 cvs rdiff -u -r1.3 -r1.4 src/sys/dev/raidframe/rf_paritymap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: oster@NetBSD.org
State-Changed-When: Wed, 03 Mar 2010 14:27:11 +0000
State-Changed-Why:
Fixed with this change:
cvs rdiff -u -r1.3 -r1.4 src/sys/dev/raidframe/rf_paritymap.c


From: Bernd Ernesti <netbsd@lists.veego.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/42904 CVS commit: src/sys/dev/raidframe
Date: Wed, 3 Mar 2010 21:09:13 +0100

 On Wed, Mar 03, 2010 at 02:25:03PM +0000, Greg Oster wrote:
 > From: Greg Oster <oster@netbsd.org>
 > To: gnats-bugs@gnats.NetBSD.org
 > Cc: 
 > Subject: PR/42904 CVS commit: src/sys/dev/raidframe
 > Date: Wed, 3 Mar 2010 14:23:27 +0000
 > 
 >  Module Name:	src
 >  Committed By:	oster
 >  Date:		Wed Mar  3 14:23:27 UTC 2010
 >  
 >  Modified Files:
 >  	src/sys/dev/raidframe: rf_paritymap.c
 >  
 >  Log Message:
 >  Don't attempt to read or write component label stuff from/to 'dead disks'.
 >  Update used spares with the correct parity map bits too.
 >  
 >  Addresses PR#42904 by Louis Guillaume.  Fix confirmed by submitter.
 >  Thanks!

 Is there a pullup needed for netbsd-5?

 Bernd

From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/42904 CVS commit: src/sys/dev/raidframe
Date: Wed, 3 Mar 2010 14:15:10 -0600

 On Wed,  3 Mar 2010 20:10:05 +0000 (UTC)
 Bernd Ernesti <netbsd@lists.veego.de> wrote:

 > The following reply was made to PR kern/42904; it has been noted by GNATS.
 > 
 > From: Bernd Ernesti <netbsd@lists.veego.de>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: PR/42904 CVS commit: src/sys/dev/raidframe
 > Date: Wed, 3 Mar 2010 21:09:13 +0100
 > 
 >  On Wed, Mar 03, 2010 at 02:25:03PM +0000, Greg Oster wrote:
 >  > From: Greg Oster <oster@netbsd.org>
 >  > To: gnats-bugs@gnats.NetBSD.org
 >  > Cc: 
 >  > Subject: PR/42904 CVS commit: src/sys/dev/raidframe
 >  > Date: Wed, 3 Mar 2010 14:23:27 +0000
 >  > 
 >  >  Module Name:	src
 >  >  Committed By:	oster
 >  >  Date:		Wed Mar  3 14:23:27 UTC 2010
 >  >  
 >  >  Modified Files:
 >  >  	src/sys/dev/raidframe: rf_paritymap.c
 >  >  
 >  >  Log Message:
 >  >  Don't attempt to read or write component label stuff from/to 'dead disks'.
 >  >  Update used spares with the correct parity map bits too.
 >  >  
 >  >  Addresses PR#42904 by Louis Guillaume.  Fix confirmed by submitter.
 >  >  Thanks!
 >  
 >  Is there a pullup needed for netbsd-5?

 Absolutely.  It's already in the queue as ticket 1325.

 Later...

 Greg Oster

From: Stephen Borrill <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/42904 CVS commit: [netbsd-5] src/sys/dev/raidframe
Date: Sat, 6 Mar 2010 21:05:29 +0000

 Module Name:	src
 Committed By:	sborrill
 Date:		Sat Mar  6 21:05:29 UTC 2010

 Modified Files:
 	src/sys/dev/raidframe [netbsd-5]: rf_paritymap.c

 Log Message:
 Pull up the following revisions(s) (requested by oster in ticket #1325):
 	sys/dev/raidframe/rf_paritymap.c:	revision 1.4

 Don't attempt to read or write component label stuff from/to 'dead
 disks'. Update used spares with the correct parity map bits too.
 Addresses PR#42904.


 To generate a diff of this commit:
 cvs rdiff -u -r1.3.2.2 -r1.3.2.3 src/sys/dev/raidframe/rf_paritymap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.