NetBSD Problem Report #22505

Received: (qmail 8940 invoked by uid 605); 16 Aug 2003 20:23:10 -0000
Message-Id: <20030816202309.8931.qmail@mail.netbsd.org>
Date: 16 Aug 2003 20:23:09 -0000
From: tls@netbsd.org
Sender: gnats-bugs-owner@NetBSD.org
Reply-To: tls@netbsd.org
To: gnats-bugs@gnats.netbsd.org
Subject: twe driver doesn't probe right with set in degraded mode.
X-Send-Pr-Version: 3.95

>Number:         22505
>Category:       kern
>Synopsis:       With a RAID set in degraded mode, the twe driver splodes at boot.
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Aug 16 20:24:00 +0000 2003
>Closed-Date:    
>Last-Modified:  Fri May 07 11:58:51 +0000 2004
>Originator:     Thor Lancelot Simon
>Release:        NetBSD/i386 1.6W as of 2003-08-10
>Organization:
	The NetBSD Project
>Environment:
NetBSD enola-gay 1.6W NetBSD 1.6W (ENOLA-GAY) #8: Sun Aug 10 17:07:21 EDT 2003  tls@rekusant:/usr/src/sys/arch/i386/compile/ENOLA-GAY i386
Architecture: i386
Machine: i386
>Description:
	My 3ware Escalade 6410 controller often decides after a sudden power failure that one of the components of my RAID 10 set has failed.  It does not
automatically initiate a rebuild -- it's necessary to go into the BIOS and
reassign the disk to the set in order to cause this.  Until then, the set
is in DEGRADED mode (as shown by the BIOS) but is functional (the BIOS can
use it to boot the NetBSD kernel).

This is obviously a controller firmware bug.  However, it interacts in an
extremely bad way with a bug in the NetBSD driver:

twe0 at pci2 dev 5 function 0: 3ware Escalade
twe0: interrupting at apic 0 int 11 (irq 11)
twe0: no attention interrupt
twe0: reset failed

No logical disk devices probe, and the kernel fails to mount its root
filesystem:

boot device: <unknown>
device ld0 (0x1300) not configured

Here's what the controller/array look like after I manually start an array
rebuild in the BIOS:

twe0 at pci2 dev 5 function 0: 3ware Escalade
twe0: interrupting at apic 0 int 11 (irq 11)
twe0: 4 ports, Firmware FE6X 1.02.00.029, BIOS BEXX 1.07.00.009
ld0 at twe0 unit 0
ld0: 114 GB, 14947 cyl, 255 head, 63 sec, 512 bytes/sect x 240135680 sectors

About 30 seconds *after* the first I/O on the device (e.g. fsck) it reports:

twe0: unit 0: AEN 0x000b (rebuild started) received

Which is what I'd expect from a 3ware card in degraded mode.

>How-To-Repeat:
Repeatedly yank the power cord out of a machine with a 6410 card and four
disks.  Eventually, it'll decide one of the disks has failed; then the
condition described here will occur.
>Fix:
Unknown.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->ad@netbsd.org 
Responsible-Changed-By: tls 
Responsible-Changed-When: Sat Aug 16 20:29:59 UTC 2003 
Responsible-Changed-Why:  
Because Andrew can probably fix this with one hand tied behind his back. :-) 
Responsible-Changed-From-To: ad@netbsd.org->ad 
Responsible-Changed-By: lukem 
Responsible-Changed-When: Thu Oct 30 00:37:35 UTC 2003 
Responsible-Changed-Why:  
to be consistent with how we use the responsible field 
Responsible-Changed-From-To: ad->kern-bug-people 
Responsible-Changed-By: wiz 
Responsible-Changed-When: Fri May 7 11:58:32 UTC 2004 
Responsible-Changed-Why:  
ad is not working on this any longer. 
>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.