NetBSD Problem Report #47097
From www@NetBSD.org Sat Oct 20 15:45:04 2012
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
by www.NetBSD.org (Postfix) with ESMTP id 6C3A563E21A
for <gnats-bugs@gnats.NetBSD.org>; Sat, 20 Oct 2012 15:45:04 +0000 (UTC)
Message-Id: <20121020154503.A9C7763DFAE@www.NetBSD.org>
Date: Sat, 20 Oct 2012 15:45:03 +0000 (UTC)
From: tshiozak@bsdclub.org
Reply-To: tshiozak@bsdclub.org
To: gnats-bugs@NetBSD.org
Subject: ahcisata(4) triggers panic when disk I/O error is caused
X-Send-Pr-Version: www-1.0
>Number: 47097
>Category: kern
>Synopsis: ahcisata(4) triggers panic when disk I/O error is caused
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Oct 20 15:50:00 +0000 2012
>Last-Modified: Fri Oct 26 10:05:05 +0000 2012
>Originator: Takuya SHIOZAKI
>Release: trunk
>Organization:
>Environment:
NetBSD ichigo 6.99.11 NetBSD 6.99.11 (GENERIC) #0: Mon Sep 24 11:49:55 JST 2012 root@aoi:/usr/home/current-build/obj/sys/arch/i386/compile/GENERIC i386
>Description:
when disk I/O error is caused, ahcisata(4) seems to call tsleep() under interrupt context and it triggers panic.
call trace is here:
vpanic
kern_assert
callout_halt
sleepq_block
ahci_do_reset_drive
ahci_reset_drive
wddone
ahci_bio_complete
ahci_intr_port
ahci_intr
>How-To-Repeat:
use ahcisata driver with a nearly broken disk.
>Fix:
probably, don't use tsleep() in ahci_do_reset_drive() or don't call drv_done under interrupt context. we need ata(9) manpage.
>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/47097: ahcisata(4) triggers panic when disk I/O error is
caused
Date: Sun, 21 Oct 2012 13:04:20 +0200
--u3/rZRmxL6MmkK24
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
On Sat, Oct 20, 2012 at 03:50:00PM +0000, tshiozak@bsdclub.org wrote:
> when disk I/O error is caused, ahcisata(4) seems to call tsleep() under interrupt context and it triggers panic.
>
> call trace is here:
>
> vpanic
> kern_assert
> callout_halt
> sleepq_block
> ahci_do_reset_drive
> ahci_reset_drive
> wddone
> ahci_bio_complete
> ahci_intr_port
> ahci_intr
The attached patch should work around it. But this is going to
delay() for half a second, which is bad. Something better is needed,
so we can reset the drive from the atabus thread.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
--u3/rZRmxL6MmkK24
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=diff
Index: ahcisata_core.c
===================================================================
RCS file: /cvsroot/src/sys/dev/ic/ahcisata_core.c,v
retrieving revision 1.44
diff -u -p -u -r1.44 ahcisata_core.c
--- ahcisata_core.c 27 Sep 2012 00:39:47 -0000 1.44
+++ ahcisata_core.c 21 Oct 2012 11:02:38 -0000
@@ -721,7 +721,10 @@ again:
if ((((sig & AHCI_P_TFD_ST) >> AHCI_P_TFD_ST_SHIFT)
& WDCS_BSY) == 0)
break;
- tsleep(&sc, PRIBIO, "ahcid2h", mstohz(10));
+ if (flags & AT_WAIT)
+ tsleep(&sc, PRIBIO, "ahcid2h", mstohz(10));
+ else
+ delay(10000);
}
if (i == AHCI_RST_WAIT) {
aprint_error("%s: BSY never cleared, TD 0x%x\n",
@@ -740,7 +743,10 @@ again:
AHCI_READ(sc, AHCI_P_CMD(chp->ch_channel))), DEBUG_PROBE);
end:
ahci_channel_stop(sc, chp, flags);
- tsleep(&sc, PRIBIO, "ahcirst", mstohz(500));
+ if (flags & AT_WAIT)
+ tsleep(&sc, PRIBIO, "ahcirst", mstohz(500));
+ else
+ delay(500000);
/* clear port interrupt register */
AHCI_WRITE(sc, AHCI_P_IS(chp->ch_channel), 0xffffffff);
ahci_channel_start(sc, chp, AT_WAIT,
--u3/rZRmxL6MmkK24--
From: "T.SHIOZAKI" <tshiozak@bsdclub.org>
To: bouyer@antioche.eu.org
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: kern/47097: ahcisata(4) triggers panic when disk I/O error is
caused
Date: Thu, 25 Oct 2012 06:41:06 +0900 (JST)
> On Sat, Oct 20, 2012 at 03:50:00PM +0000, tshiozak@bsdclub.org wrote:
> > when disk I/O error is caused, ahcisata(4) seems to call tsleep() under interrupt context and it triggers panic.
> >
> > call trace is here:
> >
> > vpanic
> > kern_assert
> > callout_halt
> > sleepq_block
> > ahci_do_reset_drive
> > ahci_reset_drive
> > wddone
> > ahci_bio_complete
> > ahci_intr_port
> > ahci_intr
>
> The attached patch should work around it. But this is going to
> delay() for half a second, which is bad. Something better is needed,
> so we can reset the drive from the atabus thread.
I confirmed that the patch fixes the problem. Thanks.
---
Takuya SHIOZAKI
From: "Manuel Bouyer" <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/47097 CVS commit: src/sys/dev/ic
Date: Fri, 26 Oct 2012 09:59:11 +0000
Module Name: src
Committed By: bouyer
Date: Fri Oct 26 09:59:11 UTC 2012
Modified Files:
src/sys/dev/ic: ahcisata_core.c
Log Message:
Workaround PR kern/47097: use delay() instead of tsleep() ahci_do_reset_drive()
if not called with AT_WAIT.
The right fix here is to change the ata layer to reset the drive from
thread context, to avoid a 0.5 delay() in interrupt context when a drive
fails.
To generate a diff of this commit:
cvs rdiff -u -r1.44 -r1.45 src/sys/dev/ic/ahcisata_core.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: "T.SHIOZAKI" <tshiozak@bsdclub.org>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: kern/47097: ahcisata(4) triggers panic when disk I/O error is
caused
Date: Fri, 26 Oct 2012 11:59:48 +0200
On Thu, Oct 25, 2012 at 06:41:06AM +0900, T.SHIOZAKI wrote:
> > delay() for half a second, which is bad. Something better is needed,
> > so we can reset the drive from the atabus thread.
>
> I confirmed that the patch fixes the problem. Thanks.
OK, I commited this patch as a stopgap measure, but a real fix will
require more work.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.