NetBSD Problem Report #30233
From bouyer@antioche.eu.org Sat May 14 22:02:07 2005
Return-Path: <bouyer@antioche.eu.org>
Received: from chassiron.antioche.eu.org (bouyer.net1.nerim.net [62.212.96.44])
by narn.netbsd.org (Postfix) with ESMTP id CACA063B116
for <gnats-bugs@gnats.NetBSD.org>; Sat, 14 May 2005 22:02:05 +0000 (UTC)
Message-Id: <200505142202.j4EM20G3004412@rochebonne.antioche.eu.org>
Date: Sun, 15 May 2005 00:02:00 +0200 (CEST)
From: Manuel Bouyer <bouyer@antioche.eu.org>
Reply-To: bouyer@antioche.eu.org
To: gnats-bugs@netbsd.org
Subject: raidstrategy() isn't interrupt-safe
X-Send-Pr-Version: 3.95
>Number: 30233
>Category: kern
>Synopsis: raidstrategy() isn't interrupt-safe
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: oster
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat May 14 22:03:00 +0000 2005
>Closed-Date: Sat Oct 01 16:40:54 +0000 2005
>Last-Modified: Tue Oct 04 14:15:01 +0000 2005
>Originator: Manuel Bouyer
>Release: NetBSD 3.99.3
>Organization:
>Environment:
System: NetBSD rochebonne.antioche.eu.org 3.99.3 NetBSD 3.99.3 (ROCHEBONNE) #0: Sat May 14 15:33:21 CEST 2005 bouyer@pop.lip6.fr:/local/pop1/bouyer/tmp/i386/obj/local/pop1/bouyer/current/src/sys/arch/i386/compile/ROCHEBONNE i386
Architecture: i386
Machine: i386
>Description:
Quoting a post from Jason Thorpe on tech-kern:
> There are lots of other things that might cause a disk's strategy
> routine to be called from interrupt context (ccd / raidframe are good
> examples). Really, we need to audit ALL of the disk strategy
> routines and ensure that they are IPL_BIO interrupt-context safe.
raidstrategy() isn't safe to call from interrupt context: it ends up
calling pool_get(PR_WAITOK):
raidstrategy()
raidstart()
rf_DoAccess()
rf_AllocRaidAccDesc()
pool_get()
There may be other problems too other problems when called from
interrupt context too (are the locks interrupt-safe ?).
>How-To-Repeat:
An easy way to trigger a pool_get() panic is to export a partition
from a raidframe device to a guest Xen domain, as reported by
Yoshito Komatsu on port-xen.
>Fix:
None provided.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->oster
Responsible-Changed-By: oster@netbsd.org
Responsible-Changed-When: Thu, 26 May 2005 01:30:11 +0000
Responsible-Changed-Why:
I deal with RAIDframe suckage.
From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/30233: raidstrategy() isn't interrupt-safe
Date: Sat, 24 Sep 2005 17:23:17 -0600
Manuel Bouyer writes:
[snip]
> >How-To-Repeat:
> An easy way to trigger a pool_get() panic is to export a partition
> from a raidframe device to a guest Xen domain, as reported by
> Yoshito Komatsu on port-xen.
> >Fix:
The following will hopefully fix this problem. I'm currently testing
the impact of these changes on RAIDframe in general, but would be
interested to hear reports on whether this actually fixes the problem
mentioned in this PR.
Later...
Greg Oster
Index: rf_engine.c
===================================================================
RCS file: /cvsroot/src/sys/dev/raidframe/rf_engine.c,v
retrieving revision 1.35
diff -p -r1.35 rf_engine.c
*** rf_engine.c 27 Feb 2005 00:27:44 -0000 1.35
--- rf_engine.c 24 Sep 2005 23:19:07 -0000
*************** __KERNEL_RCSID(0, "$NetBSD: rf_engine.c,
*** 67,72 ****
--- 67,73 ----
#include "rf_dagutils.h"
#include "rf_shutdown.h"
#include "rf_raid.h"
+ #include "rf_kintf.h"
static void rf_ShutdownEngine(void *);
static void DAGExecutionThread(RF_ThreadArg_t arg);
*************** DAGExecutionThread(RF_ThreadArg_t arg)
*** 825,835 ****
}
/*
! * rf_RaidIOThread() -- When I/O to a component completes,
! * KernelWakeupFunc() puts the completed request onto raidPtr->iodone
! * TAILQ. This function looks after requests on that queue by calling
! * rf_DiskIOComplete() for the request, and by calling any required
! * CompleteFunc for the request.
*/
static void
--- 826,838 ----
}
/*
! * rf_RaidIOThread() -- When I/O to a component begins, raidstrategy()
! * puts the I/O on a buf_queue, and then signals raidPtr->iodone. If
! * necessary, this function calls raidstart() to initiate the I/O.
! * When I/O to a component completes, KernelWakeupFunc() puts the
! * completed request onto raidPtr->iodone TAILQ. This function looks
! * after requests on that queue by calling rf_DiskIOComplete() for the
! * request, and by calling any required CompleteFunc for the request.
*/
static void
*************** rf_RaidIOThread(RF_ThreadArg_t arg)
*** 846,852 ****
while (!raidPtr->shutdown_raidio) {
/* if there is nothing to do, then snooze. */
! if (TAILQ_EMPTY(&(raidPtr->iodone))) {
ltsleep(&(raidPtr->iodone), PRIBIO, "raidiow", 0,
&(raidPtr->iodone_lock));
}
--- 849,856 ----
while (!raidPtr->shutdown_raidio) {
/* if there is nothing to do, then snooze. */
! if (TAILQ_EMPTY(&(raidPtr->iodone)) &&
! rf_buf_queue_check(raidPtr->raidid)) {
ltsleep(&(raidPtr->iodone), PRIBIO, "raidiow", 0,
&(raidPtr->iodone_lock));
}
*************** rf_RaidIOThread(RF_ThreadArg_t arg)
*** 859,864 ****
--- 863,874 ----
(req->CompleteFunc) (req->argument, req->error);
simple_lock(&(raidPtr->iodone_lock));
}
+
+ /* process any pending outgoing IO */
+ simple_unlock(&(raidPtr->iodone_lock));
+ raidstart(raidPtr);
+ simple_lock(&(raidPtr->iodone_lock));
+
}
/* Let rf_ShutdownEngine know that we're done... */
Index: rf_netbsd.h
===================================================================
RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsd.h,v
retrieving revision 1.23
diff -p -r1.23 rf_netbsd.h
*** rf_netbsd.h 29 May 2005 22:03:09 -0000 1.23
--- rf_netbsd.h 24 Sep 2005 23:19:08 -0000
*************** struct RF_Pools_s {
*** 90,95 ****
--- 90,96 ----
extern struct RF_Pools_s rf_pools;
void rf_pool_init(struct pool *, size_t, const char *, size_t, size_t);
+ int rf_buf_queue_check(int);
/* XXX probably belongs in a different .h file. */
typedef struct RF_AutoConfig_s {
Index: rf_netbsdkintf.c
===================================================================
RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v
retrieving revision 1.189
diff -p -r1.189 rf_netbsdkintf.c
*** rf_netbsdkintf.c 24 Sep 2005 22:51:55 -0000 1.189
--- rf_netbsdkintf.c 24 Sep 2005 23:19:09 -0000
*************** raidstrategy(struct buf *bp)
*** 735,741 ****
/* stuff it onto our queue */
BUFQ_PUT(&rs->buf_queue, bp);
! raidstart(raidPtrs[raidID]);
splx(s);
}
--- 735,742 ----
/* stuff it onto our queue */
BUFQ_PUT(&rs->buf_queue, bp);
! /* scheduled the IO to happen at the next convenient time */
! wakeup(&(raidPtrs[raidID]->iodone));
splx(s);
}
*************** rf_pool_init(struct pool *p, size_t size
*** 3330,3332 ****
--- 3331,3353 ----
pool_prime(p, xmin);
pool_setlowat(p, xmin);
}
+
+ /*
+
+ rf_buf_queue_check(int raidid) -- looks into the buf_queue to see if
+ there is IO pending and if that IO could possibly be done for a given
+ RAID set. Returns 0 if IO is waiting and can be done, 1 otherwise.
+
+ */
+
+ int
+ rf_buf_queue_check(int raidid)
+ {
+ if ((BUFQ_PEEK(&(raid_softc[raidid].buf_queue)) != NULL) &&
+ raidPtrs[raidid]->openings > 0) {
+ /* there is work to do */
+ return 0;
+ }
+ /* default is nothing to do */
+ return 1;
+ }
Index: rf_states.c
===================================================================
RCS file: /cvsroot/src/sys/dev/raidframe/rf_states.c,v
retrieving revision 1.38
diff -p -r1.38 rf_states.c
*** rf_states.c 27 Feb 2005 00:27:45 -0000 1.38
--- rf_states.c 24 Sep 2005 23:19:10 -0000
*************** rf_State_LastState(RF_RaidAccessDesc_t *
*** 235,242 ****
((RF_Raid_t *) desc->raidPtr)->openings++;
RF_UNLOCK_MUTEX(((RF_Raid_t *) desc->raidPtr)->mutex);
! /* wake up any pending IO */
! raidstart(((RF_Raid_t *) desc->raidPtr));
/* printf("Calling biodone on 0x%x\n",desc->bp); */
biodone(desc->bp); /* access came through ioctl */
--- 235,241 ----
((RF_Raid_t *) desc->raidPtr)->openings++;
RF_UNLOCK_MUTEX(((RF_Raid_t *) desc->raidPtr)->mutex);
! wakeup(&(desc->raidPtr->iodone));
/* printf("Calling biodone on 0x%x\n",desc->bp); */
biodone(desc->bp); /* access came through ioctl */
From: Greg Oster <oster@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: PR/30233 CVS commit: src/sys/dev/raidframe
Date: Sun, 25 Sep 2005 19:47:17 +0000 (UTC)
Module Name: src
Committed By: oster
Date: Sun Sep 25 19:47:17 UTC 2005
Modified Files:
src/sys/dev/raidframe: rf_engine.c rf_netbsd.h rf_netbsdkintf.c
rf_states.c
Log Message:
Re-work the handling of incoming I/O in RAIDframe:
- introduce rf_buf_queue_check() which checks to see if there
is work to do in the incoming buffer queue
- rf_RaidIOThread() is now responsible for calling raidstart(), and is
also now the only place that calls raidstart()
- raidstrategy() now just queues requests in buf_queue
and signals rf_RaidIOThread() that work has arrived
Hopefully addresses PR#30233
To generate a diff of this commit:
cvs rdiff -r1.35 -r1.36 src/sys/dev/raidframe/rf_engine.c
cvs rdiff -r1.23 -r1.24 src/sys/dev/raidframe/rf_netbsd.h
cvs rdiff -r1.189 -r1.190 src/sys/dev/raidframe/rf_netbsdkintf.c
cvs rdiff -r1.38 -r1.39 src/sys/dev/raidframe/rf_states.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: oster@netbsd.org
State-Changed-When: Sun, 25 Sep 2005 20:14:13 +0000
State-Changed-Why:
Potential fixed checked in. Do these changes fix things for xen?
From: Yoshito Komatsu <ykomatsu@akaumigame.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/30233
Date: Fri, 30 Sep 2005 20:44:45 +0900
I applied this patch to NetBSD 3.0_BETA and guest Xen domain
on a RAIDframe device works fine without kernel panics.
State-Changed-From-To: feedback->closed
State-Changed-By: oster@netbsd.org
State-Changed-When: Sat, 01 Oct 2005 16:40:54 +0000
State-Changed-Why:
Feedback indicates problem is fixed.
From: Matthias Scheler <tron@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: PR/30233 CVS commit: [netbsd-3] src/sys/dev/raidframe
Date: Tue, 4 Oct 2005 14:14:40 +0000 (UTC)
Module Name: src
Committed By: tron
Date: Tue Oct 4 14:14:40 UTC 2005
Modified Files:
src/sys/dev/raidframe [netbsd-3]: rf_engine.c rf_netbsd.h
rf_netbsdkintf.c rf_states.c
Log Message:
Pull up following revision(s) (requested by oster in ticket #853):
sys/dev/raidframe/rf_netbsdkintf.c: revision 1.190
sys/dev/raidframe/rf_netbsd.h: revision 1.24
sys/dev/raidframe/rf_states.c: revision 1.39
sys/dev/raidframe/rf_engine.c: revision 1.36
Re-work the handling of incoming I/O in RAIDframe:
- introduce rf_buf_queue_check() which checks to see if there
is work to do in the incoming buffer queue
- rf_RaidIOThread() is now responsible for calling raidstart(), and is
also now the only place that calls raidstart()
- raidstrategy() now just queues requests in buf_queue
and signals rf_RaidIOThread() that work has arrived
Hopefully addresses PR#30233
To generate a diff of this commit:
cvs rdiff -r1.35 -r1.35.2.1 src/sys/dev/raidframe/rf_engine.c
cvs rdiff -r1.22.2.1 -r1.22.2.2 src/sys/dev/raidframe/rf_netbsd.h
cvs rdiff -r1.186.2.1 -r1.186.2.2 src/sys/dev/raidframe/rf_netbsdkintf.c
cvs rdiff -r1.38 -r1.38.2.1 src/sys/dev/raidframe/rf_states.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.