NetBSD Problem Report #30233

From bouyer@antioche.eu.org  Sat May 14 22:02:07 2005
Return-Path: <bouyer@antioche.eu.org>
Received: from chassiron.antioche.eu.org (bouyer.net1.nerim.net [62.212.96.44])
	by narn.netbsd.org (Postfix) with ESMTP id CACA063B116
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 14 May 2005 22:02:05 +0000 (UTC)
Message-Id: <200505142202.j4EM20G3004412@rochebonne.antioche.eu.org>
Date: Sun, 15 May 2005 00:02:00 +0200 (CEST)
From: Manuel Bouyer <bouyer@antioche.eu.org>
Reply-To: bouyer@antioche.eu.org
To: gnats-bugs@netbsd.org
Subject: raidstrategy() isn't interrupt-safe
X-Send-Pr-Version: 3.95

>Number:         30233
>Category:       kern
>Synopsis:       raidstrategy() isn't interrupt-safe
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    oster
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat May 14 22:03:00 +0000 2005
>Closed-Date:    Sat Oct 01 16:40:54 +0000 2005
>Last-Modified:  Tue Oct 04 14:15:01 +0000 2005
>Originator:     Manuel Bouyer
>Release:        NetBSD 3.99.3
>Organization:
>Environment:
System: NetBSD rochebonne.antioche.eu.org 3.99.3 NetBSD 3.99.3 (ROCHEBONNE) #0: Sat May 14 15:33:21 CEST 2005 bouyer@pop.lip6.fr:/local/pop1/bouyer/tmp/i386/obj/local/pop1/bouyer/current/src/sys/arch/i386/compile/ROCHEBONNE i386
Architecture: i386
Machine: i386
>Description:
	Quoting a post from Jason Thorpe on tech-kern:
	> There are lots of other things that might cause a disk's strategy
	> routine to be called from interrupt context (ccd / raidframe are good
	> examples).  Really, we need to audit ALL of the disk strategy
	> routines and ensure that they are IPL_BIO interrupt-context safe.

	raidstrategy() isn't safe to call from interrupt context: it ends up
	calling pool_get(PR_WAITOK):
	raidstrategy()
	raidstart()
	rf_DoAccess()
	rf_AllocRaidAccDesc()
	pool_get()

	There may be other problems too other problems when called from
	interrupt context too (are the locks interrupt-safe ?).

>How-To-Repeat:
	An easy way to trigger a pool_get() panic is to export a partition
	from a raidframe device to a guest Xen domain, as reported by
	Yoshito Komatsu on port-xen.
>Fix:
	None provided.

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->oster
Responsible-Changed-By: oster@netbsd.org
Responsible-Changed-When: Thu, 26 May 2005 01:30:11 +0000
Responsible-Changed-Why:
I deal with RAIDframe suckage.


From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/30233: raidstrategy() isn't interrupt-safe 
Date: Sat, 24 Sep 2005 17:23:17 -0600

 Manuel Bouyer writes:
 [snip]
 > >How-To-Repeat:
 > 	An easy way to trigger a pool_get() panic is to export a partition
 > 	from a raidframe device to a guest Xen domain, as reported by
 > 	Yoshito Komatsu on port-xen.
 > >Fix:

 The following will hopefully fix this problem.  I'm currently testing 
 the impact of these changes on RAIDframe in general, but would be 
 interested to hear reports on whether this actually fixes the problem
 mentioned in this PR.

 Later...

 Greg Oster

 Index: rf_engine.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/raidframe/rf_engine.c,v
 retrieving revision 1.35
 diff -p -r1.35 rf_engine.c
 *** rf_engine.c	27 Feb 2005 00:27:44 -0000	1.35
 --- rf_engine.c	24 Sep 2005 23:19:07 -0000
 *************** __KERNEL_RCSID(0, "$NetBSD: rf_engine.c,
 *** 67,72 ****
 --- 67,73 ----
   #include "rf_dagutils.h"
   #include "rf_shutdown.h"
   #include "rf_raid.h"
 + #include "rf_kintf.h"

   static void rf_ShutdownEngine(void *);
   static void DAGExecutionThread(RF_ThreadArg_t arg);
 *************** DAGExecutionThread(RF_ThreadArg_t arg)
 *** 825,835 ****
   }

   /*
 !  * rf_RaidIOThread() -- When I/O to a component completes,
 !  * KernelWakeupFunc() puts the completed request onto raidPtr->iodone
 !  * TAILQ.  This function looks after requests on that queue by calling
 !  * rf_DiskIOComplete() for the request, and by calling any required
 !  * CompleteFunc for the request.
    */

   static void
 --- 826,838 ----
   }

   /*
 !  * rf_RaidIOThread() -- When I/O to a component begins, raidstrategy()
 !  * puts the I/O on a buf_queue, and then signals raidPtr->iodone.  If
 !  * necessary, this function calls raidstart() to initiate the I/O.
 !  * When I/O to a component completes, KernelWakeupFunc() puts the
 !  * completed request onto raidPtr->iodone TAILQ.  This function looks
 !  * after requests on that queue by calling rf_DiskIOComplete() for the
 !  * request, and by calling any required CompleteFunc for the request.  
    */

   static void
 *************** rf_RaidIOThread(RF_ThreadArg_t arg)
 *** 846,852 ****

   	while (!raidPtr->shutdown_raidio) {
   		/* if there is nothing to do, then snooze. */
 ! 		if (TAILQ_EMPTY(&(raidPtr->iodone))) {
   			ltsleep(&(raidPtr->iodone), PRIBIO, "raidiow", 0,
   				&(raidPtr->iodone_lock));
   		}
 --- 849,856 ----

   	while (!raidPtr->shutdown_raidio) {
   		/* if there is nothing to do, then snooze. */
 ! 		if (TAILQ_EMPTY(&(raidPtr->iodone)) &&
 ! 		    rf_buf_queue_check(raidPtr->raidid)) {
   			ltsleep(&(raidPtr->iodone), PRIBIO, "raidiow", 0,
   				&(raidPtr->iodone_lock));
   		}
 *************** rf_RaidIOThread(RF_ThreadArg_t arg)
 *** 859,864 ****
 --- 863,874 ----
   			(req->CompleteFunc) (req->argument, req->error);
   			simple_lock(&(raidPtr->iodone_lock));
   		}
 + 
 + 		/* process any pending outgoing IO */
 + 		simple_unlock(&(raidPtr->iodone_lock));
 + 		raidstart(raidPtr);
 + 		simple_lock(&(raidPtr->iodone_lock));
 + 
   	}

   	/* Let rf_ShutdownEngine know that we're done... */
 Index: rf_netbsd.h
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsd.h,v
 retrieving revision 1.23
 diff -p -r1.23 rf_netbsd.h
 *** rf_netbsd.h	29 May 2005 22:03:09 -0000	1.23
 --- rf_netbsd.h	24 Sep 2005 23:19:08 -0000
 *************** struct RF_Pools_s {
 *** 90,95 ****
 --- 90,96 ----

   extern struct RF_Pools_s rf_pools;
   void rf_pool_init(struct pool *, size_t, const char *, size_t, size_t);
 + int rf_buf_queue_check(int);

   /* XXX probably belongs in a different .h file. */
   typedef struct RF_AutoConfig_s {
 Index: rf_netbsdkintf.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v
 retrieving revision 1.189
 diff -p -r1.189 rf_netbsdkintf.c
 *** rf_netbsdkintf.c	24 Sep 2005 22:51:55 -0000	1.189
 --- rf_netbsdkintf.c	24 Sep 2005 23:19:09 -0000
 *************** raidstrategy(struct buf *bp)
 *** 735,741 ****
   	/* stuff it onto our queue */
   	BUFQ_PUT(&rs->buf_queue, bp);

 ! 	raidstart(raidPtrs[raidID]);

   	splx(s);
   }
 --- 735,742 ----
   	/* stuff it onto our queue */
   	BUFQ_PUT(&rs->buf_queue, bp);

 ! 	/* scheduled the IO to happen at the next convenient time */
 ! 	wakeup(&(raidPtrs[raidID]->iodone));

   	splx(s);
   }
 *************** rf_pool_init(struct pool *p, size_t size
 *** 3330,3332 ****
 --- 3331,3353 ----
   	pool_prime(p, xmin);
   	pool_setlowat(p, xmin);
   }
 + 
 + /*
 + 
 + rf_buf_queue_check(int raidid) -- looks into the buf_queue to see if
 + there is IO pending and if that IO could possibly be done for a given
 + RAID set.  Returns 0 if IO is waiting and can be done, 1 otherwise. 
 + 
 +  */
 + 
 + int
 + rf_buf_queue_check(int raidid)
 + {
 + 	if ((BUFQ_PEEK(&(raid_softc[raidid].buf_queue)) != NULL) &&
 + 	    raidPtrs[raidid]->openings > 0) {
 + 		/* there is work to do */
 + 		return 0;
 + 	} 
 + 	/* default is nothing to do */
 + 	return 1;
 + }
 Index: rf_states.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/raidframe/rf_states.c,v
 retrieving revision 1.38
 diff -p -r1.38 rf_states.c
 *** rf_states.c	27 Feb 2005 00:27:45 -0000	1.38
 --- rf_states.c	24 Sep 2005 23:19:10 -0000
 *************** rf_State_LastState(RF_RaidAccessDesc_t *
 *** 235,242 ****
   	((RF_Raid_t *) desc->raidPtr)->openings++;
   	RF_UNLOCK_MUTEX(((RF_Raid_t *) desc->raidPtr)->mutex);

 ! 	/* wake up any pending IO */
 ! 	raidstart(((RF_Raid_t *) desc->raidPtr));

   	/* printf("Calling biodone on 0x%x\n",desc->bp); */
   	biodone(desc->bp);	/* access came through ioctl */
 --- 235,241 ----
   	((RF_Raid_t *) desc->raidPtr)->openings++;
   	RF_UNLOCK_MUTEX(((RF_Raid_t *) desc->raidPtr)->mutex);

 ! 	wakeup(&(desc->raidPtr->iodone));

   	/* printf("Calling biodone on 0x%x\n",desc->bp); */
   	biodone(desc->bp);	/* access came through ioctl */




From: Greg Oster <oster@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: PR/30233 CVS commit: src/sys/dev/raidframe
Date: Sun, 25 Sep 2005 19:47:17 +0000 (UTC)

 Module Name:	src
 Committed By:	oster
 Date:		Sun Sep 25 19:47:17 UTC 2005

 Modified Files:
 	src/sys/dev/raidframe: rf_engine.c rf_netbsd.h rf_netbsdkintf.c
 	    rf_states.c

 Log Message:
 Re-work the handling of incoming I/O in RAIDframe:
 - introduce rf_buf_queue_check() which checks to see if there
 is work to do in the incoming buffer queue
 - rf_RaidIOThread() is now responsible for calling raidstart(), and is
 also now the only place that calls raidstart()
 - raidstrategy() now just queues requests in buf_queue
 and signals rf_RaidIOThread() that work has arrived

 Hopefully addresses PR#30233


 To generate a diff of this commit:
 cvs rdiff -r1.35 -r1.36 src/sys/dev/raidframe/rf_engine.c
 cvs rdiff -r1.23 -r1.24 src/sys/dev/raidframe/rf_netbsd.h
 cvs rdiff -r1.189 -r1.190 src/sys/dev/raidframe/rf_netbsdkintf.c
 cvs rdiff -r1.38 -r1.39 src/sys/dev/raidframe/rf_states.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: oster@netbsd.org
State-Changed-When: Sun, 25 Sep 2005 20:14:13 +0000
State-Changed-Why:
Potential fixed checked in.  Do these changes fix things for xen?


From: Yoshito Komatsu <ykomatsu@akaumigame.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/30233
Date: Fri, 30 Sep 2005 20:44:45 +0900

 I applied this patch to NetBSD 3.0_BETA and guest Xen domain
 on a RAIDframe device works fine without kernel panics.

State-Changed-From-To: feedback->closed
State-Changed-By: oster@netbsd.org
State-Changed-When: Sat, 01 Oct 2005 16:40:54 +0000
State-Changed-Why:
Feedback indicates problem is fixed.


From: Matthias Scheler <tron@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: PR/30233 CVS commit: [netbsd-3] src/sys/dev/raidframe
Date: Tue,  4 Oct 2005 14:14:40 +0000 (UTC)

 Module Name:	src
 Committed By:	tron
 Date:		Tue Oct  4 14:14:40 UTC 2005

 Modified Files:
 	src/sys/dev/raidframe [netbsd-3]: rf_engine.c rf_netbsd.h
 	    rf_netbsdkintf.c rf_states.c

 Log Message:
 Pull up following revision(s) (requested by oster in ticket #853):
 	sys/dev/raidframe/rf_netbsdkintf.c: revision 1.190
 	sys/dev/raidframe/rf_netbsd.h: revision 1.24
 	sys/dev/raidframe/rf_states.c: revision 1.39
 	sys/dev/raidframe/rf_engine.c: revision 1.36
 Re-work the handling of incoming I/O in RAIDframe:
 - introduce rf_buf_queue_check() which checks to see if there
 is work to do in the incoming buffer queue
 - rf_RaidIOThread() is now responsible for calling raidstart(), and is
 also now the only place that calls raidstart()
 - raidstrategy() now just queues requests in buf_queue
 and signals rf_RaidIOThread() that work has arrived
 Hopefully addresses PR#30233


 To generate a diff of this commit:
 cvs rdiff -r1.35 -r1.35.2.1 src/sys/dev/raidframe/rf_engine.c
 cvs rdiff -r1.22.2.1 -r1.22.2.2 src/sys/dev/raidframe/rf_netbsd.h
 cvs rdiff -r1.186.2.1 -r1.186.2.2 src/sys/dev/raidframe/rf_netbsdkintf.c
 cvs rdiff -r1.38 -r1.38.2.1 src/sys/dev/raidframe/rf_states.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.