NetBSD Problem Report #35071
From gendalia@menelos.com Sat Nov 18 10:42:43 2006
Return-Path: <gendalia@menelos.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 8928E63BAFE
for <gnats-bugs@gnats.NetBSD.org>; Sat, 18 Nov 2006 10:42:43 +0000 (UTC)
Message-Id: <E1GlNej-0003o4-Jr@mail.menelos.com>
Date: Sat, 18 Nov 2006 04:42:41 -0600
From: tjd-nb-pr@menelos.com
Reply-To: tjd-nb-pr@menelos.com
To: gnats-bugs@NetBSD.org
Subject: panic: mpt_get_request: corrupted request free list (xfer)
X-Send-Pr-Version: 3.95
>Number: 35071
>Category: kern
>Synopsis: panic: mpt_get_request: corrupted request free list (xfer)
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Nov 18 10:45:00 +0000 2006
>Last-Modified: Sun Dec 03 11:10:01 +0000 2006
>Originator: Tracy Di Marco White
>Release: NetBSD 4.0_BETA
>Organization:
Iowa State University
>Environment:
System: NetBSD blackhole.ait.iastate.edu 4.0_BETA NetBSD 4.0_BETA (GENERIC) #0: Mon Sep 11 09:48:17 CDT 2006 root@blackhole.ait.iastate.edu:/usr/obj/usr/src/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
panic: mpt_get_request: corrupted request free list (xfer)
Stopped in pid 21393.1 (tibsmnt) at netbsd:cpu_Debugger+0x4: leave
db> t
cpu_Debugger(dcb8fa48,10002,0,cf2c2828,c4bc9400) at netbsd:cpu_Debugger+0x4
panic(c08317f8,6,da8b96a4,0,1) at netbsd:panic+0x141
mpt_get_request(c4bc9400,10002,cf2c286c,c042a7f0,dcb8f9d8) at netbsd:mpt_get_request+0x5b
mpt_scsipi_request(c4bc96dc,0,c4298f44,0,cf2c2900) at netbsd:mpt_scsipi_request+0x4d
scsipi_run_queue(da8b96a4,0,c4d65100,c4bc96dc,0) at netbsd:scsipi_run_queue+0x184
scsipi_execute_xs(c4298f44,cf2c2982,6,0,0) at netbsd:scsipi_execute_xs+0x17e
scsipi_test_unit_ready(c4d65100,a0,0,dcb8f9d8,dcb8f9d8) at netbsd:scsipi_test_unit_ready+0x4d
stopen(e11,801,2000,dd9ff4ec,dcb8f9d8) at netbsd:stopen+0xd6
spec_open(cf2c2a78,cf31cb58,4d2,0,4d2) at netbsd:spec_open+0x1df
VOP_OPEN(dcb8f9d8,801,cf31cb58,dd9ff4ec,dd9ff4ec) at netbsd:VOP_OPEN+0x2f
vn_open(cf2c2b68,801,d60,dd9ff4ec,c0886444) at netbsd:vn_open+0x266
sys_open(dd9ff4ec,cf2c2c00,cf2c2c68,bba50000,23) at netbsd:sys_open+0xa0
linux_sys_open(dd9ff4ec,cf2c2c48,cf2c2c68,8563f70,8050000) at netbsd:linux_sys_open+0x70
linux_syscall_plain(cf2c2c88,bba7002b,bba7002b,bfbf002b,bbbf002b) at netbsd:linux_syscall_plain+0xa8
Same machine and version as in PR #34892
http://gendalia.public.iastate.edu/blackhole.dmesg.txt
>How-To-Repeat:
Don't think I've repeated this panic on this version of NetBSD.
>Fix:
>Audit-Trail:
From: Tracy Di Marco White <tjd-nb-pr@menelos.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer)
Date: Sat, 18 Nov 2006 04:51:40 -0600
I meant to include the console message prior to the panic, and show
uvm after.
mpt3: mpt_done: no scsipi_xfer, index = 0xfd, seq = 0x00000000
mpt3: request state: Free
mpt3: mpt_request:
SCSI IO Request @ 0xcf5cd64c
Chain Offset 0x10
MsgFlags 0x00
MsgContext 0x000000fd
Bus: 0
TargetID 1
SenseBufferLength 32
LUN: 0x0
Control 0x01000500 WRITE UNTAGGED
DataLength 0x00010000
SenseBufAddr 0x04798be0
CDB[0:6] 0a 00 01 00 00 00
SE32 0xce558a30: Addr=0xb3917008 FlagsLength=0x14000ff8
HOST_TO_IOC
SE32 0xce558a38: Addr=0x27318000 FlagsLength=0x94001000
HOST_TO_IOC LAST_ELEMENT
CE32 0xce558a40: Addr=0x4798a48 NxtChnO=0x16 Flgs=0x30 Len=0x60
SE32 0xce558a48: Addr=0x5f279000 FlagsLength=0x14001000
HOST_TO_IOC
SE32 0xce558a50: Addr=0x95a3a000 FlagsLength=0x14001000
HOST_TO_IOC
SE32 0xce558a58: Addr=0x6653b000 FlagsLength=0x14001000
HOST_TO_IOC
SE32 0xce558a60: Addr=0x7ef9c000 FlagsLength=0x14001000
HOST_TO_IOC
SE32 0xce558a68: Addr=0x9705d000 FlagsLength=0x14001000
HOST_TO_IOC
SE32 0xce558a70: Addr=0x4263e000 FlagsLength=0x14001000
HOST_TO_IOC
SE32 0xce558a78: Addr=0x3617f000 FlagsLength=0x14001000
HOST_TO_IOC
SE32 0xce558a80: Addr=0x6e7c0000 FlagsLength=0x14001000
HOST_TO_IOC
SE32 0xce558a88: Addr=0x80241000 FlagsLength=0x14001000
HOST_TO_IOC
SE32 0xce558a90: Addr=0x67be2000 FlagsLength=0x14001000
HOST_TO_IOC
SE32 0xce558a98: Addr=0x82ea3000 FlagsLength=0x94001000
HOST_TO_IOC LAST_ELEMENT
CE32 0xce558aa0: Addr=0x4798aa8 NxtChnO=0x0 Flgs=0x30 Len=0x20
SE32 0xce558aa8: Addr=0x28ac4000 FlagsLength=0x14001000
HOST_TO_IOC
SE32 0xce558ab0: Addr=0x42325000 FlagsLength=0x14001000
HOST_TO_IOC
SE32 0xce558ab8: Addr=0xca566000 FlagsLength=0x14001000
HOST_TO_IOC
SE32 0xce558ac0: Addr=0xac0a7000 FlagsLength=0xd5000008
HOST_TO_IOC LAST_ELEMENT END_OF_BUFFER END_OF_LIST
mpt3: mpt_reply:
SCSI IO Request Reply @ 0xce38f380
IOC Status Success
IOCLogInfo 0x00000000
MsgLength 0x08
MsgFlags 0x00
MsgContext 0x000000fd
Bus: 0
TargetID 2
CDBLength 6
SCSI Status: Check Condition
SCSI State: (0x00000001)AutoSense_Valid
TransferCnt 0x0000
SenseCnt 0x001c
ResponseInfo 0x00000000
panic: mpt_get_request: corrupted request free list (xfer)
Stopped in pid 21393.1 (tibsmnt) at netbsd:cpu_Debugger+0x4: leave
db> t
cpu_Debugger(dcb8fa48,10002,0,cf2c2828,c4bc9400) at netbsd:cpu_Debugger+0x4
panic(c08317f8,6,da8b96a4,0,1) at netbsd:panic+0x141
mpt_get_request(c4bc9400,10002,cf2c286c,c042a7f0,dcb8f9d8) at netbsd:mpt_get_request+0x5b
mpt_scsipi_request(c4bc96dc,0,c4298f44,0,cf2c2900) at netbsd:mpt_scsipi_request+0x4d
scsipi_run_queue(da8b96a4,0,c4d65100,c4bc96dc,0) at netbsd:scsipi_run_queue+0x184
scsipi_execute_xs(c4298f44,cf2c2982,6,0,0) at netbsd:scsipi_execute_xs+0x17e
scsipi_test_unit_ready(c4d65100,a0,0,dcb8f9d8,dcb8f9d8) at netbsd:scsipi_test_unit_ready+0x4d
stopen(e11,801,2000,dd9ff4ec,dcb8f9d8) at netbsd:stopen+0xd6
spec_open(cf2c2a78,cf31cb58,4d2,0,4d2) at netbsd:spec_open+0x1df
VOP_OPEN(dcb8f9d8,801,cf31cb58,dd9ff4ec,dd9ff4ec) at netbsd:VOP_OPEN+0x2f
vn_open(cf2c2b68,801,d60,dd9ff4ec,c0886444) at netbsd:vn_open+0x266
sys_open(dd9ff4ec,cf2c2c00,cf2c2c68,bba50000,23) at netbsd:sys_open+0xa0
linux_sys_open(dd9ff4ec,cf2c2c48,cf2c2c68,8563f70,8050000) at netbsd:linux_sys_open+0x70
linux_syscall_plain(cf2c2c88,bba7002b,bba7002b,bfbf002b,bbbf002b) at netbsd:linux_syscall_plain+0xa8
db> show uvm
Current UVM status:
pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
900981 VM pages: 555299 active, 272631 inactive, 1250 wired, 2823 free
min 10% (25) anon, 30% (76) file, 5% (12) exec
max 15% (38) anon, 80% (204) file, 30% (76) exec
pages 117901 anon, 707980 file, 3627 exec
freemin=64, free-target=85, inactive-target=272631, wired-max=300327
faults=-2134302039, traps=1541685497, intrs=186871550, ctxswitch=305568730
softint=115212963, syscalls=1890254534, swapins=25698, swapouts=25721
fault counts:
noram=2470, noanon=0, pgwait=21, pgrele=0
ok relocks(total)=386581(386609), anget(retrys)=879286551(359538), amapcopy=39404854
neighbor anon/obj pg=35382378/392159854, gets(lock/unlock)=114677177/27055
cases: anon=859097219, anoncow=13780464, obj=96111232, prcopy=18565921, przero=461640788
daemon and swap counts:
woke=1563139, revs=1559493, scans=512965763, obscans=484154054, anscans=600471
busy=74778, freed=484754525, reactivate=5476456, deactivate=520377365
pageouts=49971, pending=522177, nswget=359620
nswapdev=1, swpgavail=264554
swpages=264554, swpginuse=155842, swpgonly=129083, paging=0
db>
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org, tjd-nb-pr@menelos.com
Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer)
Date: Sat, 18 Nov 2006 15:39:10 +0100
On Sat, Nov 18, 2006 at 10:55:02AM +0000, Tracy Di Marco White wrote:
> The following reply was made to PR kern/35071; it has been noted by GNATS.
>
> From: Tracy Di Marco White <tjd-nb-pr@menelos.com>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer)
> Date: Sat, 18 Nov 2006 04:51:40 -0600
>
> I meant to include the console message prior to the panic, and show
> uvm after.
> mpt3: mpt_done: no scsipi_xfer, index = 0xfd, seq = 0x00000000
> mpt3: request state: Free
I've seen something similar on a netbsd-3 host. I think the problem started
with:
sd1(mpt0:0:1:0): command timeout
mpt0: timeout on request index = 0xfb, seq = 0x0361bdae
mpt0: Status 0x00000000, Mask 0x00000001, Doorbell 0x24000000
mpt0: request state: On Chip
So maybe it's the timeout handling code which corrupts the list.
But I didn't look at the code at all.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Tracy Di Marco White <tjd-nb-pr@menelos.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer)
Date: Fri, 01 Dec 2006 09:15:43 -0600
I seem to be getting this every day, or every other day. So, more messages.
st2: already open
st0(mpt3:0:1:0): command timeout
mpt3: timeout on request index = 0xfe, seq = 0x0133d791
mpt3: Status 0x00000000, Mask 0x00000001, Doorbell 0x24000000
mpt3: request state: On Chip
mpt3: mpt_done: no scsipi_xfer, index = 0xfe, seq = 0x00000000
mpt3: request state: Free
mpt3: mpt_request:
SCSI IO Request @ 0xc09d2aa8
Chain Offset 0x00
MsgFlags 0x00
MsgContext 0x000000fe
Bus: 0
TargetID 2
SenseBufferLength 32
LUN: 0x0
Control 0x00000500 NODATATRANSFER UNTAGGED
DataLength 0x00000000
SenseBufAddr 0x04798de0
CDB[0:6] 1e 00 00 00 00 00
SE32 0xce558c30: Addr=0x0 FlagsLength=0xd1000000
LAST_ELEMENT END_OF_BUFFER END_OF_LIST
mpt3: mpt_reply:
SCSI IO Request Reply @ 0xce38f480
IOC Status Success
IOCLogInfo 0x00000000
MsgLength 0x08
MsgFlags 0x00
MsgContext 0x000000fe
Bus: 0
TargetID 1
CDBLength 6
SCSI Status: Busy
SCSI State: (0x00000000)
TransferCnt 0x0000
SenseCnt 0x0000
ResponseInfo 0x00000000
panic: mpt_get_request: corrupted request free list (xfer)
Stopped in pid 8333.1 (tibsmnt) at netbsd:cpu_Debugger+0x4: leave
db> t
cpu_Debugger(cf59cd90,10002,0,ce3d9828,c4bc9400) at netbsd:cpu_Debugger+0x4
panic(c08317f8,6,ce42fa80,0,1) at netbsd:panic+0x141
mpt_get_request(c4bc9400,10002,ce3d986c,c042a7f0,cf59cd20) at netbsd:mpt_get_request+0x5b
mpt_scsipi_request(c4bc96dc,0,c429ff0c,0,ce3d9900) at netbsd:mpt_scsipi_request+0x4d
scsipi_run_queue(ce42fa80,0,c4d65100,c4bc96dc,0) at netbsd:scsipi_run_queue+0x184
scsipi_execute_xs(c429ff0c,ce3d9982,6,0,0) at netbsd:scsipi_execute_xs+0x17e
scsipi_test_unit_ready(c4d65100,0,0,cf59cd20,cf59cd20) at netbsd:scsipi_test_unit_ready+0x4d
stopen(e11,801,2000,d4a4eec4,cf59cd20) at netbsd:stopen+0xd6
spec_open(ce3d9a78,2,7432d,202,c087934c) at netbsd:spec_open+0x1df
VOP_OPEN(cf59cd20,801,cf806ce8,d4a4eec4,d4a4eec4) at netbsd:VOP_OPEN+0x2f
vn_open(ce3d9b68,801,42c,d4a4eec4,90e6887e) at netbsd:vn_open+0x266
sys_open(d4a4eec4,ce3d9c00,ce3d9c68,d4a4eec4,106) at netbsd:sys_open+0xa0
linux_sys_open(d4a4eec4,ce3d9c48,ce3d9c68,8563f70,8198000) at netbsd:linux_sys_open+0x70
linux_syscall_plain(ce3d9c88,2b,bfbf002b,bbbf002b,bfbf002b) at netbsd:linux_syscall_plain+0xa8
db>
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org, tjd-nb-pr@menelos.com
Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer)
Date: Sat, 2 Dec 2006 19:55:01 +0100
--7AUc2qLy4jB3hD7Z
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
On Fri, Dec 01, 2006 at 03:20:02PM +0000, Tracy Di Marco White wrote:
> The following reply was made to PR kern/35071; it has been noted by GNATS.
>
> From: Tracy Di Marco White <tjd-nb-pr@menelos.com>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer)
> Date: Fri, 01 Dec 2006 09:15:43 -0600
>
> I seem to be getting this every day, or every other day. So, more messages.
>
> st2: already open
> st0(mpt3:0:1:0): command timeout
> mpt3: timeout on request index = 0xfe, seq = 0x0133d791
> mpt3: Status 0x00000000, Mask 0x00000001, Doorbell 0x24000000
> mpt3: request state: On Chip
> mpt3: mpt_done: no scsipi_xfer, index = 0xfe, seq = 0x00000000
> mpt3: request state: Free
OK, the command resets, and later the chip says it's complete while
we've already freed it. I think we should just issue a bus reset
(or bus_device_reset but it's harder to do) in case of timeout, and
let the controller complete the commands.
Attached is a patch that attemps to implement a bus_reset function for
mpt(4). You can easily test by starting some I/O (e.g dd if=/dev/rsdxd
of=/dev/null bs=1m) and while it's running issue several scsictl scsibusx reset
I expect to see "IOC Bus Reset Port %d" or "External Bus Reset" on console
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
--7AUc2qLy4jB3hD7Z
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=diff
Index: mpt_netbsd.c
===================================================================
RCS file: /cvsroot/src/sys/dev/ic/mpt_netbsd.c,v
retrieving revision 1.10
diff -u -r1.10 mpt_netbsd.c
--- mpt_netbsd.c 11 Dec 2005 12:21:28 -0000 1.10
+++ mpt_netbsd.c 2 Dec 2006 18:50:50 -0000
@@ -78,6 +78,7 @@
__KERNEL_RCSID(0, "$NetBSD: mpt_netbsd.c,v 1.10 2005/12/11 12:21:28 christos Exp $");
#include <dev/ic/mpt.h> /* pulls in all headers */
+#include <sys/scsiio.h>
#include <machine/stdarg.h> /* for mpt_prt() */
@@ -89,10 +90,13 @@
static void mpt_get_xfer_mode(mpt_softc_t *, struct scsipi_periph *);
static void mpt_ctlop(mpt_softc_t *, void *vmsg, uint32_t);
static void mpt_event_notify_reply(mpt_softc_t *, MSG_EVENT_NOTIFY_REPLY *);
+static void mpt_bus_reset(mpt_softc_t *);
static void mpt_scsipi_request(struct scsipi_channel *,
scsipi_adapter_req_t, void *);
static void mpt_minphys(struct buf *);
+static int mpt_ioctl(struct scsipi_channel *, u_long, caddr_t, int,
+ struct proc *);
void
mpt_scsipi_attach(mpt_softc_t *mpt)
@@ -110,10 +114,11 @@
memset(adapt, 0, sizeof(*adapt));
adapt->adapt_dev = &mpt->sc_dev;
adapt->adapt_nchannels = 1;
- adapt->adapt_openings = maxq;
- adapt->adapt_max_periph = maxq;
+ adapt->adapt_openings = maxq - 1; /* keep one for mngt reqs */
+ adapt->adapt_max_periph = maxq - 1;
adapt->adapt_request = mpt_scsipi_request;
adapt->adapt_minphys = mpt_minphys;
+ adapt->adapt_ioctl = mpt_ioctl;
/* Fill in the scsipi_channel. */
memset(chan, 0, sizeof(*chan));
@@ -382,14 +387,15 @@
mpt_prt(mpt, "request state: %s", mpt_req_state(req->debug));
if (mpt->verbose > 1)
mpt_print_scsi_io_request((MSG_SCSI_IO_REQUEST *)req->req_vbuf);
-
+#if 0
/* XXX WHAT IF THE IOC IS STILL USING IT?? */
req->xfer = NULL;
mpt_free_request(mpt, req);
xs->error = XS_TIMEOUT;
scsipi_done(xs);
-
+#endif
+ mpt_bus_reset(mpt);
splx(s);
}
@@ -461,6 +467,8 @@
if (__predict_false(mpt_req->Function == MPI_FUNCTION_SCSI_TASK_MGMT)) {
if (mpt->verbose > 1)
mpt_prt(mpt, "mpt_done: TASK MGMT");
+ KASSERT(req == mpt->mngt_req);
+ mpt->mngt_req = NULL;
goto done;
}
@@ -1280,7 +1288,43 @@
}
}
-/* XXXJRT mpt_bus_reset() */
+static void
+mpt_bus_reset(mpt_softc_t *mpt)
+{
+ request_t *req;
+ MSG_SCSI_TASK_MGMT *mngt_req;
+ int s;
+
+ s = splbio();
+ if (mpt->mngt_req) {
+ /* request already queued; can't do more */
+ splx(s);
+ return;
+ }
+ req = mpt_get_request(mpt);
+ if (__predict_false(req == NULL)) {
+ printf("%s: no mngt request\n", mpt->sc_dev.dv_xname);
+ splx(s);
+ return;
+ }
+ mpt->mngt_req = req;
+ splx(s);
+ mngt_req = req->req_vbuf;
+ memset(mngt_req, 0, sizeof(*mngt_req));
+ mngt_req->Function = MPI_FUNCTION_SCSI_TASK_MGMT;
+ mngt_req->Bus = mpt->bus;
+ mngt_req->TargetID = 0;
+ mngt_req->ChainOffset = 0;
+ mngt_req->TaskType = MPI_SCSITASKMGMT_TASKTYPE_RESET_BUS;
+ mngt_req->Reserved1 = 0;
+ mngt_req->MsgFlags =
+ mpt->is_fc ? MPI_SCSITASKMGMT_MSGFLAGS_LIP_RESET_OPTION : 0;
+ mngt_req->MsgContext = req->index;
+ mngt_req->TaskMsgContext = 0;
+ s = splbio();
+ mpt_send_cmd(mpt, req);
+ splx(s);
+}
/*****************************************************************************
* SCSI interface routines
@@ -1322,3 +1366,19 @@
bp->b_bcount = MPT_MAX_XFER;
minphys(bp);
}
+
+static int
+mpt_ioctl(struct scsipi_channel *chan, u_long cmd, caddr_t arg,
+ int flag, struct proc *p)
+{
+ struct scsipi_adapter *adapt = chan->chan_adapter;
+ mpt_softc_t *mpt = (void *) adapt->adapt_dev;
+
+ switch (cmd) {
+ case SCBUSIORESET:
+ mpt_bus_reset(mpt);
+ return(0);
+ default:
+ return (ENOTTY);
+ }
+}
Index: mpt_netbsd.h
===================================================================
RCS file: /cvsroot/src/sys/dev/ic/mpt_netbsd.h,v
retrieving revision 1.4
diff -u -r1.4 mpt_netbsd.h
--- mpt_netbsd.h 11 Dec 2005 12:21:28 -0000 1.4
+++ mpt_netbsd.h 2 Dec 2006 18:50:50 -0000
@@ -227,6 +227,7 @@
/* SCSIPI and software management */
request_t *request_pool;
SLIST_HEAD(req_queue, req_entry) request_free_list;
+ request_t *mngt_req;
struct scsipi_adapter sc_adapter;
struct scsipi_channel sc_channel;
--7AUc2qLy4jB3hD7Z--
From: Tracy Di Marco White <tjd-nb-pr@menelos.com>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org,
gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer)
Date: Sun, 03 Dec 2006 04:34:18 -0600
In message <20061202185501.GA16429@antioche.eu.org>, Manuel Bouyer writes:
>OK, the command resets, and later the chip says it's complete while
>we've already freed it. I think we should just issue a bus reset
>(or bus_device_reset but it's harder to do) in case of timeout, and
>let the controller complete the commands.
>
>Attached is a patch that attemps to implement a bus_reset function for
>mpt(4). You can easily test by starting some I/O (e.g dd if=/dev/rsdxd
>of=/dev/null bs=1m) and while it's running issue several scsictl scsibusx reset
>
>I expect to see "IOC Bus Reset Port %d" or "External Bus Reset" on console
I occasionally get this:
probe(mpt2:0:0:0): command timeout
mpt2: timeout on request index = 0xfe, seq = 0x00000068
mpt2: Status 0x80000000, Mask 0x00000001, Doorbell 0x24000000
mpt2: request state: On Chip
over and over at boot, on different controllers.
Now, instead, it seems to hang here instead of repeating.
When I get this I need to reboot anyway until I don't get it,
as usually whatever is on the scsi chain complaining will not
be found.
-Tracy
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Tracy Di Marco White <tjd-nb-pr@menelos.com>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org,
gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer)
Date: Sun, 3 Dec 2006 12:07:34 +0100
On Sun, Dec 03, 2006 at 04:34:18AM -0600, Tracy Di Marco White wrote:
>
> In message <20061202185501.GA16429@antioche.eu.org>, Manuel Bouyer writes:
> >OK, the command resets, and later the chip says it's complete while
> >we've already freed it. I think we should just issue a bus reset
> >(or bus_device_reset but it's harder to do) in case of timeout, and
> >let the controller complete the commands.
> >
> >Attached is a patch that attemps to implement a bus_reset function for
> >mpt(4). You can easily test by starting some I/O (e.g dd if=/dev/rsdxd
> >of=/dev/null bs=1m) and while it's running issue several scsictl scsibusx reset
> >
> >I expect to see "IOC Bus Reset Port %d" or "External Bus Reset" on console
>
> I occasionally get this:
> probe(mpt2:0:0:0): command timeout
> mpt2: timeout on request index = 0xfe, seq = 0x00000068
> mpt2: Status 0x80000000, Mask 0x00000001, Doorbell 0x24000000
> mpt2: request state: On Chip
>
> over and over at boot, on different controllers.
> Now, instead, it seems to hang here instead of repeating.
> When I get this I need to reboot anyway until I don't get it,
> as usually whatever is on the scsi chain complaining will not
> be found.
So when we issue a bus reset the IOC doens't abort pending commands that
it has in its queue. It's hard to understand how such rarely-used feature
works by reverse-engineering other drivers; I'm not even sure it works
properly in other drivers ...
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.