NetBSD Problem Report #35071

From gendalia@menelos.com  Sat Nov 18 10:42:43 2006
Return-Path: <gendalia@menelos.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 8928E63BAFE
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 18 Nov 2006 10:42:43 +0000 (UTC)
Message-Id: <E1GlNej-0003o4-Jr@mail.menelos.com>
Date: Sat, 18 Nov 2006 04:42:41 -0600
From: tjd-nb-pr@menelos.com
Reply-To: tjd-nb-pr@menelos.com
To: gnats-bugs@NetBSD.org
Subject: panic: mpt_get_request: corrupted request free list (xfer)
X-Send-Pr-Version: 3.95

>Number:         35071
>Category:       kern
>Synopsis:       panic: mpt_get_request: corrupted request free list (xfer)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Nov 18 10:45:00 +0000 2006
>Last-Modified:  Sun Dec 03 11:10:01 +0000 2006
>Originator:     Tracy Di Marco White
>Release:        NetBSD 4.0_BETA
>Organization:
Iowa State University
>Environment:
System: NetBSD blackhole.ait.iastate.edu 4.0_BETA NetBSD 4.0_BETA (GENERIC) #0: Mon Sep 11 09:48:17 CDT 2006 root@blackhole.ait.iastate.edu:/usr/obj/usr/src/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
panic: mpt_get_request: corrupted request free list (xfer)

Stopped in pid 21393.1 (tibsmnt) at     netbsd:cpu_Debugger+0x4:        leave
db>  t
cpu_Debugger(dcb8fa48,10002,0,cf2c2828,c4bc9400) at netbsd:cpu_Debugger+0x4
panic(c08317f8,6,da8b96a4,0,1) at netbsd:panic+0x141
mpt_get_request(c4bc9400,10002,cf2c286c,c042a7f0,dcb8f9d8) at netbsd:mpt_get_request+0x5b
mpt_scsipi_request(c4bc96dc,0,c4298f44,0,cf2c2900) at netbsd:mpt_scsipi_request+0x4d
scsipi_run_queue(da8b96a4,0,c4d65100,c4bc96dc,0) at netbsd:scsipi_run_queue+0x184
scsipi_execute_xs(c4298f44,cf2c2982,6,0,0) at netbsd:scsipi_execute_xs+0x17e
scsipi_test_unit_ready(c4d65100,a0,0,dcb8f9d8,dcb8f9d8) at netbsd:scsipi_test_unit_ready+0x4d
stopen(e11,801,2000,dd9ff4ec,dcb8f9d8) at netbsd:stopen+0xd6
spec_open(cf2c2a78,cf31cb58,4d2,0,4d2) at netbsd:spec_open+0x1df
VOP_OPEN(dcb8f9d8,801,cf31cb58,dd9ff4ec,dd9ff4ec) at netbsd:VOP_OPEN+0x2f
vn_open(cf2c2b68,801,d60,dd9ff4ec,c0886444) at netbsd:vn_open+0x266
sys_open(dd9ff4ec,cf2c2c00,cf2c2c68,bba50000,23) at netbsd:sys_open+0xa0
linux_sys_open(dd9ff4ec,cf2c2c48,cf2c2c68,8563f70,8050000) at netbsd:linux_sys_open+0x70
linux_syscall_plain(cf2c2c88,bba7002b,bba7002b,bfbf002b,bbbf002b) at netbsd:linux_syscall_plain+0xa8

Same machine and version as in PR #34892
http://gendalia.public.iastate.edu/blackhole.dmesg.txt

>How-To-Repeat:
Don't think I've repeated this panic on this version of NetBSD.
>Fix:

>Audit-Trail:
From: Tracy Di Marco White <tjd-nb-pr@menelos.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer) 
Date: Sat, 18 Nov 2006 04:51:40 -0600

 I meant to include the console message prior to the panic, and show
 uvm after.
 mpt3: mpt_done: no scsipi_xfer, index = 0xfd, seq = 0x00000000
 mpt3: request state: Free
 mpt3: mpt_request:
 SCSI IO Request @ 0xcf5cd64c
         Chain Offset  0x10
         MsgFlags      0x00
         MsgContext    0x000000fd
         Bus:                0
         TargetID            1
         SenseBufferLength   32
         LUN:              0x0
         Control           0x01000500  WRITE  UNTAGGED 
         DataLength      0x00010000
         SenseBufAddr    0x04798be0
         CDB[0:6]        0a 00 01 00 00 00 
         SE32 0xce558a30: Addr=0xb3917008 FlagsLength=0x14000ff8
          HOST_TO_IOC
         SE32 0xce558a38: Addr=0x27318000 FlagsLength=0x94001000
          HOST_TO_IOC LAST_ELEMENT
         CE32 0xce558a40: Addr=0x4798a48 NxtChnO=0x16 Flgs=0x30 Len=0x60
         SE32 0xce558a48: Addr=0x5f279000 FlagsLength=0x14001000
          HOST_TO_IOC
         SE32 0xce558a50: Addr=0x95a3a000 FlagsLength=0x14001000
          HOST_TO_IOC
         SE32 0xce558a58: Addr=0x6653b000 FlagsLength=0x14001000
          HOST_TO_IOC
         SE32 0xce558a60: Addr=0x7ef9c000 FlagsLength=0x14001000
          HOST_TO_IOC
         SE32 0xce558a68: Addr=0x9705d000 FlagsLength=0x14001000
          HOST_TO_IOC
         SE32 0xce558a70: Addr=0x4263e000 FlagsLength=0x14001000
          HOST_TO_IOC
         SE32 0xce558a78: Addr=0x3617f000 FlagsLength=0x14001000
          HOST_TO_IOC
         SE32 0xce558a80: Addr=0x6e7c0000 FlagsLength=0x14001000
          HOST_TO_IOC
         SE32 0xce558a88: Addr=0x80241000 FlagsLength=0x14001000
          HOST_TO_IOC
         SE32 0xce558a90: Addr=0x67be2000 FlagsLength=0x14001000
          HOST_TO_IOC
         SE32 0xce558a98: Addr=0x82ea3000 FlagsLength=0x94001000
          HOST_TO_IOC LAST_ELEMENT
         CE32 0xce558aa0: Addr=0x4798aa8 NxtChnO=0x0 Flgs=0x30 Len=0x20
         SE32 0xce558aa8: Addr=0x28ac4000 FlagsLength=0x14001000
          HOST_TO_IOC
         SE32 0xce558ab0: Addr=0x42325000 FlagsLength=0x14001000
          HOST_TO_IOC
         SE32 0xce558ab8: Addr=0xca566000 FlagsLength=0x14001000
          HOST_TO_IOC
         SE32 0xce558ac0: Addr=0xac0a7000 FlagsLength=0xd5000008
          HOST_TO_IOC LAST_ELEMENT END_OF_BUFFER END_OF_LIST
 mpt3: mpt_reply:
 SCSI IO Request Reply @ 0xce38f380
         IOC Status    Success
         IOCLogInfo    0x00000000
         MsgLength     0x08
         MsgFlags      0x00
         MsgContext    0x000000fd
         Bus:          0
         TargetID      2
         CDBLength     6
         SCSI Status:  Check Condition
         SCSI State:   (0x00000001)AutoSense_Valid 
         TransferCnt   0x0000
         SenseCnt      0x001c
         ResponseInfo  0x00000000
 panic: mpt_get_request: corrupted request free list (xfer)

 Stopped in pid 21393.1 (tibsmnt) at     netbsd:cpu_Debugger+0x4:        leave
 db>  t
 cpu_Debugger(dcb8fa48,10002,0,cf2c2828,c4bc9400) at netbsd:cpu_Debugger+0x4
 panic(c08317f8,6,da8b96a4,0,1) at netbsd:panic+0x141
 mpt_get_request(c4bc9400,10002,cf2c286c,c042a7f0,dcb8f9d8) at netbsd:mpt_get_request+0x5b
 mpt_scsipi_request(c4bc96dc,0,c4298f44,0,cf2c2900) at netbsd:mpt_scsipi_request+0x4d
 scsipi_run_queue(da8b96a4,0,c4d65100,c4bc96dc,0) at netbsd:scsipi_run_queue+0x184
 scsipi_execute_xs(c4298f44,cf2c2982,6,0,0) at netbsd:scsipi_execute_xs+0x17e
 scsipi_test_unit_ready(c4d65100,a0,0,dcb8f9d8,dcb8f9d8) at netbsd:scsipi_test_unit_ready+0x4d
 stopen(e11,801,2000,dd9ff4ec,dcb8f9d8) at netbsd:stopen+0xd6
 spec_open(cf2c2a78,cf31cb58,4d2,0,4d2) at netbsd:spec_open+0x1df
 VOP_OPEN(dcb8f9d8,801,cf31cb58,dd9ff4ec,dd9ff4ec) at netbsd:VOP_OPEN+0x2f
 vn_open(cf2c2b68,801,d60,dd9ff4ec,c0886444) at netbsd:vn_open+0x266
 sys_open(dd9ff4ec,cf2c2c00,cf2c2c68,bba50000,23) at netbsd:sys_open+0xa0
 linux_sys_open(dd9ff4ec,cf2c2c48,cf2c2c68,8563f70,8050000) at netbsd:linux_sys_open+0x70
 linux_syscall_plain(cf2c2c88,bba7002b,bba7002b,bfbf002b,bbbf002b) at netbsd:linux_syscall_plain+0xa8
 db> show uvm
 Current UVM status:
   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
   900981 VM pages: 555299 active, 272631 inactive, 1250 wired, 2823 free
   min  10% (25) anon, 30% (76) file, 5% (12) exec
   max  15% (38) anon, 80% (204) file, 30% (76) exec
   pages  117901 anon, 707980 file, 3627 exec
   freemin=64, free-target=85, inactive-target=272631, wired-max=300327
   faults=-2134302039, traps=1541685497, intrs=186871550, ctxswitch=305568730
   softint=115212963, syscalls=1890254534, swapins=25698, swapouts=25721
   fault counts:
     noram=2470, noanon=0, pgwait=21, pgrele=0
     ok relocks(total)=386581(386609), anget(retrys)=879286551(359538), amapcopy=39404854
     neighbor anon/obj pg=35382378/392159854, gets(lock/unlock)=114677177/27055
     cases: anon=859097219, anoncow=13780464, obj=96111232, prcopy=18565921, przero=461640788
   daemon and swap counts:
     woke=1563139, revs=1559493, scans=512965763, obscans=484154054, anscans=600471
     busy=74778, freed=484754525, reactivate=5476456, deactivate=520377365
     pageouts=49971, pending=522177, nswget=359620
     nswapdev=1, swpgavail=264554
     swpages=264554, swpginuse=155842, swpgonly=129083, paging=0
 db> 

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org, tjd-nb-pr@menelos.com
Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer)
Date: Sat, 18 Nov 2006 15:39:10 +0100

 On Sat, Nov 18, 2006 at 10:55:02AM +0000, Tracy Di Marco White wrote:
 > The following reply was made to PR kern/35071; it has been noted by GNATS.
 > 
 > From: Tracy Di Marco White <tjd-nb-pr@menelos.com>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer) 
 > Date: Sat, 18 Nov 2006 04:51:40 -0600
 > 
 >  I meant to include the console message prior to the panic, and show
 >  uvm after.
 >  mpt3: mpt_done: no scsipi_xfer, index = 0xfd, seq = 0x00000000
 >  mpt3: request state: Free

 I've seen something similar on a netbsd-3 host. I think the problem started
 with:
 sd1(mpt0:0:1:0): command timeout
 mpt0: timeout on request index = 0xfb, seq = 0x0361bdae
 mpt0: Status 0x00000000, Mask 0x00000001, Doorbell 0x24000000
 mpt0: request state: On Chip

 So maybe it's the timeout handling code which corrupts the list.
 But I didn't look at the code at all.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Tracy Di Marco White <tjd-nb-pr@menelos.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer) 
Date: Fri, 01 Dec 2006 09:15:43 -0600

 I seem to be getting this every day, or every other day.  So, more messages.

 st2: already open
 st0(mpt3:0:1:0): command timeout
 mpt3: timeout on request index = 0xfe, seq = 0x0133d791
 mpt3: Status 0x00000000, Mask 0x00000001, Doorbell 0x24000000
 mpt3: request state: On Chip
 mpt3: mpt_done: no scsipi_xfer, index = 0xfe, seq = 0x00000000
 mpt3: request state: Free
 mpt3: mpt_request:
 SCSI IO Request @ 0xc09d2aa8
         Chain Offset  0x00
         MsgFlags      0x00
         MsgContext    0x000000fe
         Bus:                0
         TargetID            2
         SenseBufferLength   32
         LUN:              0x0
         Control           0x00000500  NODATATRANSFER  UNTAGGED 
         DataLength      0x00000000
         SenseBufAddr    0x04798de0
         CDB[0:6]        1e 00 00 00 00 00 
         SE32 0xce558c30: Addr=0x0 FlagsLength=0xd1000000
          LAST_ELEMENT END_OF_BUFFER END_OF_LIST
 mpt3: mpt_reply:
 SCSI IO Request Reply @ 0xce38f480
         IOC Status    Success
         IOCLogInfo    0x00000000
         MsgLength     0x08
         MsgFlags      0x00
         MsgContext    0x000000fe
         Bus:          0
         TargetID      1
         CDBLength     6
         SCSI Status:  Busy
         SCSI State:   (0x00000000)
         TransferCnt   0x0000
         SenseCnt      0x0000
         ResponseInfo  0x00000000



 panic: mpt_get_request: corrupted request free list (xfer)

 Stopped in pid 8333.1 (tibsmnt) at      netbsd:cpu_Debugger+0x4:        leave
 db> t
 cpu_Debugger(cf59cd90,10002,0,ce3d9828,c4bc9400) at netbsd:cpu_Debugger+0x4
 panic(c08317f8,6,ce42fa80,0,1) at netbsd:panic+0x141
 mpt_get_request(c4bc9400,10002,ce3d986c,c042a7f0,cf59cd20) at netbsd:mpt_get_request+0x5b
 mpt_scsipi_request(c4bc96dc,0,c429ff0c,0,ce3d9900) at netbsd:mpt_scsipi_request+0x4d
 scsipi_run_queue(ce42fa80,0,c4d65100,c4bc96dc,0) at netbsd:scsipi_run_queue+0x184
 scsipi_execute_xs(c429ff0c,ce3d9982,6,0,0) at netbsd:scsipi_execute_xs+0x17e
 scsipi_test_unit_ready(c4d65100,0,0,cf59cd20,cf59cd20) at netbsd:scsipi_test_unit_ready+0x4d
 stopen(e11,801,2000,d4a4eec4,cf59cd20) at netbsd:stopen+0xd6
 spec_open(ce3d9a78,2,7432d,202,c087934c) at netbsd:spec_open+0x1df
 VOP_OPEN(cf59cd20,801,cf806ce8,d4a4eec4,d4a4eec4) at netbsd:VOP_OPEN+0x2f
 vn_open(ce3d9b68,801,42c,d4a4eec4,90e6887e) at netbsd:vn_open+0x266
 sys_open(d4a4eec4,ce3d9c00,ce3d9c68,d4a4eec4,106) at netbsd:sys_open+0xa0
 linux_sys_open(d4a4eec4,ce3d9c48,ce3d9c68,8563f70,8198000) at netbsd:linux_sys_open+0x70
 linux_syscall_plain(ce3d9c88,2b,bfbf002b,bbbf002b,bfbf002b) at netbsd:linux_syscall_plain+0xa8
 db> 

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org, tjd-nb-pr@menelos.com
Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer)
Date: Sat, 2 Dec 2006 19:55:01 +0100

 --7AUc2qLy4jB3hD7Z
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Fri, Dec 01, 2006 at 03:20:02PM +0000, Tracy Di Marco White wrote:
 > The following reply was made to PR kern/35071; it has been noted by GNATS.
 > 
 > From: Tracy Di Marco White <tjd-nb-pr@menelos.com>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer) 
 > Date: Fri, 01 Dec 2006 09:15:43 -0600
 > 
 >  I seem to be getting this every day, or every other day.  So, more messages.
 >  
 >  st2: already open
 >  st0(mpt3:0:1:0): command timeout
 >  mpt3: timeout on request index = 0xfe, seq = 0x0133d791
 >  mpt3: Status 0x00000000, Mask 0x00000001, Doorbell 0x24000000
 >  mpt3: request state: On Chip
 >  mpt3: mpt_done: no scsipi_xfer, index = 0xfe, seq = 0x00000000
 >  mpt3: request state: Free

 OK, the command resets, and later the chip says it's complete while
 we've already freed it. I think we should just issue a bus reset
 (or bus_device_reset but it's harder to do) in case of timeout, and
 let the controller complete the commands.

 Attached is a patch that attemps to implement a bus_reset function for
 mpt(4). You can easily test by starting some I/O (e.g dd if=/dev/rsdxd
 of=/dev/null bs=1m) and while it's running issue several scsictl scsibusx reset

 I expect to see "IOC Bus Reset Port %d" or "External Bus Reset" on console

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

 --7AUc2qLy4jB3hD7Z
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename=diff

 Index: mpt_netbsd.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/ic/mpt_netbsd.c,v
 retrieving revision 1.10
 diff -u -r1.10 mpt_netbsd.c
 --- mpt_netbsd.c	11 Dec 2005 12:21:28 -0000	1.10
 +++ mpt_netbsd.c	2 Dec 2006 18:50:50 -0000
 @@ -78,6 +78,7 @@
  __KERNEL_RCSID(0, "$NetBSD: mpt_netbsd.c,v 1.10 2005/12/11 12:21:28 christos Exp $");

  #include <dev/ic/mpt.h>			/* pulls in all headers */
 +#include <sys/scsiio.h>

  #include <machine/stdarg.h>		/* for mpt_prt() */

 @@ -89,10 +90,13 @@
  static void	mpt_get_xfer_mode(mpt_softc_t *, struct scsipi_periph *);
  static void	mpt_ctlop(mpt_softc_t *, void *vmsg, uint32_t);
  static void	mpt_event_notify_reply(mpt_softc_t *, MSG_EVENT_NOTIFY_REPLY *);
 +static void	mpt_bus_reset(mpt_softc_t *);

  static void	mpt_scsipi_request(struct scsipi_channel *,
  		    scsipi_adapter_req_t, void *);
  static void	mpt_minphys(struct buf *);
 +static int 	mpt_ioctl(struct scsipi_channel *, u_long, caddr_t, int,
 +			struct proc *);

  void
  mpt_scsipi_attach(mpt_softc_t *mpt)
 @@ -110,10 +114,11 @@
  	memset(adapt, 0, sizeof(*adapt));
  	adapt->adapt_dev = &mpt->sc_dev;
  	adapt->adapt_nchannels = 1;
 -	adapt->adapt_openings = maxq;
 -	adapt->adapt_max_periph = maxq;
 +	adapt->adapt_openings = maxq - 1; /* keep one for mngt reqs */
 +	adapt->adapt_max_periph = maxq - 1;
  	adapt->adapt_request = mpt_scsipi_request;
  	adapt->adapt_minphys = mpt_minphys;
 +	adapt->adapt_ioctl = mpt_ioctl;

  	/* Fill in the scsipi_channel. */
  	memset(chan, 0, sizeof(*chan));
 @@ -382,14 +387,15 @@
  	mpt_prt(mpt, "request state: %s", mpt_req_state(req->debug));
  	if (mpt->verbose > 1)
  		mpt_print_scsi_io_request((MSG_SCSI_IO_REQUEST *)req->req_vbuf);
 -
 +#if 0
  	/* XXX WHAT IF THE IOC IS STILL USING IT?? */
  	req->xfer = NULL;
  	mpt_free_request(mpt, req);

  	xs->error = XS_TIMEOUT;
  	scsipi_done(xs);
 -
 +#endif
 +	mpt_bus_reset(mpt);
  	splx(s);
  }

 @@ -461,6 +467,8 @@
  	if (__predict_false(mpt_req->Function == MPI_FUNCTION_SCSI_TASK_MGMT)) {
  		if (mpt->verbose > 1)
  			mpt_prt(mpt, "mpt_done: TASK MGMT");
 +		KASSERT(req == mpt->mngt_req);
 +		mpt->mngt_req = NULL;
  		goto done;
  	}

 @@ -1280,7 +1288,43 @@
  	}
  }

 -/* XXXJRT mpt_bus_reset() */
 +static void
 +mpt_bus_reset(mpt_softc_t *mpt)
 +{
 +	request_t *req;
 +	MSG_SCSI_TASK_MGMT *mngt_req;
 +	int s;
 +
 +	s = splbio();
 +	if (mpt->mngt_req) {
 +		/* request already queued; can't do more */
 +		splx(s);
 +		return;
 +	}
 +	req = mpt_get_request(mpt);
 +	if (__predict_false(req == NULL)) {
 +		printf("%s: no mngt request\n", mpt->sc_dev.dv_xname);
 +		splx(s);
 +		return;
 +	}
 +	mpt->mngt_req = req;
 +	splx(s);
 +	mngt_req = req->req_vbuf;
 +	memset(mngt_req, 0, sizeof(*mngt_req));
 +	mngt_req->Function = MPI_FUNCTION_SCSI_TASK_MGMT;
 +	mngt_req->Bus = mpt->bus;
 +	mngt_req->TargetID = 0;
 +	mngt_req->ChainOffset = 0;
 +	mngt_req->TaskType = MPI_SCSITASKMGMT_TASKTYPE_RESET_BUS;
 +	mngt_req->Reserved1 = 0;
 +	mngt_req->MsgFlags =
 +	    mpt->is_fc ? MPI_SCSITASKMGMT_MSGFLAGS_LIP_RESET_OPTION : 0;
 +	mngt_req->MsgContext = req->index;
 +	mngt_req->TaskMsgContext = 0;
 +	s = splbio();
 +	mpt_send_cmd(mpt, req);
 +	splx(s);
 +}

  /*****************************************************************************
   * SCSI interface routines
 @@ -1322,3 +1366,19 @@
  		bp->b_bcount = MPT_MAX_XFER;
  	minphys(bp);
  }
 +
 +static int
 +mpt_ioctl(struct scsipi_channel *chan, u_long cmd, caddr_t arg,
 +    int flag, struct proc *p)
 +{
 +	struct scsipi_adapter *adapt = chan->chan_adapter;
 +	mpt_softc_t *mpt = (void *) adapt->adapt_dev;
 +
 +	switch (cmd) {
 +	case SCBUSIORESET:
 +		mpt_bus_reset(mpt);
 +		return(0);
 +	default:
 +		return (ENOTTY);
 +	}
 +}
 Index: mpt_netbsd.h
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/ic/mpt_netbsd.h,v
 retrieving revision 1.4
 diff -u -r1.4 mpt_netbsd.h
 --- mpt_netbsd.h	11 Dec 2005 12:21:28 -0000	1.4
 +++ mpt_netbsd.h	2 Dec 2006 18:50:50 -0000
 @@ -227,6 +227,7 @@
  	/* SCSIPI and software management */
  	request_t		*request_pool;
  	SLIST_HEAD(req_queue, req_entry) request_free_list;
 +	request_t		*mngt_req;

  	struct scsipi_adapter	sc_adapter;
  	struct scsipi_channel	sc_channel;

 --7AUc2qLy4jB3hD7Z--

From: Tracy Di Marco White <tjd-nb-pr@menelos.com>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org,
	gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer) 
Date: Sun, 03 Dec 2006 04:34:18 -0600

 In message <20061202185501.GA16429@antioche.eu.org>, Manuel Bouyer writes:
 >OK, the command resets, and later the chip says it's complete while
 >we've already freed it. I think we should just issue a bus reset
 >(or bus_device_reset but it's harder to do) in case of timeout, and
 >let the controller complete the commands.
 >
 >Attached is a patch that attemps to implement a bus_reset function for
 >mpt(4). You can easily test by starting some I/O (e.g dd if=/dev/rsdxd
 >of=/dev/null bs=1m) and while it's running issue several scsictl scsibusx reset
 >
 >I expect to see "IOC Bus Reset Port %d" or "External Bus Reset" on console

 I occasionally get this:
 probe(mpt2:0:0:0): command timeout
 mpt2: timeout on request index = 0xfe, seq = 0x00000068
 mpt2: Status 0x80000000, Mask 0x00000001, Doorbell 0x24000000
 mpt2: request state: On Chip

 over and over at boot, on different controllers.
 Now, instead, it seems to hang here instead of repeating.
 When I get this I need to reboot anyway until I don't get it,
 as usually whatever is on the scsi chain complaining will not
 be found.

 -Tracy

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Tracy Di Marco White <tjd-nb-pr@menelos.com>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org,
	gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer)
Date: Sun, 3 Dec 2006 12:07:34 +0100

 On Sun, Dec 03, 2006 at 04:34:18AM -0600, Tracy Di Marco White wrote:
 > 
 > In message <20061202185501.GA16429@antioche.eu.org>, Manuel Bouyer writes:
 > >OK, the command resets, and later the chip says it's complete while
 > >we've already freed it. I think we should just issue a bus reset
 > >(or bus_device_reset but it's harder to do) in case of timeout, and
 > >let the controller complete the commands.
 > >
 > >Attached is a patch that attemps to implement a bus_reset function for
 > >mpt(4). You can easily test by starting some I/O (e.g dd if=/dev/rsdxd
 > >of=/dev/null bs=1m) and while it's running issue several scsictl scsibusx reset
 > >
 > >I expect to see "IOC Bus Reset Port %d" or "External Bus Reset" on console
 > 
 > I occasionally get this:
 > probe(mpt2:0:0:0): command timeout
 > mpt2: timeout on request index = 0xfe, seq = 0x00000068
 > mpt2: Status 0x80000000, Mask 0x00000001, Doorbell 0x24000000
 > mpt2: request state: On Chip
 > 
 > over and over at boot, on different controllers.
 > Now, instead, it seems to hang here instead of repeating.
 > When I get this I need to reboot anyway until I don't get it,
 > as usually whatever is on the scsi chain complaining will not
 > be found.

 So when we issue a bus reset the IOC doens't abort pending commands that
 it has in its queue. It's hard to understand how such rarely-used feature
 works by reverse-engineering other drivers; I'm not even sure it works
 properly in other drivers ...

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.