NetBSD Problem Report #57133

From www@netbsd.org  Fri Dec 23 16:19:04 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id B6A591A921F
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 23 Dec 2022 16:19:04 +0000 (UTC)
Message-Id: <20221223161903.5B6E41A9239@mollari.NetBSD.org>
Date: Fri, 23 Dec 2022 16:19:03 +0000 (UTC)
From: abs@absd.org
Reply-To: abs@absd.org
To: gnats-bugs@NetBSD.org
Subject: xs->resid == xs->datalen panic in mpii
X-Send-Pr-Version: www-1.0

>Number:         57133
>Category:       kern
>Synopsis:       xs->resid == xs->datalen panic in mpii
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Dec 23 16:20:02 +0000 2022
>Closed-Date:    Thu Oct 26 19:08:57 +0000 2023
>Last-Modified:  Thu Oct 26 19:08:57 +0000 2023
>Originator:     David Brownlee
>Release:        10.0_BETA
>Organization:
>Environment:
NetBSD iris.absd.org 10.0_BETA NetBSD 10.0_BETA (GENERIC) #0: Fri Dec 16 19:12:49 UTC 2022  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64

>Description:
System running 8 disks on Symbios Logic SAS2308, primarily ZFS but root & home on UFS.

Previously running netbsd-9 without issue for several years.

Upgraded to netbsd-10_BETA and system paniced overnight with

mpii0: mpii_scsi_cmd_tmo
sd7(mpii0:0:7:0): passthrough: adapter inconsistency
panic: kernel diagnostic assertion "xs->resid == xs->datalen" failed: file "/usr/src/sys/dev/pci/mpii.c", line 3207 
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x183
kern_assert() at netbsd:kern_assert+0x4b
mpii_scsi_cmd_done() at netbsd:mpii_scsi_cmd_done+0x33a
mpii_intr() at netbsd:mpii_intr+0x221
intr_kdtrace_wrapper() at netbsd:intr_kdtrace_wrapper+0x26
Xhandle_ioapic_edge16() at netbsd:Xhandle_ioapic_edge16+0x75

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57133: xs->resid == xs->datalen panic in mpii
Date: Tue, 7 Feb 2023 12:17:43 -0000 (UTC)

 abs@absd.org (David Brownlee) writes:

 >Still seeing this panic - looks to be about every other night, though I
 >rarely get a crashdump.

 >Suggest this could be a pretty nasty regression from netbsd-9 for anyone
 >running a server with an mpii :/

 That's usually an erroneous error path, should be easy to find.

From: Brian Buhrow <buhrow@nfbcal.org>
To: gnats-bugs@netbsd.org
Cc: buhrow@nfbcal.org
Subject: Re: kern/57133
Date: Fri, 29 Sep 2023 11:47:07 -0700

 	Hello.  While playing with a new system, I ran into this bug and am able to reliably
 reproduce the problem.  So, I did a little digging and found some additional details:

 1.  mpii.c V1.25 with NetBSD-9977 from January 2021 demonstrates the problem.
 Netbsd-9.1_stable with mpii.c V1.22.4.1 works just fine.


 2.  The only differences between the two versions are a bunch of changes that make most of the
 functions in mpii.c static functions and calling malloc with M_WAITOK set, so error checking
 for memory shortages can be dropped from mpii.c.  

 3.  That makes this problem a symptom, rather than a cause of the trouble.  To that end, I
 modified mpii.c so that when xs->resid != xs->datalen, it prints an error message with the two
 values, rather than panicing.  Now, I can  boot the system and do things with it.  In the
 excerpted dmesg output below, the disks which demonstrate the problem during the probe are
 Western Digital 4TB SATA3 disks, while the 10TB Seagate SAS3 disks don't appear to demonstrate
 the problem.  It seems that the problem here is that something changed in the scsipi subsystem
 and the mpii.c driver makes an assumption about what should be in the xs structure that no
 longer holds.  I did a cursory search down the source tree to see if I could find any other drivers
 that check to see if xs->resid = xs->datalen, but I didn't find any  obvious examples.
 	With that said, I'm not familiar enough with the scsipi system to say this check should be
 removed from the mpii.c driver, but does it make sense to have the system panic when the values
 don't match?  What is this check guarding against?

 	Here is the excerpted dmesg output from a successful boot with my diagnostic messages
 instead of the panic.  If anyone has ideas on other things to try, or ideas on what changed in
 the scsipi subsystem to break this check, I'd be interested to know.
 -thanks
 -Brian


 [   1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
 [   1.0000000]     2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
 [   1.0000000]     2018, 2019, 2020, 2021 The NetBSD Foundation, Inc.  All rights reserved.
 [   1.0000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
 [   1.0000000]     The Regents of the University of California.  All rights reserved.

 [   1.0000000] NetBSD 9.99.77 (INSTALL) #0: Fri Sep 29 03:37:19 PDT 2023
 [   1.0000000] 	buhrow@loth-9.nfbcal.org:/usr/local/netbsd/obj-9977-64/sys/arch/amd64/compile/INSTALL
 [   1.0000000] total memory = 511 GB
 [   1.0000000] avail memory = 496 GB
 [   1.0311173] mpii0 at pci9 dev 0 function 0: Symbios Logic SAS3008 (rev. 0x02)
 [   1.0311173] mpii0: interrupting at msix9 vec 0
 [   1.0311173] mpii0: LSI3008-IT, firmware 16.0.1.0, MPI 2.5
 [   1.0311173] mpii0: physical device inserted in slot 0
 [   1.0311173] mpii0: physical device inserted in slot 1
 [   1.0311173] mpii0: physical device inserted in slot 2
 [   1.0311173] mpii0: physical device inserted in slot 5
 [   1.0311173] mpii0: physical device inserted in slot 6
 [   1.0311173] mpii0: physical device inserted in slot 7
 [   1.0311173] mpii0: physical device inserted in slot 8
 [   1.0311173] mpii0: physical device inserted in slot 9
 [   1.0311173] mpii0: physical device inserted in slot 10
 [   1.0311173] mpii0: physical device inserted in slot 11
 [   1.0311173] mpii0: physical device inserted in slot 28
 [   1.0311173] scsibus0 at mpii0: 256 targets, 8 luns per target
 [   8.5704323] scsibus0: waiting 2 seconds for devices to settle...
 [  10.5804346] sd0 at scsibus0 target 0 lun 0: <ATA, WDC WD4003FFBX-6, 0A83> disk fixed
 [  10.5804346] sd0: 3726 GB, 3815448 cyl, 16 head, 127 sec, 512 bytes/sect x 7814037168 sectors
 [  10.5911436] dk0 at sd0: "1adc589e-4f32-11ee-b97c-00259036fd2e", 7814033005 blocks at 34, type: raidframe
 [  10.6004341] sd0: tagged queueing
 [  10.7404347] mpii0: resid = 0, datalen = 16384
 [  10.7404347] sd1 at scsibus0 target 1 lun 0: <ATA, WDC WD4003FFBX-6, 0A83> disk fixed
 [  10.7504342] sd1: 3726 GB, 3815448 cyl, 16 head, 127 sec, 512 bytes/sect x 7814037168 sectors
 [  10.7904348] sd1: 3907019088 trailing sectors not covered by disklabel
 [  10.7904348] sd1: tagged queueing
 [  10.9904351] mpii0: resid = 0, datalen = 16384
 [  10.9904351] sd2 at scsibus0 target 2 lun 0: <ATA, WDC WD4003FFBX-6, 0A83> disk fixed
 [  11.0004344] sd2: 3726 GB, 3815448 cyl, 16 head, 127 sec, 512 bytes/sect x 7814037168 sectors
 [  11.0504351] sd2: 3907019088 trailing sectors not covered by disklabel
 [  11.0604345] sd2: tagged queueing
 [  11.2404351] mpii0: resid = 0, datalen = 16384
 [  11.2404351] sd3 at scsibus0 target 5 lun 0: <ATA, WDC WD4003FFBX-6, 0A83> disk fixed
 [  11.2504347] sd3: 3726 GB, 3815448 cyl, 16 head, 127 sec, 512 bytes/sect x 7814037168 sectors
 [  11.2704351] dk1 at sd3: "549e5298-5113-11ee-910e-00259036fd2e", 7814033005 blocks at 34, type: raidframe
 [  11.2804346] sd3: tagged queueing
 [  11.2904346] probe(mpii0:0:6:0): Sense Error Code 0x72
 [  11.2904346] probe(mpii0:0:6:0): Sense Error Code 0x72
 [  11.3004345] sd4 at scsibus0 target 6 lun 0: <SEAGATE, ST12000NM0158, RSL2> disk fixed
 [  11.3204350] sd4: 10949 GB, 501605 cyl, 15 head, 3051 sec, 512 bytes/sect x 22961717248 sectors
 [  11.3304345] sd4: tagged queueing
 [  11.3404346] probe(mpii0:0:7:0): Sense Error Code 0x72
 [  11.3504346] probe(mpii0:0:7:0): Sense Error Code 0x72
 [  11.3604346] sd5 at scsibus0 target 7 lun 0: <SEAGATE, ST12000NM0158, RSL2> disk fixed
 [  11.3804351] sd5: 10949 GB, 477414 cyl, 15 head, 3206 sec, 512 bytes/sect x 22961717248 sectors
 [  11.3904347] sd5: tagged queueing
 [  11.4004351] probe(mpii0:0:8:0): Sense Error Code 0x72
 [  11.4104345] probe(mpii0:0:8:0): Sense Error Code 0x72
 [  11.4104345] sd6 at scsibus0 target 8 lun 0: <SEAGATE, ST12000NM0158, RSL2> disk fixed
 [  11.4304351] sd6: 10949 GB, 510249 cyl, 15 head, 3000 sec, 512 bytes/sect x 22961717248 sectors
 [  11.4404351] sd6: tagged queueing
 [  11.4404351] sd7 at scsibus0 target 9 lun 0: <SEAGATE, ST12000NM0158, RSL2> disk fixed
 [  11.4704351] sd7: 10949 GB, 467635 cyl, 16 head, 3068 sec, 512 bytes/sect x 22961717248 sectors
 [  11.5004352] sd7: tagged queueing
 [  11.5004352] probe(mpii0:0:10:0): Sense Error Code 0x72
 [  11.5104346] probe(mpii0:0:10:0): Sense Error Code 0x72
 [  11.5104346] sd8 at scsibus0 target 10 lun 0: <SEAGATE, ST12000NM0158, RSL2> disk fixed
 [  11.5304352] sd8: 10949 GB, 501985 cyl, 15 head, 3049 sec, 512 bytes/sect x 22961717248 sectors
 [  11.5504352] sd8: tagged queueing
 [  11.5604376] probe(mpii0:0:11:0): Sense Error Code 0x72
 [  11.5604376] probe(mpii0:0:11:0): Sense Error Code 0x72
 [  11.5704346] sd9 at scsibus0 target 11 lun 0: <SEAGATE, ST12000NM0158, RSL2> disk fixed
 [  11.5904354] sd9: 10949 GB, 487183 cyl, 16 head, 2945 sec, 512 bytes/sect x 22961717248 sectors
 [  11.6004348] dk2 at sd9: "EFI System Partition", 1048576 blocks at 2048, type: msdos
 [  11.6104348] dk3 at sd9: "2c159d0f-f486-4170-9784-8ec1391fbc00", 22960664576 blocks at 1050624, type: ext2fs
 [  11.6204347] sd9: tagged queueing
 [  11.6304347] ses0 at scsibus0 target 28 lun 0: <LSI, SAS3x28, 0501> enclosure services fixed
 [  11.6404348] ses0: SCSI-3 SES Device
 [  11.6404348] ses0: tagged queueing
 [  15.1704381] sd1: 3907019088 trailing sectors not covered by disklabel
 [  15.1804383] sd1: 3907019088 trailing sectors not covered by disklabel
 [  15.1904427] sd2: 3907019088 trailing sectors not covered by disklabel
 [  15.1904427] sd2: 3907019088 trailing sectors not covered by disklabel
 [  15.2104381] raid1: RAID Level 1
 [  15.2104381] raid1: Components: /dev/dk0 /dev/dk1
 [  15.2104381] raid1: Total Sectors: 7814032896 (3815445 MB)
 [  15.2204383] dk4 at raid1: "2569cdee-4f34-11ee-84a1-00259036fd2e", 7814032829 blocks at 34, type: ffs
 [  15.2304381] raid0: RAID Level 1
 [  15.2404378] raid0: Components: /dev/sd2a /dev/sd1a
 [  15.2404378] raid0: Total Sectors: 3907017920 (1907723 MB)
 [  15.2604380] WARNING: 2 errors while detecting hardware; check system log.
 [  15.2804376] boot device: raid0
 [  15.2804376] root on md0a dumps on md0b
 [  15.2904383] root file system type: ffs
 [  15.2904383] kern.module.path=/stand/amd64/9.99.77/modules
 [  15.3004381] WARNING: clock lost 455 days
 [  15.3004381] WARNING: using filesystem time
 [  15.3116721] WARNING: CHECK AND RESET THE DATE!
 Created tmpfs /dev (1818624 byte, 3520 inodes)
 erase ^?, werase ^W, kill ^U, intr ^C


 --- End of forwarded message from "Brian Buhrow" <buhrow@nfbcal.org>






From: Brian Buhrow <buhrow@nfbcal.org>
To: gnats-bugs@netbsd.org
Cc: buhrow@nfbcal.org
Subject: Re: kern/57133
Date: Sun, 1 Oct 2023 12:11:05 -0700

 	Hello.  Following up on this issue again, I decided to print the SCsI command that
 triggers the problem, in hopes that might help folks figure out what's going wrong.  Below is
 some excerpted output that shows the problem.

 Also, I dinclude a patch to mpii.c which avoids the panic, but provides more information about
 the issue, so, as folks encounter the error in the field, we can get more data.  this patch
 allows me to run with the mpii card and the apparently troublesome Western Digital disks
 without a problem.

 First, the debug output, then the patch.


 [  10.5574554] sd0 at scsibus0 target 0 lun 0: <ATA, WDC WD4003FFBX-6, 0A83> disk fixed
 [  10.5574554] sd0: 3726 GB, 3815448 cyl, 16 head, 127 sec, 512 bytes/sect x 7814037168 sectors
 [  10.5681363] dk0 at sd0: "1adc589e-4f32-11ee-b97c-00259036fd2e", 7814033005 blocks at 34, type: raidframe
 [  10.5853388] sd0: tagged queueing
 [  10.7574556] mpii0: resid = 0, datalen = 16384
 [  10.7574556] mpii0: SCSI command info is: 0xa3 0c 80 00 00 00 00 00 40 00 00 00
 sd1 at scsibus0 target 1 lun 0: <ATA, WDC WD4003FFBX-6, 0A83> disk fixed
 [  10.7774550] sd1: 3726 GB, 3815448 cyl, 16 head, 127 sec, 512 bytes/sect x 7814037168 sectors
 [  10.8374548] mssd1: tagged queueing
 [  11.0074552] mpii0: resid = 0, datalen = 16384
 [  11.0074552] mpii0: SCSI command info is: 0xa3 0c 80 00 00 00 00 00 40 00 00 00
 sd2 at scsibus0 target 2 lun 0: <ATA, WDC WD4003FFBX-6, 0A83> disk fixed
 [  11.0274552] sd2: 3726 GB, 3815448 cyl, 16 head, 127 sec, 512 bytes/sect x 7814037168 sectors
 [  11.0574559] sd2: 3907019088 trailing sectors not covered by disklabel
 [  11.0674553] sd2: tagged queueing
 [  11.2574559] mpii0: resid = 0, datalen = 16384
 [  11.2574559] mpii0: SCSI command info is: 0xa3 0c 80 00 00 00 00 00 40 00 00 00
 sd3 at scsibus0 target 5 lun 0: <ATA, WDC WD4003FFBX-6, 0A83> disk fixed
 [  11.2774557] sd3: 3726 GB, 3815448 cyl, 16 head, 127 sec, 512 bytes/sect x 7814037168 sectors
 [  11.2874554] dk1 at sd3: "549e5298-5113-11ee-910e-00259036fd2e", 7814033005 blocks at 34, type: raidframe
 [  11.2974554] sd3: tagged queueing


 Now, the patch.

 Index: mpii.c
 ==================================================================
 RCS file: /cvsroot/src/sys/dev/pci/mpii.c,v
 retrieving revision 1.29
 diff -u -r1.29 mpii.c
 --- mpii.c	7 Aug 2021 16:19:14 -0000	1.29
 +++ mpii.c	1 Oct 2023 18:58:28 -0000
 @@ -1,4 +1,4 @@
 -/* $NetBSD$ */
 +/* $NetBSD: mpii.c,v 1.29 2021/08/07 16:19:14 thorpej Exp $ */
  /*	$OpenBSD: mpii.c,v 1.115 2018/08/14 05:22:21 jmatthew Exp $	*/
  /*
   * Copyright (c) 2010, 2012 Mike Belopuhov
 @@ -20,7 +20,7 @@
   */

  #include <sys/cdefs.h>
 -__KERNEL_RCSID(0, "$NetBSD$");
 +__KERNEL_RCSID(0, "$NetBSD: mpii.c,v 1.29 2021/08/07 16:19:14 thorpej Exp $");

  #include "bio.h"

 @@ -3203,8 +3203,18 @@
  		bus_dmamap_unload(sc->sc_dmat, dmap);
  	}

 +	if (xs->resid != xs->datalen) {
 +		printf("%s: resid = %u, datalen = %u\n",
 +		DEVNAME(sc), xs->resid, xs->datalen);
 +		printf("%s: SCSI command info is: ",DEVNAME(sc));
 +		scsipi_print_cdb(xs->cmd);
 +		printf("\n");
 +	}
 +
  	KASSERT(xs->error == XS_NOERROR);
 +#if 0
  	KASSERT(xs->resid == xs->datalen);
 +#endif
  	KASSERT(xs->status == SCSI_OK);

  	if (ccb->ccb_rcb == NULL) {

From: David Brownlee <abs@absd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/57133
Date: Tue, 3 Oct 2023 19:32:50 +0100

 Had a quick check for related comparisons:
  '(xs->resid.*\S=.*xs->datalen|xs->datalen.*\S=.*xs->resid)' in the
 source tree. Looks like its been pretty consistent through at least 8,
 9 10 & current:

 sys/arch/bebox/stand/boot/siop.c:957:                   if (xs->resid
 == xs->datalen && xs->datalen) {
 sys/arch/prep/stand/boot/siop.c:919:                    if (xs->resid
 == xs->datalen && xs->datalen) {
 sys/dev/pci/mpii.c:3207:        KASSERT(xs->resid == xs->datalen);
 sys/dev/scsipi/atapi_base.c:89:                 if (xs->resid == xs->datalen)
 sys/dev/scsipi/st.c:2262:               if (xs->datalen && xs->resid
 >= xs->datalen) {
 sys/dev/scsipi/scsipi_base.c:983:                       if (xs->resid
 == xs->datalen && xs->datalen) {

 mpii.c definitely seems to be the only assert in there, so presumably
 as long as its doing the Right Thing with the returned data, the
 assert should be removed...

 David

State-Changed-From-To: open->feedback
State-Changed-By: buhrow@NetBSD.org
State-Changed-When: Wed, 18 Oct 2023 05:21:28 +0000
State-Changed-Why:
There is a patch and we want to make sure it works for the original submitter.
.


From: Brian Buhrow <buhrow@nfbcal.org>
To: gnats-bugs@netbsd.org
Cc: buhrow@nfbcal.org
Subject: Re: kern/57133
Date: Tue, 17 Oct 2023 22:16:29 -0700

 	Hello.  Further analysis reveals the trouble, as explained below.  Also, I provide a diff
 to mpii.c that fixes the problem.  Can the original submitter of this bug test the below patch
 to see if it addresses their issue?  If it does, I'll commit the fix and request pullups to
 NetBSD-8 and NetBSD-9, since the bug was introduced to both of those releases.

 -thanks
 -Brian


 	Hello.  Okay.  I can now explain why this assert is firing and have a fix for it.  It is a
 regression introduced in R1.22 of mpii.c.

 	If a request comes in and the IOC returns a MPII_SCSIIO_STATUS_CHECK_COND condition, after
 a successful transfer, or one that is a recovered error, 
 mpii(4) correctly sets the xs->error to XS_SENSE, but incorrectly sets xs->resid to 0 before
 returning the xfer to the upper scsi layers.  Once the upper layers get it, they notice the
 XS_SENSE check condition and because it's a retryable error, they increment xs_requeuecnt, set
 ERESTART and send the xfer request down to the mpii(4) layer again for a retry.  What they do
 not do is reset xs->resid equal to xs->datalen.  When the xfer comes down to mpii(4) again, the
 assert happens.  The fix is for the mpii(4) driver to leave xs->resid alone when it encounters
 a MPII_SCSIIO_STATUS_CHECK_COND condition.


 The below patch implements this change.


 Index: mpii.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/pci/mpii.c,v
 retrieving revision 1.29
 diff -u -r1.29 mpii.c
 --- mpii.c	7 Aug 2021 16:19:14 -0000	1.29
 +++ mpii.c	18 Oct 2023 05:09:53 -0000
 @@ -3264,7 +3264,6 @@
  			break;

  		case MPII_SCSIIO_STATUS_CHECK_COND:
 -			xs->resid = 0;
  			xs->error = XS_SENSE;
  			break;


From: "Brian Buhrow" <buhrow@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57133 CVS commit: src/sys/dev/pci
Date: Wed, 25 Oct 2023 00:21:49 +0000

 Module Name:	src
 Committed By:	buhrow
 Date:		Wed Oct 25 00:21:49 UTC 2023

 Modified Files:
 	src/sys/dev/pci: mpii.c

 Log Message:
 Fixes for PR kern/57133:

 I can now explain why this assert is firing and have a fix for it.  It is a
  regression introduced in R1.22 of mpii.c.

         If a request comes in and the IOC returns a MPII_SCSIIO_STATUS_CHECK_COND condition, after
  a successful transfer, or one that is a recovered error,
  mpii(4) correctly sets the xs->error to XS_SENSE, but incorrectly sets xs->resid to 0 before
  returning the xfer to the upper scsi layers.  Once the upper layers get it, they notice the
  XS_SENSE check condition and because it's a retryable error, they increment xs_requeuecnt, set
  ERESTART and send the xfer request down to the mpii(4) layer again for a retry.  What they do
  not do is reset xs->resid equal to xs->datalen.  When the xfer comes down to mpii(4) again, the
  assert happens.  The fix is for the mpii(4) driver to leave xs->resid alone when it encounters
  a MPII_SCSIIO_STATUS_CHECK_COND condition.

 This bug affects NetBSD-10, netbsd-9 and netbsd-8.


 To generate a diff of this commit:
 cvs rdiff -u -r1.29 -r1.30 src/sys/dev/pci/mpii.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: feedback->pending-pullups
State-Changed-By: buhrow@NetBSD.org
State-Changed-When: Wed, 25 Oct 2023 00:37:24 +0000
State-Changed-Why:
Requested pullups to NetBSD-8, NetBSD-9 and NetBSD-10


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57133 CVS commit: [netbsd-10] src/sys/dev/pci
Date: Thu, 26 Oct 2023 15:11:02 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Thu Oct 26 15:11:02 UTC 2023

 Modified Files:
 	src/sys/dev/pci [netbsd-10]: mpii.c

 Log Message:
 Pull up following revision(s) (requested by buhrow in ticket #435):

 	sys/dev/pci/mpii.c: revision 1.30

 Fixes for PR kern/57133:

 I can now explain why this assert is firing and have a fix for it.  It is a  regression introduced in R1.22 of mpii.c.

         If a request comes in and the IOC returns a MPII_SCSIIO_STATUS_CHECK_COND condition, after
  a successful transfer, or one that is a recovered error,
  mpii(4) correctly sets the xs->error to XS_SENSE, but incorrectly sets xs->resid to 0 before
  returning the xfer to the upper scsi layers.  Once the upper layers get it, they notice the
  XS_SENSE check condition and because it's a retryable error, they increment xs_requeuecnt, set
  ERESTART and send the xfer request down to the mpii(4) layer again for a retry. What they do
  not do is reset xs->resid equal to xs->datalen.  When the xfer comes down to mpii(4) again, the
  assert happens.  The fix is for the mpii(4) driver to leave xs->resid alone when it encounters
  a MPII_SCSIIO_STATUS_CHECK_COND condition.

 This bug affects NetBSD-10, netbsd-9 and netbsd-8.


 To generate a diff of this commit:
 cvs rdiff -u -r1.29 -r1.29.6.1 src/sys/dev/pci/mpii.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57133 CVS commit: [netbsd-9] src/sys/dev/pci
Date: Thu, 26 Oct 2023 15:12:10 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Thu Oct 26 15:12:10 UTC 2023

 Modified Files:
 	src/sys/dev/pci [netbsd-9]: mpii.c

 Log Message:
 Pull up following revision(s) (requested by buhrow in ticket #1756):

 	sys/dev/pci/mpii.c: revision 1.30

 Fixes for PR kern/57133:

 I can now explain why this assert is firing and have a fix for it.  It is a  regression introduced in R1.22 of mpii.c.

         If a request comes in and the IOC returns a MPII_SCSIIO_STATUS_CHECK_COND condition, after
  a successful transfer, or one that is a recovered error,
  mpii(4) correctly sets the xs->error to XS_SENSE, but incorrectly sets xs->resid to 0 before
  returning the xfer to the upper scsi layers.  Once the upper layers get it, they notice the
  XS_SENSE check condition and because it's a retryable error, they increment xs_requeuecnt, set
  ERESTART and send the xfer request down to the mpii(4) layer again for a retry. What they do
  not do is reset xs->resid equal to xs->datalen.  When the xfer comes down to mpii(4) again, the
  assert happens.  The fix is for the mpii(4) driver to leave xs->resid alone when it encounters
  a MPII_SCSIIO_STATUS_CHECK_COND condition.

 This bug affects NetBSD-10, netbsd-9 and netbsd-8.


 To generate a diff of this commit:
 cvs rdiff -u -r1.22.4.1 -r1.22.4.2 src/sys/dev/pci/mpii.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57133 CVS commit: [netbsd-8] src/sys/dev/pci
Date: Thu, 26 Oct 2023 15:13:38 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Thu Oct 26 15:13:38 UTC 2023

 Modified Files:
 	src/sys/dev/pci [netbsd-8]: mpii.c

 Log Message:
 Pull up following revision(s) (requested by buhrow in ticket #1916):

 	sys/dev/pci/mpii.c: revision 1.30

 Fixes for PR kern/57133:

 I can now explain why this assert is firing and have a fix for it.  It is a  regression introduced in R1.22 of mpii.c.

         If a request comes in and the IOC returns a MPII_SCSIIO_STATUS_CHECK_COND condition, after
  a successful transfer, or one that is a recovered error,
  mpii(4) correctly sets the xs->error to XS_SENSE, but incorrectly sets xs->resid to 0 before
  returning the xfer to the upper scsi layers.  Once the upper layers get it, they notice the
  XS_SENSE check condition and because it's a retryable error, they increment xs_requeuecnt, set
  ERESTART and send the xfer request down to the mpii(4) layer again for a retry. What they do
  not do is reset xs->resid equal to xs->datalen.  When the xfer comes down to mpii(4) again, the
  assert happens.  The fix is for the mpii(4) driver to leave xs->resid alone when it encounters
  a MPII_SCSIIO_STATUS_CHECK_COND condition.

 This bug affects NetBSD-10, netbsd-9 and netbsd-8.


 To generate a diff of this commit:
 cvs rdiff -u -r1.8.10.7 -r1.8.10.8 src/sys/dev/pci/mpii.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: buhrow@NetBSD.org
State-Changed-When: Thu, 26 Oct 2023 19:08:57 +0000
State-Changed-Why:
Pullups have been processed.
.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.