NetBSD Problem Report #54045

From kardel@Kardel.name  Wed Mar  6 10:47:35 2019
Return-Path: <kardel@Kardel.name>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 90A5C7A1CB
	for <gnats-bugs@gnats.NetBSD.org>; Wed,  6 Mar 2019 10:47:35 +0000 (UTC)
Message-Id: <20190306093217.E595744B33@Andromeda.Kardel.name>
Date: Wed,  6 Mar 2019 10:32:17 +0100 (CET)
From: kardel@netbsd.org
Reply-To: kardel@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: mpii driver triggers panic on timeout in physio
X-Send-Pr-Version: 3.95

>Number:         54045
>Category:       kern
>Synopsis:       kernel diagnostic assertion "bp->b_error == 0" failed: file "/src/NetBSD/cur/src/sys/kern/kern_physio.c", line 179
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kardel@netbsd.org
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Mar 06 10:50:00 +0000 2019
>Closed-Date:    Fri Mar 15 14:58:45 +0000 2019
>Last-Modified:  Fri Mar 15 14:58:45 +0000 2019
>Originator:     Frank Kardel
>Release:        NetBSD 8.99.34
>Organization:
>Environment:
System: NetBSD admin1 8.99.34 NetBSD 8.99.34 (GENERIC_SCSIDEBUG) #5: Wed Mar  6 08:51:34 CET 2019  kardel@Andromeda:/usr/src/obj.amd64/sys/arch/amd64/compile/GENERIC_SCSIDEBUG amd64
Architecture: x86_64
Machine: amd64
>Description:
	Timeouts on SCSI WRITE/READ commands in the mpii driver cause physio to panic with:
	kernel diagnostic assertion "bp->b_error == 0" failed: file "/src/NetBSD/cur/src/sys/kern/kern_physio.c", line 179
	Root cause is the improper setting of bp->resid in the error case triggering the physio assertion that 
	fully completed transfers must not indicate error conditions.
>How-To-Repeat:
	Force a timeout on READ/WRITE in the mpii driver (tape drives with bad tapes are a
	good test candidate) and observe the panic on a DIAGNOSTIC kernel.
>Fix:
	Align values of xs->error, xs->resid and xs->status with the expectations of the scsipi subsystem.

>Release-Note:

>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54045: mpii driver triggers panic on timeout in physio
Date: Wed, 6 Mar 2019 12:31:56 -0000 (UTC)

 kardel@netbsd.org writes:

 >>Description:
 >	Timeouts on SCSI WRITE/READ commands in the mpii driver cause physio to panic with:
 >	kernel diagnostic assertion "bp->b_error == 0" failed: file "/src/NetBSD/cur/src/sys/kern/kern_physio.c", line 179


 The adapter driver doesn't hold that assertion. You can get a partial
 transfer and an error condition at the same time from SCSI. Some error
 paths in some drivers reset xs->resid, but that's not used very consistently.

 The sd driver assures that the physio assertion is satisfied in sddone(),
 but the tape driver doesn't.

 The assertion itself is also something "new". The original code would
 accept a partial read (usually an EOM condition) that could also return
 an error.

 The combination of drivers that reset the residual after an error and
 an assertion that checks the fake residual is a bit strange too.


 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/54045: mpii driver triggers panic on timeout in physio
Date: Wed, 6 Mar 2019 13:46:26 +0100

 I agree with you. I have cleaned up the mpii driver a bit with respect 
 to error/status/resid handling - I will commit that soon afte some testing.

 This cleanup will need to be pulled up to -8 also.

 Frank


 On 03/06/19 13:35, Michael van Elst wrote:
 > The following reply was made to PR kern/54045; it has been noted by GNATS.
 >
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: kern/54045: mpii driver triggers panic on timeout in physio
 > Date: Wed, 6 Mar 2019 12:31:56 -0000 (UTC)
 >
 >   kardel@netbsd.org writes:
 >   
 >   >>Description:
 >   >	Timeouts on SCSI WRITE/READ commands in the mpii driver cause physio to panic with:
 >   >	kernel diagnostic assertion "bp->b_error == 0" failed: file "/src/NetBSD/cur/src/sys/kern/kern_physio.c", line 179
 >   
 >   
 >   The adapter driver doesn't hold that assertion. You can get a partial
 >   transfer and an error condition at the same time from SCSI. Some error
 >   paths in some drivers reset xs->resid, but that's not used very consistently.
 >   
 >   The sd driver assures that the physio assertion is satisfied in sddone(),
 >   but the tape driver doesn't.
 >   
 >   The assertion itself is also something "new". The original code would
 >   accept a partial read (usually an EOM condition) that could also return
 >   an error.
 >   
 >   The combination of drivers that reset the residual after an error and
 >   an assertion that checks the fake residual is a bit strange too.
 >   
 >   
 >   --
 >   --
 >                                   Michael van Elst
 >   Internet: mlelstv@serpens.de
 >                                   "A potential Snark may lurk in every tree."
 >   

Responsible-Changed-From-To: kern-bug-people->kardel@netbsd.org
Responsible-Changed-By: kardel@NetBSD.org
Responsible-Changed-When: Wed, 06 Mar 2019 12:51:45 +0000
Responsible-Changed-Why:
mine


State-Changed-From-To: open->analyzed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Wed, 06 Mar 2019 12:51:45 +0000
State-Changed-Why:
root cause is analyzed


From: "Frank Kardel" <kardel@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54045 CVS commit: src/sys/dev/pci
Date: Mon, 11 Mar 2019 14:35:22 +0000

 Module Name:	src
 Committed By:	kardel
 Date:		Mon Mar 11 14:35:22 UTC 2019

 Modified Files:
 	src/sys/dev/pci: mpii.c

 Log Message:
 PR/54045

 fix mpii to adhere to physio diagnostic invariant that
 fully processed data must not post an error:
 1) verify expected scspi state via KASSERT() instead of just
    setting the variables.
 2) set xs->resid only in known good conditions
 3) insure setting errors in all error paths and refrain
    from clearing xs->resid in error paths.

 While there do some cosmectic clean up:
 1) extend and relocate some debug output
 2) mpii HBAs can also manage non-disk devices like tapes etc,
    so log that physical "devices" instead of physical "disks" are
    attached or detached.

 Tested with NEOSeries FlexStor II and luckily a broken tape drive 8-(

 mpii0 at pci1 dev 0 function 0: vendor 1000 product 00ab (rev. 0x01)
 mpii0: interrupting at irq 11
 mpii0: HBA 9400-8i8e, firmware 3.0.4.0, MPI 2.6
 mpii0: physical device inserted in slot 9
 mpii0: physical device inserted in slot 13
 mpii0: physical device inserted in slot 16
 st0 at scsibus0 target 9 lun 0: <IBM, ULTRIUM-HH7, J4D1> tape removable
 st0: density code 92, variable blocks, write-enabled
 ch0 at scsibus0 target 9 lun 1: <BDT, FlexStor II, 5.50> changer removable
 ch0: 23 slots, 2 drives, 1 picker, 1 portal
 st0: tagged queueing
 ch0: tagged queueing
 st1 at scsibus0 target 13 lun 0: <IBM, ULTRIUM-HH7, J4D1> tape removable
 st1: density code 92, variable blocks, write-enabled
 st1: tagged queueing
 ses0 at scsibus0 target 16 lun 0: <LSI, VirtualSES, 01> enclosure services fixed

 Note: pullup-8


 To generate a diff of this commit:
 cvs rdiff -u -r1.21 -r1.22 src/sys/dev/pci/mpii.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/54045 CVS commit: src/sys/dev/pci
Date: Fri, 15 Mar 2019 11:02:26 +0100

 pullup-8 requested - ticket 1217

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54045 CVS commit: [netbsd-8] src/sys/dev/pci
Date: Fri, 15 Mar 2019 14:50:36 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Fri Mar 15 14:50:36 UTC 2019

 Modified Files:
 	src/sys/dev/pci [netbsd-8]: mpii.c

 Log Message:
 Pull up following revision(s) (requested by kardel in ticket #1217):

 	sys/dev/pci/mpii.c: revision 1.22

 PR/54045

 fix mpii to adhere to physio diagnostic invariant that
 fully processed data must not post an error:

 1) verify expected scspi state via KASSERT() instead of just
     setting the variables.
 2) set xs->resid only in known good conditions
 3) insure setting errors in all error paths and refrain
     from clearing xs->resid in error paths.

 While there do some cosmectic clean up:
 1) extend and relocate some debug output
 2) mpii HBAs can also manage non-disk devices like tapes etc,
     so log that physical "devices" instead of physical "disks" are
     attached or detached.

 Tested with NEOSeries FlexStor II and luckily a broken tape drive 8-(

 mpii0 at pci1 dev 0 function 0: vendor 1000 product 00ab (rev. 0x01)
 mpii0: interrupting at irq 11
 mpii0: HBA 9400-8i8e, firmware 3.0.4.0, MPI 2.6
 mpii0: physical device inserted in slot 9
 mpii0: physical device inserted in slot 13
 mpii0: physical device inserted in slot 16
 st0 at scsibus0 target 9 lun 0: <IBM, ULTRIUM-HH7, J4D1> tape removable
 st0: density code 92, variable blocks, write-enabled
 ch0 at scsibus0 target 9 lun 1: <BDT, FlexStor II, 5.50> changer removable
 ch0: 23 slots, 2 drives, 1 picker, 1 portal
 st0: tagged queueing
 ch0: tagged queueing
 st1 at scsibus0 target 13 lun 0: <IBM, ULTRIUM-HH7, J4D1> tape removable
 st1: density code 92, variable blocks, write-enabled
 st1: tagged queueing
 ses0 at scsibus0 target 16 lun 0: <LSI, VirtualSES, 01> enclosure
 services fixed

 Note: pullup-8


 To generate a diff of this commit:
 cvs rdiff -u -r1.8.10.4 -r1.8.10.5 src/sys/dev/pci/mpii.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->closed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Fri, 15 Mar 2019 14:58:45 +0000
State-Changed-Why:
pullup-8 completed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.