NetBSD Problem Report #54045
From kardel@Kardel.name Wed Mar 6 10:47:35 2019
Return-Path: <kardel@Kardel.name>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 90A5C7A1CB
for <gnats-bugs@gnats.NetBSD.org>; Wed, 6 Mar 2019 10:47:35 +0000 (UTC)
Message-Id: <20190306093217.E595744B33@Andromeda.Kardel.name>
Date: Wed, 6 Mar 2019 10:32:17 +0100 (CET)
From: kardel@netbsd.org
Reply-To: kardel@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: mpii driver triggers panic on timeout in physio
X-Send-Pr-Version: 3.95
>Number: 54045
>Category: kern
>Synopsis: kernel diagnostic assertion "bp->b_error == 0" failed: file "/src/NetBSD/cur/src/sys/kern/kern_physio.c", line 179
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kardel@netbsd.org
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Mar 06 10:50:00 +0000 2019
>Closed-Date: Fri Mar 15 14:58:45 +0000 2019
>Last-Modified: Fri Mar 15 14:58:45 +0000 2019
>Originator: Frank Kardel
>Release: NetBSD 8.99.34
>Organization:
>Environment:
System: NetBSD admin1 8.99.34 NetBSD 8.99.34 (GENERIC_SCSIDEBUG) #5: Wed Mar 6 08:51:34 CET 2019 kardel@Andromeda:/usr/src/obj.amd64/sys/arch/amd64/compile/GENERIC_SCSIDEBUG amd64
Architecture: x86_64
Machine: amd64
>Description:
Timeouts on SCSI WRITE/READ commands in the mpii driver cause physio to panic with:
kernel diagnostic assertion "bp->b_error == 0" failed: file "/src/NetBSD/cur/src/sys/kern/kern_physio.c", line 179
Root cause is the improper setting of bp->resid in the error case triggering the physio assertion that
fully completed transfers must not indicate error conditions.
>How-To-Repeat:
Force a timeout on READ/WRITE in the mpii driver (tape drives with bad tapes are a
good test candidate) and observe the panic on a DIAGNOSTIC kernel.
>Fix:
Align values of xs->error, xs->resid and xs->status with the expectations of the scsipi subsystem.
>Release-Note:
>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/54045: mpii driver triggers panic on timeout in physio
Date: Wed, 6 Mar 2019 12:31:56 -0000 (UTC)
kardel@netbsd.org writes:
>>Description:
> Timeouts on SCSI WRITE/READ commands in the mpii driver cause physio to panic with:
> kernel diagnostic assertion "bp->b_error == 0" failed: file "/src/NetBSD/cur/src/sys/kern/kern_physio.c", line 179
The adapter driver doesn't hold that assertion. You can get a partial
transfer and an error condition at the same time from SCSI. Some error
paths in some drivers reset xs->resid, but that's not used very consistently.
The sd driver assures that the physio assertion is satisfied in sddone(),
but the tape driver doesn't.
The assertion itself is also something "new". The original code would
accept a partial read (usually an EOM condition) that could also return
an error.
The combination of drivers that reset the residual after an error and
an assertion that checks the fake residual is a bit strange too.
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/54045: mpii driver triggers panic on timeout in physio
Date: Wed, 6 Mar 2019 13:46:26 +0100
I agree with you. I have cleaned up the mpii driver a bit with respect
to error/status/resid handling - I will commit that soon afte some testing.
This cleanup will need to be pulled up to -8 also.
Frank
On 03/06/19 13:35, Michael van Elst wrote:
> The following reply was made to PR kern/54045; it has been noted by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: kern/54045: mpii driver triggers panic on timeout in physio
> Date: Wed, 6 Mar 2019 12:31:56 -0000 (UTC)
>
> kardel@netbsd.org writes:
>
> >>Description:
> > Timeouts on SCSI WRITE/READ commands in the mpii driver cause physio to panic with:
> > kernel diagnostic assertion "bp->b_error == 0" failed: file "/src/NetBSD/cur/src/sys/kern/kern_physio.c", line 179
>
>
> The adapter driver doesn't hold that assertion. You can get a partial
> transfer and an error condition at the same time from SCSI. Some error
> paths in some drivers reset xs->resid, but that's not used very consistently.
>
> The sd driver assures that the physio assertion is satisfied in sddone(),
> but the tape driver doesn't.
>
> The assertion itself is also something "new". The original code would
> accept a partial read (usually an EOM condition) that could also return
> an error.
>
> The combination of drivers that reset the residual after an error and
> an assertion that checks the fake residual is a bit strange too.
>
>
> --
> --
> Michael van Elst
> Internet: mlelstv@serpens.de
> "A potential Snark may lurk in every tree."
>
Responsible-Changed-From-To: kern-bug-people->kardel@netbsd.org
Responsible-Changed-By: kardel@NetBSD.org
Responsible-Changed-When: Wed, 06 Mar 2019 12:51:45 +0000
Responsible-Changed-Why:
mine
State-Changed-From-To: open->analyzed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Wed, 06 Mar 2019 12:51:45 +0000
State-Changed-Why:
root cause is analyzed
From: "Frank Kardel" <kardel@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54045 CVS commit: src/sys/dev/pci
Date: Mon, 11 Mar 2019 14:35:22 +0000
Module Name: src
Committed By: kardel
Date: Mon Mar 11 14:35:22 UTC 2019
Modified Files:
src/sys/dev/pci: mpii.c
Log Message:
PR/54045
fix mpii to adhere to physio diagnostic invariant that
fully processed data must not post an error:
1) verify expected scspi state via KASSERT() instead of just
setting the variables.
2) set xs->resid only in known good conditions
3) insure setting errors in all error paths and refrain
from clearing xs->resid in error paths.
While there do some cosmectic clean up:
1) extend and relocate some debug output
2) mpii HBAs can also manage non-disk devices like tapes etc,
so log that physical "devices" instead of physical "disks" are
attached or detached.
Tested with NEOSeries FlexStor II and luckily a broken tape drive 8-(
mpii0 at pci1 dev 0 function 0: vendor 1000 product 00ab (rev. 0x01)
mpii0: interrupting at irq 11
mpii0: HBA 9400-8i8e, firmware 3.0.4.0, MPI 2.6
mpii0: physical device inserted in slot 9
mpii0: physical device inserted in slot 13
mpii0: physical device inserted in slot 16
st0 at scsibus0 target 9 lun 0: <IBM, ULTRIUM-HH7, J4D1> tape removable
st0: density code 92, variable blocks, write-enabled
ch0 at scsibus0 target 9 lun 1: <BDT, FlexStor II, 5.50> changer removable
ch0: 23 slots, 2 drives, 1 picker, 1 portal
st0: tagged queueing
ch0: tagged queueing
st1 at scsibus0 target 13 lun 0: <IBM, ULTRIUM-HH7, J4D1> tape removable
st1: density code 92, variable blocks, write-enabled
st1: tagged queueing
ses0 at scsibus0 target 16 lun 0: <LSI, VirtualSES, 01> enclosure services fixed
Note: pullup-8
To generate a diff of this commit:
cvs rdiff -u -r1.21 -r1.22 src/sys/dev/pci/mpii.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: PR/54045 CVS commit: src/sys/dev/pci
Date: Fri, 15 Mar 2019 11:02:26 +0100
pullup-8 requested - ticket 1217
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54045 CVS commit: [netbsd-8] src/sys/dev/pci
Date: Fri, 15 Mar 2019 14:50:36 +0000
Module Name: src
Committed By: martin
Date: Fri Mar 15 14:50:36 UTC 2019
Modified Files:
src/sys/dev/pci [netbsd-8]: mpii.c
Log Message:
Pull up following revision(s) (requested by kardel in ticket #1217):
sys/dev/pci/mpii.c: revision 1.22
PR/54045
fix mpii to adhere to physio diagnostic invariant that
fully processed data must not post an error:
1) verify expected scspi state via KASSERT() instead of just
setting the variables.
2) set xs->resid only in known good conditions
3) insure setting errors in all error paths and refrain
from clearing xs->resid in error paths.
While there do some cosmectic clean up:
1) extend and relocate some debug output
2) mpii HBAs can also manage non-disk devices like tapes etc,
so log that physical "devices" instead of physical "disks" are
attached or detached.
Tested with NEOSeries FlexStor II and luckily a broken tape drive 8-(
mpii0 at pci1 dev 0 function 0: vendor 1000 product 00ab (rev. 0x01)
mpii0: interrupting at irq 11
mpii0: HBA 9400-8i8e, firmware 3.0.4.0, MPI 2.6
mpii0: physical device inserted in slot 9
mpii0: physical device inserted in slot 13
mpii0: physical device inserted in slot 16
st0 at scsibus0 target 9 lun 0: <IBM, ULTRIUM-HH7, J4D1> tape removable
st0: density code 92, variable blocks, write-enabled
ch0 at scsibus0 target 9 lun 1: <BDT, FlexStor II, 5.50> changer removable
ch0: 23 slots, 2 drives, 1 picker, 1 portal
st0: tagged queueing
ch0: tagged queueing
st1 at scsibus0 target 13 lun 0: <IBM, ULTRIUM-HH7, J4D1> tape removable
st1: density code 92, variable blocks, write-enabled
st1: tagged queueing
ses0 at scsibus0 target 16 lun 0: <LSI, VirtualSES, 01> enclosure
services fixed
Note: pullup-8
To generate a diff of this commit:
cvs rdiff -u -r1.8.10.4 -r1.8.10.5 src/sys/dev/pci/mpii.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: analyzed->closed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Fri, 15 Mar 2019 14:58:45 +0000
State-Changed-Why:
pullup-8 completed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.