NetBSD Problem Report #41797
From Wolfgang.Stukenbrock@nagler-company.com Wed Jul 29 17:43:21 2009
Return-Path: <Wolfgang.Stukenbrock@nagler-company.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id AD32363B882
for <gnats-bugs@gnats.NetBSD.org>; Wed, 29 Jul 2009 17:43:21 +0000 (UTC)
Message-Id: <20090729174316.E49CF4EA9FE@s012.nagler-company.com>
Date: Wed, 29 Jul 2009 19:43:16 +0200 (CEST)
From: Wolfgang.Stukenbrock@nagler-company.com
Reply-To: Wolfgang.Stukenbrock@nagler-company.com
To: gnats-bugs@gnats.NetBSD.org
Subject: kernel panic in kern_physio when tape reaches EOM during write if DIAGNOSTICS is enbled, without DIGNOSTICS error status is lost
X-Send-Pr-Version: 3.95
>Number: 41797
>Category: kern
>Synopsis: kernel panic in kern_physio when tape reaches EOM during write if DIAGNOSTICS is enbled, without DIGNOSTICS error status is lost
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jul 29 17:45:00 +0000 2009
>Last-Modified: Wed Jul 29 19:15:02 +0000 2009
>Originator: Wolfgang Stukenbrock
>Release: NetBSD 4.0
>Organization:
Dr. Nagler & Company GmbH
>Environment:
System: NetBSD s012 4.0 NetBSD 4.0 (NSW-S012) #9: Fri Mar 13 12:31:52 CET 2009 wgstuken@s012:/usr/src/sys/arch/amd64/compile/NSW-S012 amd64
Architecture: x86_64
Machine: amd64
>Description:
We have a VXA320 Tape connected to an adaptec 29160 controler at this system.
For debugging purpose we run a kernel with DIAGNOSTICS enabled.
"Sometimes" the systems panics with an asserstion in kern_phyio.c in line 201 "KASSERT((bp->b_flags & B_ERROR) == 0);".
This is only enabled when the kernel is compiled with DIAGNOSTICS - so most user will never get the panic.
(but first EOM error status is lost ... - see analyses below)
It took some time to find out, that the cause of it is the st-driver.
We are using the nrst-devices - so no fixed block mode - and the default behaviour ot theese is EEW disabled.
Now the following happens when EOM is reached on the tape:
The XS-command is returned with XS_SENCE from the ahc driver. The transfer count is equal to the number of bytes requested.
The st-driver detects that EOM is the cause for the problem. Due to the fact that no EEW is enabled it returns EIO.
The st-drive is called again to finish the packet with EIO indicated. It set B_ERROR in the buffer.
The physio-done routine now checks if all bytes have been transfered - and it is (!) - so it reaches the assertion above -> panic.
If EEW is enabled on the tape, the st-driver returns 0 (no-error) after detecting EOM and no problem occurs.
I'm not realy confirmed with the return semantics of the HW-controlers. The code seems to ecpect, that the failed command is
returned and the sence-info has to be requested.
In the case above, the ahc driver already returns the sence information. The driver seems to be able to handle this too.
I don't know it it is a legal situation, that all bytes have been written by the tape, but EOM is signaled anyway.
Perhaps this is a special case of the VXA-tape drive.
netherless: The code in kern_phsyio.c physio_done() looks wrong to me, because it does not update the error status in mbp if all bytes
have been transfered but B_ERROR has been set too. This looses the error information and no error is reported to user level
as it should be.
In fact without DIAGNOSTICS in kernel-config, the first EOM-hit by a write is not returned to user level! I've tested it.
I think the way to fix this, is to check the B_ERROR flag in phsyio_done() too and enter error processing if either not all
requested bytes have been transfered or an error status is set.
remark: this leads to another bug in phsyio() some lines below .. the check with delta must allow 0 too. We must allow an error
even if all requested data has been transferd .....
>How-To-Repeat:
Setup a kernel with DIGNOSTIC, connect an SCSI-Tape to it and fill up the tape till it hits EOM.
The system will panic there ...
>Fix:
The following fix need to be applyedto sys/kern/kern_physio.c.
With this fix the system no longer panic an the error is returend to user-level on first EOM detection.
--- kern_physio.c 2009/07/29 13:56:10 1.1
+++ kern_physio.c 2009/07/29 17:39:11
@@ -158,7 +158,7 @@
uvm_vsunlock(bp->b_proc->p_vmspace, bp->b_data, todo);
simple_lock(&mbp->b_interlock);
- if (__predict_false(done != todo)) {
+ if (__predict_false(done != todo || (bp->b_flags & B_ERROR) == 0)) {
off_t endoffset = dbtob(bp->b_blkno) + done;
/*
@@ -197,8 +197,6 @@
mbp->b_error = error;
}
mbp->b_flags |= B_ERROR;
- } else {
- KASSERT((bp->b_flags & B_ERROR) == 0);
}
mbp->b_running--;
@@ -438,7 +436,7 @@
off_t delta;
delta = uio->uio_offset - mbp->b_endoffset;
- KASSERT(delta > 0);
+ KASSERT(delta >= 0);
uio->uio_resid += delta;
/* uio->uio_offset = mbp->b_endoffset; */
} else {
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/41797: kernel panic in kern_physio when tape reaches EOM
during write if DIAGNOSTICS is enbled, without DIGNOSTICS error
status is lost
Date: Wed, 29 Jul 2009 19:13:33 +0000
On Wed, Jul 29, 2009 at 05:45:00PM +0000, Wolfgang.Stukenbrock@nagler-company.com wrote:
> We have a VXA320 Tape connected to an adaptec 29160 controler
> at this system. For debugging purpose we run a kernel with
> DIAGNOSTICS enabled. "Sometimes" the systems panics with an
> asserstion in kern_phyio.c in line 201 "KASSERT((bp->b_flags &
> B_ERROR) == 0);".
I swear I've seen this problem reported before, but there isn't an
open PR that I can find for it, and it doesn't look to have been fixed
either. Does anyone else remember this or am I confusing it with
something else?
Anyhow, also see 38643, which is connected. The I/O completion
reporting for st seems to be thoroughly unsatisfactory. :-|
--
David A. Holland
dholland@netbsd.org
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.