NetBSD Problem Report #51037
From agrier@poofygoof.com Sat Apr 2 06:51:28 2016
Return-Path: <agrier@poofygoof.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 977237A490
for <gnats-bugs@gnats.NetBSD.org>; Sat, 2 Apr 2016 06:51:28 +0000 (UTC)
Message-Id: <20160402053603.A7FE916A48@pythagoras.poofy.goof.com>
Date: Fri, 1 Apr 2016 22:36:03 -0700 (PDT)
From: agrier@pythagoras.poofy.goof.com
Reply-To: agrier@poofy.goof.com
To: gnats-bugs@NetBSD.org
Subject: iscsi initiator fails sporadically with "illegal request" error with FreeBSD-10.2 ctl target
X-Send-Pr-Version: 3.95
>Number: 51037
>Category: kern
>Synopsis: iscsi initiator fails sporadically with "illegal request" error with FreeBSD-10.2 ctl target
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: mlelstv
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Apr 02 06:55:00 +0000 2016
>Closed-Date: Sat Mar 14 10:41:58 +0000 2020
>Last-Modified: Sat Mar 14 10:41:58 +0000 2020
>Originator: agrier@pythagoras.poofy.goof.com
>Release: NetBSD 7.0_STABLE
>Organization:
Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com
>Environment:
System: NetBSD pythagoras.poofy.goof.com 7.0_STABLE NetBSD 7.0_STABLE (X2200M2_DOM0) #0: Tue Nov 17 22:49:19 PST 2015 agrier@arignote.poofy.goof.com:/var/obj/7.0_STABLE/X2200M2_DOM0 amd64
X2200M2_DOM0 is XEN3_DOM0 with pseudo-device iscsi enabled, and limited
matching of bge device to avoid clobbering out-of-band management port.
FreeBSD target is 10.2-RELEASE.
Architecture: x86_64
Machine: amd64
>Description:
NetBSD-7 in-kernel initiator (iscsid) is able to connect to a FreeBSD
10.2 ctl-provided target, but receives write errors when attempting to
newfs a slice:
# scsictl sd1 identify
/dev/rsd1d: scsibus1 target 0 lun 0 <FREEBSD, CTLDISK, 0001>
# gpt show sd1
start size index contents
0 1 PMBR
1 1 Pri GPT header
2 32 Pri GPT table
34 6
40 83884026 1 GPT part - NetBSD FFSv1/FFSv2
83884066 4196276 2 GPT part - NetBSD swap
88080342 184549385 3 GPT part - NetBSD FFSv1/FFSv2
272629727 32 Sec GPT table
272629759 1 Sec GPT header
# dkctl sd1 listwedges
/dev/rsd1d: 3 wedges:
dk0: 7c2a809e-fba9-4d45-867c-d7544833743c, 83884026 blocks at 40, type: ffs
dk1: 76c19470-cd40-4ef6-885f-ad3ede7873e0, 4196276 blocks at 83884066, type: swap
dk2: 3789d038-600c-45c6-b48c-e8639473948a, 184549385 blocks at 88080342, type: ffs
# newfs dk2
/dev/rdk2: 90112.0MB (184549384 sectors) block size 16384, fragment size 2048
using 489 cylinder groups of 184.28MB, 11794 blks, 23296 inodes.
super-block backups (for fsck_ffs -b #) at:
32, 377440, 754848, 1132256, 1509664, 1887072, 2264480, 2641888, 3019296,
.wtfs: write error for sector 7930336: Invalid argument
# dmesg | tail -1
sd1(iscsi0:0:0:0): illegal request, data = 00 00 00 00 4e 00
this happens both in the dom0 and a domU which has the iscsi target exported
as an xbd.
>How-To-Repeat:
create ZFS block device (zfs create -V130G tank/foo)
export as iscsi lun target in FreeBSD (edit /etc/ctl.conf, start ctld)
attach to NetBSD initiator (start iscsid, iscsctl {add_send_target,
refresh_targets, login})
GPT partition device (gpt create sd1, gpt add -s xxx -t ffs sd1)
add slices (dkctl ...)
format a slice (newfs dk2)
>Fix:
>Release-Note:
>Audit-Trail:
From: "Aaron J. Grier" <agrier@poofygoof.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/51037
Date: Sat, 2 Apr 2016 16:57:20 -0700
additional data point: this problem seems to be specific to FreeBSD. a
2GB test target with GPT partition exported from SmartOS works
correctly, and I can newfs multiple times with different block sizes.
--
Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/51037
Date: Sun, 3 Apr 2016 06:58:00 +0000 (UTC)
agrier@poofygoof.com ("Aaron J. Grier") writes:
> additional data point: this problem seems to be specific to FreeBSD. a
> 2GB test target with GPT partition exported from SmartOS works
> correctly, and I can newfs multiple times with different block sizes.
The error message indicates that there is an issue with concurrent
commands. This can easily be a protocol violation of the initiator,
but maybe also the target offering command queueing but failing to
handle it.
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: "Aaron J. Grier" <agrier@poofygoof.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/51037
Date: Mon, 11 Apr 2016 22:03:26 -0700
a tcpdump reveals the problem:
http://poofygoof.com/~agrier/attic/iscsifail.cap.bz2
NetBSD iSCSI initiator is attempting to initiate two writes
concurrently, but FreeBSD iSCSI target apparently does not support
overlapped commands.
this seems more of a problem with FreeBSD, (and I will follow-up there,)
but is there any way I can configure NetBSD to play nice in the
meantime? (I would think FreeBSD would want to allow concurrent
commands to allow pipelining...)
as a wild guess, I tried disabling write-back cache on the iscsi sd
device with scsictl, but that didn't change the observed behavior.
--
Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/51037
Date: Tue, 12 Apr 2016 10:08:36 +0000 (UTC)
agrier@poofygoof.com ("Aaron J. Grier") writes:
> this seems more of a problem with FreeBSD, (and I will follow-up there,)
> but is there any way I can configure NetBSD to play nice in the
> meantime? (I would think FreeBSD would want to allow concurrent
> commands to allow pipelining...)
This:
sys/dev/iscsi/iscsi_main.c: adapt->adapt_max_periph = CCBS_PER_SESSION;
should be the maximum number of outstanding commands to a ISCSI device.
However, concurrent commands to a single LUN should only be generated
with TCQ.
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: "Aaron J. Grier" <agrier@poofygoof.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/51037
Date: Sun, 22 May 2016 16:02:53 -0700
effective (but crude) workaround follows. more complete packet capture
available at http://poofygoof.com/~agrier/attic/freebsd_iscsi.cap.bz2 .
(compare with http://poofygoof.com/~agrier/attic/smartos_iscsi.cap.bz2).
Index: scsiconf.c
===================================================================
RCS file: /cvsroot/src/sys/dev/scsipi/scsiconf.c,v
retrieving revision 1.273
diff -u -r1.273 scsiconf.c
--- scsiconf.c 25 Jul 2014 08:10:38 -0000 1.273
+++ scsiconf.c 22 May 2016 19:24:28 -0000
@@ -705,6 +705,9 @@
"IOMEGA", "ZIP 100", "J.03"}, PQUIRK_NOLUNS|PQUIRK_NOSYNC},
{{T_DIRECT, T_REMOV,
"INSITE", "I325VM", ""}, PQUIRK_NOLUNS},
+ /* FreeBSD ctld iscsi target doesn't handle tags? */
+ {{T_DIRECT, T_FIXED,
+ "FREEBSD", "CTLDISK", "0001"}, PQUIRK_NOTAG},
/* XXX: QIC-36 tape behind Emulex adapter. Very broken. */
{{T_SEQUENTIAL, T_REMOV,
--
Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com
Responsible-Changed-From-To: kern-bug-people->mlelstv
Responsible-Changed-By: mlelstv@NetBSD.org
Responsible-Changed-When: Thu, 26 May 2016 06:38:20 +0000
Responsible-Changed-Why:
I can replicate the problem.
From: "Aaron J. Grier" <agrier@poofygoof.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/51037
Date: Sun, 14 Aug 2016 20:21:23 -0700
slowly chipping away at this.
from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209700:
> It [FreeBSD target] supports tagged queueing of commands marked as
> tagged. For some reason the NetBSD initiator sets the "flags" field
> of SCSI Command PDU to Untagged (0). Not sure what the SCSI spec
> says, to be honest, but it might it be an initiator error?
turned on #define ISCSI_DEBUG 10 to see if I could get any clues, but
nothing obvious is sticking out, maybe another set of eyes?
here's the hopefully relevant part:
ISCSI: scsipi_request RUN_XFER
RunXfer: flags=22009, data=0xffffa0000c1b0000, datalen=65536, resid=65536, cmdlen=10, waitok=0
RunXfer: Periph_lun = 0, cmd[1] = 0, cmdlen = 10
S2C1: Send Command: CmdSN 1030, data_in 0, len 65536, totlen 0
S2C1: Send_pdu: ccb=0xffffa00002490588, pcd=2, cdsp=5, pdu=0xffffa000024bf218, pdsp=3
scsipi_req returns
ISCSI: scsipi_request RUN_XFER
RunXfer: flags=22009, data=0xffffa0000c1c0000, datalen=65536, resid=65536, cmdlen=10, waitok=0
RunXfer: Periph_lun = 0, cmd[1] = 0, cmdlen = 10
S2C1: Send Command: CmdSN 1031, data_in 0, len 65536, totlen 0
S2C1: Send_pdu: ccb=0xffffa00002490378, pcd=2, cdsp=5, pdu=0xffffa000024bf4e8, pdsp=3
scsipi_req returns
ISCSI: scsipi_request RUN_XFER
RunXfer: flags=22009, data=0xffffa0000c1d0000, datalen=65536, resid=65536, cmdlen=10, waitok=0
RunXfer: Periph_lun = 0, cmd[1] = 0, cmdlen = 10
S2C1: Send Command: CmdSN 1032, data_in 0, len 65536, totlen 0
S2C1: Send_pdu: ccb=0xffffa00002490168, pcd=2, cdsp=5, pdu=0xffffa000024bf650, pdsp=3
scsipi_req returns
read_pdu_data: data segment length = 0
S2C1: Rx Command Response rsp = 0, status = 0
AddSernum bottom 0 [1031], top 1, num 1031, diff 1
AckSernum bottom 0, top 0, num 1031 ExpSN 1032
S2C1: CCB done, ccb = 0xffffa00002490798, disp = 5
read_pdu_data: data segment length = 0
S2C1: Rx Command Response rsp = 0, status = 0
AddSernum bottom 0 [1032], top 1, num 1032, diff 1
AckSernum bottom 0, top 0, num 1032 ExpSN 1033
S2C1: CCB done, ccb = 0xffffa00002490588, disp = 5
read_pdu_data: data segment length = 254
S2C1: Rx Command Response rsp = 0, status = 2
AddSernum bottom 0 [1033], top 1, num 1033, diff 1
AckSernum bottom 0, top 0, num 1033 ExpSN 1034
S2C1: CCB done, ccb = 0xffffa00002490168, disp = 5
Scsipi_done, error=XS_SENSE, sense data=70 00 05 00...
read_pdu_data: data segment length = 0
S2C1: Rx Command Response rsp = 0, status = 0
AddSernum bottom 0 [1034], top 1, num 1034, diff 1
AckSernum bottom 0, top 0, num 1034 ExpSN 1035
S2C1: CCB done, ccb = 0xffffa00002490378, disp = 5
sd0(iscsi0:0:0:0): illegal request, data = 00 00 00 00 4e 00
ISCSI: scsipi_request RUN_XFER
RunXfer: flags=40100, data=0x0, datalen=0, resid=0, cmdlen=10, waitok=1
RunXfer: Periph_lun = 0, cmd[1] = 0, cmdlen = 10
S2C1: Send Command: CmdSN 1033, data_in 0, len 0, totlen 0
S2C1: Send_pdu: ccb=0xffffa0000248ff58, pcd=2, cdsp=5, pdu=0xffffa000024c4ab0, pdsp=3
scsipi_req returns
read_pdu_data: data segment length = 0
S2C1: Received NOP-In PDU, req_ccb=0x0, ITT=ffffffff, TTT=0, StatSN=40b
S2C1: Send NOP_Out rx_pdu=0xffffa000024c4510
S2C1: Send_pdu: ccb=0x0, pcd=0, cdsp=2, pdu=0xffffa000024c47e0, pdsp=2
S2C1: Send NOP_Out rx_pdu=0x0
S2C1: Send_pdu: ccb=0xffffa0000248fd48, pcd=2, cdsp=3, pdu=0xffffa000024c40d8, pdsp=2
read_pdu_data: data segment length = 0
S2C1: Received NOP-In PDU, req_ccb=0xffffa0000248fd48, ITT=c000234, TTT=ffffffff, StatSN=40b
S2C1: CheckCmdSN - CmdSN=1033, ExpCmdSn=1034, waiting=0xffffa000024c4ab0, flags=a0
S2C1: CheckCmdSN - CmdSN=1034, ExpCmdSn=1034, waiting=0x0, flags=80
S2C1: CCB done, ccb = 0xffffa0000248fd48, disp = 3
AddSernum bottom 0 [1035], top 1, num 1035, diff 1
AckSernum bottom 0, top 0, num 1035 ExpSN 1036
read_pdu_data: data segment length = 0
S2C1: Rx Command Response rsp = 0, status = 0
AddSernum bottom 0 [1036], top 1, num 1036, diff 1
AckSernum bottom 0, top 0, num 1036 ExpSN 1037
S2C1: CCB done, ccb = 0xffffa0000248ff58, disp = 5
read_pdu_data: data segment length = 0
S2C1: Received NOP-In PDU, req_ccb=0x0, ITT=ffffffff, TTT=0, StatSN=40d
S2C1: Send NOP_Out rx_pdu=0xffffa000024c3f70
S2C1: Send_pdu: ccb=0x0, pcd=0, cdsp=2, pdu=0xffffa000024c3598, pdsp=2
read_pdu_data: data segment length = 0
--
Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/51037
Date: Mon, 15 Aug 2016 09:19:35 +0000 (UTC)
agrier@poofygoof.com ("Aaron J. Grier") writes:
> from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209700:
> > It [FreeBSD target] supports tagged queueing of commands marked as
> > tagged. For some reason the NetBSD initiator sets the "flags" field
> > of SCSI Command PDU to Untagged (0). Not sure what the SCSI spec
> > says, to be honest, but it might it be an initiator error?
>
> turned on #define ISCSI_DEBUG 10 to see if I could get any clues, but
> nothing obvious is sticking out, maybe another set of eyes?
You see that on the iSCSI layer multiple commands are sent to
the target, but (what you cannot see) the commands are marked
as untagged and therefore should be executed in-order, so no
overlap should be possible.
In -current I did some signficant changes to the iSCSI initiator
including how tagging is handled.
The first attempt was to mark all requests as tagged and simply
ordered but I ignored the actual tag values, this stopped the
'illegal request' errors immediately.
Now it passes tags from scsipi to the iSCSI layer which is probably
the right thing to do if you support taggging.
IMHO the FreeBSD target has an issue in how it handles untagged
requests, but I haven't found anything suspicious when looking
at the FreeBSD code.
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: "Aaron J. Grier" <agrier@poofygoof.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/51037
Date: Thu, 18 Aug 2016 00:12:35 -0700
I did a minimal back-port from -current to propagate tag information to
pdu, and now FreeBSD target is happy, and I can newfs the drive.
running bonnie++ locks up in biolock, but that's a different issue. :)
this bug can be closed. (off to check -current.)
Index: sys/dev/iscsi/iscsi_globals.h
===================================================================
RCS file: /cvsroot/src/sys/dev/iscsi/iscsi_globals.h,v
retrieving revision 1.6
diff -u -r1.6 iscsi_globals.h
--- sys/dev/iscsi/iscsi_globals.h 29 Dec 2012 11:05:29 -0000 1.6
+++ sys/dev/iscsi/iscsi_globals.h 18 Aug 2016 07:10:29 -0000
@@ -49,6 +49,7 @@
#include <sys/device.h>
#include <dev/scsipi/scsi_all.h>
+#include <dev/scsipi/scsi_message.h>
#include <dev/scsipi/scsipi_all.h>
#include <dev/scsipi/scsiconf.h>
#include <dev/scsipi/scsipiconf.h>
@@ -59,7 +60,7 @@
/* ------------------------ Code selection constants ------------------------ */
-/* #define ISCSI_DEBUG 1 */
+#define ISCSI_DEBUG 0
#include "iscsi_perf.h"
#include "iscsi_test.h"
@@ -295,6 +296,7 @@
/* length of text data so far */
uint64_t lun; /* LUN */
+ uint32_t tag; /* Command tag */
uint8_t *cmd; /* SCSI command block */
uint16_t cmdlen; /* SCSI command block length */
bool data_in; /* if this is a read request */
Index: sys/dev/iscsi/iscsi_send.c
===================================================================
RCS file: /cvsroot/src/sys/dev/iscsi/iscsi_send.c,v
retrieving revision 1.8.12.1
diff -u -r1.8.12.1 iscsi_send.c
--- sys/dev/iscsi/iscsi_send.c 29 Sep 2014 15:45:36 -0000 1.8.12.1
+++ sys/dev/iscsi/iscsi_send.c 18 Aug 2016 07:10:29 -0000
@@ -1431,6 +1431,7 @@
if (!totlen)
pdu->Flags |= FLAG_FINAL;
+ pdu->Flags |= ccb->tag;
if (ccb->data_in)
init_sernum(&ccb->DataSN_buf);
@@ -1524,6 +1525,21 @@
DEB(10, ("RunXfer: Periph_lun = %d, cmd[1] = %x, cmdlen = %d\n",
xs->xs_periph->periph_lun, ccb->cmd[1], xs->cmdlen));
+ switch (xs->xs_tag_type) {
+ case MSG_ORDERED_Q_TAG:
+ ccb->tag = ATTR_ORDERED;
+ break;
+ case MSG_SIMPLE_Q_TAG:
+ ccb->tag = ATTR_SIMPLE;
+ break;
+ case MSG_HEAD_OF_Q_TAG:
+ ccb->tag = ATTR_HEAD_OF_QUEUE;
+ break;
+ default:
+ ccb->tag = 0;
+ break;
+ }
+
#ifdef LUN_1
ccb->lun += 0x1000000000000LL;
ccb->cmd[1] += 0x10;
--
Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com
State-Changed-From-To: open->closed
State-Changed-By: mlelstv@NetBSD.org
State-Changed-When: Sat, 14 Mar 2020 10:41:58 +0000
State-Changed-Why:
Issue fixed in netbsd-8.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.