NetBSD Problem Report #51037

From agrier@poofygoof.com  Sat Apr  2 06:51:28 2016
Return-Path: <agrier@poofygoof.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 977237A490
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  2 Apr 2016 06:51:28 +0000 (UTC)
Message-Id: <20160402053603.A7FE916A48@pythagoras.poofy.goof.com>
Date: Fri,  1 Apr 2016 22:36:03 -0700 (PDT)
From: agrier@pythagoras.poofy.goof.com
Reply-To: agrier@poofy.goof.com
To: gnats-bugs@NetBSD.org
Subject: iscsi initiator fails sporadically with "illegal request" error with FreeBSD-10.2 ctl target
X-Send-Pr-Version: 3.95

>Number:         51037
>Category:       kern
>Synopsis:       iscsi initiator fails sporadically with "illegal request" error with FreeBSD-10.2 ctl target
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    mlelstv
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Apr 02 06:55:00 +0000 2016
>Closed-Date:    Sat Mar 14 10:41:58 +0000 2020
>Last-Modified:  Sat Mar 14 10:41:58 +0000 2020
>Originator:     agrier@pythagoras.poofy.goof.com
>Release:        NetBSD 7.0_STABLE
>Organization:
  Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com
>Environment:


System: NetBSD pythagoras.poofy.goof.com 7.0_STABLE NetBSD 7.0_STABLE (X2200M2_DOM0) #0: Tue Nov 17 22:49:19 PST 2015 agrier@arignote.poofy.goof.com:/var/obj/7.0_STABLE/X2200M2_DOM0 amd64
X2200M2_DOM0 is XEN3_DOM0 with pseudo-device iscsi enabled, and limited
matching of bge device to avoid clobbering out-of-band management port.
FreeBSD target is 10.2-RELEASE.
Architecture: x86_64
Machine: amd64
>Description:
NetBSD-7 in-kernel initiator (iscsid) is able to connect to a FreeBSD
10.2 ctl-provided target, but receives write errors when attempting to
newfs a slice:

# scsictl sd1 identify
/dev/rsd1d: scsibus1 target 0 lun 0 <FREEBSD, CTLDISK, 0001>
# gpt show sd1
      start       size  index  contents
          0          1         PMBR
          1          1         Pri GPT header
          2         32         Pri GPT table
         34          6
         40   83884026      1  GPT part - NetBSD FFSv1/FFSv2
   83884066    4196276      2  GPT part - NetBSD swap
   88080342  184549385      3  GPT part - NetBSD FFSv1/FFSv2
  272629727         32         Sec GPT table
  272629759          1         Sec GPT header
# dkctl sd1 listwedges
/dev/rsd1d: 3 wedges:
dk0: 7c2a809e-fba9-4d45-867c-d7544833743c, 83884026 blocks at 40, type: ffs
dk1: 76c19470-cd40-4ef6-885f-ad3ede7873e0, 4196276 blocks at 83884066, type: swap
dk2: 3789d038-600c-45c6-b48c-e8639473948a, 184549385 blocks at 88080342, type: ffs
# newfs dk2
/dev/rdk2: 90112.0MB (184549384 sectors) block size 16384, fragment size 2048
        using 489 cylinder groups of 184.28MB, 11794 blks, 23296 inodes.
super-block backups (for fsck_ffs -b #) at:
32, 377440, 754848, 1132256, 1509664, 1887072, 2264480, 2641888, 3019296,
.wtfs: write error for sector 7930336: Invalid argument
# dmesg | tail -1
sd1(iscsi0:0:0:0): illegal request, data = 00 00 00 00 4e 00

this happens both in the dom0 and a domU which has the iscsi target exported
as an xbd.
>How-To-Repeat:
create ZFS block device (zfs create -V130G tank/foo)
export as iscsi lun target in FreeBSD (edit /etc/ctl.conf, start ctld)
attach to NetBSD initiator (start iscsid, iscsctl {add_send_target,
  refresh_targets, login})
GPT partition device (gpt create sd1, gpt add -s xxx -t ffs sd1)
add slices (dkctl ...)
format a slice (newfs dk2)
>Fix:


>Release-Note:

>Audit-Trail:
From: "Aaron J. Grier" <agrier@poofygoof.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/51037
Date: Sat, 2 Apr 2016 16:57:20 -0700

 additional data point: this problem seems to be specific to FreeBSD.  a
 2GB test target with GPT partition exported from SmartOS works
 correctly, and I can newfs multiple times with different block sizes.

 -- 
   Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/51037
Date: Sun, 3 Apr 2016 06:58:00 +0000 (UTC)

 agrier@poofygoof.com ("Aaron J. Grier") writes:

 > additional data point: this problem seems to be specific to FreeBSD.  a
 > 2GB test target with GPT partition exported from SmartOS works
 > correctly, and I can newfs multiple times with different block sizes.

 The error message indicates that there is an issue with concurrent
 commands. This can easily be a protocol violation of the initiator,
 but maybe also the target offering command queueing but failing to
 handle it.
 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: "Aaron J. Grier" <agrier@poofygoof.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/51037
Date: Mon, 11 Apr 2016 22:03:26 -0700

 a tcpdump reveals the problem:
 http://poofygoof.com/~agrier/attic/iscsifail.cap.bz2

 NetBSD iSCSI initiator is attempting to initiate two writes
 concurrently, but FreeBSD iSCSI target apparently does not support
 overlapped commands.

 this seems more of a problem with FreeBSD, (and I will follow-up there,)
 but is there any way I can configure NetBSD to play nice in the
 meantime?  (I would think FreeBSD would want to allow concurrent
 commands to allow pipelining...)

 as a wild guess, I tried disabling write-back cache on the iscsi sd
 device with scsictl, but that didn't change the observed behavior.

 -- 
   Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/51037
Date: Tue, 12 Apr 2016 10:08:36 +0000 (UTC)

 agrier@poofygoof.com ("Aaron J. Grier") writes:

 > this seems more of a problem with FreeBSD, (and I will follow-up there,)
 > but is there any way I can configure NetBSD to play nice in the
 > meantime?  (I would think FreeBSD would want to allow concurrent
 > commands to allow pipelining...)

 This:

 sys/dev/iscsi/iscsi_main.c:   adapt->adapt_max_periph = CCBS_PER_SESSION;

 should be the maximum number of outstanding commands to a ISCSI device.

 However, concurrent commands to a single LUN should only be generated
 with TCQ.



 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: "Aaron J. Grier" <agrier@poofygoof.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/51037
Date: Sun, 22 May 2016 16:02:53 -0700

 effective (but crude) workaround follows.  more complete packet capture
 available at http://poofygoof.com/~agrier/attic/freebsd_iscsi.cap.bz2 .
 (compare with http://poofygoof.com/~agrier/attic/smartos_iscsi.cap.bz2).

 Index: scsiconf.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/scsipi/scsiconf.c,v
 retrieving revision 1.273
 diff -u -r1.273 scsiconf.c
 --- scsiconf.c	25 Jul 2014 08:10:38 -0000	1.273
 +++ scsiconf.c	22 May 2016 19:24:28 -0000
 @@ -705,6 +705,9 @@
  	 "IOMEGA", "ZIP 100",		 "J.03"}, PQUIRK_NOLUNS|PQUIRK_NOSYNC},
  	{{T_DIRECT, T_REMOV,
  	 "INSITE", "I325VM",             ""},     PQUIRK_NOLUNS},
 +	/* FreeBSD ctld iscsi target doesn't handle tags? */
 +	{{T_DIRECT, T_FIXED,
 +	 "FREEBSD", "CTLDISK",       "0001"},     PQUIRK_NOTAG},

  	/* XXX: QIC-36 tape behind Emulex adapter.  Very broken. */
  	{{T_SEQUENTIAL, T_REMOV,

 -- 
   Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com

Responsible-Changed-From-To: kern-bug-people->mlelstv
Responsible-Changed-By: mlelstv@NetBSD.org
Responsible-Changed-When: Thu, 26 May 2016 06:38:20 +0000
Responsible-Changed-Why:
I can replicate the problem.


From: "Aaron J. Grier" <agrier@poofygoof.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/51037
Date: Sun, 14 Aug 2016 20:21:23 -0700

 slowly chipping away at this.

 from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209700:

 > It [FreeBSD target] supports tagged queueing of commands marked as
 > tagged.  For some reason the NetBSD initiator sets the "flags" field
 > of SCSI Command PDU to Untagged (0).  Not sure what the SCSI spec
 > says, to be honest, but it might it be an initiator error?

 turned on #define ISCSI_DEBUG 10 to see if I could get any clues, but
 nothing obvious is sticking out, maybe another set of eyes?

 here's the hopefully relevant part:

 ISCSI: scsipi_request RUN_XFER
 RunXfer: flags=22009, data=0xffffa0000c1b0000, datalen=65536, resid=65536, cmdlen=10, waitok=0
 RunXfer: Periph_lun = 0, cmd[1] = 0, cmdlen = 10
 S2C1: Send Command: CmdSN 1030, data_in 0, len 65536, totlen 0
 S2C1: Send_pdu: ccb=0xffffa00002490588, pcd=2, cdsp=5, pdu=0xffffa000024bf218, pdsp=3
 scsipi_req returns
 ISCSI: scsipi_request RUN_XFER
 RunXfer: flags=22009, data=0xffffa0000c1c0000, datalen=65536, resid=65536, cmdlen=10, waitok=0
 RunXfer: Periph_lun = 0, cmd[1] = 0, cmdlen = 10
 S2C1: Send Command: CmdSN 1031, data_in 0, len 65536, totlen 0
 S2C1: Send_pdu: ccb=0xffffa00002490378, pcd=2, cdsp=5, pdu=0xffffa000024bf4e8, pdsp=3
 scsipi_req returns
 ISCSI: scsipi_request RUN_XFER
 RunXfer: flags=22009, data=0xffffa0000c1d0000, datalen=65536, resid=65536, cmdlen=10, waitok=0
 RunXfer: Periph_lun = 0, cmd[1] = 0, cmdlen = 10
 S2C1: Send Command: CmdSN 1032, data_in 0, len 65536, totlen 0
 S2C1: Send_pdu: ccb=0xffffa00002490168, pcd=2, cdsp=5, pdu=0xffffa000024bf650, pdsp=3
 scsipi_req returns
 read_pdu_data: data segment length = 0
 S2C1: Rx Command Response rsp = 0, status = 0
 AddSernum bottom 0 [1031], top 1, num 1031, diff 1
 AckSernum bottom 0, top 0, num 1031 ExpSN 1032
 S2C1: CCB done, ccb = 0xffffa00002490798, disp = 5
 read_pdu_data: data segment length = 0
 S2C1: Rx Command Response rsp = 0, status = 0
 AddSernum bottom 0 [1032], top 1, num 1032, diff 1
 AckSernum bottom 0, top 0, num 1032 ExpSN 1033
 S2C1: CCB done, ccb = 0xffffa00002490588, disp = 5
 read_pdu_data: data segment length = 254
 S2C1: Rx Command Response rsp = 0, status = 2
 AddSernum bottom 0 [1033], top 1, num 1033, diff 1
 AckSernum bottom 0, top 0, num 1033 ExpSN 1034
 S2C1: CCB done, ccb = 0xffffa00002490168, disp = 5
 Scsipi_done, error=XS_SENSE, sense data=70 00 05 00...
 read_pdu_data: data segment length = 0
 S2C1: Rx Command Response rsp = 0, status = 0
 AddSernum bottom 0 [1034], top 1, num 1034, diff 1
 AckSernum bottom 0, top 0, num 1034 ExpSN 1035
 S2C1: CCB done, ccb = 0xffffa00002490378, disp = 5
 sd0(iscsi0:0:0:0): illegal request, data = 00 00 00 00 4e 00
 ISCSI: scsipi_request RUN_XFER
 RunXfer: flags=40100, data=0x0, datalen=0, resid=0, cmdlen=10, waitok=1
 RunXfer: Periph_lun = 0, cmd[1] = 0, cmdlen = 10
 S2C1: Send Command: CmdSN 1033, data_in 0, len 0, totlen 0
 S2C1: Send_pdu: ccb=0xffffa0000248ff58, pcd=2, cdsp=5, pdu=0xffffa000024c4ab0, pdsp=3
 scsipi_req returns
 read_pdu_data: data segment length = 0
 S2C1: Received NOP-In PDU, req_ccb=0x0, ITT=ffffffff, TTT=0, StatSN=40b
 S2C1: Send NOP_Out rx_pdu=0xffffa000024c4510
 S2C1: Send_pdu: ccb=0x0, pcd=0, cdsp=2, pdu=0xffffa000024c47e0, pdsp=2
 S2C1: Send NOP_Out rx_pdu=0x0
 S2C1: Send_pdu: ccb=0xffffa0000248fd48, pcd=2, cdsp=3, pdu=0xffffa000024c40d8, pdsp=2
 read_pdu_data: data segment length = 0
 S2C1: Received NOP-In PDU, req_ccb=0xffffa0000248fd48, ITT=c000234, TTT=ffffffff, StatSN=40b
 S2C1: CheckCmdSN - CmdSN=1033, ExpCmdSn=1034, waiting=0xffffa000024c4ab0, flags=a0
 S2C1: CheckCmdSN - CmdSN=1034, ExpCmdSn=1034, waiting=0x0, flags=80
 S2C1: CCB done, ccb = 0xffffa0000248fd48, disp = 3
 AddSernum bottom 0 [1035], top 1, num 1035, diff 1
 AckSernum bottom 0, top 0, num 1035 ExpSN 1036
 read_pdu_data: data segment length = 0
 S2C1: Rx Command Response rsp = 0, status = 0
 AddSernum bottom 0 [1036], top 1, num 1036, diff 1
 AckSernum bottom 0, top 0, num 1036 ExpSN 1037
 S2C1: CCB done, ccb = 0xffffa0000248ff58, disp = 5
 read_pdu_data: data segment length = 0
 S2C1: Received NOP-In PDU, req_ccb=0x0, ITT=ffffffff, TTT=0, StatSN=40d
 S2C1: Send NOP_Out rx_pdu=0xffffa000024c3f70
 S2C1: Send_pdu: ccb=0x0, pcd=0, cdsp=2, pdu=0xffffa000024c3598, pdsp=2
 read_pdu_data: data segment length = 0

 -- 
   Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/51037
Date: Mon, 15 Aug 2016 09:19:35 +0000 (UTC)

 agrier@poofygoof.com ("Aaron J. Grier") writes:

 > from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209700:
 > > It [FreeBSD target] supports tagged queueing of commands marked as
 > > tagged.  For some reason the NetBSD initiator sets the "flags" field
 > > of SCSI Command PDU to Untagged (0).  Not sure what the SCSI spec
 > > says, to be honest, but it might it be an initiator error?
 > 
 > turned on #define ISCSI_DEBUG 10 to see if I could get any clues, but
 > nothing obvious is sticking out, maybe another set of eyes?

 You see that on the iSCSI layer multiple commands are sent to
 the target, but (what you cannot see) the commands are marked
 as untagged and therefore should be executed in-order, so no
 overlap should be possible.

 In -current I did some signficant changes to the iSCSI initiator
 including how tagging is handled.

 The first attempt was to mark all requests as tagged and simply
 ordered but I ignored the actual tag values, this stopped the
 'illegal request' errors immediately.

 Now it passes tags from scsipi to the iSCSI layer which is probably
 the right thing to do if you support taggging.

 IMHO the FreeBSD target has an issue in how it handles untagged
 requests, but I haven't found anything suspicious when looking
 at the FreeBSD code.

 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: "Aaron J. Grier" <agrier@poofygoof.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/51037
Date: Thu, 18 Aug 2016 00:12:35 -0700

 I did a minimal back-port from -current to propagate tag information to
 pdu, and now FreeBSD target is happy, and I can newfs the drive.

 running bonnie++ locks up in biolock, but that's a different issue.  :)

 this bug can be closed.  (off to check -current.)

 Index: sys/dev/iscsi/iscsi_globals.h
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/iscsi/iscsi_globals.h,v
 retrieving revision 1.6
 diff -u -r1.6 iscsi_globals.h
 --- sys/dev/iscsi/iscsi_globals.h	29 Dec 2012 11:05:29 -0000	1.6
 +++ sys/dev/iscsi/iscsi_globals.h	18 Aug 2016 07:10:29 -0000
 @@ -49,6 +49,7 @@
  #include <sys/device.h>

  #include <dev/scsipi/scsi_all.h>
 +#include <dev/scsipi/scsi_message.h>
  #include <dev/scsipi/scsipi_all.h>
  #include <dev/scsipi/scsiconf.h>
  #include <dev/scsipi/scsipiconf.h>
 @@ -59,7 +60,7 @@

  /* ------------------------ Code selection constants ------------------------ */

 -/* #define ISCSI_DEBUG      1 */
 +#define ISCSI_DEBUG      0

  #include "iscsi_perf.h"
  #include "iscsi_test.h"
 @@ -295,6 +296,7 @@
  	/* length of text data so far */

  	uint64_t		lun; /* LUN */
 +	uint32_t		tag; /* Command tag */
  	uint8_t			*cmd; /* SCSI command block */
  	uint16_t		cmdlen; /* SCSI command block length */
  	bool			data_in; /* if this is a read request */
 Index: sys/dev/iscsi/iscsi_send.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/iscsi/iscsi_send.c,v
 retrieving revision 1.8.12.1
 diff -u -r1.8.12.1 iscsi_send.c
 --- sys/dev/iscsi/iscsi_send.c	29 Sep 2014 15:45:36 -0000	1.8.12.1
 +++ sys/dev/iscsi/iscsi_send.c	18 Aug 2016 07:10:29 -0000
 @@ -1431,6 +1431,7 @@

  	if (!totlen)
  		pdu->Flags |= FLAG_FINAL;
 +	pdu->Flags |= ccb->tag;

  	if (ccb->data_in)
  		init_sernum(&ccb->DataSN_buf);
 @@ -1524,6 +1525,21 @@
  	DEB(10, ("RunXfer: Periph_lun = %d, cmd[1] = %x, cmdlen = %d\n",
  			xs->xs_periph->periph_lun, ccb->cmd[1], xs->cmdlen));

 +	switch (xs->xs_tag_type) {
 +	case MSG_ORDERED_Q_TAG:
 +		ccb->tag = ATTR_ORDERED;
 +		break;
 +	case MSG_SIMPLE_Q_TAG:
 +		ccb->tag = ATTR_SIMPLE;
 +		break;
 +	case MSG_HEAD_OF_Q_TAG:
 +		ccb->tag = ATTR_HEAD_OF_QUEUE;
 +		break;
 +	default:
 +		ccb->tag = 0;
 +		break;
 +	}
 +
  #ifdef LUN_1
  	ccb->lun += 0x1000000000000LL;
  	ccb->cmd[1] += 0x10;

 -- 
   Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com

State-Changed-From-To: open->closed
State-Changed-By: mlelstv@NetBSD.org
State-Changed-When: Sat, 14 Mar 2020 10:41:58 +0000
State-Changed-Why:
Issue fixed in netbsd-8.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.