NetBSD Problem Report #57785

From www@netbsd.org  Tue Dec 19 03:45:34 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 97B8A1A923C
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 19 Dec 2023 03:45:34 +0000 (UTC)
Message-Id: <20231219034533.319971A923F@mollari.NetBSD.org>
Date: Tue, 19 Dec 2023 03:45:33 +0000 (UTC)
From: 2857@gmx.de
Reply-To: 2857@gmx.de
To: gnats-bugs@NetBSD.org
Subject: unable to use iscsi kernel initiator on sparc64
X-Send-Pr-Version: www-1.0

>Number:         57785
>Category:       port-sparc64
>Synopsis:       unable to use iscsi kernel initiator on sparc64
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-sparc64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Dec 19 03:50:00 +0000 2023
>Last-Modified:  Tue Dec 19 09:10:01 +0000 2023
>Originator:     zip100
>Release:        NetBSD 9.3
>Organization:
>Environment:
target: NetBSD netbsdvm 9.3 NetBSD 9.3 (GENERIC) #0: Thu Aug  4 15:30:37 UTC 2022  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64

initiator: 
NetBSD sunfish 9.3 NetBSD 9.3 (GENERIC) #0: Thu Aug  4 15:30:37 UTC 2022  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/sparc64/compile/GENERIC sparc64
>Description:
Despite initiator being a sparc64 machine, I do believe it's not related to endianness, though I might be wrong...anyway, just trying to follow up the example from https://man.netbsd.org/iscsictl.8 . 

---------------------------------------- 
HOSTNAME     ADDRESS           ROLE
sunfish      192.168.33.64.    initiator
netbsdvm     192.168.33.199    target
----------------------------------------

I've setup a 256M test disk:
netbsdvm # cat /etc/iscsi/targets
# extent        file or device          start           length
extent0         /tmp/iscsi-target0      0               256MB
# target        flags   storage         netmask
target0         rw      extent0         0.0.0.0/0

netbsdvm # grep iscsi /etc/rc.conf
iscsi_target=YES


Kernel messages:

<snip>

Dec 19 03:17:03 netbsdvm iscsi-target: > iSCSI Discovery login  successful from iqn.1994-04.org.netbsd:iscsi.sunfish:2201371340 on 192.168.33.64 disk -1, ISID 70368764559360, TSIH 1
Dec 19 03:17:03 netbsdvm iscsi-target: pid 474:/usr/src/external/bsd/iscsi/lib/../dist/src/lib/target.c:1223: ***ERROR*** CmdSN
Dec 19 03:17:03 netbsdvm iscsi-target: pid 474:/usr/src/external/bsd/iscsi/lib/../dist/src/lib/target.c:1388: ***ERROR*** logout_command_t() failed 
Dec 19 03:17:03 netbsdvm iscsi-target: pid 474:/usr/src/external/bsd/iscsi/lib/../dist/src/lib/target.c:1502: ***ERROR*** execute_t() failed 
^^^^ output after refresh_targets



Dec 19 03:17:52 netbsdvm iscsi-target: pid 474:/usr/src/external/bsd/iscsi/lib/../dist/src/lib/target.c:1108: ***ERROR*** cmd.tsih 0 not found 
Dec 19 03:17:52 netbsdvm iscsi-target: > iSCSI Normal login  successful from iqn.1994-04.org.netbsd:iscsi.sunfish:2201371340 on 192.168.33.64 disk 0, ISID 70368764559360, TSIH 2
Dec 19 03:17:52 netbsdvm iscsi-target: pid 474:/usr/src/external/bsd/iscsi/lib/../dist/src/lib/disk.c:1364: ***ERROR*** UNKNOWN OPCODE 0xa3 
^^^ output after login



Kernel log on client after running final newfs command:
[ 1655.6300054] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
[ 1655.7027396] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
[ 1655.7754745] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
[ 1655.8482093] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
[ 1655.9209439] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
[ 1655.9936783] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
[ 1656.0664131] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings

(spamming endlessly on the console, machine is unresponsive via network)
>How-To-Repeat:
Let's follow along the example from https://man.netbsd.org/iscsictl.8 :

sunfish # iscsictl add_send_target -a 192.168.33.199
Added Send Target 1

sunfish # iscsictl refresh_targets
OK

sunfish # iscsictl list_targets
     1: iqn.1994-04.org.netbsd.iscsi-target:target0
        2: 192.168.33.199:3260,1

sunfish # iscsictl login -P 1 
Created Session 2, Connection 1

sunfish # iscsictl list_sessions
Session 2: Target iqn.1994-04.org.netbsd.iscsi-target:target0

sunfish # newfs /dev/rsd0a
newfs: /dev/rsd0a: open for read: Device not configured

sunfish # newfs /dev/rsd0 
newfs: Unable to determine file system size

sunfish # newfs /dev/rsd0
newfs: Unable to determine file system size

sunfish # disklabel sd0
# /dev/rsd0:
type: SCSI
disk: NetBSD iSCSI    
label: fictitious
flags:
bytes/sector: 512
sectors/track: 32
tracks/cylinder: 64
sectors/cylinder: 2048
cylinders: 256
total sectors: 524288
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

3 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 c:    524288         0     unused      0     0        # (Cyl.      0 -    255)
disklabel: boot block size 0
disklabel: super block size 0


----------------------------------------------------------------------

Trying to fix the situation with fdisk:

sunfish # fdisk -0ua sd0
fdisk: primary partition table invalid, no magic in sector 0
fdisk: Cannot determine the number of heads
Disk: /dev/rsd0
NetBSD disklabel disk geometry:
cylinders: 256, heads: 64, sectors/track: 32 (2048 sectors/cylinder)
total sectors: 524288, bytes/sector: 512

BIOS disk geometry:
cylinders: 256, heads: 64, sectors/track: 32 (2048 sectors/cylinder)
total sectors: 524288

Partitions aligned to 2048 sector boundaries, offset 32

Do you want to change our idea of what BIOS thinks? [n]  

Partition 0:
<UNUSED>
The data for partition 0 is:
<UNUSED>
sysid: [0..255 default: 169] 
start: [0..256cyl default: 32, 0cyl, 0MB] 
size: [0..256cyl default: 524256, 256cyl, 256MB] 
Do you want to change the active partition? [n] y
Choosing 4 will make no partition active.
active partition: [0..4 default: 0] 
Are you happy with this choice? [n] y

We haven't written the MBR back to disk yet.  This is your last chance.
Partition table:
0: NetBSD (sysid 169)
    start 32, size 524256 (256 MB, Cyls 0-255), Active
        PBR is not bootable: All bytes are identical (0x00)
1: <UNUSED>
2: <UNUSED>
3: <UNUSED>
First active partition: 0
Drive serial number: 0 (0x00000000)
Should we write new partition table? [n] y

^^^^^ could serial number of 000000000 be the problem? also, where did 3 empty partitions come from...


Running disklabel, apparently this should help...

sunfish # disklabel -i sd0
Enter '?' for help
partition>P
3 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 c:    524288         0     unused      0     0        # (Cyl.      0 -    255)
partition>e
Filesystem type [unused]: 4.2BSD
Start offset ('x' to start after partition 'x') [0c, 0s, 0M]: 
Partition size ('$' for all remaining) [0c, 0s, 0M]: $
 e:    524288         0     4.2BSD      0     0     0  # (Cyl.      0 -    255)
partition>W
Label disk [n]?y
Label written
partition>Q



And this completely locks up the SSH session, spamming QUEUE FULL on the console:
sunfish # newfs /dev/rsd0a


>Fix:

>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-sparc64/57785: unable to use iscsi kernel initiator on sparc64
Date: Tue, 19 Dec 2023 09:06:33 -0000 (UTC)

 2857@gmx.de writes:

 >I've setup a 256M test disk:
 >netbsdvm # cat /etc/iscsi/targets
 ># extent        file or device          start           length
 >extent0         /tmp/iscsi-target0      0               256MB
 ># target        flags   storage         netmask
 >target0         rw      extent0         0.0.0.0/0

 The userland iscsi-target (and -initiator) are very limited.



 >Kernel log on client after running final newfs command:
 >[ 1655.6300054] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
 >[ 1655.7027396] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
 >[ 1655.7754745] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
 >[ 1655.8482093] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
 >[ 1655.9209439] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
 >[ 1655.9936783] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
 >[ 1656.0664131] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings

 Means: the target rejects queries.


 It does work better against the istgt from pkgsrc.


 >sunfish # newfs /dev/rsd0
 >newfs: Unable to determine file system size

 doesn't really match the disklabel output which returns size information.
 It would be interesting to get output from 'drvctl -p sd0'.

 >sunfish # disklabel sd0
 ># /dev/rsd0:
 ...
 >bytes/sector: 512
 ...
 >total sectors: 524288
 ...

 >3 partitions:
 >#        size    offset     fstype [fsize bsize cpg/sgs]
 > c:    524288         0     unused      0     0        # (Cyl.      0 -    255)

 I get a warning about wrong partition type (not 4.2BSD).


 >Trying to fix the situation with fdisk:
 sparc64 doesn't regularly MBR partitions but Sun disklabels.


 I have set up a netbsd-10_RC1/sparc64 system for qemu.
 The target is a netbsd-9/i386 system running istgt.


 My first problem was that iscsictl login failed with a timeout,
 istgt reported a protocol error. This is because I had configured
 CHAP, which uses "big numbers" that can either be encoded as
 base-64 or as hex strings. The kernel initiator uses base-64
 by default (and accepts anything as answer), but istgt only supports
 hex strings. Setting hw.iscsi.hexbignums=1 with sysctl helps.
 NetBSD-current (and after the latest pullup netbsd-10) switches
 the encoding automatically according to what the target uses.

 [  1161.173935] scsibus0 at iscsi0: 1 target, 16 luns per target
 [  1161.193182] sd0 at scsibus0 target 0 lun 0: <NetBSD, iSCSI DISK, 0001> disk fixed
 [  1161.243748] sd0: fabricating a geometry
 [  1161.243748] sd0: 100 GB, 12800 cyl, 64 head, 32 sec, 4096 bytes/sect x 26214400 sectors
 [  1161.253557] sd0: fabricating a geometry
 [  1161.265175] sd0: GPT GUID: ff9d9e75-78aa-4710-adf5-ffd30db38cb3
 [  1161.265175] dk0 at sd0: "links", 26214389 blocks at 6, type: ffs
 [  1161.275293] sd1 at scsibus0 target 0 lun 1: <NetBSD, iSCSI DISK, 0001> disk fixed
 [  1161.323010] sd1: fabricating a geometry
 [  1161.323010] sd1: 100 GB, 12800 cyl, 64 head, 32 sec, 4096 bytes/sect x 26214400 sectors
 [  1161.323010] sd1: fabricating a geometry
 [  1161.334560] sd1: GPT GUID: 6a621763-60e5-44c9-a494-53e27f79308e
 [  1161.334560] dk1 at sd1: "rechts", 26214389 blocks at 6, type: ffs
 [  1161.334560] sd0: async, 8-bit transfers, tagged queueing
 [  1161.334560] sd1: async, 8-bit transfers, tagged queueing

 istgt has configured two LUNs for that target. So two disks
 attached.


 Trying to change the disklabel to 4.2BSD failed with DIOCWDINFO:
 Label magic number or checksum is wrong!.
 This is misleading, the error returned was EINVAL from reading
 the disklabel because I had configured the iscsi volume with
 4K sectors. The sparc64 disklabel routines do not support this.

 [  1918.054422] scsibus0 at iscsi0: 1 target, 16 luns per target
 [  1918.073255] sd0 at scsibus0 target 0 lun 0: <NetBSD, iSCSI DISK, 0001> disk fixed
 [  1918.114634] sd0: fabricating a geometry
 [  1918.114634] sd0: 100 GB, 102400 cyl, 64 head, 32 sec, 512 bytes/sect x 209715200 sectors
 [  1918.123510] sd0: fabricating a geometry
 [  1918.133597] sd1 at scsibus0 target 0 lun 1: <NetBSD, iSCSI DISK, 0001> disk fixed
 [  1918.174570] sd1: fabricating a geometry
 [  1918.174570] sd1: 100 GB, 102400 cyl, 64 head, 32 sec, 512 bytes/sect x 209715200 sectors
 [  1918.183509] sd1: fabricating a geometry
 [  1918.183509] sd0: async, 8-bit transfers, tagged queueing
 [  1918.183509] sd1: async, 8-bit transfers, tagged queueing


 After reconfiguring the target to 512 byte sectors it still
 failed. Despite the error above, disklabel _had_ written
 the new label with the 4096 byte sector size information and
 the system was trusting that information even after the real
 disk had changed. Rewriting the disklabel with the correct
 sector size was then succesful.

 # /dev/rsd0:
 type: SCSI
 disk: iSCSI DISK      
 label: fictitious
 flags:
 bytes/sector: 512
 sectors/track: 32
 tracks/cylinder: 64
 sectors/cylinder: 2048
 cylinders: 12800
 total sectors: 26214400
 rpm: 3600
 interleave: 1
 trackskew: 0
 cylinderskew: 0
 headswitch: 0           # microseconds
 track-to-track seek: 0  # microseconds
 drivedata: 0 

 3 partitions:
 #        size    offset     fstype [fsize bsize cpg/sgs]
  c:  26214400         0     4.2BSD      0     0     0  # (Cyl.      0 -  12799)

 newfs then succeeded and I could mount the volume:

 Filesystem     Size   Used  Avail %Cap Mounted on
 /dev/sd0        98G    64M    93G   0% /mnt

 N.B. istgt volume definition is now:

 [LogicalUnit2]
   Comment "LVM test 1" 
   TargetName test 
   TargetAlias "Test LUN 0"
   Mapping PortalGroup1 InitiatorGroup1
   AuthMethod CHAP
   AuthGroup AuthGroup1
   UseDigest Auto
   UnitType Disk
   QueueDepth 128
   BlockLength 512
   LUN0 Storage /dev/vg0/rlvleft 100GB
   LUN1 Storage /dev/vg0/rlvright 100GB

 Backends for the target are two LVM volumes. This (still) requires to
 specify the disk size as istgt tries to query the disk size Linux
 style (with stat()) which does not work reliably for NetBSD device
 nodes.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.