NetBSD Problem Report #57785
From www@netbsd.org Tue Dec 19 03:45:34 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 97B8A1A923C
for <gnats-bugs@gnats.NetBSD.org>; Tue, 19 Dec 2023 03:45:34 +0000 (UTC)
Message-Id: <20231219034533.319971A923F@mollari.NetBSD.org>
Date: Tue, 19 Dec 2023 03:45:33 +0000 (UTC)
From: 2857@gmx.de
Reply-To: 2857@gmx.de
To: gnats-bugs@NetBSD.org
Subject: unable to use iscsi kernel initiator on sparc64
X-Send-Pr-Version: www-1.0
>Number: 57785
>Category: port-sparc64
>Synopsis: unable to use iscsi kernel initiator on sparc64
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-sparc64-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Dec 19 03:50:00 +0000 2023
>Last-Modified: Tue Dec 19 09:10:01 +0000 2023
>Originator: zip100
>Release: NetBSD 9.3
>Organization:
>Environment:
target: NetBSD netbsdvm 9.3 NetBSD 9.3 (GENERIC) #0: Thu Aug 4 15:30:37 UTC 2022 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
initiator:
NetBSD sunfish 9.3 NetBSD 9.3 (GENERIC) #0: Thu Aug 4 15:30:37 UTC 2022 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/sparc64/compile/GENERIC sparc64
>Description:
Despite initiator being a sparc64 machine, I do believe it's not related to endianness, though I might be wrong...anyway, just trying to follow up the example from https://man.netbsd.org/iscsictl.8 .
----------------------------------------
HOSTNAME ADDRESS ROLE
sunfish 192.168.33.64. initiator
netbsdvm 192.168.33.199 target
----------------------------------------
I've setup a 256M test disk:
netbsdvm # cat /etc/iscsi/targets
# extent file or device start length
extent0 /tmp/iscsi-target0 0 256MB
# target flags storage netmask
target0 rw extent0 0.0.0.0/0
netbsdvm # grep iscsi /etc/rc.conf
iscsi_target=YES
Kernel messages:
<snip>
Dec 19 03:17:03 netbsdvm iscsi-target: > iSCSI Discovery login successful from iqn.1994-04.org.netbsd:iscsi.sunfish:2201371340 on 192.168.33.64 disk -1, ISID 70368764559360, TSIH 1
Dec 19 03:17:03 netbsdvm iscsi-target: pid 474:/usr/src/external/bsd/iscsi/lib/../dist/src/lib/target.c:1223: ***ERROR*** CmdSN
Dec 19 03:17:03 netbsdvm iscsi-target: pid 474:/usr/src/external/bsd/iscsi/lib/../dist/src/lib/target.c:1388: ***ERROR*** logout_command_t() failed
Dec 19 03:17:03 netbsdvm iscsi-target: pid 474:/usr/src/external/bsd/iscsi/lib/../dist/src/lib/target.c:1502: ***ERROR*** execute_t() failed
^^^^ output after refresh_targets
Dec 19 03:17:52 netbsdvm iscsi-target: pid 474:/usr/src/external/bsd/iscsi/lib/../dist/src/lib/target.c:1108: ***ERROR*** cmd.tsih 0 not found
Dec 19 03:17:52 netbsdvm iscsi-target: > iSCSI Normal login successful from iqn.1994-04.org.netbsd:iscsi.sunfish:2201371340 on 192.168.33.64 disk 0, ISID 70368764559360, TSIH 2
Dec 19 03:17:52 netbsdvm iscsi-target: pid 474:/usr/src/external/bsd/iscsi/lib/../dist/src/lib/disk.c:1364: ***ERROR*** UNKNOWN OPCODE 0xa3
^^^ output after login
Kernel log on client after running final newfs command:
[ 1655.6300054] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
[ 1655.7027396] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
[ 1655.7754745] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
[ 1655.8482093] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
[ 1655.9209439] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
[ 1655.9936783] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
[ 1656.0664131] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
(spamming endlessly on the console, machine is unresponsive via network)
>How-To-Repeat:
Let's follow along the example from https://man.netbsd.org/iscsictl.8 :
sunfish # iscsictl add_send_target -a 192.168.33.199
Added Send Target 1
sunfish # iscsictl refresh_targets
OK
sunfish # iscsictl list_targets
1: iqn.1994-04.org.netbsd.iscsi-target:target0
2: 192.168.33.199:3260,1
sunfish # iscsictl login -P 1
Created Session 2, Connection 1
sunfish # iscsictl list_sessions
Session 2: Target iqn.1994-04.org.netbsd.iscsi-target:target0
sunfish # newfs /dev/rsd0a
newfs: /dev/rsd0a: open for read: Device not configured
sunfish # newfs /dev/rsd0
newfs: Unable to determine file system size
sunfish # newfs /dev/rsd0
newfs: Unable to determine file system size
sunfish # disklabel sd0
# /dev/rsd0:
type: SCSI
disk: NetBSD iSCSI
label: fictitious
flags:
bytes/sector: 512
sectors/track: 32
tracks/cylinder: 64
sectors/cylinder: 2048
cylinders: 256
total sectors: 524288
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0 # microseconds
track-to-track seek: 0 # microseconds
drivedata: 0
3 partitions:
# size offset fstype [fsize bsize cpg/sgs]
c: 524288 0 unused 0 0 # (Cyl. 0 - 255)
disklabel: boot block size 0
disklabel: super block size 0
----------------------------------------------------------------------
Trying to fix the situation with fdisk:
sunfish # fdisk -0ua sd0
fdisk: primary partition table invalid, no magic in sector 0
fdisk: Cannot determine the number of heads
Disk: /dev/rsd0
NetBSD disklabel disk geometry:
cylinders: 256, heads: 64, sectors/track: 32 (2048 sectors/cylinder)
total sectors: 524288, bytes/sector: 512
BIOS disk geometry:
cylinders: 256, heads: 64, sectors/track: 32 (2048 sectors/cylinder)
total sectors: 524288
Partitions aligned to 2048 sector boundaries, offset 32
Do you want to change our idea of what BIOS thinks? [n]
Partition 0:
<UNUSED>
The data for partition 0 is:
<UNUSED>
sysid: [0..255 default: 169]
start: [0..256cyl default: 32, 0cyl, 0MB]
size: [0..256cyl default: 524256, 256cyl, 256MB]
Do you want to change the active partition? [n] y
Choosing 4 will make no partition active.
active partition: [0..4 default: 0]
Are you happy with this choice? [n] y
We haven't written the MBR back to disk yet. This is your last chance.
Partition table:
0: NetBSD (sysid 169)
start 32, size 524256 (256 MB, Cyls 0-255), Active
PBR is not bootable: All bytes are identical (0x00)
1: <UNUSED>
2: <UNUSED>
3: <UNUSED>
First active partition: 0
Drive serial number: 0 (0x00000000)
Should we write new partition table? [n] y
^^^^^ could serial number of 000000000 be the problem? also, where did 3 empty partitions come from...
Running disklabel, apparently this should help...
sunfish # disklabel -i sd0
Enter '?' for help
partition>P
3 partitions:
# size offset fstype [fsize bsize cpg/sgs]
c: 524288 0 unused 0 0 # (Cyl. 0 - 255)
partition>e
Filesystem type [unused]: 4.2BSD
Start offset ('x' to start after partition 'x') [0c, 0s, 0M]:
Partition size ('$' for all remaining) [0c, 0s, 0M]: $
e: 524288 0 4.2BSD 0 0 0 # (Cyl. 0 - 255)
partition>W
Label disk [n]?y
Label written
partition>Q
And this completely locks up the SSH session, spamming QUEUE FULL on the console:
sunfish # newfs /dev/rsd0a
>Fix:
>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-sparc64/57785: unable to use iscsi kernel initiator on sparc64
Date: Tue, 19 Dec 2023 09:06:33 -0000 (UTC)
2857@gmx.de writes:
>I've setup a 256M test disk:
>netbsdvm # cat /etc/iscsi/targets
># extent file or device start length
>extent0 /tmp/iscsi-target0 0 256MB
># target flags storage netmask
>target0 rw extent0 0.0.0.0/0
The userland iscsi-target (and -initiator) are very limited.
>Kernel log on client after running final newfs command:
>[ 1655.6300054] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
>[ 1655.7027396] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
>[ 1655.7754745] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
>[ 1655.8482093] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
>[ 1655.9209439] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
>[ 1655.9936783] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
>[ 1656.0664131] sd0(iscsi0:0:0:0): QUEUE FULL resulted in 0 openings
Means: the target rejects queries.
It does work better against the istgt from pkgsrc.
>sunfish # newfs /dev/rsd0
>newfs: Unable to determine file system size
doesn't really match the disklabel output which returns size information.
It would be interesting to get output from 'drvctl -p sd0'.
>sunfish # disklabel sd0
># /dev/rsd0:
...
>bytes/sector: 512
...
>total sectors: 524288
...
>3 partitions:
># size offset fstype [fsize bsize cpg/sgs]
> c: 524288 0 unused 0 0 # (Cyl. 0 - 255)
I get a warning about wrong partition type (not 4.2BSD).
>Trying to fix the situation with fdisk:
sparc64 doesn't regularly MBR partitions but Sun disklabels.
I have set up a netbsd-10_RC1/sparc64 system for qemu.
The target is a netbsd-9/i386 system running istgt.
My first problem was that iscsictl login failed with a timeout,
istgt reported a protocol error. This is because I had configured
CHAP, which uses "big numbers" that can either be encoded as
base-64 or as hex strings. The kernel initiator uses base-64
by default (and accepts anything as answer), but istgt only supports
hex strings. Setting hw.iscsi.hexbignums=1 with sysctl helps.
NetBSD-current (and after the latest pullup netbsd-10) switches
the encoding automatically according to what the target uses.
[ 1161.173935] scsibus0 at iscsi0: 1 target, 16 luns per target
[ 1161.193182] sd0 at scsibus0 target 0 lun 0: <NetBSD, iSCSI DISK, 0001> disk fixed
[ 1161.243748] sd0: fabricating a geometry
[ 1161.243748] sd0: 100 GB, 12800 cyl, 64 head, 32 sec, 4096 bytes/sect x 26214400 sectors
[ 1161.253557] sd0: fabricating a geometry
[ 1161.265175] sd0: GPT GUID: ff9d9e75-78aa-4710-adf5-ffd30db38cb3
[ 1161.265175] dk0 at sd0: "links", 26214389 blocks at 6, type: ffs
[ 1161.275293] sd1 at scsibus0 target 0 lun 1: <NetBSD, iSCSI DISK, 0001> disk fixed
[ 1161.323010] sd1: fabricating a geometry
[ 1161.323010] sd1: 100 GB, 12800 cyl, 64 head, 32 sec, 4096 bytes/sect x 26214400 sectors
[ 1161.323010] sd1: fabricating a geometry
[ 1161.334560] sd1: GPT GUID: 6a621763-60e5-44c9-a494-53e27f79308e
[ 1161.334560] dk1 at sd1: "rechts", 26214389 blocks at 6, type: ffs
[ 1161.334560] sd0: async, 8-bit transfers, tagged queueing
[ 1161.334560] sd1: async, 8-bit transfers, tagged queueing
istgt has configured two LUNs for that target. So two disks
attached.
Trying to change the disklabel to 4.2BSD failed with DIOCWDINFO:
Label magic number or checksum is wrong!.
This is misleading, the error returned was EINVAL from reading
the disklabel because I had configured the iscsi volume with
4K sectors. The sparc64 disklabel routines do not support this.
[ 1918.054422] scsibus0 at iscsi0: 1 target, 16 luns per target
[ 1918.073255] sd0 at scsibus0 target 0 lun 0: <NetBSD, iSCSI DISK, 0001> disk fixed
[ 1918.114634] sd0: fabricating a geometry
[ 1918.114634] sd0: 100 GB, 102400 cyl, 64 head, 32 sec, 512 bytes/sect x 209715200 sectors
[ 1918.123510] sd0: fabricating a geometry
[ 1918.133597] sd1 at scsibus0 target 0 lun 1: <NetBSD, iSCSI DISK, 0001> disk fixed
[ 1918.174570] sd1: fabricating a geometry
[ 1918.174570] sd1: 100 GB, 102400 cyl, 64 head, 32 sec, 512 bytes/sect x 209715200 sectors
[ 1918.183509] sd1: fabricating a geometry
[ 1918.183509] sd0: async, 8-bit transfers, tagged queueing
[ 1918.183509] sd1: async, 8-bit transfers, tagged queueing
After reconfiguring the target to 512 byte sectors it still
failed. Despite the error above, disklabel _had_ written
the new label with the 4096 byte sector size information and
the system was trusting that information even after the real
disk had changed. Rewriting the disklabel with the correct
sector size was then succesful.
# /dev/rsd0:
type: SCSI
disk: iSCSI DISK
label: fictitious
flags:
bytes/sector: 512
sectors/track: 32
tracks/cylinder: 64
sectors/cylinder: 2048
cylinders: 12800
total sectors: 26214400
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0 # microseconds
track-to-track seek: 0 # microseconds
drivedata: 0
3 partitions:
# size offset fstype [fsize bsize cpg/sgs]
c: 26214400 0 4.2BSD 0 0 0 # (Cyl. 0 - 12799)
newfs then succeeded and I could mount the volume:
Filesystem Size Used Avail %Cap Mounted on
/dev/sd0 98G 64M 93G 0% /mnt
N.B. istgt volume definition is now:
[LogicalUnit2]
Comment "LVM test 1"
TargetName test
TargetAlias "Test LUN 0"
Mapping PortalGroup1 InitiatorGroup1
AuthMethod CHAP
AuthGroup AuthGroup1
UseDigest Auto
UnitType Disk
QueueDepth 128
BlockLength 512
LUN0 Storage /dev/vg0/rlvleft 100GB
LUN1 Storage /dev/vg0/rlvright 100GB
Backends for the target are two LVM volumes. This (still) requires to
specify the disk size as istgt tries to query the disk size Linux
style (with stat()) which does not work reliably for NetBSD device
nodes.
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.