NetBSD Problem Report #56728
From www@netbsd.org Tue Feb 22 14:12:59 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id A80041A9239
for <gnats-bugs@gnats.NetBSD.org>; Tue, 22 Feb 2022 14:12:59 +0000 (UTC)
Message-Id: <20220222141258.3E36A1A923C@mollari.NetBSD.org>
Date: Tue, 22 Feb 2022 14:12:58 +0000 (UTC)
From: 6bone@6bone.informatik.uni-leipzig.de
Reply-To: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Subject: iscsi redundancy doesn't work
X-Send-Pr-Version: www-1.0
>Number: 56728
>Category: bin
>Synopsis: iscsi redundancy doesn't work
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: mlelstv
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Feb 22 14:15:00 +0000 2022
>Last-Modified: Sun Sep 11 22:21:30 +0000 2022
>Originator: Uwe
>Release: NetBSD 9.99.93
>Organization:
University of Leipizg
>Environment:
NetBSD 6bone.informatik.uni-leipzig.de 9.99.93 NetBSD 9.99.93 (GENERIC)
>Description:
It is not possible to connect an iscsi target redundantly.
If you try the following, the target is only connected via the first send_target:
iscsictl add_send_target -a 172.18.86.130
iscsictl add_send_target -a 172.18.86.131
iscsictl refresh_targets
iscsictl list_targets
1: iqn.1992-08.com.netapp:naclaug
2: 172.18.86.130:3260,1026
3: 172.18.86.131:3260,1027
iscsictl login -P 1
iscsictl list_sessions -c
Session 3: Target iqn.1992-08.com.netapp:naclaug
Connection 1 Address 172.18.86.130:3260,1026
If the connection to the target is broken, the system will crash a few minutes later.
###########
If you try the following, two connections to the target will be established. However, the system does not seem to recognize that it is the same target.
iscsictl add_send_target -a 172.18.86.130
iscsictl add_send_target -a 172.18.86.131
iscsictl refresh_targets
iscsictl list_targets
1: iqn.1992-08.com.netapp:naclaug
2: 172.18.86.130:3260,1026
3: 172.18.86.131:3260,1027
iscsictl login -P 2
iscsictl login -P 3
iscsictl list_sessions -c
Session 3: Target iqn.1992-08.com.netapp:naclaug
Connection 1 Address 172.18.86.130:3260,1026
Session 4: Target iqn.1992-08.com.netapp:naclaug
Connection 1 Address 172.18.86.131:3260,1027
dmesg shows:
[ 1643.344321] scsibus4 at iscsi0: 1 target, 16 luns per target
[ 1643.344321] sd3 at scsibus4 target 0 lun 11: <NETAPP, LUN C-Mode, 9700> disk fixed
[ 1643.344321] sd3: 10240 GB, 65129 cyl, 16 head, 20607 sec, 512 bytes/sect x 21474836480 sectors
[ 1643.354566] sd3: GPT GUID: d644c65b-110e-4dd0-9500-0cfc70900463
[ 1643.354566] dk0 at sd3: "NetApp01", 21474836413 blocks at 34, type: ffs
[ 1643.354566] sd3: async, 8-bit transfers, tagged queueing
[ 2057.578810] scsibus5 at iscsi0: 1 target, 16 luns per target
[ 2057.588952] sd4 at scsibus5 target 0 lun 11: <NETAPP, LUN C-Mode, 9700> disk fixed
[ 2057.588952] sd4: 10240 GB, 65129 cyl, 16 head, 20607 sec, 512 bytes/sect x 21474836480 sectors
[ 2057.588952] sd4: GPT GUID: d644c65b-110e-4dd0-9500-0cfc70900463
[ 2057.588952] dk1 at sd4: "02203a72-215e-4912-9815-a5025bdd8124", 21474836413 blocks at 34, type: ffs
[ 2057.588952] autoconfiguration error: sd4: wedge named 'NetApp01' already existed, using '02203a72-215e-4912-9815-a5025bdd8124'
[ 2057.588952] sd4: async, 8-bit transfers, tagged queueing
Unfortunately, the system does not recognize that it is the same target.
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/56728: iscsi redundancy doesn't work
Date: Tue, 22 Feb 2022 16:20:22 -0000 (UTC)
6bone@6bone.informatik.uni-leipzig.de writes:
>If you try the following, the target is only connected via the first send_target:
>iscsictl add_send_target -a 172.18.86.130
>iscsictl add_send_target -a 172.18.86.131
>iscsictl refresh_targets
>iscsictl list_targets
> 1: iqn.1992-08.com.netapp:naclaug
> 2: 172.18.86.130:3260,1026
> 3: 172.18.86.131:3260,1027
>iscsictl login -P 1
>iscsictl list_sessions -c
>Session 3: Target iqn.1992-08.com.netapp:naclaug
> Connection 1 Address 172.18.86.130:3260,1026
>If the connection to the target is broken, the system will crash a few minutes later.
Do you have any messages from that crash? A Backtrace ?
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: bin/56728: iscsi redundancy doesn't work
Date: Tue, 22 Feb 2022 23:31:47 +0100 (CET)
On Tue, 22 Feb 2022, Michael van Elst wrote:
>
> Do you have any messages from that crash? A Backtrace ?
>
Does that help?
dmesg -M netbsd.30.core -N netbsd.30
...
[ 98560.538286] sd3d: error writing fsbn 7490271010 of
7490271010-7490271073 (sd3 bn 7490271010; cn 22717 tn 9 sn 18043)
[ 98560.538286] sd3d: error writing fsbn 7489986082 of
7489986082-7489986145 (sd3 bn 7489986082; cn 22716 tn 12 sn 1006)
[ 98560.538286] sd3d: error reading fsbn 7415496930 of
7415496930-7415496993 (sd3 bn 7415496930; cn 22490 tn 13 sn 6159)
[ 98560.538286] sd3d: error reading fsbn 7415496930 of
7415496930-7415496993 (sd3 bn 7415496930; cn 22490 tn 13 sn 6159)
[ 98591.063909] uvm_fault(0xffffffff81901bc0, 0xffff860067907000, 1) -> e
[ 98591.063909] fatal page fault in supervisor mode
[ 98591.063909] trap type 6 code 0 rip 0xffffffff8025381b cs 0x8 rflags
0x10282 cr2 0xffff860067907070 ilevel 0 rsp 0xffff860396d57c20
[ 98591.063909] curlwp 0xffff8186f0a7c940 pid 0.390 lowest kstack
0xffff860396d532c0
[ 98591.063909] panic: trap
[ 98591.063909] cpu2: Begin traceback...
[ 98591.073939] vpanic() at netbsd:vpanic+0x156
[ 98591.073939] panic() at netbsd:panic+0x3c
[ 98591.073939] trap() at netbsd:trap+0xb27
[ 98591.073939] --- trap (number 6) ---
[ 98591.073939] sd_diskstart() at netbsd:sd_diskstart+0x38
[ 98591.073939] dk_start() at netbsd:dk_start+0xed
[ 98591.073939] spec_strategy() at netbsd:spec_strategy+0x9d
[ 98591.073939] VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x3c
[ 98591.073939] dkstart() at netbsd:dkstart+0x184
[ 98591.073939] spec_strategy() at netbsd:spec_strategy+0x9d
[ 98591.083939] VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x3c
[ 98591.083939] bio_doread() at netbsd:bio_doread+0x96
[ 98591.083939] bread() at netbsd:bread+0x18
[ 98591.083939] ffs_update.part.0() at netbsd:ffs_update.part.0+0x129
[ 98591.083939] ffs_full_fsync() at netbsd:ffs_full_fsync+0xf2
[ 98591.083939] VOP_FSYNC() at netbsd:VOP_FSYNC+0x47
[ 98591.093940] sched_sync() at netbsd:sched_sync+0x1a2
[ 98591.093940] cpu2: End traceback...
[ 98591.093940] dumping to dev 4,1 (offset=22227071, size=12581616):
[ 98591.093940] dump <4>mfi0: workqueue busy: updates stopped
[ 98649.955719] coretemp0: workqueue busy: updates stopped
[ 98649.955719] coretemp1: workqueue busy: updates stopped
[ 98649.955719] coretemp2: workqueue busy: updates stopped
[ 98649.955719] coretemp3: workqueue busy: updates stopped
(gdb) target kvm netbsd.30.core
0xffffffff80226145 in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0)
at
/mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/machdep.c:720
720 dumpsys();
(gdb) bt
#0 0xffffffff80226145 in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0)
at
/mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/machdep.c:720
#1 0xffffffff80d5e677 in kern_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0)
at /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/kern/kern_reboot.c:73
#2 0xffffffff80da1d42 in vpanic (fmt=fmt@entry=0xffffffff813900f6 "trap",
ap=ap@entry=0xffff860396d579e8)
at /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/kern/subr_prf.c:290
#3 0xffffffff80da1e07 in panic (fmt=fmt@entry=0xffffffff813900f6 "trap")
at /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/kern/subr_prf.c:209
#4 0xffffffff80228f67 in trap (frame=0xffff860396d57b30)
at
/mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/trap.c:326
#5 0xffffffff80221023 in alltraps ()
#6 0xffff8186ef216b80 in ?? ()
#7 0xffff8185d9255dd0 in ?? ()
#8 0x0000000000000006 in ?? ()
#9 0x000000000000001e in ?? ()
#10 0xffff8186edb93a40 in ?? ()
#11 0x0000000000080537 in ?? ()
#12 0xffff8186f0981a40 in ?? ()
#13 0xffff818588fec050 in ?? ()
#14 0x0000000000000008 in ?? ()
#15 0x0000000000001000 in ?? ()
#16 0x0000000000000000 in ?? ()
(gdb) list *(0xffffffff80226145)
0xffffffff80226145 is in cpu_reboot
(/mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/machdep.c:720).
715 /* Disable interrupts. */
716 s = splhigh();
717
718 /* Do a dump if requested. */
719 if ((howto & (RB_DUMP | RB_HALT)) == RB_DUMP)
720 dumpsys();
721
722 haltsys:
723 doshutdownhooks();
724
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/56728: iscsi redundancy doesn't work
Date: Wed, 23 Feb 2022 03:58:57 -0000 (UTC)
6bone@6bone.informatik.uni-leipzig.de writes:
>On Tue, 22 Feb 2022, Michael van Elst wrote:
>>
>> Do you have any messages from that crash? A Backtrace ?
>>
>Does that help?
>[ 98560.538286] sd3d: error reading fsbn 7415496930 of
>7415496930-7415496993 (sd3 bn 7415496930; cn 22490 tn 13 sn 6159)
>[ 98591.063909] uvm_fault(0xffffffff81901bc0, 0xffff860067907000, 1) -> e
>[ 98591.063909] fatal page fault in supervisor mode
>[ 98591.063909] trap type 6 code 0 rip 0xffffffff8025381b cs 0x8 rflags
>0x10282 cr2 0xffff860067907070 ilevel 0 rsp 0xffff860396d57c20
>[ 98591.063909] curlwp 0xffff8186f0a7c940 pid 0.390 lowest kstack
Looks like sd_diskstart is running with a NULL periph pointer.
I don't see how that happens yet, but it is probably the result of
detaching the sd device while in use. The detach message is
probably not yet printed.
So that's one problem.
Detaching the sd device happens when no connection to the iscsi
server exists and no connection cannot be re-established
either.
For multiple connections to an iscsi server you need to do something
like:
add_send_target -> add target to list
refresh_targets -> get portals
login -> establish session (creates sd)
add_connection -> add redundant connection to session
the man page doesn't look correct.
# iscsictl add_send_target -a x.x.x.x
Added Send Target 1
# iscsictl refresh_targets
OK
# iscsictl list_targets
1: iqn.2007-09.jp.ne.peach.istgt:pbulk1
2: x.x.x.x:3260,1
3: iqn.2007-09.jp.ne.peach.istgt:test
4: x.x.x.x:3260,1
# iscsictl login -P 4
Created Session 2, Connection 1
# iscsictl list_sessions
Session 2: Target iqn.2007-09.jp.ne.peach.istgt:test
# iscsictl add_connection -I 2
Added Connection 2
tcp 0 0 y.y.y.y.65330 x.x.x.x.3260 ESTABLISHED
tcp 0 0 y.y.y.y.65331 x.x.x.x.3260 ESTABLISHED
# tcpdrop y.y.y.y 65530 x.x.x.x 3260
[ 793856.693477] S2C2: *** Connection Error, status=18, logout=2, state=3
[ 793856.693477] S2C2: Write failed sock 0xffff8524621c5480 (ret: 32, req: 48, resid: 48)
[ 793856.693477] S2C2: *** Connection Error, status=18, logout=-1, state=5
[ 793858.693531] S2C2: Connection ReCreated successfully - status 0
tcp 0 0 y.y.y.y.65329 x.x.x.x.3260 ESTABLISHED
tcp 0 0 y.y.y.y.65331 x.x.x.x.3260 ESTABLISHED
Not exactly the same (only a single target IP), but it shows how the
connection gets re-established.
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: bin/56728: iscsi redundancy doesn't work
Date: Wed, 23 Feb 2022 09:34:03 +0100 (CET)
Hello,
iscsictl add_send_target -a 172.18.86.130
iscsictl add_send_target -a 172.18.86.131
iscsictl refresh_targets
OK
iscsictl list_targets
1: iqn.1992-08.com.netapp:naclaug
2: 172.18.86.130:3260,1026
3: 172.18.86.131:3260,1027
iscsictl login -P 2
Created Session 3, Connection 1
iscsictl list_sessions
Session 3: Target iqn.1992-08.com.netapp:naclaug
iscsictl add_connection -I 3
iscsictl: add_connection: The login failed
-> crash (no drive is muted yet!)
[ 1326.174188] scsibus4 at iscsi0: 1 target, 16 luns per target
[ 1326.174188] sd3 at scsibus4 target 0 lun 11: <NETAPP, LUN C-Mode,
9700> disk fixed
[ 1326.174188] sd3: 10240 GB, 65129 cyl, 16 head, 20607 sec, 512
bytes/sect x 21474836480 sectors
[ 1326.174188] sd3: GPT GUID: d644c65b-110e-4dd0-9500-0cfc70900463
[ 1326.174188] dk0 at sd3: "NetApp01", 21474836413 blocks at 34, type:
ffs
[ 1326.174188] sd3: async, 8-bit transfers, tagged queueing
[ 1355.946583] S3C2: Login failed (rc 4)
[ 1355.946583] S3C2: *** Connection Error, status=18, logout=2, state=6
[ 1356.950793] dk0 at sd3 (NetApp01) deleted
[ 1356.950793] sd3: detached
[ 1356.950793] scsibus4: detached
[ 1358.296346] uvm_fault(0xffffffff819014c0, 0xffffd68067902000, 2) -> e
[ 1358.296346] fatal page fault in supervisor mode
[ 1358.306388] trap type 6 code 0x2 rip 0xffffffff8022d80c cs 0x8 rflags
0x10246 cr2 0xffffd680679021c0 ilevel 0 rsp 0xffffd68396627f08
[ 1358.306388] curlwp 0xfffff80d32d67a00 pid 0.554 lowest kstack
0xffffd683966232c0
[ 1358.306388] panic: trap
[ 1358.306388] cpu1: Begin traceback...
[ 1358.306388] vpanic() at netbsd:vpanic+0x156
[ 1358.306388] panic() at netbsd:panic+0x3c
[ 1358.306388] trap() at netbsd:trap+0xb27
[ 1358.306388] --- trap (number 6) ---
[ 1358.306388] mutex_enter() at netbsd:mutex_enter+0xc
[ 1358.306388] send_nop_out() at iscsi:send_nop_out+0x133
[ 1358.306388] connection_timeout() at iscsi:connection_timeout+0x4d
[ 1358.306388] iscsi_cleanup_thread() at iscsi:iscsi_cleanup_thread+0x7b2
[ 1358.306388] cpu1: End traceback...
[ 1358.306388] dumping to dev 4,1 (offset=22227071, size=12581616):
[ 1358.306388] dump <4>mfi0: workqueue busy: updates stopped
[ 1390.048733] coretemp0: workqueue busy: updates stopped
[ 1390.048733] coretemp1: workqueue busy: updates stopped
[ 1390.048733] coretemp2: workqueue busy: updates stopped
[ 1390.048733] coretemp3: workqueue busy: updates stopped
ipmi0: workqueue busy: updates stopped
(gdb) target kvm netbsd.31.core
0xffffffff80226145 in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0)
at
/mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/machdep.c:720
720 dumpsys();
(gdb) bt
#0 0xffffffff80226145 in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0)
at
/mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/machdep.c:720
#1 0xffffffff80d37917 in kern_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0)
at /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/kern/kern_reboot.c:73
#2 0xffffffff80d7afe2 in vpanic (fmt=fmt@entry=0xffffffff81390116 "trap",
ap=ap@entry=0xffffd68396627cc8)
at /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/kern/subr_prf.c:290
#3 0xffffffff80d7b0a7 in panic (fmt=fmt@entry=0xffffffff81390116 "trap")
at /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/kern/subr_prf.c:209
#4 0xffffffff80228f67 in trap (frame=0xffffd68396627e10)
at
/mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/trap.c:326
#5 0xffffffff80221023 in alltraps ()
#6 0xffffd680679021c0 in ?? ()
#7 0x0000000000000000 in ?? ()
list *(0xffffffff80226145)
0xffffffff80226145 is in cpu_reboot
(/mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/machdep.c:720).
715 /* Disable interrupts. */
716 s = splhigh();
717
718 /* Do a dump if requested. */
719 if ((howto & (RB_DUMP | RB_HALT)) == RB_DUMP)
720 dumpsys();
721
722 haltsys:
723 doshutdownhooks();
724
Thank you for your efforts
Regards
Uwe
On Wed, 23 Feb 2022, Michael van Elst wrote:
> Date: Wed, 23 Feb 2022 04:00:02 +0000 (UTC)
> From: Michael van Elst <mlelstv@serpens.de>
> Reply-To: gnats-bugs@netbsd.org
> To: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
> 6bone@6bone.informatik.uni-leipzig.de
> Subject: [Extern] Re: bin/56728: iscsi redundancy doesn't work
>
> The following reply was made to PR bin/56728; it has been noted by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: bin/56728: iscsi redundancy doesn't work
> Date: Wed, 23 Feb 2022 03:58:57 -0000 (UTC)
>
> 6bone@6bone.informatik.uni-leipzig.de writes:
>
> >On Tue, 22 Feb 2022, Michael van Elst wrote:
>
> >>
> >> Do you have any messages from that crash? A Backtrace ?
> >>
>
> >Does that help?
>
>
> >[ 98560.538286] sd3d: error reading fsbn 7415496930 of
> >7415496930-7415496993 (sd3 bn 7415496930; cn 22490 tn 13 sn 6159)
> >[ 98591.063909] uvm_fault(0xffffffff81901bc0, 0xffff860067907000, 1) -> e
> >[ 98591.063909] fatal page fault in supervisor mode
> >[ 98591.063909] trap type 6 code 0 rip 0xffffffff8025381b cs 0x8 rflags
> >0x10282 cr2 0xffff860067907070 ilevel 0 rsp 0xffff860396d57c20
> >[ 98591.063909] curlwp 0xffff8186f0a7c940 pid 0.390 lowest kstack
>
>
> Looks like sd_diskstart is running with a NULL periph pointer.
> I don't see how that happens yet, but it is probably the result of
> detaching the sd device while in use. The detach message is
> probably not yet printed.
>
> So that's one problem.
>
> Detaching the sd device happens when no connection to the iscsi
> server exists and no connection cannot be re-established
> either.
>
> For multiple connections to an iscsi server you need to do something
> like:
>
> add_send_target -> add target to list
> refresh_targets -> get portals
> login -> establish session (creates sd)
> add_connection -> add redundant connection to session
>
> the man page doesn't look correct.
>
> # iscsictl add_send_target -a x.x.x.x
> Added Send Target 1
> # iscsictl refresh_targets
> OK
> # iscsictl list_targets
> 1: iqn.2007-09.jp.ne.peach.istgt:pbulk1
> 2: x.x.x.x:3260,1
> 3: iqn.2007-09.jp.ne.peach.istgt:test
> 4: x.x.x.x:3260,1
> # iscsictl login -P 4
> Created Session 2, Connection 1
> # iscsictl list_sessions
> Session 2: Target iqn.2007-09.jp.ne.peach.istgt:test
> # iscsictl add_connection -I 2
> Added Connection 2
>
> tcp 0 0 y.y.y.y.65330 x.x.x.x.3260 ESTABLISHED
> tcp 0 0 y.y.y.y.65331 x.x.x.x.3260 ESTABLISHED
>
> # tcpdrop y.y.y.y 65530 x.x.x.x 3260
>
> [ 793856.693477] S2C2: *** Connection Error, status=18, logout=2, state=3
> [ 793856.693477] S2C2: Write failed sock 0xffff8524621c5480 (ret: 32, req: 48, resid: 48)
> [ 793856.693477] S2C2: *** Connection Error, status=18, logout=-1, state=5
> [ 793858.693531] S2C2: Connection ReCreated successfully - status 0
>
>
> tcp 0 0 y.y.y.y.65329 x.x.x.x.3260 ESTABLISHED
> tcp 0 0 y.y.y.y.65331 x.x.x.x.3260 ESTABLISHED
>
> Not exactly the same (only a single target IP), but it shows how the
> connection gets re-established.
>
>
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/56728: iscsi redundancy doesn't work
Date: Wed, 23 Feb 2022 12:49:37 -0000 (UTC)
6bone@6bone.informatik.uni-leipzig.de writes:
>iscsictl list_sessions
>Session 3: Target iqn.1992-08.com.netapp:naclaug
>iscsictl add_connection -I 3
>iscsictl: add_connection: The login failed
>-> crash (no drive is muted yet!)
That's a second connection to the same portal (not the second portal
) and apparently it is exclusive and rejected by the target. Shouldn't
crash anything though.
The command
iscsictl add_connection -I 3 -P 3
should add a connection for session 3 to the second portal number 3.
I'll try to set up something similar for testing...
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/56728: iscsi redundancy doesn't work
Date: Sun, 27 Feb 2022 15:09:02 -0000 (UTC)
mlelstv@serpens.de (Michael van Elst) writes:
> The command
> iscsictl add_connection -I 3 -P 3
> should add a connection for session 3 to the second portal number 3.
>
> I'll try to set up something similar for testing...
This part does work. I can create two redundant connections, cut network
connectivity for one net (e.g. by dropping routes) and restore it and
everything stays stable.
Trying to logout from that session fails somewhat. Access to the
devices fails immediately with I/O error and after some timeout
the session is dropped and the devices detach.
Removing each connection individually with 'remove_connection' looks
more sane. When last connection is removed, the session is dropped
and the devices detach.
Simply stopping the target would also cause infinite retries.
You can trace all this by increasing hw.iscsi.debug with sysctl.
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: bin/56728: iscsi redundancy doesn't work
Date: Mon, 28 Feb 2022 09:08:57 +0100 (CET)
On Sun, 27 Feb 2022, Michael van Elst wrote:
> Date: Sun, 27 Feb 2022 15:10:01 +0000 (UTC)
> From: Michael van Elst <mlelstv@serpens.de>
> Reply-To: gnats-bugs@netbsd.org
> To: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
> 6bone@6bone.informatik.uni-leipzig.de
> Subject: [Extern] Re: bin/56728: iscsi redundancy doesn't work
>
> The following reply was made to PR bin/56728; it has been noted by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: bin/56728: iscsi redundancy doesn't work
> Date: Sun, 27 Feb 2022 15:09:02 -0000 (UTC)
iscsictl list_targets
1: iqn.1992-08.com.netapp:naclaug
2: 172.18.86.130:3260,1026
3: 172.18.86.131:3260,1027
iscsictl list_sessions -c
Session 3: Target iqn.1992-08.com.netapp:naclaug
Connection 1 Address 172.18.86.130:3260,1026
iscsictl add_connection -I 3 -P 3
iscsictl: add_connection: The login failed
-> crash
Here is some debug output. At time 881 'iscsictl login -P 2' was run. At
1669 the iscsictl add_connection -I 3 -P 3. Nothing between.
[ 863.216616] ISCSI Ioctl cmd = c0680002
[ 863.216616] Create Connection for Session 1
[ 863.216616] S1C1: get_socket: par_sock=8, fdesc=0xfffffe6856d01680
[ 863.216616] ISCSI Ioctl cmd = c1000008
[ 863.216616] ISCSI Ioctl cmd = c1000008
[ 863.216616] ISCSI Ioctl cmd = c0080003
[ 863.216616] ISCSI: kill_session 1, status 29, logout 0, recover 0
[ 863.216616] S1C1: Kill_connection: terminating=0, status=29, logout=0,
state=3
[ 863.216616] S1C1: Kill_connection setting destroy flag
[ 863.216616] S1C1: Received Logout PDU - CCB = 0xffffd600678fad00,
otherconn=0, response=0
[ 863.216616] S1C1: Closing Socket 0xfffffe6856d01680
[ 863.216616] S1C1: TERMINATED
[ 863.216616] S1C1: Waiting for conn_idle
[ 863.216616] S1C1: Waited for conn_idle, destroy = 1
[ 863.216616] ISCSI Ioctl cmd = c018000d
[ 863.216616] S1C1: Send thread exits
[ 863.216616] Cleanup: session 1
[ 863.216616] Cleanup: Unmap session 1
[ 863.216616] Cleanup: session ended 1
[ 863.216616] ISCSI Ioctl cmd = c0680002
[ 863.216616] Create Connection for Session 2
[ 863.216616] S2C1: get_socket: par_sock=8, fdesc=0xfffffe68556c2b00
[ 863.216616] ISCSI Ioctl cmd = c018000d
[ 863.216616] ISCSI Ioctl cmd = c1000008
[ 863.216616] ISCSI Ioctl cmd = c1000008
[ 863.216616] ISCSI Ioctl cmd = c0080003
[ 863.216616] ISCSI: kill_session 2, status 29, logout 0, recover 0
[ 863.216616] S2C1: Kill_connection: terminating=0, status=29, logout=0,
state=3
[ 863.216616] S2C1: Kill_connection setting destroy flag
[ 863.216616] S2C1: Received Logout PDU - CCB = 0xffffd600678fad00,
otherconn=0, response=0
[ 863.216616] S2C1: Closing Socket 0xfffffe68556c2b00
[ 863.216616] S2C1: TERMINATED
[ 863.216616] S2C1: ISCSI Ioctl cmd = c018000d
[ 863.216616] Waiting for conn_idle
[ 863.216616] S2C1: Waited for conn_idle, destroy = 1
[ 863.216616] S2C1: Send thread exits
[ 863.216616] Cleanup: session 2
[ 863.216616] Cleanup: Unmap session 2
[ 863.216616] Cleanup: session ended 2
[ 863.216616] ISCSI Ioctl cmd = c018000d
[ 881.873938] ISCSI Ioctl cmd = c0680002
[ 881.873938] Create Connection for Session 3
[ 881.873938] S3C1: get_socket: par_sock=8, fdesc=0xfffffe6856d01680
[ 881.884021] Login: map session 3
[ 881.884021] scsibus4 at iscsi0: 1 target, 16 luns per target
[ 881.884021] sd3 at scsibus4 target 0 lun 11: <NETAPP, LUN C-Mode,
9700> disk fixed
[ 881.884021] sd3: 10240 GB, 65129 cyl, 16 head, 20607 sec, 512
bytes/sect x 21474836480 sectors
[ 881.884021] sd3: GPT GUID: d644c65b-110e-4dd0-9500-0cfc70900463
[ 881.884021] dk0 at sd3: "NetApp01", 21474836413 blocks at 34, type:
ffs
[ 881.884021] sd3: async, 8-bit transfers, tagged queueing
[ 1669.006479] ISCSI Ioctl cmd = c0680004
[ 1669.006479] Create Connection for Session 3
[ 1669.006479] S3C2: get_socket: par_sock=8, fdesc=0xfffffe68527dd7c0
[ 1669.006479] S3C2: Login problem - Class = 2, Detail = a
[ 1669.006479] S3C2: Login failed (rc 4)
[ 1669.006479] S3C2: Kill_connection: terminating=0, status=4, logout=-1,
state=0
[ 1669.006479] S3C2: Kill_connection setting destroy flag
[ 1669.006479] S3C2: Closing Socket 0xfffffe68527dd7c0
[ 1669.006479] S3C2: S3C2: Read failed (ret: 0, req: 48, out: 48)
[ 1669.006479] TERMINATED
[ 1669.006479] S3C2: ISCSI Ioctl cmd = c018000d
[ 1669.006479] S3C2: *** Connection Error, status=18, logout=2, state=6
[ 1669.006479] Waiting for conn_idle
[ 1669.006479] S3C2: Waited for conn_idle, destroy = 1
[ 1669.006479] S3C2: Send thread exits
[ 1669.006479] Cleanup: session 3
[ 1669.006479] Cleanup: Unmap session 3
[ 1669.006479] dk0 at sd3 (NetApp01) deleted
[ 1669.006479] sd3: detached
[ 1669.006479] scsibus4: detached
[ 1669.006479] Cleanup: session ended 3
[ 1669.006479] ISCSI Ioctl cmd = c018000d
>
> Trying to logout from that session fails somewhat. Access to the
> devices fails immediately with I/O error and after some timeout
> the session is dropped and the devices detach.
>
> Removing each connection individually with 'remove_connection' looks
> more sane. When last connection is removed, the session is dropped
> and the devices detach.
>
> Simply stopping the target would also cause infinite retries.
> You can trace all this by increasing hw.iscsi.debug with sysctl.
>
>
Thank you for your efforts
Regards
Uwe
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: bin/56728: iscsi redundancy doesn't work
Date: Wed, 2 Mar 2022 13:42:19 +0100 (CET)
Hello,
if it helps with troubleshooting, here is a screenshot of recent iscsi
crashes. I can't provoke the crash. The problem has happened to me several
times. A dump is not written. Therefore I can only offer the image.
It is always the position with "conn != zero".
https://speicherwolke.uni-leipzig.de/index.php/s/fqjRTwTXszL2pXA
Thank you for your efforts
Regards
Uwe
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: bin/56728: iscsi redundancy doesn't work
Date: Sat, 30 Apr 2022 11:19:41 +0200 (CEST)
Hello,
I tested with kernel version 9.99.96. The problem is still there. I now
think there are three independent problems.
Problem 1: it is possible to reproduce a kernel crash with iscsi. This is
independent of the iscsi redundancy.
Problem 2: in my environment it is not possible to set up a redundant
environment.
Problem 3: the manpage is wrong.
Details on problem 1: If you use iscsictl to open and close connections to
the iscsi target, the kernel crashes. iscsi devices are not mounted at
this time. Here is a kernel dump:
[ 40275.994983] dk0 at sd3 (NetApp01) deleted
[ 40275.994983] sd3: detached
[ 40275.994983] scsibus4: detached
[ 40282.903575] uvm_fault(0xffffffff81903e00, 0xffffa08067901000, 2) -> e
[ 40282.903575] fatal page fault in supervisor mode
[ 40282.903575] trap type 6 code 0x2 rip 0xffffffff8022d84c cs 0x8 rflags
0x10246 cr2 0xffffa080679011c0 ilevel 0 rsp 0xffffa08397034f08
[ 40282.903575] curlwp 0xffffecdb7be4db00 pid 0.581 lowest kstack
0xffffa083970302c0
[ 40282.903575] panic: trap
[ 40282.903575] cpu0: Begin traceback...
[ 40282.903575] vpanic() at netbsd:vpanic+0x183
[ 40282.903575] panic() at netbsd:panic+0x3c
[ 40282.903575] trap() at netbsd:trap+0xb27
[ 40282.903575] --- trap (number 6) ---
[ 40282.903575] mutex_enter() at netbsd:mutex_enter+0xc
[ 40282.903575] send_nop_out() at iscsi:send_nop_out+0x133
[ 40282.903575] connection_timeout() at iscsi:connection_timeout+0x4d
[ 40282.903575] iscsi_cleanup_thread() at iscsi:iscsi_cleanup_thread+0x7b2
[ 40282.903575] cpu0: End traceback...
[ 40282.903575] dumping to dev 4,1 (offset=22227071, size=12581616):
[ 40282.903575] dump <7>lagg0: link state DOWN (was UP)
Problem 2 details: We are using a NetApp iscsi cluster. Multiple Netapp
controllers are used for redundancy. These provide the same device (iqn).
If you establish a session to the second controller, NetBSD does not
recognize that it is the same iqn. NetBSD thinks it's a new device:
[ 39843.450611] scsibus5 at iscsi0: 1 target, 16 luns per target
[ 39843.450611] sd4 at scsibus5 target 0 lun 11: <NETAPP, LUN C-Mode, 9700> disk fixed
[ 39843.450611] sd4: 10240 GB, 65129 cyl, 16 head, 20607 sec, 512 bytes/sect x 21474836480 sectors
[ 39843.450611] sd4: GPT GUID: d644c65b-110e-4dd0-9500-0cfc70900463
[ 39843.450611] dk1 at sd4: "02203a72-215e-4912-9815-a5025bdd8124", 21474836413blocks at 34, type: ffs
[ 39843.450611] autoconfiguration error: sd4: wedge named 'NetApp01' already existed, using '02203a72-215e-4912-9815-a5025bdd8124'
[ 39843.450611] sd4: async, 8-bit transfers, tagged queueing
[ 39929.604877] dk1 at sd4 (02203a72-215e-4912-9815-a5025bdd8124) deleted
[ 39929.604877] sd4: detached
[ 39929.604877] scsibus5: detached
Problem 3: the man page for iscsictl contains errors. For example, the
add_connection command requires an -I parameter that is not described. The
meaning of -m is not described. There seem to be errors with other
commands as well.
Thank you for your efforts
Regards
Uwe
On Wed, 2 Mar 2022, 6bone@6bone.informatik.uni-leipzig.de wrote:
> Date: Wed, 2 Mar 2022 13:42:19 +0100 (CET)
> From: 6bone@6bone.informatik.uni-leipzig.de
> To: gnats-bugs@netbsd.org
> Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
> Subject: Re: [Extern] Re: bin/56728: iscsi redundancy doesn't work
>
> Hello,
>
> if it helps with troubleshooting, here is a screenshot of recent iscsi
> crashes. I can't provoke the crash. The problem has happened to me several
> times. A dump is not written. Therefore I can only offer the image.
>
> It is always the position with "conn != zero".
>
> https://speicherwolke.uni-leipzig.de/index.php/s/fqjRTwTXszL2pXA
>
>
> Thank you for your efforts
>
>
> Regards
> Uwe
>
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/56728: iscsi redundancy doesn't work
Date: Sat, 30 Apr 2022 10:02:23 -0000 (UTC)
6bone@6bone.informatik.uni-leipzig.de writes:
>Details on problem 1: If you use iscsictl to open and close connections to
>the iscsi target, the kernel crashes. iscsi devices are not mounted at
>this time. Here is a kernel dump:
>[ 40282.903575] trap type 6 code 0x2 rip 0xffffffff8022d84c cs 0x8 rflags
>0x10246 cr2 0xffffa080679011c0 ilevel 0 rsp 0xffffa08397034f08
>[ 40282.903575] curlwp 0xffffecdb7be4db00 pid 0.581 lowest kstack
>0xffffa083970302c0
>[ 40282.903575] panic: trap
>[ 40282.903575] cpu0: Begin traceback...
>[ 40282.903575] vpanic() at netbsd:vpanic+0x183
>[ 40282.903575] panic() at netbsd:panic+0x3c
>[ 40282.903575] trap() at netbsd:trap+0xb27
>[ 40282.903575] --- trap (number 6) ---
>[ 40282.903575] mutex_enter() at netbsd:mutex_enter+0xc
>[ 40282.903575] send_nop_out() at iscsi:send_nop_out+0x133
>[ 40282.903575] connection_timeout() at iscsi:connection_timeout+0x4d
>[ 40282.903575] iscsi_cleanup_thread() at iscsi:iscsi_cleanup_thread+0x7b2
>[ 40282.903575] cpu0: End traceback...
>[ 40282.903575] dumping to dev 4,1 (offset=22227071, size=12581616):
>[ 40282.903575] dump <7>lagg0: link state DOWN (was UP)
If that's reproducable, please provide a log with ISCSI_DEBUG.
>Problem 2 details: We are using a NetApp iscsi cluster. Multiple Netapp
>controllers are used for redundancy. These provide the same device (iqn).
>If you establish a session to the second controller, NetBSD does not
>recognize that it is the same iqn. NetBSD thinks it's a new device:
It is a new device. Linux supports multipathing (in the device mapper
driver) that detects such duplicates, handles the redundancy and
provides a virtual disk.
Unfortunately our device mapper still lacks multipathing support
(and also mirror support, but that's another story).
>Problem 3: the man page for iscsictl contains errors. For example, the
>add_connection command requires an -I parameter that is not described. The
>meaning of -m is not described. There seem to be errors with other
>commands as well.
Yes. The man page is neither correct nor complete.
Greetings,
Responsible-Changed-From-To: bin-bug-people->mlelstv
Responsible-Changed-By: mlelstv@NetBSD.org
Responsible-Changed-When: Sun, 11 Sep 2022 22:21:30 +0000
Responsible-Changed-Why:
mine
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2022
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.