NetBSD Problem Report #56728

From www@netbsd.org  Tue Feb 22 14:12:59 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id A80041A9239
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 22 Feb 2022 14:12:59 +0000 (UTC)
Message-Id: <20220222141258.3E36A1A923C@mollari.NetBSD.org>
Date: Tue, 22 Feb 2022 14:12:58 +0000 (UTC)
From: 6bone@6bone.informatik.uni-leipzig.de
Reply-To: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Subject: iscsi redundancy doesn't work
X-Send-Pr-Version: www-1.0

>Number:         56728
>Category:       bin
>Synopsis:       iscsi redundancy doesn't work
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    mlelstv
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Feb 22 14:15:00 +0000 2022
>Last-Modified:  Sun Sep 11 22:21:30 +0000 2022
>Originator:     Uwe
>Release:        NetBSD 9.99.93
>Organization:
University of Leipizg
>Environment:
NetBSD 6bone.informatik.uni-leipzig.de 9.99.93 NetBSD 9.99.93 (GENERIC)
>Description:
It is not possible to connect an iscsi target redundantly.

If you try the following, the target is only connected via the first send_target:

iscsictl add_send_target -a 172.18.86.130
iscsictl add_send_target -a 172.18.86.131
iscsictl refresh_targets

iscsictl list_targets
     1: iqn.1992-08.com.netapp:naclaug
        2: 172.18.86.130:3260,1026
        3: 172.18.86.131:3260,1027

iscsictl login -P 1

iscsictl list_sessions -c
Session 3: Target iqn.1992-08.com.netapp:naclaug
 Connection 1 Address 172.18.86.130:3260,1026

If the connection to the target is broken, the system will crash a few minutes later.

###########

If you try the following, two connections to the target will be established. However, the system does not seem to recognize that it is the same target.

iscsictl add_send_target -a 172.18.86.130
iscsictl add_send_target -a 172.18.86.131
iscsictl refresh_targets

iscsictl list_targets
     1: iqn.1992-08.com.netapp:naclaug
        2: 172.18.86.130:3260,1026
        3: 172.18.86.131:3260,1027

iscsictl login -P 2
iscsictl login -P 3

iscsictl list_sessions -c
Session 3: Target iqn.1992-08.com.netapp:naclaug
 Connection 1 Address 172.18.86.130:3260,1026
Session 4: Target iqn.1992-08.com.netapp:naclaug
 Connection 1 Address 172.18.86.131:3260,1027

dmesg shows:

[  1643.344321] scsibus4 at iscsi0: 1 target, 16 luns per target
[  1643.344321] sd3 at scsibus4 target 0 lun 11: <NETAPP, LUN C-Mode, 9700> disk fixed
[  1643.344321] sd3: 10240 GB, 65129 cyl, 16 head, 20607 sec, 512 bytes/sect x 21474836480 sectors
[  1643.354566] sd3: GPT GUID: d644c65b-110e-4dd0-9500-0cfc70900463
[  1643.354566] dk0 at sd3: "NetApp01", 21474836413 blocks at 34, type: ffs
[  1643.354566] sd3: async, 8-bit transfers, tagged queueing
[  2057.578810] scsibus5 at iscsi0: 1 target, 16 luns per target
[  2057.588952] sd4 at scsibus5 target 0 lun 11: <NETAPP, LUN C-Mode, 9700> disk fixed
[  2057.588952] sd4: 10240 GB, 65129 cyl, 16 head, 20607 sec, 512 bytes/sect x 21474836480 sectors
[  2057.588952] sd4: GPT GUID: d644c65b-110e-4dd0-9500-0cfc70900463
[  2057.588952] dk1 at sd4: "02203a72-215e-4912-9815-a5025bdd8124", 21474836413 blocks at 34, type: ffs
[  2057.588952] autoconfiguration error: sd4: wedge named 'NetApp01' already existed, using '02203a72-215e-4912-9815-a5025bdd8124'
[  2057.588952] sd4: async, 8-bit transfers, tagged queueing

Unfortunately, the system does not recognize that it is the same target.
>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/56728: iscsi redundancy doesn't work
Date: Tue, 22 Feb 2022 16:20:22 -0000 (UTC)

 6bone@6bone.informatik.uni-leipzig.de writes:

 >If you try the following, the target is only connected via the first send_target:

 >iscsictl add_send_target -a 172.18.86.130
 >iscsictl add_send_target -a 172.18.86.131
 >iscsictl refresh_targets

 >iscsictl list_targets
 >     1: iqn.1992-08.com.netapp:naclaug
 >        2: 172.18.86.130:3260,1026
 >        3: 172.18.86.131:3260,1027

 >iscsictl login -P 1

 >iscsictl list_sessions -c
 >Session 3: Target iqn.1992-08.com.netapp:naclaug
 > Connection 1 Address 172.18.86.130:3260,1026

 >If the connection to the target is broken, the system will crash a few minutes later.


 Do you have any messages from that crash? A Backtrace ?


From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: bin/56728: iscsi redundancy doesn't work
Date: Tue, 22 Feb 2022 23:31:47 +0100 (CET)

 On Tue, 22 Feb 2022, Michael van Elst wrote:

 >
 > Do you have any messages from that crash? A Backtrace ?
 >

 Does that help?


 dmesg -M netbsd.30.core -N netbsd.30

 ...
 [ 98560.538286] sd3d: error writing fsbn 7490271010 of 
 7490271010-7490271073 (sd3 bn 7490271010; cn 22717 tn 9 sn 18043)
 [ 98560.538286] sd3d: error writing fsbn 7489986082 of 
 7489986082-7489986145 (sd3 bn 7489986082; cn 22716 tn 12 sn 1006)
 [ 98560.538286] sd3d: error reading fsbn 7415496930 of 
 7415496930-7415496993 (sd3 bn 7415496930; cn 22490 tn 13 sn 6159)
 [ 98560.538286] sd3d: error reading fsbn 7415496930 of 
 7415496930-7415496993 (sd3 bn 7415496930; cn 22490 tn 13 sn 6159)
 [ 98591.063909] uvm_fault(0xffffffff81901bc0, 0xffff860067907000, 1) -> e
 [ 98591.063909] fatal page fault in supervisor mode
 [ 98591.063909] trap type 6 code 0 rip 0xffffffff8025381b cs 0x8 rflags 
 0x10282 cr2 0xffff860067907070 ilevel 0 rsp 0xffff860396d57c20
 [ 98591.063909] curlwp 0xffff8186f0a7c940 pid 0.390 lowest kstack 
 0xffff860396d532c0
 [ 98591.063909] panic: trap
 [ 98591.063909] cpu2: Begin traceback...
 [ 98591.073939] vpanic() at netbsd:vpanic+0x156
 [ 98591.073939] panic() at netbsd:panic+0x3c
 [ 98591.073939] trap() at netbsd:trap+0xb27
 [ 98591.073939] --- trap (number 6) ---
 [ 98591.073939] sd_diskstart() at netbsd:sd_diskstart+0x38
 [ 98591.073939] dk_start() at netbsd:dk_start+0xed
 [ 98591.073939] spec_strategy() at netbsd:spec_strategy+0x9d
 [ 98591.073939] VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x3c
 [ 98591.073939] dkstart() at netbsd:dkstart+0x184
 [ 98591.073939] spec_strategy() at netbsd:spec_strategy+0x9d
 [ 98591.083939] VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x3c
 [ 98591.083939] bio_doread() at netbsd:bio_doread+0x96
 [ 98591.083939] bread() at netbsd:bread+0x18
 [ 98591.083939] ffs_update.part.0() at netbsd:ffs_update.part.0+0x129
 [ 98591.083939] ffs_full_fsync() at netbsd:ffs_full_fsync+0xf2
 [ 98591.083939] VOP_FSYNC() at netbsd:VOP_FSYNC+0x47
 [ 98591.093940] sched_sync() at netbsd:sched_sync+0x1a2
 [ 98591.093940] cpu2: End traceback...

 [ 98591.093940] dumping to dev 4,1 (offset=22227071, size=12581616):
 [ 98591.093940] dump <4>mfi0: workqueue busy: updates stopped
 [ 98649.955719] coretemp0: workqueue busy: updates stopped
 [ 98649.955719] coretemp1: workqueue busy: updates stopped
 [ 98649.955719] coretemp2: workqueue busy: updates stopped
 [ 98649.955719] coretemp3: workqueue busy: updates stopped


 (gdb) target kvm netbsd.30.core
 0xffffffff80226145 in cpu_reboot (howto=howto@entry=260,
      bootstr=bootstr@entry=0x0)
      at 
 /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/machdep.c:720
 720                     dumpsys();
 (gdb) bt
 #0  0xffffffff80226145 in cpu_reboot (howto=howto@entry=260,
      bootstr=bootstr@entry=0x0)
      at 
 /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/machdep.c:720
 #1  0xffffffff80d5e677 in kern_reboot (howto=howto@entry=260,
      bootstr=bootstr@entry=0x0)
      at /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/kern/kern_reboot.c:73
 #2  0xffffffff80da1d42 in vpanic (fmt=fmt@entry=0xffffffff813900f6 "trap",
      ap=ap@entry=0xffff860396d579e8)
      at /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/kern/subr_prf.c:290
 #3  0xffffffff80da1e07 in panic (fmt=fmt@entry=0xffffffff813900f6 "trap")
      at /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/kern/subr_prf.c:209
 #4  0xffffffff80228f67 in trap (frame=0xffff860396d57b30)
      at 
 /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/trap.c:326
 #5  0xffffffff80221023 in alltraps ()
 #6  0xffff8186ef216b80 in ?? ()
 #7  0xffff8185d9255dd0 in ?? ()
 #8  0x0000000000000006 in ?? ()
 #9  0x000000000000001e in ?? ()
 #10 0xffff8186edb93a40 in ?? ()
 #11 0x0000000000080537 in ?? ()
 #12 0xffff8186f0981a40 in ?? ()
 #13 0xffff818588fec050 in ?? ()
 #14 0x0000000000000008 in ?? ()
 #15 0x0000000000001000 in ?? ()
 #16 0x0000000000000000 in ?? ()


 (gdb) list *(0xffffffff80226145)
 0xffffffff80226145 is in cpu_reboot 
 (/mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/machdep.c:720).
 715             /* Disable interrupts. */
 716             s = splhigh();
 717
 718             /* Do a dump if requested. */
 719             if ((howto & (RB_DUMP | RB_HALT)) == RB_DUMP)
 720                     dumpsys();
 721
 722     haltsys:
 723             doshutdownhooks();
 724

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/56728: iscsi redundancy doesn't work
Date: Wed, 23 Feb 2022 03:58:57 -0000 (UTC)

 6bone@6bone.informatik.uni-leipzig.de writes:

 >On Tue, 22 Feb 2022, Michael van Elst wrote:

 >>
 >> Do you have any messages from that crash? A Backtrace ?
 >>

 >Does that help?


 >[ 98560.538286] sd3d: error reading fsbn 7415496930 of 
 >7415496930-7415496993 (sd3 bn 7415496930; cn 22490 tn 13 sn 6159)
 >[ 98591.063909] uvm_fault(0xffffffff81901bc0, 0xffff860067907000, 1) -> e
 >[ 98591.063909] fatal page fault in supervisor mode
 >[ 98591.063909] trap type 6 code 0 rip 0xffffffff8025381b cs 0x8 rflags 
 >0x10282 cr2 0xffff860067907070 ilevel 0 rsp 0xffff860396d57c20
 >[ 98591.063909] curlwp 0xffff8186f0a7c940 pid 0.390 lowest kstack 


 Looks like sd_diskstart is running with a NULL periph pointer.
 I don't see how that happens yet, but it is probably the result of
 detaching the sd device while in use. The detach message is
 probably not yet printed.

 So that's one problem.

 Detaching the sd device happens when no connection to the iscsi
 server exists and no connection cannot be re-established
 either.

 For multiple connections to an iscsi server you need to do something
 like:

 add_send_target    -> add target to list
 refresh_targets    -> get portals
 login              -> establish session (creates sd)
 add_connection     -> add redundant connection to session

 the man page doesn't look correct.

 # iscsictl add_send_target -a x.x.x.x
 Added Send Target 1
 # iscsictl refresh_targets
 OK
 # iscsictl list_targets
      1: iqn.2007-09.jp.ne.peach.istgt:pbulk1
         2: x.x.x.x:3260,1
      3: iqn.2007-09.jp.ne.peach.istgt:test
         4: x.x.x.x:3260,1
 # iscsictl login -P 4
 Created Session 2, Connection 1
 # iscsictl list_sessions
 Session 2: Target iqn.2007-09.jp.ne.peach.istgt:test
 # iscsictl add_connection -I 2
 Added Connection 2

 tcp        0      0  y.y.y.y.65330       x.x.x.x.3260         ESTABLISHED
 tcp        0      0  y.y.y.y.65331       x.x.x.x.3260         ESTABLISHED

 # tcpdrop y.y.y.y 65530 x.x.x.x 3260

 [ 793856.693477] S2C2: *** Connection Error, status=18, logout=2, state=3
 [ 793856.693477] S2C2: Write failed sock 0xffff8524621c5480 (ret: 32, req: 48, resid: 48)
 [ 793856.693477] S2C2: *** Connection Error, status=18, logout=-1, state=5
 [ 793858.693531] S2C2: Connection ReCreated successfully - status 0


 tcp        0      0  y.y.y.y.65329       x.x.x.x.3260         ESTABLISHED
 tcp        0      0  y.y.y.y.65331       x.x.x.x.3260         ESTABLISHED

 Not exactly the same (only a single target IP), but it shows how the
 connection gets re-established.

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: bin/56728: iscsi redundancy doesn't work
Date: Wed, 23 Feb 2022 09:34:03 +0100 (CET)

 Hello,

 iscsictl add_send_target -a 172.18.86.130
 iscsictl add_send_target -a 172.18.86.131
 iscsictl refresh_targets
 OK

 iscsictl list_targets
       1: iqn.1992-08.com.netapp:naclaug
          2: 172.18.86.130:3260,1026
          3: 172.18.86.131:3260,1027

 iscsictl login -P 2
 Created Session 3, Connection 1

 iscsictl list_sessions
 Session 3: Target iqn.1992-08.com.netapp:naclaug

 iscsictl add_connection -I 3
 iscsictl: add_connection: The login failed
 -> crash (no drive is muted yet!)


 [  1326.174188] scsibus4 at iscsi0: 1 target, 16 luns per target
 [  1326.174188] sd3 at scsibus4 target 0 lun 11: <NETAPP, LUN C-Mode, 
 9700> disk fixed
 [  1326.174188] sd3: 10240 GB, 65129 cyl, 16 head, 20607 sec, 512 
 bytes/sect x 21474836480 sectors
 [  1326.174188] sd3: GPT GUID: d644c65b-110e-4dd0-9500-0cfc70900463
 [  1326.174188] dk0 at sd3: "NetApp01", 21474836413 blocks at 34, type: 
 ffs
 [  1326.174188] sd3: async, 8-bit transfers, tagged queueing
 [  1355.946583] S3C2: Login failed (rc 4)
 [  1355.946583] S3C2: *** Connection Error, status=18, logout=2, state=6
 [  1356.950793] dk0 at sd3 (NetApp01) deleted
 [  1356.950793] sd3: detached
 [  1356.950793] scsibus4: detached
 [  1358.296346] uvm_fault(0xffffffff819014c0, 0xffffd68067902000, 2) -> e
 [  1358.296346] fatal page fault in supervisor mode
 [  1358.306388] trap type 6 code 0x2 rip 0xffffffff8022d80c cs 0x8 rflags 
 0x10246 cr2 0xffffd680679021c0 ilevel 0 rsp 0xffffd68396627f08
 [  1358.306388] curlwp 0xfffff80d32d67a00 pid 0.554 lowest kstack 
 0xffffd683966232c0
 [  1358.306388] panic: trap
 [  1358.306388] cpu1: Begin traceback...
 [  1358.306388] vpanic() at netbsd:vpanic+0x156
 [  1358.306388] panic() at netbsd:panic+0x3c
 [  1358.306388] trap() at netbsd:trap+0xb27
 [  1358.306388] --- trap (number 6) ---
 [  1358.306388] mutex_enter() at netbsd:mutex_enter+0xc
 [  1358.306388] send_nop_out() at iscsi:send_nop_out+0x133
 [  1358.306388] connection_timeout() at iscsi:connection_timeout+0x4d
 [  1358.306388] iscsi_cleanup_thread() at iscsi:iscsi_cleanup_thread+0x7b2
 [  1358.306388] cpu1: End traceback...

 [  1358.306388] dumping to dev 4,1 (offset=22227071, size=12581616):
 [  1358.306388] dump <4>mfi0: workqueue busy: updates stopped
 [  1390.048733] coretemp0: workqueue busy: updates stopped
 [  1390.048733] coretemp1: workqueue busy: updates stopped
 [  1390.048733] coretemp2: workqueue busy: updates stopped
 [  1390.048733] coretemp3: workqueue busy: updates stopped
 ipmi0: workqueue busy: updates stopped

 (gdb) target kvm netbsd.31.core
 0xffffffff80226145 in cpu_reboot (howto=howto@entry=260,
      bootstr=bootstr@entry=0x0)
      at 
 /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/machdep.c:720
 720                     dumpsys();

 (gdb) bt
 #0  0xffffffff80226145 in cpu_reboot (howto=howto@entry=260,
      bootstr=bootstr@entry=0x0)
      at 
 /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/machdep.c:720
 #1  0xffffffff80d37917 in kern_reboot (howto=howto@entry=260,
      bootstr=bootstr@entry=0x0)
      at /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/kern/kern_reboot.c:73
 #2  0xffffffff80d7afe2 in vpanic (fmt=fmt@entry=0xffffffff81390116 "trap",
      ap=ap@entry=0xffffd68396627cc8)
      at /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/kern/subr_prf.c:290
 #3  0xffffffff80d7b0a7 in panic (fmt=fmt@entry=0xffffffff81390116 "trap")
      at /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/kern/subr_prf.c:209
 #4  0xffffffff80228f67 in trap (frame=0xffffd68396627e10)
      at 
 /mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/trap.c:326
 #5  0xffffffff80221023 in alltraps ()
 #6  0xffffd680679021c0 in ?? ()
 #7  0x0000000000000000 in ?? ()

 list *(0xffffffff80226145)
 0xffffffff80226145 is in cpu_reboot 
 (/mnt/iscsi_iqn.1992-08.com.netapp/usr/src/sys/arch/amd64/amd64/machdep.c:720).
 715             /* Disable interrupts. */
 716             s = splhigh();
 717
 718             /* Do a dump if requested. */
 719             if ((howto & (RB_DUMP | RB_HALT)) == RB_DUMP)
 720                     dumpsys();
 721
 722     haltsys:
 723             doshutdownhooks();
 724


 Thank you for your efforts

 Regards
 Uwe



 On Wed, 23 Feb 2022, Michael van Elst wrote:

 > Date: Wed, 23 Feb 2022 04:00:02 +0000 (UTC)
 > From: Michael van Elst <mlelstv@serpens.de>
 > Reply-To: gnats-bugs@netbsd.org
 > To: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
 >     6bone@6bone.informatik.uni-leipzig.de
 > Subject: [Extern] Re: bin/56728: iscsi redundancy doesn't work
 > 
 > The following reply was made to PR bin/56728; it has been noted by GNATS.
 >
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: bin/56728: iscsi redundancy doesn't work
 > Date: Wed, 23 Feb 2022 03:58:57 -0000 (UTC)
 >
 > 6bone@6bone.informatik.uni-leipzig.de writes:
 >
 > >On Tue, 22 Feb 2022, Michael van Elst wrote:
 >
 > >>
 > >> Do you have any messages from that crash? A Backtrace ?
 > >>
 >
 > >Does that help?
 >
 >
 > >[ 98560.538286] sd3d: error reading fsbn 7415496930 of
 > >7415496930-7415496993 (sd3 bn 7415496930; cn 22490 tn 13 sn 6159)
 > >[ 98591.063909] uvm_fault(0xffffffff81901bc0, 0xffff860067907000, 1) -> e
 > >[ 98591.063909] fatal page fault in supervisor mode
 > >[ 98591.063909] trap type 6 code 0 rip 0xffffffff8025381b cs 0x8 rflags
 > >0x10282 cr2 0xffff860067907070 ilevel 0 rsp 0xffff860396d57c20
 > >[ 98591.063909] curlwp 0xffff8186f0a7c940 pid 0.390 lowest kstack
 >
 >
 > Looks like sd_diskstart is running with a NULL periph pointer.
 > I don't see how that happens yet, but it is probably the result of
 > detaching the sd device while in use. The detach message is
 > probably not yet printed.
 >
 > So that's one problem.
 >
 > Detaching the sd device happens when no connection to the iscsi
 > server exists and no connection cannot be re-established
 > either.
 >
 > For multiple connections to an iscsi server you need to do something
 > like:
 >
 > add_send_target    -> add target to list
 > refresh_targets    -> get portals
 > login              -> establish session (creates sd)
 > add_connection     -> add redundant connection to session
 >
 > the man page doesn't look correct.
 >
 > # iscsictl add_send_target -a x.x.x.x
 > Added Send Target 1
 > # iscsictl refresh_targets
 > OK
 > # iscsictl list_targets
 >      1: iqn.2007-09.jp.ne.peach.istgt:pbulk1
 >         2: x.x.x.x:3260,1
 >      3: iqn.2007-09.jp.ne.peach.istgt:test
 >         4: x.x.x.x:3260,1
 > # iscsictl login -P 4
 > Created Session 2, Connection 1
 > # iscsictl list_sessions
 > Session 2: Target iqn.2007-09.jp.ne.peach.istgt:test
 > # iscsictl add_connection -I 2
 > Added Connection 2
 >
 > tcp        0      0  y.y.y.y.65330       x.x.x.x.3260         ESTABLISHED
 > tcp        0      0  y.y.y.y.65331       x.x.x.x.3260         ESTABLISHED
 >
 > # tcpdrop y.y.y.y 65530 x.x.x.x 3260
 >
 > [ 793856.693477] S2C2: *** Connection Error, status=18, logout=2, state=3
 > [ 793856.693477] S2C2: Write failed sock 0xffff8524621c5480 (ret: 32, req: 48, resid: 48)
 > [ 793856.693477] S2C2: *** Connection Error, status=18, logout=-1, state=5
 > [ 793858.693531] S2C2: Connection ReCreated successfully - status 0
 >
 >
 > tcp        0      0  y.y.y.y.65329       x.x.x.x.3260         ESTABLISHED
 > tcp        0      0  y.y.y.y.65331       x.x.x.x.3260         ESTABLISHED
 >
 > Not exactly the same (only a single target IP), but it shows how the
 > connection gets re-established.
 >
 >

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/56728: iscsi redundancy doesn't work
Date: Wed, 23 Feb 2022 12:49:37 -0000 (UTC)

 6bone@6bone.informatik.uni-leipzig.de writes:

 >iscsictl list_sessions
 >Session 3: Target iqn.1992-08.com.netapp:naclaug

 >iscsictl add_connection -I 3
 >iscsictl: add_connection: The login failed
 >-> crash (no drive is muted yet!)


 That's a second connection to the same portal (not the second portal
 ) and apparently it is exclusive and rejected by the target. Shouldn't
 crash anything though.

 The command

 iscsictl add_connection -I 3 -P 3

 should add a connection for session 3 to the second portal number 3.

 I'll try to set up something similar for testing...

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/56728: iscsi redundancy doesn't work
Date: Sun, 27 Feb 2022 15:09:02 -0000 (UTC)

 mlelstv@serpens.de (Michael van Elst) writes:

 > The command
 > iscsictl add_connection -I 3 -P 3
 > should add a connection for session 3 to the second portal number 3.
 > 
 > I'll try to set up something similar for testing...

 This part does work. I can create two redundant connections, cut network
 connectivity for one net (e.g. by dropping routes) and restore it and
 everything stays stable.

 Trying to logout from that session fails somewhat. Access to the
 devices fails immediately with I/O error and after some timeout
 the session is dropped and the devices detach.

 Removing each connection individually with 'remove_connection' looks
 more sane. When last connection is removed, the session is dropped
 and the devices detach.

 Simply stopping the target would also cause infinite retries.
 You can trace all this by increasing hw.iscsi.debug with sysctl.

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: bin/56728: iscsi redundancy doesn't work
Date: Mon, 28 Feb 2022 09:08:57 +0100 (CET)

 On Sun, 27 Feb 2022, Michael van Elst wrote:

 > Date: Sun, 27 Feb 2022 15:10:01 +0000 (UTC)
 > From: Michael van Elst <mlelstv@serpens.de>
 > Reply-To: gnats-bugs@netbsd.org
 > To: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
 >     6bone@6bone.informatik.uni-leipzig.de
 > Subject: [Extern] Re: bin/56728: iscsi redundancy doesn't work
 > 
 > The following reply was made to PR bin/56728; it has been noted by GNATS.
 >
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: bin/56728: iscsi redundancy doesn't work
 > Date: Sun, 27 Feb 2022 15:09:02 -0000 (UTC)

 iscsictl list_targets
       1: iqn.1992-08.com.netapp:naclaug
          2: 172.18.86.130:3260,1026
          3: 172.18.86.131:3260,1027

 iscsictl list_sessions -c
 Session 3: Target iqn.1992-08.com.netapp:naclaug
   Connection 1 Address 172.18.86.130:3260,1026

 iscsictl add_connection -I 3 -P 3
 iscsictl: add_connection: The login failed
 -> crash

 Here is some debug output. At time 881 'iscsictl login -P 2' was run. At 
 1669 the iscsictl add_connection -I 3 -P 3. Nothing between.

 [   863.216616] ISCSI Ioctl cmd = c0680002
 [   863.216616] Create Connection for Session 1
 [   863.216616] S1C1: get_socket: par_sock=8, fdesc=0xfffffe6856d01680
 [   863.216616] ISCSI Ioctl cmd = c1000008
 [   863.216616] ISCSI Ioctl cmd = c1000008
 [   863.216616] ISCSI Ioctl cmd = c0080003
 [   863.216616] ISCSI: kill_session 1, status 29, logout 0, recover 0
 [   863.216616] S1C1: Kill_connection: terminating=0, status=29, logout=0, 
 state=3
 [   863.216616] S1C1: Kill_connection setting destroy flag
 [   863.216616] S1C1: Received Logout PDU - CCB = 0xffffd600678fad00, 
 otherconn=0, response=0
 [   863.216616] S1C1: Closing Socket 0xfffffe6856d01680
 [   863.216616] S1C1: TERMINATED
 [   863.216616] S1C1: Waiting for conn_idle
 [   863.216616] S1C1: Waited for conn_idle, destroy = 1
 [   863.216616] ISCSI Ioctl cmd = c018000d
 [   863.216616] S1C1: Send thread exits
 [   863.216616] Cleanup: session 1
 [   863.216616] Cleanup: Unmap session 1
 [   863.216616] Cleanup: session ended 1
 [   863.216616] ISCSI Ioctl cmd = c0680002
 [   863.216616] Create Connection for Session 2
 [   863.216616] S2C1: get_socket: par_sock=8, fdesc=0xfffffe68556c2b00
 [   863.216616] ISCSI Ioctl cmd = c018000d
 [   863.216616] ISCSI Ioctl cmd = c1000008
 [   863.216616] ISCSI Ioctl cmd = c1000008
 [   863.216616] ISCSI Ioctl cmd = c0080003
 [   863.216616] ISCSI: kill_session 2, status 29, logout 0, recover 0
 [   863.216616] S2C1: Kill_connection: terminating=0, status=29, logout=0, 
 state=3
 [   863.216616] S2C1: Kill_connection setting destroy flag
 [   863.216616] S2C1: Received Logout PDU - CCB = 0xffffd600678fad00, 
 otherconn=0, response=0
 [   863.216616] S2C1: Closing Socket 0xfffffe68556c2b00
 [   863.216616] S2C1: TERMINATED
 [   863.216616] S2C1: ISCSI Ioctl cmd = c018000d
 [   863.216616] Waiting for conn_idle
 [   863.216616] S2C1: Waited for conn_idle, destroy = 1
 [   863.216616] S2C1: Send thread exits
 [   863.216616] Cleanup: session 2
 [   863.216616] Cleanup: Unmap session 2
 [   863.216616] Cleanup: session ended 2
 [   863.216616] ISCSI Ioctl cmd = c018000d
 [   881.873938] ISCSI Ioctl cmd = c0680002
 [   881.873938] Create Connection for Session 3
 [   881.873938] S3C1: get_socket: par_sock=8, fdesc=0xfffffe6856d01680
 [   881.884021] Login: map session 3
 [   881.884021] scsibus4 at iscsi0: 1 target, 16 luns per target
 [   881.884021] sd3 at scsibus4 target 0 lun 11: <NETAPP, LUN C-Mode, 
 9700> disk fixed
 [   881.884021] sd3: 10240 GB, 65129 cyl, 16 head, 20607 sec, 512 
 bytes/sect x 21474836480 sectors
 [   881.884021] sd3: GPT GUID: d644c65b-110e-4dd0-9500-0cfc70900463
 [   881.884021] dk0 at sd3: "NetApp01", 21474836413 blocks at 34, type: 
 ffs
 [   881.884021] sd3: async, 8-bit transfers, tagged queueing
 [  1669.006479] ISCSI Ioctl cmd = c0680004
 [  1669.006479] Create Connection for Session 3
 [  1669.006479] S3C2: get_socket: par_sock=8, fdesc=0xfffffe68527dd7c0
 [  1669.006479] S3C2: Login problem - Class = 2, Detail = a
 [  1669.006479] S3C2: Login failed (rc 4)
 [  1669.006479] S3C2: Kill_connection: terminating=0, status=4, logout=-1, 
 state=0
 [  1669.006479] S3C2: Kill_connection setting destroy flag
 [  1669.006479] S3C2: Closing Socket 0xfffffe68527dd7c0
 [  1669.006479] S3C2: S3C2: Read failed (ret: 0, req: 48, out: 48)
 [  1669.006479] TERMINATED
 [  1669.006479] S3C2: ISCSI Ioctl cmd = c018000d
 [  1669.006479] S3C2: *** Connection Error, status=18, logout=2, state=6
 [  1669.006479] Waiting for conn_idle
 [  1669.006479] S3C2: Waited for conn_idle, destroy = 1
 [  1669.006479] S3C2: Send thread exits
 [  1669.006479] Cleanup: session 3
 [  1669.006479] Cleanup: Unmap session 3
 [  1669.006479] dk0 at sd3 (NetApp01) deleted
 [  1669.006479] sd3: detached
 [  1669.006479] scsibus4: detached
 [  1669.006479] Cleanup: session ended 3
 [  1669.006479] ISCSI Ioctl cmd = c018000d


 >
 > Trying to logout from that session fails somewhat. Access to the
 > devices fails immediately with I/O error and after some timeout
 > the session is dropped and the devices detach.
 >
 > Removing each connection individually with 'remove_connection' looks
 > more sane. When last connection is removed, the session is dropped
 > and the devices detach.
 >
 > Simply stopping the target would also cause infinite retries.
 > You can trace all this by increasing hw.iscsi.debug with sysctl.
 >
 >


 Thank you for your efforts

 Regards
 Uwe

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: bin/56728: iscsi redundancy doesn't work
Date: Wed, 2 Mar 2022 13:42:19 +0100 (CET)

 Hello,

 if it helps with troubleshooting, here is a screenshot of recent iscsi 
 crashes. I can't provoke the crash. The problem has happened to me several 
 times. A dump is not written. Therefore I can only offer the image.

 It is always the position with "conn != zero".

 https://speicherwolke.uni-leipzig.de/index.php/s/fqjRTwTXszL2pXA


 Thank you for your efforts


 Regards
 Uwe

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: bin/56728: iscsi redundancy doesn't work
Date: Sat, 30 Apr 2022 11:19:41 +0200 (CEST)

 Hello,

 I tested with kernel version 9.99.96. The problem is still there. I now 
 think there are three independent problems.

 Problem 1: it is possible to reproduce a kernel crash with iscsi. This is 
 independent of the iscsi redundancy.

 Problem 2: in my environment it is not possible to set up a redundant 
 environment.

 Problem 3: the manpage is wrong.


 Details on problem 1: If you use iscsictl to open and close connections to 
 the iscsi target, the kernel crashes. iscsi devices are not mounted at 
 this time. Here is a kernel dump:


 [ 40275.994983] dk0 at sd3 (NetApp01) deleted
 [ 40275.994983] sd3: detached
 [ 40275.994983] scsibus4: detached
 [ 40282.903575] uvm_fault(0xffffffff81903e00, 0xffffa08067901000, 2) -> e
 [ 40282.903575] fatal page fault in supervisor mode
 [ 40282.903575] trap type 6 code 0x2 rip 0xffffffff8022d84c cs 0x8 rflags 
 0x10246 cr2 0xffffa080679011c0 ilevel 0 rsp 0xffffa08397034f08
 [ 40282.903575] curlwp 0xffffecdb7be4db00 pid 0.581 lowest kstack 
 0xffffa083970302c0
 [ 40282.903575] panic: trap
 [ 40282.903575] cpu0: Begin traceback...
 [ 40282.903575] vpanic() at netbsd:vpanic+0x183
 [ 40282.903575] panic() at netbsd:panic+0x3c
 [ 40282.903575] trap() at netbsd:trap+0xb27
 [ 40282.903575] --- trap (number 6) ---
 [ 40282.903575] mutex_enter() at netbsd:mutex_enter+0xc
 [ 40282.903575] send_nop_out() at iscsi:send_nop_out+0x133
 [ 40282.903575] connection_timeout() at iscsi:connection_timeout+0x4d
 [ 40282.903575] iscsi_cleanup_thread() at iscsi:iscsi_cleanup_thread+0x7b2
 [ 40282.903575] cpu0: End traceback...
 [ 40282.903575] dumping to dev 4,1 (offset=22227071, size=12581616):
 [ 40282.903575] dump <7>lagg0: link state DOWN (was UP)



 Problem 2 details: We are using a NetApp iscsi cluster. Multiple Netapp 
 controllers are used for redundancy. These provide the same device (iqn). 
 If you establish a session to the second controller, NetBSD does not 
 recognize that it is the same iqn. NetBSD thinks it's a new device:

 [ 39843.450611] scsibus5 at iscsi0: 1 target, 16 luns per target
 [ 39843.450611] sd4 at scsibus5 target 0 lun 11: <NETAPP, LUN C-Mode, 9700> disk fixed
 [ 39843.450611] sd4: 10240 GB, 65129 cyl, 16 head, 20607 sec, 512 bytes/sect x 21474836480 sectors
 [ 39843.450611] sd4: GPT GUID: d644c65b-110e-4dd0-9500-0cfc70900463
 [ 39843.450611] dk1 at sd4: "02203a72-215e-4912-9815-a5025bdd8124", 21474836413blocks at 34, type: ffs
 [ 39843.450611] autoconfiguration error: sd4: wedge named 'NetApp01' already existed, using '02203a72-215e-4912-9815-a5025bdd8124'
 [ 39843.450611] sd4: async, 8-bit transfers, tagged queueing
 [ 39929.604877] dk1 at sd4 (02203a72-215e-4912-9815-a5025bdd8124) deleted
 [ 39929.604877] sd4: detached
 [ 39929.604877] scsibus5: detached

 Problem 3: the man page for iscsictl contains errors. For example, the 
 add_connection command requires an -I parameter that is not described. The 
 meaning of -m is not described. There seem to be errors with other 
 commands as well.

 Thank you for your efforts


 Regards
 Uwe


 On Wed, 2 Mar 2022, 6bone@6bone.informatik.uni-leipzig.de wrote:

 > Date: Wed, 2 Mar 2022 13:42:19 +0100 (CET)
 > From: 6bone@6bone.informatik.uni-leipzig.de
 > To: gnats-bugs@netbsd.org
 > Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
 > Subject: Re: [Extern] Re: bin/56728: iscsi redundancy doesn't work
 > 
 > Hello,
 >
 > if it helps with troubleshooting, here is a screenshot of recent iscsi 
 > crashes. I can't provoke the crash. The problem has happened to me several 
 > times. A dump is not written. Therefore I can only offer the image.
 >
 > It is always the position with "conn != zero".
 >
 > https://speicherwolke.uni-leipzig.de/index.php/s/fqjRTwTXszL2pXA
 >
 >
 > Thank you for your efforts
 >
 >
 > Regards
 > Uwe
 >

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/56728: iscsi redundancy doesn't work
Date: Sat, 30 Apr 2022 10:02:23 -0000 (UTC)

 6bone@6bone.informatik.uni-leipzig.de writes:

 >Details on problem 1: If you use iscsictl to open and close connections to 
 >the iscsi target, the kernel crashes. iscsi devices are not mounted at 
 >this time. Here is a kernel dump:

 >[ 40282.903575] trap type 6 code 0x2 rip 0xffffffff8022d84c cs 0x8 rflags 
 >0x10246 cr2 0xffffa080679011c0 ilevel 0 rsp 0xffffa08397034f08
 >[ 40282.903575] curlwp 0xffffecdb7be4db00 pid 0.581 lowest kstack 
 >0xffffa083970302c0
 >[ 40282.903575] panic: trap
 >[ 40282.903575] cpu0: Begin traceback...
 >[ 40282.903575] vpanic() at netbsd:vpanic+0x183
 >[ 40282.903575] panic() at netbsd:panic+0x3c
 >[ 40282.903575] trap() at netbsd:trap+0xb27
 >[ 40282.903575] --- trap (number 6) ---
 >[ 40282.903575] mutex_enter() at netbsd:mutex_enter+0xc
 >[ 40282.903575] send_nop_out() at iscsi:send_nop_out+0x133
 >[ 40282.903575] connection_timeout() at iscsi:connection_timeout+0x4d
 >[ 40282.903575] iscsi_cleanup_thread() at iscsi:iscsi_cleanup_thread+0x7b2
 >[ 40282.903575] cpu0: End traceback...
 >[ 40282.903575] dumping to dev 4,1 (offset=22227071, size=12581616):
 >[ 40282.903575] dump <7>lagg0: link state DOWN (was UP)


 If that's reproducable, please provide a log with ISCSI_DEBUG.



 >Problem 2 details: We are using a NetApp iscsi cluster. Multiple Netapp 
 >controllers are used for redundancy. These provide the same device (iqn). 
 >If you establish a session to the second controller, NetBSD does not 
 >recognize that it is the same iqn. NetBSD thinks it's a new device:

 It is a new device. Linux supports multipathing (in the device mapper
 driver) that detects such duplicates, handles the redundancy and
 provides a virtual disk.

 Unfortunately our device mapper still lacks multipathing support
 (and also mirror support, but that's another story).




 >Problem 3: the man page for iscsictl contains errors. For example, the 
 >add_connection command requires an -I parameter that is not described. The 
 >meaning of -m is not described. There seem to be errors with other 
 >commands as well.

 Yes. The man page is neither correct nor complete.



 Greetings,

Responsible-Changed-From-To: bin-bug-people->mlelstv
Responsible-Changed-By: mlelstv@NetBSD.org
Responsible-Changed-When: Sun, 11 Sep 2022 22:21:30 +0000
Responsible-Changed-Why:
mine


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2022 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.