NetBSD Problem Report #58422

From www@netbsd.org  Fri Jul 12 09:01:30 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 4F1F71A9238
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 12 Jul 2024 09:01:30 +0000 (UTC)
Message-Id: <20240712090129.010901A9239@mollari.NetBSD.org>
Date: Fri, 12 Jul 2024 09:01:28 +0000 (UTC)
From: 6bone@6bone.informatik.uni-leipzig.de
Reply-To: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Subject: kernel crash when using the iscsi initiator
X-Send-Pr-Version: www-1.0

>Number:         58422
>Category:       kern
>Synopsis:       kernel crash when using the iscsi initiator
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jul 12 09:05:00 +0000 2024
>Last-Modified:  Wed Jul 31 12:05:01 +0000 2024
>Originator:     Uwe Toenjes
>Release:        NetBSD 10.0_STABLE
>Organization:
University of Leipzig
>Environment:
NetBSD 6bone.informatik.uni-leipzig.de 10.0_STABLE NetBSD 10.0_STABLE (MYCONF10.gdb) #1: Mon Jul  8 15:34:51 CEST 2024  root@6bone.informatik.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF10.gdb amd64
>Description:
If you use the iscsi initiator, the kernel crashes after a few days. The behavior is reproducible.

output from: dmesg -M netbsd.47.core -N /netbsd

....
[ 44363.227068] S3C1: freeing UNUSED pdu
[ 44389.708635] S3C1: freeing UNUSED pdu
[ 44396.580012] S3C1: freeing UNUSED pdu
[ 44415.335063] S3C1: freeing UNUSED pdu
[ 44431.093458] S3C1: freeing UNUSED pdu
[ 44463.477924] S3C1: freeing UNUSED pdu
[ 44464.509811] S3C1: freeing UNUSED pdu
[ 44520.645690] S3C1: freeing UNUSED pdu
[ 44531.724707] S3C1: freeing UNUSED pdu
[ 85126.220101] S-1C-1: ccb_timeout: num=1 total=1 disp=0
[ 85126.220101] uvm_fault(0xffffffff8195d9a0, 0x0, 1) -> e
[ 85126.220101] fatal page fault in supervisor mode
[ 85126.220101] trap type 6 code 0 rip 0xffffffff80d8fc5d cs 0x8 rflags 0x10202 cr2 0x10 ilevel 0 rsp 0xffff940c9666ff10
[ 85126.220101] curlwp 0xffff82c9d7b87b00 pid 0.607 lowest kstack 0xffff940c9666b2c0
[ 85126.220101] panic: trap
[ 85126.220101] cpu2: Begin traceback...
[ 85126.220101] vpanic() at netbsd:vpanic+0x183
[ 85126.220101] panic() at netbsd:panic+0x3c
[ 85126.220101] trap() at netbsd:trap+0xb24
[ 85126.220101] --- trap (number 6) ---
[ 85126.230191] mutex_enter() at netbsd:mutex_enter+0x3e
[ 85126.230191] wake_ccb() at iscsi:wake_ccb+0x8e
[ 85126.230191] iscsi_cleanup_thread() at iscsi:iscsi_cleanup_thread+0x351
[ 85126.230191] cpu2: End traceback...

[ 85126.230191] dumping to dev 4,1 (offset=22227071, size=12581616):
[ 85126.230191] dump <4>ipmi0: workqueue busy: updates stopped
[ 85171.151680] mfi0: workqueue busy: updates stopped
[ 85171.151680] coretemp0: workqueue busy: updates stopped
[ 85171.151680] coretemp1: workqueue busy: updates stopped
[ 85171.151680] coretemp2: workqueue busy: updates stopped
[ 85171.151680] coretemp3: workqueue busy: updates stopped

gdb output:

Reading symbols from /netbsd...
(gdb) target kvm netbsd.47.core
0xffffffff80237015 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:717
717                     dumpsys();
(gdb) bt
#0  0xffffffff80237015 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:717
#1  0xffffffff80d9da24 in kern_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:73
#2  0xffffffff80de7d8d in vpanic (fmt=fmt@entry=0xffffffff8139b6c8 "trap", ap=ap@entry=0xffff940c9666fcd8) at /usr/src/sys/kern/subr_prf.c:291
#3  0xffffffff80de7e62 in panic (fmt=fmt@entry=0xffffffff8139b6c8 "trap") at /usr/src/sys/kern/subr_prf.c:208
#4  0xffffffff80239e17 in trap (frame=0xffff940c9666fe20) at /usr/src/sys/arch/amd64/amd64/trap.c:326
#5  0xffffffff80231ef4 in alltraps ()
#6  0x0000000000000010 in ?? ()
#7  0xffff9400677341c8 in ?? ()
#8  0x0000000000000006 in ?? ()
#9  0x0000000000000000 in ?? ()


>How-To-Repeat:
See description. If you need additional information or the complete dump file, please let me know.
>Fix:

>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Sat, 13 Jul 2024 07:59:10 -0000 (UTC)

 6bone@6bone.informatik.uni-leipzig.de writes:

 >NetBSD 6bone.informatik.uni-leipzig.de 10.0_STABLE NetBSD 10.0_STABLE (MYCONF10.gdb) #1: Mon Jul  8 15:34:51 CEST 2024  root@6bone.informatik.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF10.gdb amd64

 >[ 85126.220101] cpu2: Begin traceback...
 >[ 85126.220101] vpanic() at netbsd:vpanic+0x183
 >[ 85126.220101] panic() at netbsd:panic+0x3c
 >[ 85126.220101] trap() at netbsd:trap+0xb24
 >[ 85126.220101] --- trap (number 6) ---
 >[ 85126.230191] mutex_enter() at netbsd:mutex_enter+0x3e
 >[ 85126.230191] wake_ccb() at iscsi:wake_ccb+0x8e
 >[ 85126.230191] iscsi_cleanup_thread() at iscsi:iscsi_cleanup_thread+0x351
 >[ 85126.230191] cpu2: End traceback...


 That's a use-after-free, when a connection is shut down.

 The cleanup thread handles timed out CCBs that have already
 been freed with the closing connection.

 Can you try to recreate the crash with a DIAGNOSTIC kernel and/or
 set the sysctl hw.iscsi.debug to 9 ?

 It might be easier to recreate if you know how the connection
 is shut down. Maybe some reconfig or reboot of the iscsi target
 or some network issue.

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Sun, 14 Jul 2024 20:58:05 +0200 (CEST)

 On Sat, 13 Jul 2024, Michael van Elst wrote:

 > Date: Sat, 13 Jul 2024 08:00:03 +0000 (UTC)
 > From: Michael van Elst <mlelstv@serpens.de>
 > Reply-To: gnats-bugs@netbsd.org
 > To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 >     netbsd-bugs@netbsd.org, 6bone@6bone.informatik.uni-leipzig.de
 > Subject: [Extern] Re: kern/58422: kernel crash when using the iscsi initiator
 > 
 > The following reply was made to PR kern/58422; it has been noted by GNATS.
 >
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: kern/58422: kernel crash when using the iscsi initiator
 > Date: Sat, 13 Jul 2024 07:59:10 -0000 (UTC)
 >
 > 6bone@6bone.informatik.uni-leipzig.de writes:
 >
 > >NetBSD 6bone.informatik.uni-leipzig.de 10.0_STABLE NetBSD 10.0_STABLE (MYCONF10.gdb) #1: Mon Jul  8 15:34:51 CEST 2024  root@6bone.informatik.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF10.gdb amd64
 >
 > >[ 85126.220101] cpu2: Begin traceback...
 > >[ 85126.220101] vpanic() at netbsd:vpanic+0x183
 > >[ 85126.220101] panic() at netbsd:panic+0x3c
 > >[ 85126.220101] trap() at netbsd:trap+0xb24
 > >[ 85126.220101] --- trap (number 6) ---
 > >[ 85126.230191] mutex_enter() at netbsd:mutex_enter+0x3e
 > >[ 85126.230191] wake_ccb() at iscsi:wake_ccb+0x8e
 > >[ 85126.230191] iscsi_cleanup_thread() at iscsi:iscsi_cleanup_thread+0x351
 > >[ 85126.230191] cpu2: End traceback...
 >
 >
 > That's a use-after-free, when a connection is shut down.
 >
 > The cleanup thread handles timed out CCBs that have already
 > been freed with the closing connection.

 I don't know if it's relevant, but connecting the iscsi initiator occurs 
 without a noticeable message in the dmesg.

 After you start using the iscsi device, the messages "S3C1: freeing UNUSED 
 pdu" appear regularly in the dmesg.

 This goes on for a few days until the crash.

 >
 > Can you try to recreate the crash with a DIAGNOSTIC kernel and/or
 > set the sysctl hw.iscsi.debug to 9 ?
 >

 The dump from the bug report already comes from a kernel with DIAGNOSTIC 
 enabled. I will try to cause the crash with hw.iscsi.debug=9.

 > It might be easier to recreate if you know how the connection
 > is shut down. Maybe some reconfig or reboot of the iscsi target
 > or some network issue.
 >

 The iscsi connection was only started once during boot. Then it ran until 
 it crashed. The iscsi target is a NetApp system that runs very stable. I 
 think a network problem is unlikely.


 Thank you for your efforts

 Regards
 Uwe

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Mon, 15 Jul 2024 04:42:28 -0000 (UTC)

 6bone@6bone.informatik.uni-leipzig.de writes:

 >The dump from the bug report already comes from a kernel with DIAGNOSTIC 
 >enabled. I will try to cause the crash with hw.iscsi.debug=9.

 This is a bit strange. A DIAGNOSTIC kernel should have triggered an
 assertion in wake_ccb() instead of crashing with an UVM fault.

 The reason is probably that your kernel doesn't have iscsi builtin,
 but loaded as a module (which is built without DIAGNOSTIC).

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Mon, 15 Jul 2024 11:29:00 +0200 (CEST)

 On Mon, 15 Jul 2024, Michael van Elst wrote:

 > Date: Mon, 15 Jul 2024 04:45:01 +0000 (UTC)
 > From: Michael van Elst <mlelstv@serpens.de>
 > Reply-To: gnats-bugs@netbsd.org
 > To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 >     netbsd-bugs@netbsd.org, 6bone@6bone.informatik.uni-leipzig.de
 > Subject: [Extern] Re: kern/58422: kernel crash when using the iscsi initiator
 > 
 > The following reply was made to PR kern/58422; it has been noted by GNATS.
 >
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: kern/58422: kernel crash when using the iscsi initiator
 > Date: Mon, 15 Jul 2024 04:42:28 -0000 (UTC)
 >
 > 6bone@6bone.informatik.uni-leipzig.de writes:
 >
 > >The dump from the bug report already comes from a kernel with DIAGNOSTIC
 > >enabled. I will try to cause the crash with hw.iscsi.debug=9.
 >
 > This is a bit strange. A DIAGNOSTIC kernel should have triggered an
 > assertion in wake_ccb() instead of crashing with an UVM fault.
 >
 > The reason is probably that your kernel doesn't have iscsi builtin,
 > but loaded as a module (which is built without DIAGNOSTIC).
 >
 >
 You're right. The iscsi is used as a module. I'm creating a new kernel 
 with iscsi built in.

 You're probably right in your assumption that it's a network problem. I 
 noticed that when the iscsi load is high, the load on CPU 0 due to 
 interrupts is extremely high.

 load averages:  3.73,  3.73,  2.93;               up 0+11:53:45        08:59:41
 53 processes: 1 runnable, 50 sleeping, 2 on CPU
 CPU0 states:  0.0% user,  0.0% nice,  4.8% system, 95.2% interrupt,  0.0% idle
 CPU1 states:  0.0% user,  0.0% nice, 95.6% system,  0.0% interrupt,  4.4% idle
 CPU2 states:  0.0% user,  0.0% nice, 89.5% system,  1.0% interrupt,  9.5% idle
 CPU3 states:  0.0% user,  0.0% nice, 96.8% system,  0.0% interrupt,  3.2% idle
 Memory: 24G Act, 12G Inact, 17M Wired, 24M Exec, 36G File, 19M Free
 Swap: 59G Total, 59G Free / Pools: 8105M Used / Network: 1400K In, 3312K 
 Out

    PID USERNAME PRI NICE   SIZE   RES STATE       TIME   WCPU    CPU  COMMAND
 0     root     123    0     0K  422M CPU/3     805:15   238%   238%  [system]
 10668 mirror    25    0   298M  248M RUN/1       6:42 56.69% 56.69%  rsync
 ...

 With the high CPU load, network packets can probably be lost.


 Thank you for your efforts

 Regards
 Uwe

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Sat, 20 Jul 2024 07:41:59 +0200 (CEST)

 I have an new dump. Output from "dmesg -T -M netbsd.48.core"

 ...
 [Sun Jul 21 21:21:37 CEST 2024] S3C1: freeing UNUSED pdu
 [Sun Jul 21 21:21:37 CEST 2024] S3C1: freeing UNUSED pdu
 [Sun Jul 21 21:21:38 CEST 2024] S3C1: freeing UNUSED pdu
 [Sun Jul 21 21:21:38 CEST 2024] S3C1: freeing UNUSED pdu
 [Sun Jul 21 21:21:38 CEST 2024] S3C1: freeing UNUSED pdu
 [Sun Jul 21 21:21:59 CEST 2024] S3C1: freeing UNUSED pdu
 [Sun Jul 21 21:22:08 CEST 2024] S3C1: freeing UNUSED pdu
 [Sun Jul 21 21:22:08 CEST 2024] S3C1: freeing UNUSED pdu
 [Sun Jul 21 21:22:38 CEST 2024] S3C1: freeing UNUSED pdu
 [Sun Jul 21 21:22:42 CEST 2024] S3C1: freeing UNUSED pdu
 [Sun Jul 21 21:22:45 CEST 2024] S3C1: freeing UNUSED pdu
 [Sun Jul 21 21:22:47 CEST 2024] S3C1: freeing UNUSED pdu
 [Sun Jul 21 21:22:48 CEST 2024] S3C1: freeing UNUSED pdu
 [Sun Jul 21 23:46:26 CEST 2024] S-1C-1: ccb_timeout: num=1 total=1 disp=0
 [Sun Jul 21 23:46:26 CEST 2024] panic: kernel diagnostic assertion "conn 
 != NULL" failed: file "/usr/src/sys/dev/iscsi/iscsi_utils.c", line 403
 [Sun Jul 21 23:46:26 CEST 2024] cpu1: Begin traceback...
 [Sun Jul 21 23:46:26 CEST 2024] vpanic() at netbsd:vpanic+0x183
 [Sun Jul 21 23:46:26 CEST 2024] kern_assert() at netbsd:kern_assert+0x4b
 [Sun Jul 21 23:46:26 CEST 2024] wake_ccb() at netbsd:wake_ccb+0x1e8
 [Sun Jul 21 23:46:26 CEST 2024] iscsi_cleanup_thread() at netbsd:iscsi_cleanup_thread+0x5b4
 [Sun Jul 21 23:46:26 CEST 2024] cpu1: End traceback...

 [Sun Jul 21 23:46:26 CEST 2024] dumping to dev 4,1 (offset=22227071, size=12581616):
 [Sun Jul 21 23:46:26 CEST 2024] dump <4>ipmi0: workqueue busy: updates stopped
 mfi0: workqueue busy: updates stopped
 [Sun Jul 21 23:47:13 CEST 2024] coretemp0: workqueue busy: updates stopped
 [Sun Jul 21 23:47:13 CEST 2024] coretemp1: workqueue busy: updates stopped
 [Sun Jul 21 23:47:13 CEST 2024] coretemp3: workqueue busy: updates stopped

 gdb:
 Reading symbols from /netbsd...
 (No debugging symbols found in /netbsd)
 (gdb) target kvm netbsd.48.core
 0xffffffff80237015 in cpu_reboot ()
 (gdb) bt
 #0  0xffffffff80237015 in cpu_reboot ()
 #1  0xffffffff80d9da24 in kern_reboot ()
 #2  0xffffffff80de7d8d in vpanic ()
 #3  0xffffffff80fbd20f in kern_assert ()
 #4  0xffffffff80f1d3d5 in wake_ccb ()
 #5  0xffffffff80f0e2d7 in iscsi_cleanup_thread ()
 #6  0xffffffff8020e787 in lwp_trampoline ()
 #7  0x0000000000000000 in ?? ()


 Regards
 Uwe

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Sat, 20 Jul 2024 08:20:25 +0200 (CEST)

 now the correkt gdb output:

 Type "apropos word" to search for commands related to "word"...
 Reading symbols from /netbsd...
 (gdb) target kvm netbsd.48.core
 0xffffffff80237015 in cpu_reboot (howto=howto@entry=260,
      bootstr=bootstr@entry=0x0) at 
 /usr/src/sys/arch/amd64/amd64/machdep.c:717
 717                     dumpsys();
 (gdb) bt
 #0  0xffffffff80237015 in cpu_reboot (howto=howto@entry=260,
      bootstr=bootstr@entry=0x0) at 
 /usr/src/sys/arch/amd64/amd64/machdep.c:717
 #1  0xffffffff80d9da24 in kern_reboot (howto=howto@entry=260,
      bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:73
 #2  0xffffffff80de7d8d in vpanic (
      fmt=0xffffffff8139b748 "kernel %sassertion \"%s\" failed: file \"%s\", 
 line %d ", ap=ap@entry=0xffffd08c96806f28) at 
 /usr/src/sys/kern/subr_prf.c:291
 #3  0xffffffff80fbd20f in kern_assert (
      fmt=fmt@entry=0xffffffff8139b748 "kernel %sassertion \"%s\" failed: 
 file \"%s\", line %d ") at /usr/src/sys/lib/libkern/kern_assert.c:51
 #4  0xffffffff80f1d3d5 in wake_ccb (ccb=0xffffd0806769f430, status=24)
      at /usr/src/sys/dev/iscsi/iscsi_utils.c:403
 #5  0xffffffff80f0e2d7 in iscsi_cleanup_thread (par=<optimized out>)
      at /usr/src/sys/dev/iscsi/iscsi_ioctl.c:1823
 #6  0xffffffff8020e787 in lwp_trampoline ()
 #7  0x0000000000000000 in ?? ()
 (gdb)

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Sat, 20 Jul 2024 07:32:25 -0000 (UTC)

 6bone@6bone.informatik.uni-leipzig.de writes:

 >(gdb) target kvm netbsd.48.core
 >0xffffffff80237015 in cpu_reboot ()
 >(gdb) bt
 >#0  0xffffffff80237015 in cpu_reboot ()
 >#1  0xffffffff80d9da24 in kern_reboot ()
 >#2  0xffffffff80de7d8d in vpanic ()
 >#3  0xffffffff80fbd20f in kern_assert ()
 >#4  0xffffffff80f1d3d5 in wake_ccb ()
 >#5  0xffffffff80f0e2d7 in iscsi_cleanup_thread ()
 >#6  0xffffffff8020e787 in lwp_trampoline ()
 >#7  0x0000000000000000 in ?? ()


 Can you try this patch ?


 Index: sys/dev/iscsi/iscsi_ioctl.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/iscsi/iscsi_ioctl.c,v
 retrieving revision 1.34
 diff -p -u -r1.34 iscsi_ioctl.c
 --- sys/dev/iscsi/iscsi_ioctl.c	25 Nov 2023 10:08:27 -0000	1.34
 +++ sys/dev/iscsi/iscsi_ioctl.c	20 Jul 2024 07:29:07 -0000
 @@ -1694,8 +1694,8 @@ ccb_timeout_start(ccb_t *ccb, int ticks)
  void
  ccb_timeout_stop(ccb_t *ccb)
  {
 -	callout_stop(&ccb->ccb_timeout);
  	mutex_enter(&iscsi_cleanup_mtx);
 +	callout_halt(&ccb->ccb_timeout, &iscsi_cleanup_mtx);
  	if (ccb->ccb_timedout == TOUT_QUEUED) {
  		TAILQ_REMOVE(&iscsi_timeout_ccb_list, ccb, ccb_tchain);
  		ccb->ccb_timedout = TOUT_NONE;
 Index: sys/dev/iscsi/iscsi_send.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/iscsi/iscsi_send.c,v
 retrieving revision 1.40
 diff -p -u -r1.40 iscsi_send.c
 --- sys/dev/iscsi/iscsi_send.c	25 Nov 2023 10:08:27 -0000	1.40
 +++ sys/dev/iscsi/iscsi_send.c	20 Jul 2024 07:29:07 -0000
 @@ -1723,19 +1723,19 @@ ccb_timeout(ccb_t *ccb)
  {
  	connection_t *conn = ccb->ccb_connection;

 +	if (conn == NULL) {
 +		/* XXX Should never happen */
 +		printf("ccb_timeout: num=%d total=%d disp=%d invalid ccb=%p\n",
 +			ccb->ccb_num_timeouts+1, ccb->ccb_total_tries,
 +			ccb->ccb_disp, ccb);
 +		return;
 +	}
 +
  	ccb->ccb_total_tries++;

  	DEBC(conn, 0, ("ccb_timeout: num=%d total=%d disp=%d\n",
  		ccb->ccb_num_timeouts+1, ccb->ccb_total_tries, ccb->ccb_disp));

 -	/*
 -	 * XXX can we time out after connection is closed ?
 -	 */
 -	if (conn == NULL) {
 -		wake_ccb(ccb, ISCSI_STATUS_TIMEOUT);
 -		return;
 -	}
 -
  	if (++ccb->ccb_num_timeouts > MAX_CCB_TIMEOUTS ||
  		ccb->ccb_total_tries > MAX_CCB_TRIES ||
  		ccb->ccb_disp <= CCBDISP_FREE ||

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Mon, 22 Jul 2024 08:26:29 +0200 (CEST)

 On Sat, 20 Jul 2024, Michael van Elst wrote:

 >
 > Can you try this patch ?
 >

 It seems like the patch was applied at least once. dmesg reports:

 [ 123362.515673] ccb_timeout: num=1 total=0 disp=0 invalid ccb=0xffff8980676abe28

 The kernel continued running without crashing.


 Regards
 Uwe

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Wed, 24 Jul 2024 08:16:49 +0200 (CEST)

 On Mon, 22 Jul 2024, 6bone@6bone.informatik.uni-leipzig.de wrote:

 The code from the patch was executed again:

 [ 312501.042855] ccb_timeout: num=1 total=0 disp=0 invalid ccb=0xffff8980676ad5a8

 The kernel did not crash.


 Regards
 Uwe

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Sat, 27 Jul 2024 18:52:12 +0200 (CEST)

 The next crash. The kernel contains the patch from the case.

 dmesg -M netbsd.49.core
 ....
 [ 95493.709585] S3C1: freeing UNUSED pdu
 [ 95499.766892] S3C1: freeing UNUSED pdu
 [ 95500.317270] S3C1: freeing UNUSED pdu
 [ 95506.245037] S3C1: freeing UNUSED pdu
 [ 95511.421411] S3C1: freeing UNUSED pdu
 [ 95512.393054] Received Reject PDU, reason = 4, data_len = 48
 [ 95512.393054] S3C1: *** Connection Error, status=25, logout=1, state=3
 [ 95512.393054] sd3(iscsi0:0:0:11): generic HBA error
 [ 95512.393054] sd3d: error writing fsbn 34674059682 of 34674059682-34674059745 (sd3 bn 34674059682; cn 16930693 tn 13 sn 2)
 [ 95519.402427] sd3d: error writing fsbn 35353610530 of 35353610530-35353610593 (sd3 bn 35353610530; cn 17262505 tn 9 sn 2)
 [ 95522.475421] S3C1: *** Connection Error, status=24, logout=-1, state=5
 [ 95524.408023] sd3d: error reading fsbn 33006352546 of 33006352546-33006352609 (sd3 bn 33006352546; cn 16116383 tn 5 sn 2)
 [ 95524.498026] S3C1: Resend ccb 0xffffc280676b72b8 (37) - updating CmdSN old 21405452, new 21405454
 [ 95524.498026] S3C1: Connection ReCreated successfully - status 0
 [ 95525.691696] panic: ufsdirhash_findfree: free mismatch
 [ 95525.691696] cpu2: Begin traceback...
 [ 95525.691696] vpanic() at netbsd:vpanic+0x183
 [ 95525.691696] panic() at netbsd:panic+0x3c
 [ 95525.691696] ufsdirhash_findfree() at netbsd:ufsdirhash_findfree+0x240
 [ 95525.691696] ufs_lookup() at netbsd:ufs_lookup+0x6ac
 [ 95525.701692] VOP_LOOKUP() at netbsd:VOP_LOOKUP+0x44
 [ 95525.701692] lookup_once() at netbsd:lookup_once+0x1a6
 [ 95525.701692] namei_tryemulroot() at netbsd:namei_tryemulroot+0xb00
 [ 95525.701692] namei() at netbsd:namei+0x29
 [ 95525.701692] vn_open() at netbsd:vn_open+0x133
 [ 95525.701692] do_open() at netbsd:do_open+0xc3
 [ 95525.711693] do_sys_openat() at netbsd:do_sys_openat+0x74
 [ 95525.711693] sys_open() at netbsd:sys_open+0x24
 [ 95525.711693] syscall() at netbsd:syscall+0x196
 [ 95525.711693] --- syscall (number 5) ---
 [ 95525.711693] netbsd:syscall+0x196:
 [ 95525.711693] cpu2: End traceback...

 [ 95525.711693] dumping to dev 4,1 (offset=22227071, size=12581616):
 [ 95525.711693] dump S3C1: *** Connection Error, status=24, logout=-1, 
 state=3


 Reading symbols from /netbsd...
 (gdb) target kvm netbsd.49.core
 0xffffffff80237015 in cpu_reboot (howto=howto@entry=260,
      bootstr=bootstr@entry=0x0) at 
 /usr/src/sys/arch/amd64/amd64/machdep.c:717
 717                     dumpsys();
 (gdb) bt
 #0  0xffffffff80237015 in cpu_reboot (howto=howto@entry=260, 
 bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:717
 #1  0xffffffff80d9da24 in kern_reboot (howto=howto@entry=260, 
 bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:73
 #2  0xffffffff80de7d8d in vpanic (fmt=fmt@entry=0xffffffff814b5c80 
 "ufsdirhash_findfree: free mismatch", ap=ap@entry=0xffffc28c97682908) at 
 /usr/src/sys/kern/subr_prf.c:291
 #3  0xffffffff80de7e62 in panic (fmt=fmt@entry=0xffffffff814b5c80 
 "ufsdirhash_findfree: free mismatch") at /usr/src/sys/kern/subr_prf.c:208
 #4  0xffffffff80d16d61 in ufsdirhash_findfree 
 (ip=ip@entry=0xffffe5b4c53e07f8, slotneeded=60, 
 slotsize=slotsize@entry=0xffffc28c97682a54)
      at /usr/src/sys/ufs/ufs/ufs_dirhash.c:585
 #5  0xffffffff80d1ae04 in slot_findfree (dp=0xffffe5b4c53e07f8, 
 slot=0xffffc28c97682a4c) at /usr/src/sys/ufs/ufs/ufs_lookup.c:134
 #6  ufs_lookup (v=<optimized out>) at 
 /usr/src/sys/ufs/ufs/ufs_lookup.c:451
 #7  0xffffffff80e67c1c in VOP_LOOKUP (dvp=dvp@entry=0xffffe5b4c66b2040, 
 vpp=vpp@entry=0xffffc28c97682b38, cnp=cnp@entry=0xffffc28c97682d70) at 
 /usr/src/sys/kern/vnode_if.c:443
 #8  0xffffffff80e440f0 in lookup_once 
 (state=state@entry=0xffffc28c97682cc0, searchdir=<optimized out>, 
 newsearchdir_ret=newsearchdir_ret@entry=0xffffc28c97682c18,
      foundobj_ret=foundobj_ret@entry=0xffffc28c97682c20, 
 newsearchdir_locked_ret=newsearchdir_locked_ret@entry=0xffffc28c97682c17) 
 at /usr/src/sys/kern/vfs_lookup.c:1150
 #9  0xffffffff80e45277 in namei_oneroot (isnfsd=0, inhibitmagic=0, 
 neverfollow=0, state=<optimized out>) at 
 /usr/src/sys/kern/vfs_lookup.c:1545
 #10 namei_tryemulroot (state=state@entry=0xffffc28c97682cc0, 
 neverfollow=neverfollow@entry=0, inhibitmagic=inhibitmagic@entry=0, 
 isnfsd=isnfsd@entry=0)
      at /usr/src/sys/kern/vfs_lookup.c:1905
 #11 0xffffffff80e4724c in namei (ndp=ndp@entry=0xffffc28c97682d20) at 
 /usr/src/sys/kern/vfs_lookup.c:1941
 #12 0xffffffff80e5f40c in vn_open (at_dvp=at_dvp@entry=0x0, 
 pb=pb@entry=0xffffe5b6f4a7d340, nmode=nmode@entry=16, 
 fmode=fmode@entry=2563, cmode=384,
      ret_vp=ret_vp@entry=0xffffc28c97682ec8, 
 ret_domove=ret_domove@entry=0xffffc28c97682eb7, 
 ret_fd=ret_fd@entry=0xffffc28c97682eb8) at 
 /usr/src/sys/kern/vfs_vnops.c:219
 #13 0xffffffff80e520af in do_open (l=l@entry=0xffffe5b7ea77f5c0, dvp=0x0, 
 pb=0xffffe5b6f4a7d340, open_flags=open_flags@entry=2562, 
 open_mode=open_mode@entry=384,
      fd=fd@entry=0xffffc28c97682f7c) at 
 /usr/src/sys/kern/vfs_syscalls.c:1752
 #14 0xffffffff80e521cb in do_sys_openat (l=0xffffe5b7ea77f5c0, 
 fdat=fdat@entry=-100, path=<optimized out>, flags=2562, mode=384, 
 fd=fd@entry=0xffffc28c97682f7c)
      at /usr/src/sys/kern/vfs_syscalls.c:1835
 #15 0xffffffff80e52319 in sys_open (l=<optimized out>, uap=<optimized 
 out>, retval=0xffffc28c97682fb0) at /usr/src/sys/kern/vfs_syscalls.c:1856
 #16 0xffffffff8058b8e5 in sy_call (rval=0xffffc28c97682fb0, 
 uap=0xffffc28c97683000, l=0xffffe5b7ea77f5c0, sy=0xffffffff8187d1d8 
 <sysent+120>)
      at /usr/src/sys/sys/syscallvar.h:65
 #17 sy_invoke (code=5, rval=0xffffc28c97682fb0, uap=0xffffc28c97683000, 
 l=0xffffe5b7ea77f5c0, sy=0xffffffff8187d1d8 <sysent+120>) at 
 /usr/src/sys/sys/syscallvar.h:94
 #18 syscall (frame=0xffffc28c97683000) at 
 /usr/src/sys/arch/x86/x86/syscall.c:138
 #19 0xffffffff8020e6bd in handle_syscall ()
 #20 0x00007f7fffdb42e0 in ?? ()
 #21 0x0000000000000a02 in ?? ()
 #22 0x0000000000000180 in ?? ()
 #23 0x000072709817957a in ?? ()
 #24 0x0000000000000000 in ?? ()



 Regards
 Uwe

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Sat, 27 Jul 2024 20:26:07 -0000 (UTC)

 6bone@6bone.informatik.uni-leipzig.de writes:

 >[ 95512.393054] Received Reject PDU, reason = 4, data_len = 48
 >[ 95512.393054] S3C1: *** Connection Error, status=25, logout=1, state=3

 Protocol error (status=25)

 Maybe your iscsi target can tell more.


 >[ 95512.393054] sd3(iscsi0:0:0:11): generic HBA error
 >[ 95512.393054] sd3d: error writing fsbn 34674059682 of 34674059682-34674059745 (sd3 bn 34674059682; cn 16930693 tn 13 sn 2)
 >[ 95519.402427] sd3d: error writing fsbn 35353610530 of 35353610530-35353610593 (sd3 bn 35353610530; cn 17262505 tn 9 sn 2)

 As the result some pending I/Os get aborted.


 >[ 95522.475421] S3C1: *** Connection Error, status=24, logout=-1, state=5

 Timeout (status=24)


 >[ 95524.408023] sd3d: error reading fsbn 33006352546 of 33006352546-33006352609 (sd3 bn 33006352546; cn 16116383 tn 5 sn 2)

 Another I/O error as consequence.


 >[ 95524.498026] S3C1: Resend ccb 0xffffc280676b72b8 (37) - updating CmdSN old 21405452, new 21405454
 >[ 95524.498026] S3C1: Connection ReCreated successfully - status 0

 iscsi recovers, but the result is like with any failing disk.


 >[ 95525.691696] panic: ufsdirhash_findfree: free mismatch

 The filesystem crashes for some inconsistent data on the drive.


From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Wed, 31 Jul 2024 13:19:12 +0200 (CEST)

 On Sat, 27 Jul 2024, Michael van Elst wrote:
 >

 Since applying the patch I haven't had any further iscsi based kernel 
 crashes.

 Can the patch be included in the official kernel?


 Thank you for your efforts

 Regards
 Uwe

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Wed, 31 Jul 2024 12:01:17 -0000 (UTC)

 6bone@6bone.informatik.uni-leipzig.de writes:

 >Since applying the patch I haven't had any further iscsi based kernel 
 >crashes.

 >Can the patch be included in the official kernel?

 The patch mostly prevents the crash, but every time you see the
 debug message, that apparently is a race with a possibly leak
 of a command control block.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.