NetBSD Problem Report #58422
From www@netbsd.org Fri Jul 12 09:01:30 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 4F1F71A9238
for <gnats-bugs@gnats.NetBSD.org>; Fri, 12 Jul 2024 09:01:30 +0000 (UTC)
Message-Id: <20240712090129.010901A9239@mollari.NetBSD.org>
Date: Fri, 12 Jul 2024 09:01:28 +0000 (UTC)
From: 6bone@6bone.informatik.uni-leipzig.de
Reply-To: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Subject: kernel crash when using the iscsi initiator
X-Send-Pr-Version: www-1.0
>Number: 58422
>Category: kern
>Synopsis: kernel crash when using the iscsi initiator
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Jul 12 09:05:00 +0000 2024
>Last-Modified: Wed Jul 31 12:05:01 +0000 2024
>Originator: Uwe Toenjes
>Release: NetBSD 10.0_STABLE
>Organization:
University of Leipzig
>Environment:
NetBSD 6bone.informatik.uni-leipzig.de 10.0_STABLE NetBSD 10.0_STABLE (MYCONF10.gdb) #1: Mon Jul 8 15:34:51 CEST 2024 root@6bone.informatik.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF10.gdb amd64
>Description:
If you use the iscsi initiator, the kernel crashes after a few days. The behavior is reproducible.
output from: dmesg -M netbsd.47.core -N /netbsd
....
[ 44363.227068] S3C1: freeing UNUSED pdu
[ 44389.708635] S3C1: freeing UNUSED pdu
[ 44396.580012] S3C1: freeing UNUSED pdu
[ 44415.335063] S3C1: freeing UNUSED pdu
[ 44431.093458] S3C1: freeing UNUSED pdu
[ 44463.477924] S3C1: freeing UNUSED pdu
[ 44464.509811] S3C1: freeing UNUSED pdu
[ 44520.645690] S3C1: freeing UNUSED pdu
[ 44531.724707] S3C1: freeing UNUSED pdu
[ 85126.220101] S-1C-1: ccb_timeout: num=1 total=1 disp=0
[ 85126.220101] uvm_fault(0xffffffff8195d9a0, 0x0, 1) -> e
[ 85126.220101] fatal page fault in supervisor mode
[ 85126.220101] trap type 6 code 0 rip 0xffffffff80d8fc5d cs 0x8 rflags 0x10202 cr2 0x10 ilevel 0 rsp 0xffff940c9666ff10
[ 85126.220101] curlwp 0xffff82c9d7b87b00 pid 0.607 lowest kstack 0xffff940c9666b2c0
[ 85126.220101] panic: trap
[ 85126.220101] cpu2: Begin traceback...
[ 85126.220101] vpanic() at netbsd:vpanic+0x183
[ 85126.220101] panic() at netbsd:panic+0x3c
[ 85126.220101] trap() at netbsd:trap+0xb24
[ 85126.220101] --- trap (number 6) ---
[ 85126.230191] mutex_enter() at netbsd:mutex_enter+0x3e
[ 85126.230191] wake_ccb() at iscsi:wake_ccb+0x8e
[ 85126.230191] iscsi_cleanup_thread() at iscsi:iscsi_cleanup_thread+0x351
[ 85126.230191] cpu2: End traceback...
[ 85126.230191] dumping to dev 4,1 (offset=22227071, size=12581616):
[ 85126.230191] dump <4>ipmi0: workqueue busy: updates stopped
[ 85171.151680] mfi0: workqueue busy: updates stopped
[ 85171.151680] coretemp0: workqueue busy: updates stopped
[ 85171.151680] coretemp1: workqueue busy: updates stopped
[ 85171.151680] coretemp2: workqueue busy: updates stopped
[ 85171.151680] coretemp3: workqueue busy: updates stopped
gdb output:
Reading symbols from /netbsd...
(gdb) target kvm netbsd.47.core
0xffffffff80237015 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:717
717 dumpsys();
(gdb) bt
#0 0xffffffff80237015 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:717
#1 0xffffffff80d9da24 in kern_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:73
#2 0xffffffff80de7d8d in vpanic (fmt=fmt@entry=0xffffffff8139b6c8 "trap", ap=ap@entry=0xffff940c9666fcd8) at /usr/src/sys/kern/subr_prf.c:291
#3 0xffffffff80de7e62 in panic (fmt=fmt@entry=0xffffffff8139b6c8 "trap") at /usr/src/sys/kern/subr_prf.c:208
#4 0xffffffff80239e17 in trap (frame=0xffff940c9666fe20) at /usr/src/sys/arch/amd64/amd64/trap.c:326
#5 0xffffffff80231ef4 in alltraps ()
#6 0x0000000000000010 in ?? ()
#7 0xffff9400677341c8 in ?? ()
#8 0x0000000000000006 in ?? ()
#9 0x0000000000000000 in ?? ()
>How-To-Repeat:
See description. If you need additional information or the complete dump file, please let me know.
>Fix:
>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Sat, 13 Jul 2024 07:59:10 -0000 (UTC)
6bone@6bone.informatik.uni-leipzig.de writes:
>NetBSD 6bone.informatik.uni-leipzig.de 10.0_STABLE NetBSD 10.0_STABLE (MYCONF10.gdb) #1: Mon Jul 8 15:34:51 CEST 2024 root@6bone.informatik.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF10.gdb amd64
>[ 85126.220101] cpu2: Begin traceback...
>[ 85126.220101] vpanic() at netbsd:vpanic+0x183
>[ 85126.220101] panic() at netbsd:panic+0x3c
>[ 85126.220101] trap() at netbsd:trap+0xb24
>[ 85126.220101] --- trap (number 6) ---
>[ 85126.230191] mutex_enter() at netbsd:mutex_enter+0x3e
>[ 85126.230191] wake_ccb() at iscsi:wake_ccb+0x8e
>[ 85126.230191] iscsi_cleanup_thread() at iscsi:iscsi_cleanup_thread+0x351
>[ 85126.230191] cpu2: End traceback...
That's a use-after-free, when a connection is shut down.
The cleanup thread handles timed out CCBs that have already
been freed with the closing connection.
Can you try to recreate the crash with a DIAGNOSTIC kernel and/or
set the sysctl hw.iscsi.debug to 9 ?
It might be easier to recreate if you know how the connection
is shut down. Maybe some reconfig or reboot of the iscsi target
or some network issue.
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Sun, 14 Jul 2024 20:58:05 +0200 (CEST)
On Sat, 13 Jul 2024, Michael van Elst wrote:
> Date: Sat, 13 Jul 2024 08:00:03 +0000 (UTC)
> From: Michael van Elst <mlelstv@serpens.de>
> Reply-To: gnats-bugs@netbsd.org
> To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org, 6bone@6bone.informatik.uni-leipzig.de
> Subject: [Extern] Re: kern/58422: kernel crash when using the iscsi initiator
>
> The following reply was made to PR kern/58422; it has been noted by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: kern/58422: kernel crash when using the iscsi initiator
> Date: Sat, 13 Jul 2024 07:59:10 -0000 (UTC)
>
> 6bone@6bone.informatik.uni-leipzig.de writes:
>
> >NetBSD 6bone.informatik.uni-leipzig.de 10.0_STABLE NetBSD 10.0_STABLE (MYCONF10.gdb) #1: Mon Jul 8 15:34:51 CEST 2024 root@6bone.informatik.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF10.gdb amd64
>
> >[ 85126.220101] cpu2: Begin traceback...
> >[ 85126.220101] vpanic() at netbsd:vpanic+0x183
> >[ 85126.220101] panic() at netbsd:panic+0x3c
> >[ 85126.220101] trap() at netbsd:trap+0xb24
> >[ 85126.220101] --- trap (number 6) ---
> >[ 85126.230191] mutex_enter() at netbsd:mutex_enter+0x3e
> >[ 85126.230191] wake_ccb() at iscsi:wake_ccb+0x8e
> >[ 85126.230191] iscsi_cleanup_thread() at iscsi:iscsi_cleanup_thread+0x351
> >[ 85126.230191] cpu2: End traceback...
>
>
> That's a use-after-free, when a connection is shut down.
>
> The cleanup thread handles timed out CCBs that have already
> been freed with the closing connection.
I don't know if it's relevant, but connecting the iscsi initiator occurs
without a noticeable message in the dmesg.
After you start using the iscsi device, the messages "S3C1: freeing UNUSED
pdu" appear regularly in the dmesg.
This goes on for a few days until the crash.
>
> Can you try to recreate the crash with a DIAGNOSTIC kernel and/or
> set the sysctl hw.iscsi.debug to 9 ?
>
The dump from the bug report already comes from a kernel with DIAGNOSTIC
enabled. I will try to cause the crash with hw.iscsi.debug=9.
> It might be easier to recreate if you know how the connection
> is shut down. Maybe some reconfig or reboot of the iscsi target
> or some network issue.
>
The iscsi connection was only started once during boot. Then it ran until
it crashed. The iscsi target is a NetApp system that runs very stable. I
think a network problem is unlikely.
Thank you for your efforts
Regards
Uwe
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Mon, 15 Jul 2024 04:42:28 -0000 (UTC)
6bone@6bone.informatik.uni-leipzig.de writes:
>The dump from the bug report already comes from a kernel with DIAGNOSTIC
>enabled. I will try to cause the crash with hw.iscsi.debug=9.
This is a bit strange. A DIAGNOSTIC kernel should have triggered an
assertion in wake_ccb() instead of crashing with an UVM fault.
The reason is probably that your kernel doesn't have iscsi builtin,
but loaded as a module (which is built without DIAGNOSTIC).
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Mon, 15 Jul 2024 11:29:00 +0200 (CEST)
On Mon, 15 Jul 2024, Michael van Elst wrote:
> Date: Mon, 15 Jul 2024 04:45:01 +0000 (UTC)
> From: Michael van Elst <mlelstv@serpens.de>
> Reply-To: gnats-bugs@netbsd.org
> To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org, 6bone@6bone.informatik.uni-leipzig.de
> Subject: [Extern] Re: kern/58422: kernel crash when using the iscsi initiator
>
> The following reply was made to PR kern/58422; it has been noted by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: kern/58422: kernel crash when using the iscsi initiator
> Date: Mon, 15 Jul 2024 04:42:28 -0000 (UTC)
>
> 6bone@6bone.informatik.uni-leipzig.de writes:
>
> >The dump from the bug report already comes from a kernel with DIAGNOSTIC
> >enabled. I will try to cause the crash with hw.iscsi.debug=9.
>
> This is a bit strange. A DIAGNOSTIC kernel should have triggered an
> assertion in wake_ccb() instead of crashing with an UVM fault.
>
> The reason is probably that your kernel doesn't have iscsi builtin,
> but loaded as a module (which is built without DIAGNOSTIC).
>
>
You're right. The iscsi is used as a module. I'm creating a new kernel
with iscsi built in.
You're probably right in your assumption that it's a network problem. I
noticed that when the iscsi load is high, the load on CPU 0 due to
interrupts is extremely high.
load averages: 3.73, 3.73, 2.93; up 0+11:53:45 08:59:41
53 processes: 1 runnable, 50 sleeping, 2 on CPU
CPU0 states: 0.0% user, 0.0% nice, 4.8% system, 95.2% interrupt, 0.0% idle
CPU1 states: 0.0% user, 0.0% nice, 95.6% system, 0.0% interrupt, 4.4% idle
CPU2 states: 0.0% user, 0.0% nice, 89.5% system, 1.0% interrupt, 9.5% idle
CPU3 states: 0.0% user, 0.0% nice, 96.8% system, 0.0% interrupt, 3.2% idle
Memory: 24G Act, 12G Inact, 17M Wired, 24M Exec, 36G File, 19M Free
Swap: 59G Total, 59G Free / Pools: 8105M Used / Network: 1400K In, 3312K
Out
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
0 root 123 0 0K 422M CPU/3 805:15 238% 238% [system]
10668 mirror 25 0 298M 248M RUN/1 6:42 56.69% 56.69% rsync
...
With the high CPU load, network packets can probably be lost.
Thank you for your efforts
Regards
Uwe
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Sat, 20 Jul 2024 07:41:59 +0200 (CEST)
I have an new dump. Output from "dmesg -T -M netbsd.48.core"
...
[Sun Jul 21 21:21:37 CEST 2024] S3C1: freeing UNUSED pdu
[Sun Jul 21 21:21:37 CEST 2024] S3C1: freeing UNUSED pdu
[Sun Jul 21 21:21:38 CEST 2024] S3C1: freeing UNUSED pdu
[Sun Jul 21 21:21:38 CEST 2024] S3C1: freeing UNUSED pdu
[Sun Jul 21 21:21:38 CEST 2024] S3C1: freeing UNUSED pdu
[Sun Jul 21 21:21:59 CEST 2024] S3C1: freeing UNUSED pdu
[Sun Jul 21 21:22:08 CEST 2024] S3C1: freeing UNUSED pdu
[Sun Jul 21 21:22:08 CEST 2024] S3C1: freeing UNUSED pdu
[Sun Jul 21 21:22:38 CEST 2024] S3C1: freeing UNUSED pdu
[Sun Jul 21 21:22:42 CEST 2024] S3C1: freeing UNUSED pdu
[Sun Jul 21 21:22:45 CEST 2024] S3C1: freeing UNUSED pdu
[Sun Jul 21 21:22:47 CEST 2024] S3C1: freeing UNUSED pdu
[Sun Jul 21 21:22:48 CEST 2024] S3C1: freeing UNUSED pdu
[Sun Jul 21 23:46:26 CEST 2024] S-1C-1: ccb_timeout: num=1 total=1 disp=0
[Sun Jul 21 23:46:26 CEST 2024] panic: kernel diagnostic assertion "conn
!= NULL" failed: file "/usr/src/sys/dev/iscsi/iscsi_utils.c", line 403
[Sun Jul 21 23:46:26 CEST 2024] cpu1: Begin traceback...
[Sun Jul 21 23:46:26 CEST 2024] vpanic() at netbsd:vpanic+0x183
[Sun Jul 21 23:46:26 CEST 2024] kern_assert() at netbsd:kern_assert+0x4b
[Sun Jul 21 23:46:26 CEST 2024] wake_ccb() at netbsd:wake_ccb+0x1e8
[Sun Jul 21 23:46:26 CEST 2024] iscsi_cleanup_thread() at netbsd:iscsi_cleanup_thread+0x5b4
[Sun Jul 21 23:46:26 CEST 2024] cpu1: End traceback...
[Sun Jul 21 23:46:26 CEST 2024] dumping to dev 4,1 (offset=22227071, size=12581616):
[Sun Jul 21 23:46:26 CEST 2024] dump <4>ipmi0: workqueue busy: updates stopped
mfi0: workqueue busy: updates stopped
[Sun Jul 21 23:47:13 CEST 2024] coretemp0: workqueue busy: updates stopped
[Sun Jul 21 23:47:13 CEST 2024] coretemp1: workqueue busy: updates stopped
[Sun Jul 21 23:47:13 CEST 2024] coretemp3: workqueue busy: updates stopped
gdb:
Reading symbols from /netbsd...
(No debugging symbols found in /netbsd)
(gdb) target kvm netbsd.48.core
0xffffffff80237015 in cpu_reboot ()
(gdb) bt
#0 0xffffffff80237015 in cpu_reboot ()
#1 0xffffffff80d9da24 in kern_reboot ()
#2 0xffffffff80de7d8d in vpanic ()
#3 0xffffffff80fbd20f in kern_assert ()
#4 0xffffffff80f1d3d5 in wake_ccb ()
#5 0xffffffff80f0e2d7 in iscsi_cleanup_thread ()
#6 0xffffffff8020e787 in lwp_trampoline ()
#7 0x0000000000000000 in ?? ()
Regards
Uwe
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Sat, 20 Jul 2024 08:20:25 +0200 (CEST)
now the correkt gdb output:
Type "apropos word" to search for commands related to "word"...
Reading symbols from /netbsd...
(gdb) target kvm netbsd.48.core
0xffffffff80237015 in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at
/usr/src/sys/arch/amd64/amd64/machdep.c:717
717 dumpsys();
(gdb) bt
#0 0xffffffff80237015 in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at
/usr/src/sys/arch/amd64/amd64/machdep.c:717
#1 0xffffffff80d9da24 in kern_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:73
#2 0xffffffff80de7d8d in vpanic (
fmt=0xffffffff8139b748 "kernel %sassertion \"%s\" failed: file \"%s\",
line %d ", ap=ap@entry=0xffffd08c96806f28) at
/usr/src/sys/kern/subr_prf.c:291
#3 0xffffffff80fbd20f in kern_assert (
fmt=fmt@entry=0xffffffff8139b748 "kernel %sassertion \"%s\" failed:
file \"%s\", line %d ") at /usr/src/sys/lib/libkern/kern_assert.c:51
#4 0xffffffff80f1d3d5 in wake_ccb (ccb=0xffffd0806769f430, status=24)
at /usr/src/sys/dev/iscsi/iscsi_utils.c:403
#5 0xffffffff80f0e2d7 in iscsi_cleanup_thread (par=<optimized out>)
at /usr/src/sys/dev/iscsi/iscsi_ioctl.c:1823
#6 0xffffffff8020e787 in lwp_trampoline ()
#7 0x0000000000000000 in ?? ()
(gdb)
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Sat, 20 Jul 2024 07:32:25 -0000 (UTC)
6bone@6bone.informatik.uni-leipzig.de writes:
>(gdb) target kvm netbsd.48.core
>0xffffffff80237015 in cpu_reboot ()
>(gdb) bt
>#0 0xffffffff80237015 in cpu_reboot ()
>#1 0xffffffff80d9da24 in kern_reboot ()
>#2 0xffffffff80de7d8d in vpanic ()
>#3 0xffffffff80fbd20f in kern_assert ()
>#4 0xffffffff80f1d3d5 in wake_ccb ()
>#5 0xffffffff80f0e2d7 in iscsi_cleanup_thread ()
>#6 0xffffffff8020e787 in lwp_trampoline ()
>#7 0x0000000000000000 in ?? ()
Can you try this patch ?
Index: sys/dev/iscsi/iscsi_ioctl.c
===================================================================
RCS file: /cvsroot/src/sys/dev/iscsi/iscsi_ioctl.c,v
retrieving revision 1.34
diff -p -u -r1.34 iscsi_ioctl.c
--- sys/dev/iscsi/iscsi_ioctl.c 25 Nov 2023 10:08:27 -0000 1.34
+++ sys/dev/iscsi/iscsi_ioctl.c 20 Jul 2024 07:29:07 -0000
@@ -1694,8 +1694,8 @@ ccb_timeout_start(ccb_t *ccb, int ticks)
void
ccb_timeout_stop(ccb_t *ccb)
{
- callout_stop(&ccb->ccb_timeout);
mutex_enter(&iscsi_cleanup_mtx);
+ callout_halt(&ccb->ccb_timeout, &iscsi_cleanup_mtx);
if (ccb->ccb_timedout == TOUT_QUEUED) {
TAILQ_REMOVE(&iscsi_timeout_ccb_list, ccb, ccb_tchain);
ccb->ccb_timedout = TOUT_NONE;
Index: sys/dev/iscsi/iscsi_send.c
===================================================================
RCS file: /cvsroot/src/sys/dev/iscsi/iscsi_send.c,v
retrieving revision 1.40
diff -p -u -r1.40 iscsi_send.c
--- sys/dev/iscsi/iscsi_send.c 25 Nov 2023 10:08:27 -0000 1.40
+++ sys/dev/iscsi/iscsi_send.c 20 Jul 2024 07:29:07 -0000
@@ -1723,19 +1723,19 @@ ccb_timeout(ccb_t *ccb)
{
connection_t *conn = ccb->ccb_connection;
+ if (conn == NULL) {
+ /* XXX Should never happen */
+ printf("ccb_timeout: num=%d total=%d disp=%d invalid ccb=%p\n",
+ ccb->ccb_num_timeouts+1, ccb->ccb_total_tries,
+ ccb->ccb_disp, ccb);
+ return;
+ }
+
ccb->ccb_total_tries++;
DEBC(conn, 0, ("ccb_timeout: num=%d total=%d disp=%d\n",
ccb->ccb_num_timeouts+1, ccb->ccb_total_tries, ccb->ccb_disp));
- /*
- * XXX can we time out after connection is closed ?
- */
- if (conn == NULL) {
- wake_ccb(ccb, ISCSI_STATUS_TIMEOUT);
- return;
- }
-
if (++ccb->ccb_num_timeouts > MAX_CCB_TIMEOUTS ||
ccb->ccb_total_tries > MAX_CCB_TRIES ||
ccb->ccb_disp <= CCBDISP_FREE ||
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Mon, 22 Jul 2024 08:26:29 +0200 (CEST)
On Sat, 20 Jul 2024, Michael van Elst wrote:
>
> Can you try this patch ?
>
It seems like the patch was applied at least once. dmesg reports:
[ 123362.515673] ccb_timeout: num=1 total=0 disp=0 invalid ccb=0xffff8980676abe28
The kernel continued running without crashing.
Regards
Uwe
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Wed, 24 Jul 2024 08:16:49 +0200 (CEST)
On Mon, 22 Jul 2024, 6bone@6bone.informatik.uni-leipzig.de wrote:
The code from the patch was executed again:
[ 312501.042855] ccb_timeout: num=1 total=0 disp=0 invalid ccb=0xffff8980676ad5a8
The kernel did not crash.
Regards
Uwe
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Sat, 27 Jul 2024 18:52:12 +0200 (CEST)
The next crash. The kernel contains the patch from the case.
dmesg -M netbsd.49.core
....
[ 95493.709585] S3C1: freeing UNUSED pdu
[ 95499.766892] S3C1: freeing UNUSED pdu
[ 95500.317270] S3C1: freeing UNUSED pdu
[ 95506.245037] S3C1: freeing UNUSED pdu
[ 95511.421411] S3C1: freeing UNUSED pdu
[ 95512.393054] Received Reject PDU, reason = 4, data_len = 48
[ 95512.393054] S3C1: *** Connection Error, status=25, logout=1, state=3
[ 95512.393054] sd3(iscsi0:0:0:11): generic HBA error
[ 95512.393054] sd3d: error writing fsbn 34674059682 of 34674059682-34674059745 (sd3 bn 34674059682; cn 16930693 tn 13 sn 2)
[ 95519.402427] sd3d: error writing fsbn 35353610530 of 35353610530-35353610593 (sd3 bn 35353610530; cn 17262505 tn 9 sn 2)
[ 95522.475421] S3C1: *** Connection Error, status=24, logout=-1, state=5
[ 95524.408023] sd3d: error reading fsbn 33006352546 of 33006352546-33006352609 (sd3 bn 33006352546; cn 16116383 tn 5 sn 2)
[ 95524.498026] S3C1: Resend ccb 0xffffc280676b72b8 (37) - updating CmdSN old 21405452, new 21405454
[ 95524.498026] S3C1: Connection ReCreated successfully - status 0
[ 95525.691696] panic: ufsdirhash_findfree: free mismatch
[ 95525.691696] cpu2: Begin traceback...
[ 95525.691696] vpanic() at netbsd:vpanic+0x183
[ 95525.691696] panic() at netbsd:panic+0x3c
[ 95525.691696] ufsdirhash_findfree() at netbsd:ufsdirhash_findfree+0x240
[ 95525.691696] ufs_lookup() at netbsd:ufs_lookup+0x6ac
[ 95525.701692] VOP_LOOKUP() at netbsd:VOP_LOOKUP+0x44
[ 95525.701692] lookup_once() at netbsd:lookup_once+0x1a6
[ 95525.701692] namei_tryemulroot() at netbsd:namei_tryemulroot+0xb00
[ 95525.701692] namei() at netbsd:namei+0x29
[ 95525.701692] vn_open() at netbsd:vn_open+0x133
[ 95525.701692] do_open() at netbsd:do_open+0xc3
[ 95525.711693] do_sys_openat() at netbsd:do_sys_openat+0x74
[ 95525.711693] sys_open() at netbsd:sys_open+0x24
[ 95525.711693] syscall() at netbsd:syscall+0x196
[ 95525.711693] --- syscall (number 5) ---
[ 95525.711693] netbsd:syscall+0x196:
[ 95525.711693] cpu2: End traceback...
[ 95525.711693] dumping to dev 4,1 (offset=22227071, size=12581616):
[ 95525.711693] dump S3C1: *** Connection Error, status=24, logout=-1,
state=3
Reading symbols from /netbsd...
(gdb) target kvm netbsd.49.core
0xffffffff80237015 in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at
/usr/src/sys/arch/amd64/amd64/machdep.c:717
717 dumpsys();
(gdb) bt
#0 0xffffffff80237015 in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:717
#1 0xffffffff80d9da24 in kern_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:73
#2 0xffffffff80de7d8d in vpanic (fmt=fmt@entry=0xffffffff814b5c80
"ufsdirhash_findfree: free mismatch", ap=ap@entry=0xffffc28c97682908) at
/usr/src/sys/kern/subr_prf.c:291
#3 0xffffffff80de7e62 in panic (fmt=fmt@entry=0xffffffff814b5c80
"ufsdirhash_findfree: free mismatch") at /usr/src/sys/kern/subr_prf.c:208
#4 0xffffffff80d16d61 in ufsdirhash_findfree
(ip=ip@entry=0xffffe5b4c53e07f8, slotneeded=60,
slotsize=slotsize@entry=0xffffc28c97682a54)
at /usr/src/sys/ufs/ufs/ufs_dirhash.c:585
#5 0xffffffff80d1ae04 in slot_findfree (dp=0xffffe5b4c53e07f8,
slot=0xffffc28c97682a4c) at /usr/src/sys/ufs/ufs/ufs_lookup.c:134
#6 ufs_lookup (v=<optimized out>) at
/usr/src/sys/ufs/ufs/ufs_lookup.c:451
#7 0xffffffff80e67c1c in VOP_LOOKUP (dvp=dvp@entry=0xffffe5b4c66b2040,
vpp=vpp@entry=0xffffc28c97682b38, cnp=cnp@entry=0xffffc28c97682d70) at
/usr/src/sys/kern/vnode_if.c:443
#8 0xffffffff80e440f0 in lookup_once
(state=state@entry=0xffffc28c97682cc0, searchdir=<optimized out>,
newsearchdir_ret=newsearchdir_ret@entry=0xffffc28c97682c18,
foundobj_ret=foundobj_ret@entry=0xffffc28c97682c20,
newsearchdir_locked_ret=newsearchdir_locked_ret@entry=0xffffc28c97682c17)
at /usr/src/sys/kern/vfs_lookup.c:1150
#9 0xffffffff80e45277 in namei_oneroot (isnfsd=0, inhibitmagic=0,
neverfollow=0, state=<optimized out>) at
/usr/src/sys/kern/vfs_lookup.c:1545
#10 namei_tryemulroot (state=state@entry=0xffffc28c97682cc0,
neverfollow=neverfollow@entry=0, inhibitmagic=inhibitmagic@entry=0,
isnfsd=isnfsd@entry=0)
at /usr/src/sys/kern/vfs_lookup.c:1905
#11 0xffffffff80e4724c in namei (ndp=ndp@entry=0xffffc28c97682d20) at
/usr/src/sys/kern/vfs_lookup.c:1941
#12 0xffffffff80e5f40c in vn_open (at_dvp=at_dvp@entry=0x0,
pb=pb@entry=0xffffe5b6f4a7d340, nmode=nmode@entry=16,
fmode=fmode@entry=2563, cmode=384,
ret_vp=ret_vp@entry=0xffffc28c97682ec8,
ret_domove=ret_domove@entry=0xffffc28c97682eb7,
ret_fd=ret_fd@entry=0xffffc28c97682eb8) at
/usr/src/sys/kern/vfs_vnops.c:219
#13 0xffffffff80e520af in do_open (l=l@entry=0xffffe5b7ea77f5c0, dvp=0x0,
pb=0xffffe5b6f4a7d340, open_flags=open_flags@entry=2562,
open_mode=open_mode@entry=384,
fd=fd@entry=0xffffc28c97682f7c) at
/usr/src/sys/kern/vfs_syscalls.c:1752
#14 0xffffffff80e521cb in do_sys_openat (l=0xffffe5b7ea77f5c0,
fdat=fdat@entry=-100, path=<optimized out>, flags=2562, mode=384,
fd=fd@entry=0xffffc28c97682f7c)
at /usr/src/sys/kern/vfs_syscalls.c:1835
#15 0xffffffff80e52319 in sys_open (l=<optimized out>, uap=<optimized
out>, retval=0xffffc28c97682fb0) at /usr/src/sys/kern/vfs_syscalls.c:1856
#16 0xffffffff8058b8e5 in sy_call (rval=0xffffc28c97682fb0,
uap=0xffffc28c97683000, l=0xffffe5b7ea77f5c0, sy=0xffffffff8187d1d8
<sysent+120>)
at /usr/src/sys/sys/syscallvar.h:65
#17 sy_invoke (code=5, rval=0xffffc28c97682fb0, uap=0xffffc28c97683000,
l=0xffffe5b7ea77f5c0, sy=0xffffffff8187d1d8 <sysent+120>) at
/usr/src/sys/sys/syscallvar.h:94
#18 syscall (frame=0xffffc28c97683000) at
/usr/src/sys/arch/x86/x86/syscall.c:138
#19 0xffffffff8020e6bd in handle_syscall ()
#20 0x00007f7fffdb42e0 in ?? ()
#21 0x0000000000000a02 in ?? ()
#22 0x0000000000000180 in ?? ()
#23 0x000072709817957a in ?? ()
#24 0x0000000000000000 in ?? ()
Regards
Uwe
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Sat, 27 Jul 2024 20:26:07 -0000 (UTC)
6bone@6bone.informatik.uni-leipzig.de writes:
>[ 95512.393054] Received Reject PDU, reason = 4, data_len = 48
>[ 95512.393054] S3C1: *** Connection Error, status=25, logout=1, state=3
Protocol error (status=25)
Maybe your iscsi target can tell more.
>[ 95512.393054] sd3(iscsi0:0:0:11): generic HBA error
>[ 95512.393054] sd3d: error writing fsbn 34674059682 of 34674059682-34674059745 (sd3 bn 34674059682; cn 16930693 tn 13 sn 2)
>[ 95519.402427] sd3d: error writing fsbn 35353610530 of 35353610530-35353610593 (sd3 bn 35353610530; cn 17262505 tn 9 sn 2)
As the result some pending I/Os get aborted.
>[ 95522.475421] S3C1: *** Connection Error, status=24, logout=-1, state=5
Timeout (status=24)
>[ 95524.408023] sd3d: error reading fsbn 33006352546 of 33006352546-33006352609 (sd3 bn 33006352546; cn 16116383 tn 5 sn 2)
Another I/O error as consequence.
>[ 95524.498026] S3C1: Resend ccb 0xffffc280676b72b8 (37) - updating CmdSN old 21405452, new 21405454
>[ 95524.498026] S3C1: Connection ReCreated successfully - status 0
iscsi recovers, but the result is like with any failing disk.
>[ 95525.691696] panic: ufsdirhash_findfree: free mismatch
The filesystem crashes for some inconsistent data on the drive.
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Wed, 31 Jul 2024 13:19:12 +0200 (CEST)
On Sat, 27 Jul 2024, Michael van Elst wrote:
>
Since applying the patch I haven't had any further iscsi based kernel
crashes.
Can the patch be included in the official kernel?
Thank you for your efforts
Regards
Uwe
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/58422: kernel crash when using the iscsi initiator
Date: Wed, 31 Jul 2024 12:01:17 -0000 (UTC)
6bone@6bone.informatik.uni-leipzig.de writes:
>Since applying the patch I haven't had any further iscsi based kernel
>crashes.
>Can the patch be included in the official kernel?
The patch mostly prevents the crash, but every time you see the
debug message, that apparently is a race with a possibly leak
of a command control block.
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.