NetBSD Problem Report #58043
From paul@whooppee.com Sat Mar 16 15:01:50 2024
Return-Path: <paul@whooppee.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id B87FB1A924F
for <gnats-bugs@gnats.NetBSD.org>; Sat, 16 Mar 2024 15:01:50 +0000 (UTC)
Message-Id: <20240316150148.8DD545E33C5@speedy.whooppee.com>
Date: Sat, 16 Mar 2024 08:01:48 -0700 (PDT)
From: paul@whooppee.com
Reply-To: paul@whooppee.com
To: gnats-bugs@NetBSD.org
Subject: kernel crash in -current
X-Send-Pr-Version: 3.95
>Number: 58043
>Category: kern
>Synopsis: kernel crash in assert_sleepable() in -current, dk(4) driver?
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: hannken
>State: needs-pullups
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Mar 16 15:05:00 +0000 2024
>Closed-Date:
>Last-Modified: Sun Aug 18 18:00:02 +0000 2024
>Originator: Paul Goyette
>Release: NetBSD 10.99.10
>Organization:
+---------------------+--------------------------+----------------------+
| Paul Goyette (.sig) | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | 1B11 1849 721C 56C8 F63A | paul@whooppee.com |
| Software Developer | 6E2E 05FD 15CE 9F2D 5102 | pgoyette@netbsd.org |
| & Network Engineer | | pgoyette99@gmail.com |
+---------------------+--------------------------+----------------------+
>Environment:
System: NetBSD speedy.whooppee.com 10.99.10 NetBSD 10.99.10 (SPEEDY 2024-03-13 18:25:47 UTC) #0: Wed Mar 13 20:05:25 UTC 2024 paul@speedy.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/SPEEDY amd64
Architecture: x86_64
Machine: amd64
>Description:
At unpredictable times, but always under heavy disk load (ie,
build.sh runnning with -j30) I am seeing random crashes. I
have a crash dump from one of these crashes, and stack trace
seems to implicate the disk driver:
Crash version 10.99.10, image version 10.99.10.
crash: _kvm_kvatop(0)
Kernel compiled without options LOCKDEBUG.
System panicked: dump forced via kernel debugger
Backtrace from time of crash is available.
crash> bt
end() at 0
kern_reboot() at kern_reboot+0x87
db_sync_cmd() at db_sifting_cmd
db_command() at db_command+0x123
db_command_loop() at db_command_loop+0x1c7
db_trap() at db_trap+0xcc
kdb_trap() at kdb_trap+0x106
trap() at trap+0x2de
--- trap (number 1) ---
breakpoint() at breakpoint+0x5
vpanic() at vpanic+0x173
panic() at printf_nostamp
assert_sleepable() at assert_sleepable+0x99
pool_cache_get_paddr() at pool_cache_get_paddr+0x13c
end() at ffffffff813ad275
bdev_strategy() at bdev_strategy+0x81
spec_strategy() at spec_strategy+0x6e
VOP_STRATEGY() at VOP_STRATEGY+0x3c
dkstart() at dkstart+0x13e
dkiodone() at dkiodone+0xa6
lddone() at lddone+0x10
nvme_q_complete() at nvme_q_complete+0xff
softint_dispatch() at softint_dispatch+0x112
DDB lost frame for Xsoftintr+0x4c, trying 0xffffd220dfd9d0f0
Xsoftintr() at Xsoftintr+0x4c
--- interrupt ---
0:
I've had several other similar crashes, although I haven't
saved dump details. All stack traces seem to have pointed
in the same area, and all fail at the assert_sleepable().
Config and/or dmesg are available. One item of note is that
this machine contains multiple SSDs, and in one case I have
a ccd(4) of two 2-TB CCD partitions (each of which occupies
a complete SSD device).
>How-To-Repeat:
No specific recipe to reproduce, it is seeming random when
under high disk activity.
>Fix:
Please. In fact, pretty-please.
>Release-Note:
>Audit-Trail:
From: "J. Hannken-Illjes" <hannken@mailbox.org>
To: NetBSD GNATS <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/58043: kernel crash in -current
Date: Sat, 16 Mar 2024 17:25:27 +0100
This looks like a GPT labeled ccd: "dkX at ccdX=E2=80=9D
Here we get softint_dispatch -> dkstart -> ccdstart -> ccdbuffer -> =
pool_cache_get via
sys/dev/ccd.c:844 and sys/dev/ccd.c:932
Trying to allocate memory here panics as allocation from softint is =
forbidden.
=E2=80=94
J. Hannken-Illjes (hannken@mailbox.org <mailto:hannken@mailbox.org>)=
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Paul Goyette <paul@whooppee.com>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58043: kernel crash in assert_sleepable() in -current, dk(4) driver?
Date: Sat, 16 Mar 2024 16:27:25 +0000
This is a multi-part message in MIME format.
--=_jGdl80rZzbJ3AV6nW+2j7DYaEAx46dnv
Annoyingly, the part of the stack trace we really want here -- the
part which would tell us where something called pool_cache_get(_paddr)
-- has been obscured:
> assert_sleepable() at assert_sleepable+0x99
> pool_cache_get_paddr() at pool_cache_get_paddr+0x13c
> end() at ffffffff813ad275
> bdev_strategy() at bdev_strategy+0x81
My best guess from the rest of the stack trace:
> spec_strategy() at spec_strategy+0x6e
> VOP_STRATEGY() at VOP_STRATEGY+0x3c
> dkstart() at dkstart+0x13e
> dkiodone() at dkiodone+0xa6
> lddone() at lddone+0x10
> nvme_q_complete() at nvme_q_complete+0xff
is that the missing part looks something like this:
nvme_ns_dobio
ld_nvme_start
ld_diskstart
dk_start (note: not dkstart)
dk_strategy
ldstrategy
There's a call to bus_dmamap_load here which looks like, in this stack
trace, it will pass BUS_DMA_WAITOK because ld_nvme_start doesn't pass
NVME_NS_CTX_F_POLL. I wonder whether this should unconditionally pass
BUS_DMA_NOWAIT instead? After all, the dmamap is created with
BUS_DMA_ALLOCNOW so maybe there should be no need for allocation here.
(And I wonder whether maybe bus_dmamap_load should assert_sleepable if
you pass BUS_DMA_WAITOK, to shake out more of these paths early.)
Can you try the attached patch?
--=_jGdl80rZzbJ3AV6nW+2j7DYaEAx46dnv
Content-Type: text/plain; charset="ISO-8859-1"; name="nvme"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="nvme.patch"
diff --git a/sys/dev/ic/nvme.c b/sys/dev/ic/nvme.c
index d41c88296dbd..29f877c01031 100644
--- a/sys/dev/ic/nvme.c
+++ b/sys/dev/ic/nvme.c
@@ -786,8 +786,7 @@ nvme_ns_dobio(struct nvme_softc *sc, uint16_t nsid, voi=
d *cookie,
dmap =3D ccb->ccb_dmamap;
error =3D bus_dmamap_load(sc->sc_dmat, dmap, data,
datasize, NULL,
- (ISSET(flags, NVME_NS_CTX_F_POLL) ?
- BUS_DMA_NOWAIT : BUS_DMA_WAITOK) |
+ BUS_DMA_NOWAIT |
(ISSET(flags, NVME_NS_CTX_F_READ) ?
BUS_DMA_READ : BUS_DMA_WRITE));
if (error) {
--=_jGdl80rZzbJ3AV6nW+2j7DYaEAx46dnv--
From: Paul Goyette <paul@whooppee.com>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58043: kernel crash in assert_sleepable() in -current,
dk(4) driver?
Date: Sat, 16 Mar 2024 10:18:36 -0700 (PDT)
On Sat, 16 Mar 2024, Taylor R Campbell wrote:
> Annoyingly, the part of the stack trace we really want here -- the
> part which would tell us where something called pool_cache_get(_paddr)
> -- has been obscured:
>
>> assert_sleepable() at assert_sleepable+0x99
>> pool_cache_get_paddr() at pool_cache_get_paddr+0x13c
>> end() at ffffffff813ad275
>> bdev_strategy() at bdev_strategy+0x81
>
> My best guess from the rest of the stack trace:
>
>> spec_strategy() at spec_strategy+0x6e
>> VOP_STRATEGY() at VOP_STRATEGY+0x3c
>> dkstart() at dkstart+0x13e
>> dkiodone() at dkiodone+0xa6
>> lddone() at lddone+0x10
>> nvme_q_complete() at nvme_q_complete+0xff
>
> is that the missing part looks something like this:
>
> nvme_ns_dobio
> ld_nvme_start
> ld_diskstart
> dk_start (note: not dkstart)
> dk_strategy
> ldstrategy
>
> There's a call to bus_dmamap_load here which looks like, in this stack
> trace, it will pass BUS_DMA_WAITOK because ld_nvme_start doesn't pass
> NVME_NS_CTX_F_POLL. I wonder whether this should unconditionally pass
> BUS_DMA_NOWAIT instead? After all, the dmamap is created with
> BUS_DMA_ALLOCNOW so maybe there should be no need for allocation here.
>
> (And I wonder whether maybe bus_dmamap_load should assert_sleepable if
> you pass BUS_DMA_WAITOK, to shake out more of these paths early.)
I thought the stack trace looked like it wasn't complete! Here is the
backtrace using gdb - perhps more useful?
#0 0xffffffff80239b95 in cpu_reboot (howto=howto@entry=256,
bootstr=bootstr@entry=0x0)
at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/machdep.c:708
#1 0xffffffff806a84f5 in kern_reboot (howto=howto@entry=256,
bootstr=bootstr@entry=0x0)
at /build/netbsd-local/src_ro/sys/kern/kern_reboot.c:91
#2 0xffffffff80588d23 in db_sync_cmd (addr=<optimized out>,
have_addr=<optimized out>, count=<optimized out>, modif=<optimized out>)
at /build/netbsd-local/src_ro/sys/ddb/db_command.c:1651
#3 0xffffffff805894ca in db_command (
last_cmdp=last_cmdp@entry=0xffffd220dfd9c958)
at /build/netbsd-local/src_ro/sys/ddb/db_command.c:970
#4 0xffffffff80589abf in db_execute_commandlist (
cmdlist=0xffffffff80e353e0 <db_cmd_on_enter> "bt; show reg; sync")
at /build/netbsd-local/src_ro/sys/ddb/db_command.c:466
#5 db_command_loop () at /build/netbsd-local/src_ro/sys/ddb/db_command.c:618
#6 0xffffffff8058dc98 in db_trap (type=type@entry=1, code=code@entry=0)
at /build/netbsd-local/src_ro/sys/ddb/db_trap.c:91
#7 0xffffffff80236a54 in kdb_trap (type=type@entry=1, code=code@entry=0,
regs=regs@entry=0xffffd220dfd9cc10)
at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/db_interface.c:251
#8 0xffffffff8023c066 in trap (frame=0xffffd220dfd9cc10)
at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/trap.c:314
#9 0xffffffff80234a24 in alltraps ()
#10 0xffffffff80235365 in breakpoint ()
#11 0xffffffff806ef1be in vpanic (
fmt=fmt@entry=0xffffffff80b34a1b "%s: %s caller=%p",
ap=ap@entry=0xffffd220dfd9cd48)
at /build/netbsd-local/src_ro/sys/kern/subr_prf.c:286
#12 0xffffffff806ef29d in panic (
fmt=fmt@entry=0xffffffff80b34a1b "%s: %s caller=%p")
at /build/netbsd-local/src_ro/sys/kern/subr_prf.c:209
#13 0xffffffff8069349d in assert_sleepable ()
at /build/netbsd-local/src_ro/sys/kern/kern_lock.c:109
#14 0xffffffff806ec0e7 in pool_cache_get_paddr (pc=0xfffff7cf1a829540,
--Type <RET> for more, q to quit, c to continue without paging--
flags=1, pap=0x0) at
/build/netbsd-local/src_ro/sys/kern/subr_pool.c:2721
#15 0xffffffff813ad275 in ?? ()
#16 0x000000000000003a in ?? ()
#17 0x000000009662dc80 in ?? ()
#18 0xfffff7cf1a6644e8 in ?? ()
#19 0x000000009662dcba in ?? ()
#20 0xffffd220bf420000 in ?? ()
#21 0x0000000000001000 in ?? ()
#22 0xffffd220dfd9ce70 in ?? ()
#23 0x0000000000000100 in ?? ()
#24 0xfffff7cf1b4a85c0 in ?? ()
#25 0xffffffff813a84e0 in ?? ()
#26 0xfffff7cf1a35b478 in ?? ()
#27 0xfffff7cf1a35b360 in ?? ()
#28 0xffffd220dfd9ced0 in ?? ()
#29 0xffffffff806d8331 in bdev_strategy (bp=0xfffff7cf1b04be80)
at /build/netbsd-local/src_ro/sys/kern/subr_devsw.c:1267
Backtrace stopped: frame did not save the PC
> Can you try the attached patch?
I will test this out later today and report back.
+---------------------+--------------------------+----------------------+
| Paul Goyette (.sig) | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | 1B11 1849 721C 56C8 F63A | paul@whooppee.com |
| Software Developer | 6E2E 05FD 15CE 9F2D 5102 | pgoyette@netbsd.org |
| & Network Engineer | | pgoyette99@gmail.com |
+---------------------+--------------------------+----------------------+
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Paul Goyette <paul@whooppee.com>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58043: kernel crash in assert_sleepable() in -current,
dk(4) driver?
Date: Sat, 16 Mar 2024 17:52:17 +0000
> Date: Sat, 16 Mar 2024 10:18:36 -0700 (PDT)
> From: Paul Goyette <paul@whooppee.com>
>
> I thought the stack trace looked like it wasn't complete! Here is the
> backtrace using gdb - perhps more useful?
>
> #14 0xffffffff806ec0e7 in pool_cache_get_paddr (pc=0xfffff7cf1a829540,
> flags=1, pap=0x0) at
> /build/netbsd-local/src_ro/sys/kern/subr_pool.c:2721
> #15 0xffffffff813ad275 in ?? ()
> ...
> #28 0xffffd220dfd9ced0 in ?? ()
> #29 0xffffffff806d8331 in bdev_strategy (bp=0xfffff7cf1b04be80)
> at /build/netbsd-local/src_ro/sys/kern/subr_devsw.c:1267
Nope, this one isn't much help either...
Are you loading drivers from modules? Maybe it would help if you used
src/sys/gdbscripts/modload to load debug data from the modules?
(gdb) source /path/to/src/sys/gdbscripts/modload
(gdb) modload
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/58043: kernel crash in assert_sleepable() in -current,
dk(4) driver?
Date: Sat, 16 Mar 2024 10:59:36 -0700 (PDT)
On Sat, 16 Mar 2024, Taylor R Campbell wrote:
> >
> > #14 0xffffffff806ec0e7 in pool_cache_get_paddr (pc=0xfffff7cf1a829540,
> > flags=1, pap=0x0) at
> > /build/netbsd-local/src_ro/sys/kern/subr_pool.c:2721
> > #15 0xffffffff813ad275 in ?? ()
> > ...
> > #28 0xffffd220dfd9ced0 in ?? ()
> > #29 0xffffffff806d8331 in bdev_strategy (bp=0xfffff7cf1b04be80)
> > at /build/netbsd-local/src_ro/sys/kern/subr_devsw.c:1267
>
> Nope, this one isn't much help either...
>
> Are you loading drivers from modules? Maybe it would help if you used
> src/sys/gdbscripts/modload to load debug data from the modules?
>
> (gdb) source /path/to/src/sys/gdbscripts/modload
> (gdb) modload
Nope, this is from a stock GENERIC kernel, all modules built-in. At
least, I think it is!
There are a couple of local patches, but nothing in this vicinity.
+---------------------+--------------------------+----------------------+
| Paul Goyette (.sig) | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | 1B11 1849 721C 56C8 F63A | paul@whooppee.com |
| Software Developer | 6E2E 05FD 15CE 9F2D 5102 | pgoyette@netbsd.org |
| & Network Engineer | | pgoyette99@gmail.com |
+---------------------+--------------------------+----------------------+
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/58043: kernel crash in assert_sleepable() in -current,
dk(4) driver?
Date: Sat, 16 Mar 2024 11:13:55 -0700 (PDT)
Hmm, looks like this wasn't GENERIC after all!
I ran modload, anda then did another back-trace. Looks a bit better...
#0 0xffffffff80239b95 in cpu_reboot (howto=howto@entry=256, bootstr=bootstr@entry=0x0) at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/machdep.c:708
#1 0xffffffff806a84f5 in kern_reboot (howto=howto@entry=256, bootstr=bootstr@entry=0x0) at /build/netbsd-local/src_ro/sys/kern/kern_reboot.c:91
#2 0xffffffff80588d23 in db_sync_cmd (addr=<optimized out>, have_addr=<optimized out>, count=<optimized out>, modif=<optimized out>) at /build/netbsd-local/src_ro/sys/ddb/db_command.c:1651
#3 0xffffffff805894ca in db_command (last_cmdp=last_cmdp@entry=0xffffd220dfd9c958) at /build/netbsd-local/src_ro/sys/ddb/db_command.c:970
#4 0xffffffff80589abf in db_execute_commandlist (cmdlist=0xffffffff80e353e0 <db_cmd_on_enter> "bt; show reg; sync") at /build/netbsd-local/src_ro/sys/ddb/db_command.c:466
#5 db_command_loop () at /build/netbsd-local/src_ro/sys/ddb/db_command.c:618
#6 0xffffffff8058dc98 in db_trap (type=type@entry=1, code=code@entry=0) at /build/netbsd-local/src_ro/sys/ddb/db_trap.c:91
#7 0xffffffff80236a54 in kdb_trap (type=type@entry=1, code=code@entry=0, regs=regs@entry=0xffffd220dfd9cc10) at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/db_interface.c:251
#8 0xffffffff8023c066 in trap (frame=0xffffd220dfd9cc10) at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/trap.c:314
#9 0xffffffff80234a24 in alltraps ()
#10 0xffffffff80235365 in breakpoint ()
#11 0xffffffff806ef1be in vpanic (fmt=fmt@entry=0xffffffff80b34a1b "%s: %s caller=%p", ap=ap@entry=0xffffd220dfd9cd48) at /build/netbsd-local/src_ro/sys/kern/subr_prf.c:286
#12 0xffffffff806ef29d in panic (fmt=fmt@entry=0xffffffff80b34a1b "%s: %s caller=%p") at /build/netbsd-local/src_ro/sys/kern/subr_prf.c:209
#13 0xffffffff8069349d in assert_sleepable () at /build/netbsd-local/src_ro/sys/kern/kern_lock.c:109
#14 0xffffffff806ec0e7 in pool_cache_get_paddr (pc=0xfffff7cf1a829540, flags=flags@entry=1, pap=pap@entry=0x0) at /build/netbsd-local/src_ro/sys/kern/subr_pool.c:2721
#15 0xffffffff813ad275 in ccdbuffer (bcount=4096, addr=0xffffd220bf420000, bn=5046122874, bp=0xfffff7cf1b4a85c0, cs=0xfffff7cf1b04be40) at /build/netbsd-local/src_ro/sys/dev/ccd.c:932
#16 ccdstart (cs=0xfffff7cf1b04be40) at /build/netbsd-local/src_ro/sys/dev/ccd.c:844
#17 0xffffffff806d8331 in bdev_strategy (bp=0xfffff7cf1b4a85c0) at /build/netbsd-local/src_ro/sys/kern/subr_devsw.c:1267
#18 0xffffffff8076f142 in spec_strategy (v=<optimized out>) at /build/netbsd-local/src_ro/sys/miscfs/specfs/spec_vnops.c:1508
#19 0xffffffff80762459 in VOP_STRATEGY (vp=vp@entry=0xfffff7cf1c61cb00, bp=bp@entry=0xfffff7cf1b4a85c0) at /build/netbsd-local/src_ro/sys/kern/vnode_if.c:1733
#20 0xffffffff8077226d in dkstart (sc=0xfffff7cf1a35b248) at /build/netbsd-local/src_ro/sys/dev/dkwedge/dk.c:1626
#21 0xffffffff80772f69 in dkiodone (bp=<optimized out>) at /build/netbsd-local/src_ro/sys/dev/dkwedge/dk.c:1658
#22 0xffffffff802e186a in lddone (sc=0xfffff7cf188dcb40, bp=<optimized out>) at /build/netbsd-local/src_ro/sys/dev/ld.c:527
#23 0xffffffff802f0930 in nvme_q_complete (sc=0xffffd200fac10000, q=0xfffff7cf1717a600) at /build/netbsd-local/src_ro/sys/dev/ic/nvme.c:1541
#24 0xffffffff806b6bb1 in softint_execute (s=3, l=0xfffff7de5cf6b800) at /build/netbsd-local/src_ro/sys/kern/kern_softint.c:599
#25 softint_dispatch (pinned=<optimized out>, s=3) at /build/netbsd-local/src_ro/sys/kern/kern_softint.c:848
#26 0xffffffff8023475c in Xsoftintr ()
quit
You can't do that without a process to debug.
+---------------------+--------------------------+----------------------+
| Paul Goyette (.sig) | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | 1B11 1849 721C 56C8 F63A | paul@whooppee.com |
| Software Developer | 6E2E 05FD 15CE 9F2D 5102 | pgoyette@netbsd.org |
| & Network Engineer | | pgoyette99@gmail.com |
+---------------------+--------------------------+----------------------+
From: "J. Hannken-Illjes" <hannken@mailbox.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/58043: kernel crash in assert_sleepable() in -current, dk(4)
driver?
Date: Mon, 18 Mar 2024 10:52:09 +0100
--Apple-Mail=_C00F97DB-BCE8-4515-9DA6-D93726D027F3
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
Paul,
please try the attached patch -- if it prevents assert_sleepable() to fire
it is the call to CCD_GETBUF() aka. pool_cache_get(ccd_cache, PR_WAITOK)
from softint context that has to be fixed.
--
J. Hannken-Illjes - hannken@mailbox.org <mailto:hannken@mailbox.org>
--Apple-Mail=_C00F97DB-BCE8-4515-9DA6-D93726D027F3
Content-Disposition: attachment;
filename=001_ccd_defer.diff
Content-Type: application/octet-stream;
x-unix-mode=0644;
name="001_ccd_defer.diff"
Content-Transfer-Encoding: 7bit
ccd_defer
Always defer requests to the helper thread so ccdstart() doesn't
get called from softint anymore.
diff -r 8b8d2498ffd9 -r 2ec4e85f1120 sys/dev/ccd.c
--- sys/dev/ccd.c
+++ sys/dev/ccd.c
@@ -777,17 +777,10 @@ ccdstrategy(struct buf *bp)
return;
}
- /* Defer to thread if system is low on memory. */
+ /* Always defer to thread. */
bufq_put(cs->sc_bufq, bp);
- if (__predict_false(ccdbackoff(cs))) {
- mutex_exit(cs->sc_iolock);
-#ifdef DEBUG
- if (ccddebug & CCDB_FOLLOW)
- printf("ccdstrategy: holding off on I/O\n");
-#endif
- return;
- }
- ccdstart(cs);
+ cv_broadcast(&cs->sc_push);
+ mutex_exit(cs->sc_iolock);
}
static void
--Apple-Mail=_C00F97DB-BCE8-4515-9DA6-D93726D027F3--
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/58043: kernel crash in assert_sleepable() in -current,
dk(4) driver?
Date: Mon, 18 Mar 2024 05:03:17 -0700 (PDT)
I've already converted my config to raid0 instead of ccd, so I am sorry
that I am unable to test the patch. :-(
On Mon, 18 Mar 2024, J. Hannken-Illjes wrote:
> The following reply was made to PR kern/58043; it has been noted by GNATS.
>
> From: "J. Hannken-Illjes" <hannken@mailbox.org>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: kern/58043: kernel crash in assert_sleepable() in -current, dk(4)
> driver?
> Date: Mon, 18 Mar 2024 10:52:09 +0100
>
> --Apple-Mail=_C00F97DB-BCE8-4515-9DA6-D93726D027F3
> Content-Transfer-Encoding: 7bit
> Content-Type: text/plain;
> charset=us-ascii
>
> Paul,
>
> please try the attached patch -- if it prevents assert_sleepable() to fire
> it is the call to CCD_GETBUF() aka. pool_cache_get(ccd_cache, PR_WAITOK)
> from softint context that has to be fixed.
>
> --
> J. Hannken-Illjes - hannken@mailbox.org <mailto:hannken@mailbox.org>
>
> --Apple-Mail=_C00F97DB-BCE8-4515-9DA6-D93726D027F3
> Content-Disposition: attachment;
> filename=001_ccd_defer.diff
> Content-Type: application/octet-stream;
> x-unix-mode=0644;
> name="001_ccd_defer.diff"
> Content-Transfer-Encoding: 7bit
>
> ccd_defer
>
> Always defer requests to the helper thread so ccdstart() doesn't
> get called from softint anymore.
>
> diff -r 8b8d2498ffd9 -r 2ec4e85f1120 sys/dev/ccd.c
> --- sys/dev/ccd.c
> +++ sys/dev/ccd.c
> @@ -777,17 +777,10 @@ ccdstrategy(struct buf *bp)
> return;
> }
>
> - /* Defer to thread if system is low on memory. */
> + /* Always defer to thread. */
> bufq_put(cs->sc_bufq, bp);
> - if (__predict_false(ccdbackoff(cs))) {
> - mutex_exit(cs->sc_iolock);
> -#ifdef DEBUG
> - if (ccddebug & CCDB_FOLLOW)
> - printf("ccdstrategy: holding off on I/O\n");
> -#endif
> - return;
> - }
> - ccdstart(cs);
> + cv_broadcast(&cs->sc_push);
> + mutex_exit(cs->sc_iolock);
> }
>
> static void
>
> --Apple-Mail=_C00F97DB-BCE8-4515-9DA6-D93726D027F3--
>
>
> !DSPAM:65f80f7713337849015859!
>
>
+---------------------+--------------------------+----------------------+
| Paul Goyette (.sig) | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | 1B11 1849 721C 56C8 F63A | paul@whooppee.com |
| Software Developer | 6E2E 05FD 15CE 9F2D 5102 | pgoyette@netbsd.org |
| & Network Engineer | | pgoyette99@gmail.com |
+---------------------+--------------------------+----------------------+
From: triaxx@NetBSD.org
To: Paul Goyette <paul@whooppee.com>, gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/58043: kernel crash in assert_sleepable() in -current, dk(4)
driver?
Date: Wed, 27 Mar 2024 11:55:11 +0100
I have a system with:
$ dmesg -t | grep ccd
dk7 at wd1: "ccd0p0", 488397088 blocks at 40, type: ccd
dk8 at wd2: "ccd0p1", 488397088 blocks at 40, type: ccd
ccd0: Interleaving 2 components (63 block interleave)
ccd0: /dev/dk7 (488397042 blocks)
ccd0: /dev/dk8 (488397042 blocks)
ccd0: total 976794084 blocks
ccd0: GPT GUID: 546b7b1d-bf71-46a2-9788-226ebaf7ae2d
dk12 at ccd0: "ccd0", 976794008 blocks at 40, type: ffs
The patch fixes the issue with the kernel that crashes on this system.
> I've already converted my config to raid0 instead of ccd, so I am sorry
> that I am unable to test the patch. :-(
>
> On Mon, 18 Mar 2024, J. Hannken-Illjes wrote:
>
>> The following reply was made to PR kern/58043; it has been noted by
>> GNATS.
>>
>> From: "J. Hannken-Illjes" <hannken@mailbox.org>
>> To: gnats-bugs@netbsd.org
>> Cc:
>> Subject: Re: kern/58043: kernel crash in assert_sleepable() in
>> -current, dk(4)
>> driver?
>> Date: Mon, 18 Mar 2024 10:52:09 +0100
>>
>> Paul,
>>
>> please try the attached patch -- if it prevents assert_sleepable() to
>> fire
>> it is the call to CCD_GETBUF() aka. pool_cache_get(ccd_cache, PR_WAITOK)
>> from softint context that has to be fixed.
>>
>> --
>> J. Hannken-Illjes - hannken@mailbox.org <mailto:hannken@mailbox.org>
>>
>> ccd_defer
>>
>> Always defer requests to the helper thread so ccdstart() doesn't
>> get called from softint anymore.
>>
>> diff -r 8b8d2498ffd9 -r 2ec4e85f1120 sys/dev/ccd.c
>> --- sys/dev/ccd.c
>> +++ sys/dev/ccd.c
>> @@ -777,17 +777,10 @@ ccdstrategy(struct buf *bp)
>> return;
>> }
>>
>> - /* Defer to thread if system is low on memory. */
>> + /* Always defer to thread. */
>> bufq_put(cs->sc_bufq, bp);
>> - if (__predict_false(ccdbackoff(cs))) {
>> - mutex_exit(cs->sc_iolock);
>> -#ifdef DEBUG
>> - if (ccddebug & CCDB_FOLLOW)
>> - printf("ccdstrategy: holding off on I/O\n");
>> -#endif
>> - return;
>> - }
>> - ccdstart(cs);
>> + cv_broadcast(&cs->sc_push);
>> + mutex_exit(cs->sc_iolock);
>> }
>>
>> static void
From: triaxx@NetBSD.org
To: Paul Goyette <paul@whooppee.com>, gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/58043: kernel crash in assert_sleepable() in -current, dk(4)
driver?
Date: Wed, 27 Mar 2024 14:01:19 +0100
I have two kernels on this system: netbsd-GENERIC and
netbsd-GOLIATH-noamdgpu.
Surprisingly, GOLIATH-noamdgpu crashes when GENERIC doesn't. The diff
between both configurations is:
1c1
< ### START CONFIG FILE "/usr/src/sys/arch/amd64/conf/GENERIC"
---
> ### START CONFIG FILE
"/home/triaxx/NetBSD/src/sys/arch/amd64/conf/GOLIATH-noamdgpu"
118,119c118,119
< #options DEBUG # expensive debugging checks/support
< #options LOCKDEBUG # expensive locking checks/support
---
> options DEBUG # expensive debugging checks/support
> options LOCKDEBUG # expensive locking checks/support
130,131c130,131
< #options KGDB # remote debugger
< #options KGDB_DEVNAME="\"com\"",KGDB_DEVADDR=0x3f8,KGDB_DEVRATE=9600
---
> options KGDB # remote debugger
> options KGDB_DEVNAME="\"com\"",KGDB_DEVADDR=0x3f8,KGDB_DEVRATE=9600
255c255
< #options ACPIVERBOSE # verbose ACPI configuration messages
---
> options ACPIVERBOSE # verbose ACPI configuration messages
461,462c461,462
< i915drmkms* at pci? dev ? function ?
< intelfb* at intelfbbus?
---
> #i915drmkms* at pci? dev ? function ?
> #intelfb* at intelfbbus?
464,465c464,465
< radeon* at pci? dev ? function ?
< radeondrmkmsfb* at radeonfbbus?
---
> #radeon* at pci? dev ? function ?
> #radeondrmkmsfb* at radeonfbbus?
470,471c470,471
< nouveau* at pci? dev ? function ?
< nouveaufb* at nouveaufbbus?
---
> #nouveau* at pci? dev ? function ?
> #nouveaufb* at nouveaufbbus?
1245c1245
< ### END CONFIG FILE "/usr/src/sys/arch/amd64/conf/GENERIC"
---
> ### END CONFIG FILE
"/home/triaxx/NetBSD/src/sys/arch/amd64/conf/GOLIATH-noamdgpu"
> I have a system with:
>
> $ dmesg -t | grep ccd
> dk7 at wd1: "ccd0p0", 488397088 blocks at 40, type: ccd
> dk8 at wd2: "ccd0p1", 488397088 blocks at 40, type: ccd
> ccd0: Interleaving 2 components (63 block interleave)
> ccd0: /dev/dk7 (488397042 blocks)
> ccd0: /dev/dk8 (488397042 blocks)
> ccd0: total 976794084 blocks
> ccd0: GPT GUID: 546b7b1d-bf71-46a2-9788-226ebaf7ae2d
> dk12 at ccd0: "ccd0", 976794008 blocks at 40, type: ffs
>
> The patch fixes the issue with the kernel that crashes on this system.
>
>> I've already converted my config to raid0 instead of ccd, so I am sorry
>> that I am unable to test the patch. :-(
>>
>> On Mon, 18 Mar 2024, J. Hannken-Illjes wrote:
>>
>>> The following reply was made to PR kern/58043; it has been noted by
>>> GNATS.
>>>
>>> From: "J. Hannken-Illjes" <hannken@mailbox.org>
>>> To: gnats-bugs@netbsd.org
>>> Cc:
>>> Subject: Re: kern/58043: kernel crash in assert_sleepable() in
>>> -current, dk(4)
>>> driver?
>>> Date: Mon, 18 Mar 2024 10:52:09 +0100
>>>
>>> Paul,
>>>
>>> please try the attached patch -- if it prevents assert_sleepable() to
>>> fire
>>> it is the call to CCD_GETBUF() aka. pool_cache_get(ccd_cache, PR_WAITOK)
>>> from softint context that has to be fixed.
>>>
>>> --
>>> J. Hannken-Illjes - hannken@mailbox.org <mailto:hannken@mailbox.org>
>>>
>>> ccd_defer
>>>
>>> Always defer requests to the helper thread so ccdstart() doesn't
>>> get called from softint anymore.
>>>
>>> diff -r 8b8d2498ffd9 -r 2ec4e85f1120 sys/dev/ccd.c
>>> --- sys/dev/ccd.c
>>> +++ sys/dev/ccd.c
>>> @@ -777,17 +777,10 @@ ccdstrategy(struct buf *bp)
>>> return;
>>> }
>>>
>>> - /* Defer to thread if system is low on memory. */
>>> + /* Always defer to thread. */
>>> bufq_put(cs->sc_bufq, bp);
>>> - if (__predict_false(ccdbackoff(cs))) {
>>> - mutex_exit(cs->sc_iolock);
>>> -#ifdef DEBUG
>>> - if (ccddebug & CCDB_FOLLOW)
>>> - printf("ccdstrategy: holding off on I/O\n");
>>> -#endif
>>> - return;
>>> - }
>>> - ccdstart(cs);
>>> + cv_broadcast(&cs->sc_push);
>>> + mutex_exit(cs->sc_iolock);
>>> }
>>>
>>> static void
>
From: "J. Hannken-Illjes" <hannken@mailbox.org>
To: NetBSD GNATS <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/58043: kernel crash in assert_sleepable() in -current, dk(4)
driver?
Date: Wed, 27 Mar 2024 14:21:38 +0100
The crash is from ASSERT_SLEEPABLE() which is enabled for option DEBUG only.
--
J. Hannken-Illjes - hannken@mailbox.org
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/58043 CVS commit: src/sys/dev
Date: Sun, 31 Mar 2024 14:56:41 +0000
Module Name: src
Committed By: hannken
Date: Sun Mar 31 14:56:41 UTC 2024
Modified Files:
src/sys/dev: ccd.c
Log Message:
Using a ccd(4) with GPT (dk* at ccd*) the disk framework will call
ccdstrategy() -> ccdstart() -> ccdbuffer() from softint context.
Allocating the buffer with PR_WAITOK here is forbidden.
Change ccdstart() / ccdbuffer() to report failure back to caller and
pass PR_WAITOK / PR_NOWAIT as an additional argument.
Call ccdstart() with PR_NOPWAIT from ccdstrategy() and on error defer
to the kthread. Call ccdstart() with PR_WAITOK from kthread so requests
from kthread always succeed to allocate the buffers.
Remove the (non working) throttling on low memory as it is no longer needed.
Fixes PR kern/58043 "kernel crash in assert_sleepable() in -current,
dk(4) driver?"
To generate a diff of this commit:
cvs rdiff -u -r1.189 -r1.190 src/sys/dev/ccd.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/58043 CVS commit: [netbsd-10] src/sys/dev
Date: Thu, 18 Apr 2024 18:24:31 +0000
Module Name: src
Committed By: martin
Date: Thu Apr 18 18:24:31 UTC 2024
Modified Files:
src/sys/dev [netbsd-10]: ccd.c
Log Message:
Pull up following revision(s) (requested by hannken in ticket #669):
sys/dev/ccd.c: revision 1.190
Using a ccd(4) with GPT (dk* at ccd*) the disk framework will call
ccdstrategy() -> ccdstart() -> ccdbuffer() from softint context.
Allocating the buffer with PR_WAITOK here is forbidden.
Change ccdstart() / ccdbuffer() to report failure back to caller and
pass PR_WAITOK / PR_NOWAIT as an additional argument.
Call ccdstart() with PR_NOPWAIT from ccdstrategy() and on error defer
to the kthread. Call ccdstart() with PR_WAITOK from kthread so requests
from kthread always succeed to allocate the buffers.
Remove the (non working) throttling on low memory as it is no longer needed.
Fixes PR kern/58043 "kernel crash in assert_sleepable() in -current,
dk(4) driver?"
To generate a diff of this commit:
cvs rdiff -u -r1.189 -r1.189.4.1 src/sys/dev/ccd.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Tue, 23 Jul 2024 22:20:23 +0000
State-Changed-Why:
Fixed in HEAD and pulled up to 10; does this need pullup-9 too?
Responsible-Changed-From-To: kern-bug-people->hannken
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Sun, 18 Aug 2024 16:55:29 +0000
Responsible-Changed-Why:
Can you assess whether this is reasonable to pull up to 9?
From: "J. Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: riastradh@NetBSD.org
Subject: Re: kern/58043 (kernel crash in assert_sleepable() in -current,
dk(4) driver?)
Date: Sun, 18 Aug 2024 17:55:25 +0000
On Sun, Aug 18, 2024 at 04:55:30PM +0000, riastradh@NetBSD.org wrote:
> Synopsis: kernel crash in assert_sleepable() in -current, dk(4) driver?
>
> Responsible-Changed-From-To: kern-bug-people->hannken
> Responsible-Changed-By: riastradh@NetBSD.org
> Responsible-Changed-When: Sun, 18 Aug 2024 16:55:29 +0000
> Responsible-Changed-Why:
> Can you assess whether this is reasonable to pull up to 9?
As the assertion fires from DEBUG-kernels only, the diff does not
apply cleanly to -9 and therefore needs testing we should wait
for someone getting this assertion from -9 and then prepare the
test and pullup.
--
J. Hannken-Illjes - hannken@netbsd.org
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.