NetBSD Problem Report #55839
From paul@whooppee.com Thu Dec 3 22:37:38 2020
Return-Path: <paul@whooppee.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 23D331A9217
for <gnats-bugs@gnats.NetBSD.org>; Thu, 3 Dec 2020 22:37:38 +0000 (UTC)
Message-Id: <20201203223735.B5ECB30F2C4@speedy.whooppee.com>
Date: Thu, 3 Dec 2020 14:37:35 -0800 (PST)
From: paul@whooppee.com
Reply-To: paul@whooppee.com
To: gnats-bugs@NetBSD.org
Subject: nvme(4) panic on amd64 9.99.76 when loaded as a module
X-Send-Pr-Version: 3.95
>Number: 55839
>Category: kern
>Synopsis: nvme(4) panic on amd64 9.99.76 when loaded as a module
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Dec 03 22:40:00 +0000 2020
>Closed-Date: Tue Dec 08 07:35:44 +0000 2020
>Last-Modified: Tue Dec 08 07:35:44 +0000 2020
>Originator: Paul Goyette
>Release: NetBSD 9.99.76
>Organization:
+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org |
+--------------------+--------------------------+-----------------------+
>Environment:
System: NetBSD speedy.whooppee.com 9.99.76 NetBSD 9.99.76 (SPEEDY 2020-12-03 16:50:31 UTC) #0: Thu Dec 3 19:39:16 UTC 2020 paul@speedy.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/SPEEDY amd64
Architecture: x86_64
Machine: amd64
>Description:
This machine contains a nvme(4) device:
nvme0 at pci3 dev 0 function 0: Samsung Electronics (3rd vendor ID) product a804 (rev. 0x00)
nvme0: NVMe 1.2
nvme0: for admin queue interrupting at msix6 vec 0
nvme0: Samsung SSD 960 PRO 512GB, firmware 2B6QCXP7, serial S3EWNX0K108171P
When loading the nvme(4) module using modload(8), I get the
following crash:
panic: kernel diagnostic assertion "ns->ident == NULL" failed: file "/build/netbsd-local/src_ro/sys/dev/ic/nvme.c", line 670
Backtrace shows:
vpanic() at vpanic+0x156
__x86_indirect_thunk_rax() at __x86_indirect_thunk_rax
nvme_ns_identify() at nvme_ns_identify+0x24f
nvme_rescan() at nvme_rescan+0xc0
config_cfdata_attach() at config_cfdata_attach+0xc3
config_init_component() at config_init_component+0x7a
module_do_load() at module_do_load+0x5c9
module_load() at module_load+0x85
handle_modctl_load() at handle_modctl_load+0x157
sys_modctl() at sys_modctl+0x324
syscall() at syscall+0x23e
--- syscall (number 246) ---
syscall+0x23e:
gdb shows
0x1d0f is in nvme_ns_identify (/build/netbsd-local/src_ro/sys/dev/ic/nvme.c:637).
632 KASSERT(ccb != NULL); /* it's a bug if we don't have spare ccb here */
633
634 mem = nvme_dmamem_alloc(sc, sizeof(*identify));
635 if (mem == NULL) {
636 nvme_ccb_put(sc->sc_admin_q, ccb);
637 return ENOMEM;
638 }
639
640 memset(&sqe, 0, sizeof(sqe));
641 sqe.opcode = NVM_ADMIN_IDENTIFY;
(gdb)
gdb seems unable to disassemble things, but objdump shows
0000000000001ac0 <nvme_ns_identify>:
1ac0: 55 push %rbp
1ac1: 48 89 e5 mov %rsp,%rbp
1ac4: 41 57 push %r15
1ac6: 41 56 push %r14
...
1d08: 31 c0 xor %eax,%eax
1d0a: e8 00 00 00 00 callq 1d0f <nvme_ns_identify+0x24f>
1d0b: R_X86_64_PLT32 kern_assert-0x4
1d0f: 4c 8b 5d 80 mov -0x80(%rbp),%r11
1d13: e9 2e ff ff ff jmpq 1c46 <nvme_ns_identify+0x186>
0000000000001d18 <nvme_rescan>:
(A crash-dump file and kernel-with-symbol-table is available for
further investigation, if needed.)
>How-To-Repeat:
Boot a 9.99.76 amd64 kernel built _without_ built-in nvme module,
and then try to load the module. (It is unknown if the problem
occurs with built-in nvme module.)
>Fix:
No fix currently known.
>Release-Note:
>Audit-Trail:
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55839: nvme(4) panic on amd64 9.99.76 when loaded as a
module
Date: Fri, 4 Dec 2020 07:38:23 -0800 (PST)
>> How-To-Repeat:
> Boot a 9.99.76 amd64 kernel built _without_ built-in nvme module,
> and then try to load the module. (It is unknown if the problem
> occurs with built-in nvme module.)
I have just confirmed that booting a GENERIC kernel (with the nvme
module "built-in" does _not_ panic. It boots normally, at least to
single-user.
+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org |
+--------------------+--------------------------+-----------------------+
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55839: nvme(4) panic on amd64 9.99.76 when loaded as a
module
Date: Fri, 4 Dec 2020 10:34:53 -0800 (PST)
Backtrace from gdb (with argument names & values):
.rodata_addr = 0xffffffff80e77000
add symbol table from file
"/stand/amd64/9.99.76/modules/wapbl/wapbl.kmod" at
.text_addr = 0xffffffff80ea0000
.data_addr = 0xffffffff80e98000
.rodata_addr = 0xffffffff80e99000
(gdb) bt
#0 0xffffffff80223945 in cpu_reboot (howto=howto@entry=256,
bootstr=bootstr@entry=0x0 <nvme_ns_identify.cold>)
at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/machdep.c:713
#1 0xffffffff8060e5c5 in kern_reboot (howto=256,
bootstr=bootstr@entry=0x0 <nvme_ns_identify.cold>)
at /build/netbsd-local/src_ro/sys/kern/kern_reboot.c:73
#2 0xffffffff8054de29 in db_reboot_cmd (addr=<optimized out>,
have_addr=<optimized out>, count=<optimized out>, modif=<optimized out>)
at /build/netbsd-local/src_ro/sys/ddb/db_command.c:1471
#3 0xffffffff8054e637 in db_command (
last_cmdp=last_cmdp@entry=0xffffffff80a2b640 <db_last_command>)
at /build/netbsd-local/src_ro/sys/ddb/db_command.c:955
#4 0xffffffff8054e98c in db_command_loop ()
at /build/netbsd-local/src_ro/sys/ddb/db_command.c:614
#5 0xffffffff80552503 in db_trap (type=type@entry=1, code=code@entry=0)
at /build/netbsd-local/src_ro/sys/ddb/db_trap.c:91
#6 0xffffffff80220b4f in kdb_trap (type=type@entry=1, code=code@entry=0,
regs=regs@entry=0xffffb70927ed29f0)
at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/db_interface.c:249
#7 0xffffffff80225c76 in trap (frame=0xffffb70927ed29f0)
at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/trap.c:315
#8 0xffffffff8021ead3 in alltraps ()
#9 0xffffffff8021f415 in breakpoint ()
#10 0xffffffff8064e2e3 in vpanic (
fmt=0xffffffff80faa060 <cfattach_ioconf_nvme_pci> "kernel %sassertion \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xffffb70927ed2b28)
at /build/netbsd-local/src_ro/sys/kern/subr_prf.c:288
#11 0xffffffff807217b3 in kern_assert (
fmt=fmt@entry=0x100 <error: Cannot access memory at address 0x100>)
at /build/netbsd-local/src_ro/sys/lib/libkern/kern_assert.c:51
#12 0xffffffff80fb2d0f in nvme_ns_identify (sc=0xffffb701045e3000, nsid=1)
at /build/netbsd-local/src_ro/sys/dev/ic/nvme.c:637
#13 0xffffffff80fb2dd8 in nvme_rescan (self=0xffffdaaf83c67780,
attr=<optimized out>, flags=<optimized out>)
at /build/netbsd-local/src_ro/sys/dev/ic/nvme.c:490
#14 0xffffffff80635220 in rescan_with_cfdata (cf=0xffffffff80f9b000)
at /build/netbsd-local/src_ro/sys/kern/subr_autoconf.c:2995
#15 config_cfdata_attach (cf=0xffffffff80f9b000, scannow=<optimized out>)
at /build/netbsd-local/src_ro/sys/kern/subr_autoconf.c:928
#16 0xffffffff806352e6 in config_init_component (cfdriverv=0xffffffff80f9c0c0,
cfattachv=0xffffffff80f9c0e0, cfdatav=0xffffffff80f9b000)
at /build/netbsd-local/src_ro/sys/kern/subr_autoconf.c:388
#17 0xffffffff805ffc79 in module_do_load (
name=name@entry=0xffffdaaf845b8068 "ld_nvme", isdep=isdep@entry=false,
flags=flags@entry=0, props=props@entry=0xffffdaaf832133b8,
modp=modp@entry=0x0 <nvme_ns_identify.cold>,
modclass=modclass@entry=MODULE_CLASS_ANY, autoload=autoload@entry=false)
at /build/netbsd-local/src_ro/sys/kern/kern_module.c:1363
#18 0xffffffff806006c3 in module_load (
filename=filename@entry=0xffffdaaf845b8068 "ld_nvme", flags=flags@entry=0,
props=props@entry=0xffffdaaf832133b8,
modclass=modclass@entry=MODULE_CLASS_ANY)
at /build/netbsd-local/src_ro/sys/kern/kern_module.c:678
#19 0xffffffff8065e207 in handle_modctl_load (ml_filename=<optimized out>,
ml_flags=0, ml_props=<optimized out>, ml_propslen=190)
at /build/netbsd-local/src_ro/sys/kern/sys_module.c:99
#20 0xffffffff8065e54a in sys_modctl (l=<optimized out>,
uap=0xffffb70927ed3000, retval=<optimized out>)
at /build/netbsd-local/src_ro/sys/kern/sys_module.c:285
#21 0xffffffff802a9cfe in sy_call (rval=0xffffb70927ed2fb0,
uap=0xffffb70927ed3000, l=0xffffdaaf83cc52c0,
sy=0xffffffff80a347d0 <sysent+5904>)
at /build/netbsd-local/src_ro/sys/sys/syscallvar.h:65
#22 sy_invoke (code=246, rval=0xffffb70927ed2fb0, uap=0xffffb70927ed3000,
l=0xffffdaaf83cc52c0, sy=0xffffffff80a347d0 <sysent+5904>)
at /build/netbsd-local/src_ro/sys/sys/syscallvar.h:94
#23 syscall (frame=0xffffb70927ed3000)
at /build/netbsd-local/src_ro/sys/arch/x86/x86/syscall.c:138
#24 0xffffffff802096ad in handle_syscall ()
(gdb)
+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org |
+--------------------+--------------------------+-----------------------+
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55839: nvme(4) panic on amd64 9.99.76 when loaded as a
module
Date: Fri, 4 Dec 2020 12:21:11 -0800 (PST)
Interestingly, when nvme(4) module is loaded, it seems to go through
nvme_rescan() twice. The panic occurs the second time through (and
that is where I got all the previous stack traces).
For the _first_ time through, here is the backtrace. Note that it
results from a call to the _parent_ device's pci_rescan().
[ 17.422137] cpu1: Begin traceback...
[ 17.422137] vpanic() at netbsd:vpanic+0x156
[ 17.422137] __x86_indirect_thunk_rax() at
netbsd:__x86_indirect_thunk_rax
[ 17.422137] nvme_ns_identify() at nvme:nvme_ns_identify+0x1b3
[ 17.422137] nvme_rescan() at nvme:nvme_rescan+0xc0
[ 17.422137] nvme_attach() at nvme:nvme_attach+0xa22
[ 17.422137] nvme_pci_attach() at nvme:nvme_pci_attach+0x383
[ 17.422137] config_attach_loc() at netbsd:config_attach_loc+0x17e
[ 17.422137] pci_probe_device() at netbsd:pci_probe_device+0x585
[ 17.422137] pci_enumerate_bus() at netbsd:pci_enumerate_bus+0x1b5
[ 17.422137] pcirescan() at netbsd:pcirescan+0x4e
[ 17.422137] config_cfdata_attach() at netbsd:config_cfdata_attach+0xc3
[ 17.432137] config_init_component() at netbsd:config_init_component+0x7a
[ 17.432137] nvme_modcmd() at nvme:nvme_modcmd+0x71
[ 17.432137] module_do_load() at netbsd:module_do_load+0x5c9
[ 17.432137] module_do_load() at netbsd:module_do_load+0x48b
[ 17.432137] module_load() at netbsd:module_load+0x85
[ 17.432137] handle_modctl_load() at netbsd:handle_modctl_load+0x157
[ 17.432137] sys_modctl() at netbsd:sys_modctl+0x324
[ 17.432137] syscall() at netbsd:syscall+0x23e
[ 17.432137] --- syscall (number 246) ---
[ 17.432137] netbsd:syscall+0x23e:
[ 17.432137] cpu1: End traceback...
+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org |
+--------------------+--------------------------+-----------------------+
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@netbsd.org
Cc: kardell@netbsd.org
Subject: Re: kern/55839: nvme(4) panic on amd64 9.99.76 when loaded as a
module
Date: Fri, 4 Dec 2020 13:10:54 -0800 (PST)
FWIW, I suspect that thet fix for kern/55674 is the cause of this
problem. Frank, any chance you can come up with something that
works for loadable-module as well as for built-in?
+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org |
+--------------------+--------------------------+-----------------------+
From: "Frank Kardel" <kardel@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55839 CVS commit: src/sys/dev/ic
Date: Fri, 4 Dec 2020 23:03:12 +0000
Module Name: src
Committed By: kardel
Date: Fri Dec 4 23:03:11 UTC 2020
Modified Files:
src/sys/dev/ic: nvme.c
Log Message:
PR kern/55839:
handle multiple nvme_rescan()s correctly by doing the
name-space identify only once per nsid.
fixes issue where modloading triggers multiple
rescans.
To generate a diff of this commit:
cvs rdiff -u -r1.52 -r1.53 src/sys/dev/ic/nvme.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: pgoyette@NetBSD.org
State-Changed-When: Sun, 06 Dec 2020 03:41:16 +0000
State-Changed-Why:
Fix confirmed and committed - thanks, Frank!
State-Changed-From-To: closed->pending-pullups
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Mon, 07 Dec 2020 13:01:11 +0000
State-Changed-Why:
[pullup-8 #1633] pullup fix for PR kern/55839 (nvme(4) panic when loaded as module)
[pullup-9 #1144] pullup fix for PR kern/55839 (nvme(4) panic when loaded as module)
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55839 CVS commit: [netbsd-9] src/sys/dev/ic
Date: Mon, 7 Dec 2020 20:04:08 +0000
Module Name: src
Committed By: martin
Date: Mon Dec 7 20:04:07 UTC 2020
Modified Files:
src/sys/dev/ic [netbsd-9]: nvme.c
Log Message:
Pull up following revision(s) (requested by kardel in ticket #1144):
sys/dev/ic/nvme.c: revision 1.53
PR kern/55839:
handle multiple nvme_rescan()s correctly by doing the
name-space identify only once per nsid.
fixes issue where modloading triggers multiple
rescans.
To generate a diff of this commit:
cvs rdiff -u -r1.44.2.4 -r1.44.2.5 src/sys/dev/ic/nvme.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55839 CVS commit: [netbsd-8] src/sys/dev/ic
Date: Mon, 7 Dec 2020 20:05:36 +0000
Module Name: src
Committed By: martin
Date: Mon Dec 7 20:05:36 UTC 2020
Modified Files:
src/sys/dev/ic [netbsd-8]: nvme.c
Log Message:
Pull up following revision(s) (requested by kardel in ticket #1633):
sys/dev/ic/nvme.c: revision 1.53
PR kern/55839:
handle multiple nvme_rescan()s correctly by doing the
name-space identify only once per nsid.
fixes issue where modloading triggers multiple
rescans.
To generate a diff of this commit:
cvs rdiff -u -r1.30.2.8 -r1.30.2.9 src/sys/dev/ic/nvme.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Tue, 08 Dec 2020 07:35:44 +0000
State-Changed-Why:
pullups done.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.