NetBSD Problem Report #55839

From paul@whooppee.com  Thu Dec  3 22:37:38 2020
Return-Path: <paul@whooppee.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 23D331A9217
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  3 Dec 2020 22:37:38 +0000 (UTC)
Message-Id: <20201203223735.B5ECB30F2C4@speedy.whooppee.com>
Date: Thu,  3 Dec 2020 14:37:35 -0800 (PST)
From: paul@whooppee.com
Reply-To: paul@whooppee.com
To: gnats-bugs@NetBSD.org
Subject: nvme(4) panic on amd64 9.99.76 when loaded as a module
X-Send-Pr-Version: 3.95

>Number:         55839
>Category:       kern
>Synopsis:       nvme(4) panic on amd64 9.99.76 when loaded as a module
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Dec 03 22:40:00 +0000 2020
>Closed-Date:    Tue Dec 08 07:35:44 +0000 2020
>Last-Modified:  Tue Dec 08 07:35:44 +0000 2020
>Originator:     Paul Goyette
>Release:        NetBSD 9.99.76
>Organization:
+--------------------+--------------------------+-----------------------+
| Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:     |
| (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com     |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org   |
+--------------------+--------------------------+-----------------------+
>Environment:


System: NetBSD speedy.whooppee.com 9.99.76 NetBSD 9.99.76 (SPEEDY 2020-12-03 16:50:31 UTC) #0: Thu Dec  3 19:39:16 UTC 2020 paul@speedy.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/SPEEDY amd64
Architecture: x86_64
Machine: amd64
>Description:
	This machine contains a nvme(4) device:

	 nvme0 at pci3 dev 0 function 0: Samsung Electronics (3rd vendor ID) product a804 (rev. 0x00)
	 nvme0: NVMe 1.2
	 nvme0: for admin queue interrupting at msix6 vec 0
	 nvme0: Samsung SSD 960 PRO 512GB, firmware 2B6QCXP7, serial S3EWNX0K108171P

	When loading the nvme(4) module using modload(8), I get the
	following crash:

	panic: kernel diagnostic assertion "ns->ident == NULL" failed: file "/build/netbsd-local/src_ro/sys/dev/ic/nvme.c", line 670

	Backtrace shows:

	vpanic() at vpanic+0x156
	__x86_indirect_thunk_rax() at __x86_indirect_thunk_rax
	nvme_ns_identify() at nvme_ns_identify+0x24f
	nvme_rescan() at nvme_rescan+0xc0
	config_cfdata_attach() at config_cfdata_attach+0xc3
	config_init_component() at config_init_component+0x7a
	module_do_load() at module_do_load+0x5c9
	module_load() at module_load+0x85
	handle_modctl_load() at handle_modctl_load+0x157
	sys_modctl() at sys_modctl+0x324
	syscall() at syscall+0x23e
	--- syscall (number 246) ---
	syscall+0x23e:

	gdb shows

	0x1d0f is in nvme_ns_identify (/build/netbsd-local/src_ro/sys/dev/ic/nvme.c:637).
	632             KASSERT(ccb != NULL); /* it's a bug if we don't have spare ccb here */
	633
	634             mem = nvme_dmamem_alloc(sc, sizeof(*identify));
	635             if (mem == NULL) {
	636                     nvme_ccb_put(sc->sc_admin_q, ccb);
	637                     return ENOMEM;
	638             }
	639
	640             memset(&sqe, 0, sizeof(sqe));
	641             sqe.opcode = NVM_ADMIN_IDENTIFY;
	(gdb) 

	gdb seems unable to disassemble things, but objdump shows

	0000000000001ac0 <nvme_ns_identify>:
	    1ac0:       55                      push   %rbp
	    1ac1:       48 89 e5                mov    %rsp,%rbp
	    1ac4:       41 57                   push   %r15
	    1ac6:       41 56                   push   %r14
	...
	    1d08:       31 c0                   xor    %eax,%eax
	    1d0a:       e8 00 00 00 00          callq  1d0f <nvme_ns_identify+0x24f>
	                        1d0b: R_X86_64_PLT32    kern_assert-0x4
	    1d0f:       4c 8b 5d 80             mov    -0x80(%rbp),%r11
	    1d13:       e9 2e ff ff ff          jmpq   1c46 <nvme_ns_identify+0x186>

	0000000000001d18 <nvme_rescan>:

	(A crash-dump file and kernel-with-symbol-table is available for
	further investigation, if needed.)
>How-To-Repeat:
	Boot a 9.99.76 amd64 kernel built _without_ built-in nvme module,
	and then try to load the module.  (It is unknown if the problem
	occurs with built-in nvme module.)
>Fix:
	No fix currently known.

>Release-Note:

>Audit-Trail:

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55839: nvme(4) panic on amd64 9.99.76 when loaded as a
 module
Date: Fri, 4 Dec 2020 07:38:23 -0800 (PST)

 >> How-To-Repeat:
 > 	Boot a 9.99.76 amd64 kernel built _without_ built-in nvme module,
 > 	and then try to load the module.  (It is unknown if the problem
 > 	occurs with built-in nvme module.)

 I have just confirmed that booting a GENERIC kernel (with the nvme
 module "built-in" does _not_ panic.  It boots normally, at least to
 single-user.


 +--------------------+--------------------------+-----------------------+
 | Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:     |
 | (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com     |
 | Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org   |
 +--------------------+--------------------------+-----------------------+

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55839: nvme(4) panic on amd64 9.99.76 when loaded as a
 module
Date: Fri, 4 Dec 2020 10:34:53 -0800 (PST)

 Backtrace from gdb (with argument names & values):

          .rodata_addr = 0xffffffff80e77000
 add symbol table from file 
 "/stand/amd64/9.99.76/modules/wapbl/wapbl.kmod" at
          .text_addr = 0xffffffff80ea0000
          .data_addr = 0xffffffff80e98000
          .rodata_addr = 0xffffffff80e99000
 (gdb) bt
 #0  0xffffffff80223945 in cpu_reboot (howto=howto@entry=256,
      bootstr=bootstr@entry=0x0 <nvme_ns_identify.cold>)
      at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/machdep.c:713
 #1  0xffffffff8060e5c5 in kern_reboot (howto=256,
      bootstr=bootstr@entry=0x0 <nvme_ns_identify.cold>)
      at /build/netbsd-local/src_ro/sys/kern/kern_reboot.c:73
 #2  0xffffffff8054de29 in db_reboot_cmd (addr=<optimized out>,
      have_addr=<optimized out>, count=<optimized out>, modif=<optimized out>)
      at /build/netbsd-local/src_ro/sys/ddb/db_command.c:1471
 #3  0xffffffff8054e637 in db_command (
      last_cmdp=last_cmdp@entry=0xffffffff80a2b640 <db_last_command>)
      at /build/netbsd-local/src_ro/sys/ddb/db_command.c:955
 #4  0xffffffff8054e98c in db_command_loop ()
      at /build/netbsd-local/src_ro/sys/ddb/db_command.c:614
 #5  0xffffffff80552503 in db_trap (type=type@entry=1, code=code@entry=0)
      at /build/netbsd-local/src_ro/sys/ddb/db_trap.c:91
 #6  0xffffffff80220b4f in kdb_trap (type=type@entry=1, code=code@entry=0,
      regs=regs@entry=0xffffb70927ed29f0)
      at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/db_interface.c:249
 #7  0xffffffff80225c76 in trap (frame=0xffffb70927ed29f0)
      at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/trap.c:315
 #8  0xffffffff8021ead3 in alltraps ()
 #9  0xffffffff8021f415 in breakpoint ()
 #10 0xffffffff8064e2e3 in vpanic (
      fmt=0xffffffff80faa060 <cfattach_ioconf_nvme_pci> "kernel %sassertion \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xffffb70927ed2b28)
      at /build/netbsd-local/src_ro/sys/kern/subr_prf.c:288
 #11 0xffffffff807217b3 in kern_assert (
      fmt=fmt@entry=0x100 <error: Cannot access memory at address 0x100>)
      at /build/netbsd-local/src_ro/sys/lib/libkern/kern_assert.c:51
 #12 0xffffffff80fb2d0f in nvme_ns_identify (sc=0xffffb701045e3000, nsid=1)
      at /build/netbsd-local/src_ro/sys/dev/ic/nvme.c:637
 #13 0xffffffff80fb2dd8 in nvme_rescan (self=0xffffdaaf83c67780,
      attr=<optimized out>, flags=<optimized out>)
      at /build/netbsd-local/src_ro/sys/dev/ic/nvme.c:490
 #14 0xffffffff80635220 in rescan_with_cfdata (cf=0xffffffff80f9b000)
      at /build/netbsd-local/src_ro/sys/kern/subr_autoconf.c:2995
 #15 config_cfdata_attach (cf=0xffffffff80f9b000, scannow=<optimized out>)
      at /build/netbsd-local/src_ro/sys/kern/subr_autoconf.c:928
 #16 0xffffffff806352e6 in config_init_component (cfdriverv=0xffffffff80f9c0c0,
      cfattachv=0xffffffff80f9c0e0, cfdatav=0xffffffff80f9b000)
      at /build/netbsd-local/src_ro/sys/kern/subr_autoconf.c:388
 #17 0xffffffff805ffc79 in module_do_load (
      name=name@entry=0xffffdaaf845b8068 "ld_nvme", isdep=isdep@entry=false,
      flags=flags@entry=0, props=props@entry=0xffffdaaf832133b8,
      modp=modp@entry=0x0 <nvme_ns_identify.cold>,
      modclass=modclass@entry=MODULE_CLASS_ANY, autoload=autoload@entry=false)
      at /build/netbsd-local/src_ro/sys/kern/kern_module.c:1363
 #18 0xffffffff806006c3 in module_load (
      filename=filename@entry=0xffffdaaf845b8068 "ld_nvme", flags=flags@entry=0,
      props=props@entry=0xffffdaaf832133b8,
      modclass=modclass@entry=MODULE_CLASS_ANY)
      at /build/netbsd-local/src_ro/sys/kern/kern_module.c:678
 #19 0xffffffff8065e207 in handle_modctl_load (ml_filename=<optimized out>,
      ml_flags=0, ml_props=<optimized out>, ml_propslen=190)
      at /build/netbsd-local/src_ro/sys/kern/sys_module.c:99
 #20 0xffffffff8065e54a in sys_modctl (l=<optimized out>,
      uap=0xffffb70927ed3000, retval=<optimized out>)
      at /build/netbsd-local/src_ro/sys/kern/sys_module.c:285
 #21 0xffffffff802a9cfe in sy_call (rval=0xffffb70927ed2fb0,
      uap=0xffffb70927ed3000, l=0xffffdaaf83cc52c0,
      sy=0xffffffff80a347d0 <sysent+5904>)
      at /build/netbsd-local/src_ro/sys/sys/syscallvar.h:65
 #22 sy_invoke (code=246, rval=0xffffb70927ed2fb0, uap=0xffffb70927ed3000,
      l=0xffffdaaf83cc52c0, sy=0xffffffff80a347d0 <sysent+5904>)
      at /build/netbsd-local/src_ro/sys/sys/syscallvar.h:94
 #23 syscall (frame=0xffffb70927ed3000)
      at /build/netbsd-local/src_ro/sys/arch/x86/x86/syscall.c:138
 #24 0xffffffff802096ad in handle_syscall ()
 (gdb)



 +--------------------+--------------------------+-----------------------+
 | Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:     |
 | (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com     |
 | Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org   |
 +--------------------+--------------------------+-----------------------+

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55839: nvme(4) panic on amd64 9.99.76 when loaded as a
 module
Date: Fri, 4 Dec 2020 12:21:11 -0800 (PST)

 Interestingly, when nvme(4) module is loaded, it seems to go through
 nvme_rescan() twice.  The panic occurs the second time through (and
 that is where I got all the previous stack traces).

 For the _first_ time through, here is the backtrace.  Note that it
 results from a call to the _parent_ device's pci_rescan().

 [    17.422137] cpu1: Begin traceback...
 [    17.422137] vpanic() at netbsd:vpanic+0x156
 [    17.422137] __x86_indirect_thunk_rax() at 
 netbsd:__x86_indirect_thunk_rax
 [    17.422137] nvme_ns_identify() at nvme:nvme_ns_identify+0x1b3
 [    17.422137] nvme_rescan() at nvme:nvme_rescan+0xc0
 [    17.422137] nvme_attach() at nvme:nvme_attach+0xa22
 [    17.422137] nvme_pci_attach() at nvme:nvme_pci_attach+0x383
 [    17.422137] config_attach_loc() at netbsd:config_attach_loc+0x17e
 [    17.422137] pci_probe_device() at netbsd:pci_probe_device+0x585
 [    17.422137] pci_enumerate_bus() at netbsd:pci_enumerate_bus+0x1b5
 [    17.422137] pcirescan() at netbsd:pcirescan+0x4e
 [    17.422137] config_cfdata_attach() at netbsd:config_cfdata_attach+0xc3
 [    17.432137] config_init_component() at netbsd:config_init_component+0x7a
 [    17.432137] nvme_modcmd() at nvme:nvme_modcmd+0x71
 [    17.432137] module_do_load() at netbsd:module_do_load+0x5c9
 [    17.432137] module_do_load() at netbsd:module_do_load+0x48b
 [    17.432137] module_load() at netbsd:module_load+0x85
 [    17.432137] handle_modctl_load() at netbsd:handle_modctl_load+0x157
 [    17.432137] sys_modctl() at netbsd:sys_modctl+0x324
 [    17.432137] syscall() at netbsd:syscall+0x23e
 [    17.432137] --- syscall (number 246) ---
 [    17.432137] netbsd:syscall+0x23e:
 [    17.432137] cpu1: End traceback...



 +--------------------+--------------------------+-----------------------+
 | Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:     |
 | (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com     |
 | Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org   |
 +--------------------+--------------------------+-----------------------+

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@netbsd.org
Cc: kardell@netbsd.org
Subject: Re: kern/55839: nvme(4) panic on amd64 9.99.76 when loaded as a
 module
Date: Fri, 4 Dec 2020 13:10:54 -0800 (PST)

 FWIW, I suspect that thet fix for kern/55674 is the cause of this
 problem.  Frank, any chance you can come up with something that
 works for loadable-module as well as for built-in?


 +--------------------+--------------------------+-----------------------+
 | Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:     |
 | (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com     |
 | Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org   |
 +--------------------+--------------------------+-----------------------+

From: "Frank Kardel" <kardel@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55839 CVS commit: src/sys/dev/ic
Date: Fri, 4 Dec 2020 23:03:12 +0000

 Module Name:	src
 Committed By:	kardel
 Date:		Fri Dec  4 23:03:11 UTC 2020

 Modified Files:
 	src/sys/dev/ic: nvme.c

 Log Message:
 PR kern/55839:
 handle multiple nvme_rescan()s correctly by doing the
 name-space identify only once per nsid.

 fixes issue where modloading triggers multiple
 rescans.


 To generate a diff of this commit:
 cvs rdiff -u -r1.52 -r1.53 src/sys/dev/ic/nvme.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: pgoyette@NetBSD.org
State-Changed-When: Sun, 06 Dec 2020 03:41:16 +0000
State-Changed-Why:
Fix confirmed and committed - thanks, Frank!


State-Changed-From-To: closed->pending-pullups
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Mon, 07 Dec 2020 13:01:11 +0000
State-Changed-Why:
[pullup-8 #1633] pullup fix for PR kern/55839 (nvme(4) panic when loaded as module)
[pullup-9 #1144] pullup fix for PR kern/55839 (nvme(4) panic when loaded as module)


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55839 CVS commit: [netbsd-9] src/sys/dev/ic
Date: Mon, 7 Dec 2020 20:04:08 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon Dec  7 20:04:07 UTC 2020

 Modified Files:
 	src/sys/dev/ic [netbsd-9]: nvme.c

 Log Message:
 Pull up following revision(s) (requested by kardel in ticket #1144):

 	sys/dev/ic/nvme.c: revision 1.53

 PR kern/55839:

 handle multiple nvme_rescan()s correctly by doing the
 name-space identify only once per nsid.
 fixes issue where modloading triggers multiple
 rescans.


 To generate a diff of this commit:
 cvs rdiff -u -r1.44.2.4 -r1.44.2.5 src/sys/dev/ic/nvme.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55839 CVS commit: [netbsd-8] src/sys/dev/ic
Date: Mon, 7 Dec 2020 20:05:36 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon Dec  7 20:05:36 UTC 2020

 Modified Files:
 	src/sys/dev/ic [netbsd-8]: nvme.c

 Log Message:
 Pull up following revision(s) (requested by kardel in ticket #1633):

 	sys/dev/ic/nvme.c: revision 1.53

 PR kern/55839:

 handle multiple nvme_rescan()s correctly by doing the
 name-space identify only once per nsid.
 fixes issue where modloading triggers multiple
 rescans.


 To generate a diff of this commit:
 cvs rdiff -u -r1.30.2.8 -r1.30.2.9 src/sys/dev/ic/nvme.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Tue, 08 Dec 2020 07:35:44 +0000
State-Changed-Why:
pullups done.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.