NetBSD Problem Report #55185

From martin@aprisoft.de  Sat Apr 18 12:46:13 2020
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 1D3951A9213
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 18 Apr 2020 12:46:13 +0000 (UTC)
Message-Id: <20200418124603.3AA9B5CC803@emmas.aprisoft.de>
Date: Sat, 18 Apr 2020 14:46:03 +0200 (CEST)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: Can not boot a LOCKDEBUG kernel
X-Send-Pr-Version: 3.95

>Number:         55185
>Category:       kern
>Synopsis:       Can not boot a LOCKDEBUG kernel
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Apr 18 12:50:00 +0000 2020
>Last-Modified:  Sat May 02 09:20:01 +0000 2020
>Originator:     Martin Husemann
>Release:        NetBSD 9.99.56
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD seven-days-to-the-wolves.aprisoft.de 9.99.56 NetBSD 9.99.56 (GENERIC) #382: Mon Apr 13 13:20:58 CEST 2020 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:

I am trying to boot a DIAGNOSTIC/DEBUG/LOCKDEBUG kernel, but it fails:

[  20.3553975] nouveau0: info: NVIDIA GF104 (0c4100a1)
[  20.5502778] nouveau0: info: bios: version 70.04.2e.00.04
[  20.6302982] nouveau0: interrupting at ioapic0 pin 18 (nouveau0)
[  20.7049504] nouveau0: info: fb: 1024 MiB GDDR5
[  20.7803395] Zone  kernel: Available graphics memory: 5351108 kiB
[  20.8524188] Zone   dma32: Available graphics memory: 2097152 kiB
[  20.9241763] nouveau0: info: DRM: VRAM: 1024 MiB
[  20.9782555] nouveau0: info: DRM: GART: 1048576 MiB
[  21.0354542] nouveau0: info: DRM: TMDS table version 2.0
[  21.0978532] nouveau0: info: DRM: DCB version 4.0
[  21.1529706] nouveau0: info: DRM: DCB outp 00: 02000300 00000000
[  21.2236891] nouveau0: info: DRM: DCB outp 01: 01000302 00020030
[  21.2944070] nouveau0: info: DRM: DCB outp 02: 04011380 00000000
[  21.3651260] nouveau0: info: DRM: DCB outp 03: 08011382 00020030
[  21.4358446] nouveau0: info: DRM: DCB outp 04: 02022362 00020010
[  21.5065625] nouveau0: info: DRM: DCB conn 00: 00001030
[  21.5679195] Kernel lock error: _kernel_lock,244: spinout

[  21.5679195] lock address : 0xffffffff814772c0 type     :               spin
[  21.5679195] initialized  : 0xffffffff80dab089
[  21.5679195] shared holds :                  0 exclusive:                  1
[  21.5679195] shares wanted:                  0 exclusive:                  2
[  21.5679195] relevant cpu :                  1 last held:                  0
[  21.5679195] relevant lwp : 0xffffe7f8670ca200 last held: 0xffffe7f867935280
[  21.5679195] last locked* : 0xffffffff8094061e unlocked : 0xffffffff806b6cc9
[  21.5679195] curcpu holds :                  0 wanted by: 0xffffe7f8670ca200
[  21.5679195] nouveau0: info: DRM: DCB conn 01: 00010130

[  22.3291805] nouveau0: info: DRM: DCB conn 02: 00002261
[  22.3291805] panic: kern info: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[  22.3926188] LOCKDEBUG: Kernel lock error: _kernel_lock,244: spinoutkern info: [drm] Driver supports precise vblank timestamp query.

[  22.6318130] cpu1: Begin traceback...
[  22.6708693] vpanic() at netbsd:vpanic+0x178
[  22.7208839] snprintf() at netbsd:snprintfnouveau0: info: DRM: MM: using COPY1 for buffer copies

[  22.8309142] lockdebug_more() at netbsd:lockdebug_morenouveau0: info: No connectors reported connected with modes

[  22.9542059] kern info: [drm] Cannot find any crtc or sizes - going 1024x768
[  22.9542059] _kernel_lock(nouveaufb0 at nouveau0) at 
[  23.0914823] netbsd:_kernel_lock[  23.0914823] +0x244
[  23.1510024] frag6_fasttimo(wsdisplay0 at nouveaufb0) at  kbdmux 1netbsd:frag6_fasttimo
[  23.2287592] +0x1a
[  23.2734816] pffasttimo() at netbsd:pffasttimo+0x47
[  23.3310639] callout_softclock() at netbsd:callout_softclock+0x124
[  23.4010725] softint_dispatch() at netbsd:softint_dispatch+0x12c
[  23.4710921] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xffff87013f1590f0
[  23.5611172] Xsoftintr() at netbsd:Xsoftintr+0x4f
[  23.6111312] --- interrupt ---
[  23.6511425] 0:
[  23.6711481] cpu1: End traceback...
[  23.7111592] fatal breakpoint trap in supervisor mode
[  23.7711760] trap type 1 code 0 rip 0xffffffff8021f55d cs 0x8 rflags 0x202 cr2 0 ilevel 0x2 rsp 0xffff87013f158e70
[  23.8912096] curlwp 0xffffe7f8670ca200 pid 0.23 lowest kstack 0xffff87013f1552c0
Stopped in pid 0.23 (system) at netbsd:breakpoint+0x5:  leave
db{1}> bt
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x178
snprintf() at netbsd:snprintf
lockdebug_more() at netbsd:lockdebug_more
_kernel_lock() at netbsd:_kernel_lock+0x244
frag6_fasttimo() at netbsd:frag6_fasttimo+0x1a
pffasttimo() at netbsd:pffasttimo+0x47
callout_softclock() at netbsd:callout_softclock+0x124
softint_dispatch() at netbsd:softint_dispatch+0x12c
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xffff87013f1590f0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
0:
db{1}> mach cpu 0
using CPU 0
db{1}> bt
spllower() at netbsd:spllower+0x8
tsleep() at netbsd:tsleep+0x53
iic_smbus_intr_thread() at netbsd:iic_smbus_intr_thread+0x52
db{1}> mach cpu 2
using CPU 2
db{1}> bt
x86_stihlt() at netbsd:x86_stihlt+0x6
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xd1
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xba
idle_loop() at netbsd:idle_loop+0x14d
cpu_hatch() at netbsd:cpu_hatch+0x187
md_root_setconf() at netbsd:md_root_setconf
db{1}> mach cpu 3
using CPU 3
db{1}> bt
x86_stihlt() at netbsd:x86_stihlt+0x6
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xd1
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xba
idle_loop() at netbsd:idle_loop+0x14d
cpu_hatch() at netbsd:cpu_hatch+0x187
md_root_setconf() at netbsd:md_root_setconf
db{1}> mach cpu 4
using CPU 4
db{1}> bt
x86_stihlt() at netbsd:x86_stihlt+0x6
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xd1
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xba
idle_loop() at netbsd:idle_loop+0x14d
cpu_hatch() at netbsd:cpu_hatch+0x187
md_root_setconf() at netbsd:md_root_setconf
db{1}> mach cpu 5
using CPU 5
db{1}> bt
x86_stihlt() at netbsd:x86_stihlt+0x6
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xd1
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xba
idle_loop() at netbsd:idle_loop+0x14d
cpu_hatch() at netbsd:cpu_hatch+0x187
md_root_setconf() at netbsd:md_root_setconf



>How-To-Repeat:
s/a

>Fix:
n/a

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55185: Can not boot a LOCKDEBUG kernel
Date: Sat, 18 Apr 2020 14:56:00 +0200

 On Sat, Apr 18, 2020 at 12:50:01PM +0000, martin@NetBSD.org wrote:
 > [  21.5679195] lock address : 0xffffffff814772c0 type     :               spin
 > [  21.5679195] initialized  : 0xffffffff80dab089
 > [  21.5679195] shared holds :                  0 exclusive:                  1
 > [  21.5679195] shares wanted:                  0 exclusive:                  2
 > [  21.5679195] relevant cpu :                  1 last held:                  0
 > [  21.5679195] relevant lwp : 0xffffe7f8670ca200 last held: 0xffffe7f867935280
 > [  21.5679195] last locked* : 0xffffffff8094061e unlocked : 0xffffffff806b6cc9
 > [  21.5679195] curcpu holds :                  0 wanted by: 0xffffe7f8670ca200

 0xffffffff8094061e is the KERNEL_LOCK in kern_sleepq.c:321 (sleepq_block).

 Martin

From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: kern/55185: Can not boot a LOCKDEBUG kernel
Date: Sat, 18 Apr 2020 15:09:27 +0200

 This works for me as a workaround, it runs the mountroot thread
 without taking KERNEL_LOCK(), which avoids the spinout.

 http://www.netbsd.org/~jdolecek/autoconf_mountroot_thread.diff

 Jaromir

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55185: Can not boot a LOCKDEBUG kernel
Date: Sat, 18 Apr 2020 17:27:27 +0200

 Duh, should have remembered that discussion.

 In my case "no nouveau*" in the config did the trick as well.

 Martin

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55185: Can not boot a LOCKDEBUG kernel
Date: Sat, 18 Apr 2020 18:29:55 +0100

 For nearly the same model:

 nouveau0 at pci9 dev 0 function 0: NVIDIA GeForce GTX 680 (rev. 0xa1)
 nouveau0: info: NVIDIA GK104 (0e4000a2)

 I also see the spinout, but everything seems fine without LOCKDEBUG.
 Presumably I should apply jdolecek's patch...

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55185: Can not boot a LOCKDEBUG kernel
Date: Sat, 2 May 2020 11:18:26 +0200

 There was an inverted condition in LOCKDEBUG, r1.171 of sys/kern/kern_lock.c
 should fix this for real.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.