NetBSD Problem Report #55185
From martin@aprisoft.de Sat Apr 18 12:46:13 2020
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 1D3951A9213
for <gnats-bugs@gnats.NetBSD.org>; Sat, 18 Apr 2020 12:46:13 +0000 (UTC)
Message-Id: <20200418124603.3AA9B5CC803@emmas.aprisoft.de>
Date: Sat, 18 Apr 2020 14:46:03 +0200 (CEST)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: Can not boot a LOCKDEBUG kernel
X-Send-Pr-Version: 3.95
>Number: 55185
>Category: kern
>Synopsis: Can not boot a LOCKDEBUG kernel
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Apr 18 12:50:00 +0000 2020
>Last-Modified: Sat May 02 09:20:01 +0000 2020
>Originator: Martin Husemann
>Release: NetBSD 9.99.56
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD seven-days-to-the-wolves.aprisoft.de 9.99.56 NetBSD 9.99.56 (GENERIC) #382: Mon Apr 13 13:20:58 CEST 2020 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
I am trying to boot a DIAGNOSTIC/DEBUG/LOCKDEBUG kernel, but it fails:
[ 20.3553975] nouveau0: info: NVIDIA GF104 (0c4100a1)
[ 20.5502778] nouveau0: info: bios: version 70.04.2e.00.04
[ 20.6302982] nouveau0: interrupting at ioapic0 pin 18 (nouveau0)
[ 20.7049504] nouveau0: info: fb: 1024 MiB GDDR5
[ 20.7803395] Zone kernel: Available graphics memory: 5351108 kiB
[ 20.8524188] Zone dma32: Available graphics memory: 2097152 kiB
[ 20.9241763] nouveau0: info: DRM: VRAM: 1024 MiB
[ 20.9782555] nouveau0: info: DRM: GART: 1048576 MiB
[ 21.0354542] nouveau0: info: DRM: TMDS table version 2.0
[ 21.0978532] nouveau0: info: DRM: DCB version 4.0
[ 21.1529706] nouveau0: info: DRM: DCB outp 00: 02000300 00000000
[ 21.2236891] nouveau0: info: DRM: DCB outp 01: 01000302 00020030
[ 21.2944070] nouveau0: info: DRM: DCB outp 02: 04011380 00000000
[ 21.3651260] nouveau0: info: DRM: DCB outp 03: 08011382 00020030
[ 21.4358446] nouveau0: info: DRM: DCB outp 04: 02022362 00020010
[ 21.5065625] nouveau0: info: DRM: DCB conn 00: 00001030
[ 21.5679195] Kernel lock error: _kernel_lock,244: spinout
[ 21.5679195] lock address : 0xffffffff814772c0 type : spin
[ 21.5679195] initialized : 0xffffffff80dab089
[ 21.5679195] shared holds : 0 exclusive: 1
[ 21.5679195] shares wanted: 0 exclusive: 2
[ 21.5679195] relevant cpu : 1 last held: 0
[ 21.5679195] relevant lwp : 0xffffe7f8670ca200 last held: 0xffffe7f867935280
[ 21.5679195] last locked* : 0xffffffff8094061e unlocked : 0xffffffff806b6cc9
[ 21.5679195] curcpu holds : 0 wanted by: 0xffffe7f8670ca200
[ 21.5679195] nouveau0: info: DRM: DCB conn 01: 00010130
[ 22.3291805] nouveau0: info: DRM: DCB conn 02: 00002261
[ 22.3291805] panic: kern info: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 22.3926188] LOCKDEBUG: Kernel lock error: _kernel_lock,244: spinoutkern info: [drm] Driver supports precise vblank timestamp query.
[ 22.6318130] cpu1: Begin traceback...
[ 22.6708693] vpanic() at netbsd:vpanic+0x178
[ 22.7208839] snprintf() at netbsd:snprintfnouveau0: info: DRM: MM: using COPY1 for buffer copies
[ 22.8309142] lockdebug_more() at netbsd:lockdebug_morenouveau0: info: No connectors reported connected with modes
[ 22.9542059] kern info: [drm] Cannot find any crtc or sizes - going 1024x768
[ 22.9542059] _kernel_lock(nouveaufb0 at nouveau0) at
[ 23.0914823] netbsd:_kernel_lock[ 23.0914823] +0x244
[ 23.1510024] frag6_fasttimo(wsdisplay0 at nouveaufb0) at kbdmux 1netbsd:frag6_fasttimo
[ 23.2287592] +0x1a
[ 23.2734816] pffasttimo() at netbsd:pffasttimo+0x47
[ 23.3310639] callout_softclock() at netbsd:callout_softclock+0x124
[ 23.4010725] softint_dispatch() at netbsd:softint_dispatch+0x12c
[ 23.4710921] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xffff87013f1590f0
[ 23.5611172] Xsoftintr() at netbsd:Xsoftintr+0x4f
[ 23.6111312] --- interrupt ---
[ 23.6511425] 0:
[ 23.6711481] cpu1: End traceback...
[ 23.7111592] fatal breakpoint trap in supervisor mode
[ 23.7711760] trap type 1 code 0 rip 0xffffffff8021f55d cs 0x8 rflags 0x202 cr2 0 ilevel 0x2 rsp 0xffff87013f158e70
[ 23.8912096] curlwp 0xffffe7f8670ca200 pid 0.23 lowest kstack 0xffff87013f1552c0
Stopped in pid 0.23 (system) at netbsd:breakpoint+0x5: leave
db{1}> bt
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x178
snprintf() at netbsd:snprintf
lockdebug_more() at netbsd:lockdebug_more
_kernel_lock() at netbsd:_kernel_lock+0x244
frag6_fasttimo() at netbsd:frag6_fasttimo+0x1a
pffasttimo() at netbsd:pffasttimo+0x47
callout_softclock() at netbsd:callout_softclock+0x124
softint_dispatch() at netbsd:softint_dispatch+0x12c
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xffff87013f1590f0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
0:
db{1}> mach cpu 0
using CPU 0
db{1}> bt
spllower() at netbsd:spllower+0x8
tsleep() at netbsd:tsleep+0x53
iic_smbus_intr_thread() at netbsd:iic_smbus_intr_thread+0x52
db{1}> mach cpu 2
using CPU 2
db{1}> bt
x86_stihlt() at netbsd:x86_stihlt+0x6
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xd1
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xba
idle_loop() at netbsd:idle_loop+0x14d
cpu_hatch() at netbsd:cpu_hatch+0x187
md_root_setconf() at netbsd:md_root_setconf
db{1}> mach cpu 3
using CPU 3
db{1}> bt
x86_stihlt() at netbsd:x86_stihlt+0x6
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xd1
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xba
idle_loop() at netbsd:idle_loop+0x14d
cpu_hatch() at netbsd:cpu_hatch+0x187
md_root_setconf() at netbsd:md_root_setconf
db{1}> mach cpu 4
using CPU 4
db{1}> bt
x86_stihlt() at netbsd:x86_stihlt+0x6
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xd1
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xba
idle_loop() at netbsd:idle_loop+0x14d
cpu_hatch() at netbsd:cpu_hatch+0x187
md_root_setconf() at netbsd:md_root_setconf
db{1}> mach cpu 5
using CPU 5
db{1}> bt
x86_stihlt() at netbsd:x86_stihlt+0x6
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xd1
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xba
idle_loop() at netbsd:idle_loop+0x14d
cpu_hatch() at netbsd:cpu_hatch+0x187
md_root_setconf() at netbsd:md_root_setconf
>How-To-Repeat:
s/a
>Fix:
n/a
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55185: Can not boot a LOCKDEBUG kernel
Date: Sat, 18 Apr 2020 14:56:00 +0200
On Sat, Apr 18, 2020 at 12:50:01PM +0000, martin@NetBSD.org wrote:
> [ 21.5679195] lock address : 0xffffffff814772c0 type : spin
> [ 21.5679195] initialized : 0xffffffff80dab089
> [ 21.5679195] shared holds : 0 exclusive: 1
> [ 21.5679195] shares wanted: 0 exclusive: 2
> [ 21.5679195] relevant cpu : 1 last held: 0
> [ 21.5679195] relevant lwp : 0xffffe7f8670ca200 last held: 0xffffe7f867935280
> [ 21.5679195] last locked* : 0xffffffff8094061e unlocked : 0xffffffff806b6cc9
> [ 21.5679195] curcpu holds : 0 wanted by: 0xffffe7f8670ca200
0xffffffff8094061e is the KERNEL_LOCK in kern_sleepq.c:321 (sleepq_block).
Martin
From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/55185: Can not boot a LOCKDEBUG kernel
Date: Sat, 18 Apr 2020 15:09:27 +0200
This works for me as a workaround, it runs the mountroot thread
without taking KERNEL_LOCK(), which avoids the spinout.
http://www.netbsd.org/~jdolecek/autoconf_mountroot_thread.diff
Jaromir
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55185: Can not boot a LOCKDEBUG kernel
Date: Sat, 18 Apr 2020 17:27:27 +0200
Duh, should have remembered that discussion.
In my case "no nouveau*" in the config did the trick as well.
Martin
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55185: Can not boot a LOCKDEBUG kernel
Date: Sat, 18 Apr 2020 18:29:55 +0100
For nearly the same model:
nouveau0 at pci9 dev 0 function 0: NVIDIA GeForce GTX 680 (rev. 0xa1)
nouveau0: info: NVIDIA GK104 (0e4000a2)
I also see the spinout, but everything seems fine without LOCKDEBUG.
Presumably I should apply jdolecek's patch...
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55185: Can not boot a LOCKDEBUG kernel
Date: Sat, 2 May 2020 11:18:26 +0200
There was an inverted condition in LOCKDEBUG, r1.171 of sys/kern/kern_lock.c
should fix this for real.
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.