NetBSD Problem Report #45353

From www@NetBSD.org  Sat Sep 10 00:28:35 2011
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id B269D63BBFF
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 10 Sep 2011 00:28:35 +0000 (UTC)
Message-Id: <20110910002834.5479163BB48@www.NetBSD.org>
Date: Sat, 10 Sep 2011 00:28:34 +0000 (UTC)
From: jasper@pointless.net
Reply-To: jasper@pointless.net
To: gnats-bugs@NetBSD.org
Subject: booting with -x on netbsd-current amd64 panics.
X-Send-Pr-Version: www-1.0

>Number:         45353
>Category:       kern
>Synopsis:       booting with -x on netbsd-current amd64 panics.
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Sep 10 00:30:00 +0000 2011
>Originator:     Jasper Wallace
>Release:        5.99.55
>Organization:
Pointless.net
>Environment:
NetBSD monstrosity 5.99.55 NetBSD 5.99.55 (MONSTROSITY) #0: Fri Apr 22 10:52:22 BST 2011  jasper@monstrosity:/usr/build/obj/sys/arch/amd64/compile/MONSTROSITY amd64

>Description:
I upgraded a machine running -current from 5.99.49 to 5.99.55 to fix the select race, but the new kernel panics when booted with -x

A GENERIC kernel built from the same tree dosn't panic on a uniprocessor i386 machine, and also dosn't panic on a multiproc amd64 machine, however the machine that does panic has two raidframe arrays and the others don't.

Unfortunatly the machine that has the problem is in use so i can't take it offline much, however if someone can provide instructions on getting more info i can do them when i get a chance.

dmesg and ddb info:

crypto: driver 0 registers alg 22 flags 0 maxoplen 0
raidattach: Asked for 8 units
Kernelized RAIDframe activated
Kernel lock error: _kernel_lock: spinout

lock address : 0xffffffff807d7740 type     :               spin
initialized  : 0xffffffff802519fc
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  3
current cpu  :                  1 last held:                  0
current lwp  : 0xffff800053fe10a0 last held: 0xffffffff8079e5c0
last locked* : 0xffffffff803fbcd1 unlocked : 0xffffffff80289a3e
curcpu holds :                  0 wanted by: 0xffff800053fe10a0

panic: LOCKDEBUG
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80172e2d cs 8 rflags 246 cr2  0 cpl 2 rsp ffff800054016af0
Stopped in pid 0.19 (system) at netbsd:breakpoint+0x5:  leave
db{1}> db{1}> machine cpu
addr            dev     id      flags   ipis    curlwp          fpcurlwp
0xffffffff80772c80      cpu0    0       7009    0       0xffff800051fb2840             0
x0
0xffff800053abc1c0      cpu1    1       b002    0       0xffff800053fe10a0             0
x0
0xffff8000529b3040      cpu2    2       f002    0       0xffff800053fe20c0             0
x0
0xffff800054071040      cpu3    3       f002    0       0xffff800053fe6100             0
x0

fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80172e2d cs 8 rflags 246 cr2  0 cpl 2 rsp ffff800054016af0

----------------------------Kernel lock error: _kernel_lock: spinout

lock address : 0xffffffff807d7740 type     :               spin
initialized  : 0xffffffff802519fc
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  3
current cpu  :                  0 last held:                  2
current lwp  : 0xffff800051fb4860 last held: 0xffffffff8079e5c0
last locked* : 0xffffffff803fbcd1 unlocked : 0xffffffff80289a3e
curcpu holds :                  0 wanted by: 0xffff800051fb4860

panic: LOCKDEBUG
fatal breakpoint trap in supervisor mode

db{0}> machine cpu 0
using CPU 0
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
comintr() at netbsd:comintr+0x527
Xintr_ioapic_edge1() at netbsd:Xintr_ioapic_edge1+0xee
--- interrupt ---
Xspllower() at netbsd:Xspllower+0xe
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x16
Xintr_ioapic_level3() at netbsd:Xintr_ioapic_level3+0xf6
--- interrupt ---
Xspllower() at netbsd:Xspllower+0xe
trap() at netbsd:trap+0x57d
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff8017f49e cs 8 rflags 10246 cr2  ba cpl 8 rsp ffff8000525a1910
kernel: page fault trap, code=0
Faulted in DDB; continuing...
db{0}> machine cpu 1
using CPU 1
db{0}> bt
__cpu_simple_lock() at netbsd:__cpu_simple_lock+0x9
_kernel_lock() at netbsd:_kernel_lock+0x16f
sleepq_block() at netbsd:sleepq_block+0x1b3
iic_smbus_intr_thread() at netbsd:iic_smbus_intr_thread+0x86
db{0}> machine cpu 2
using CPU 2
db{0}> bt
spllower() at netbsd:spllower
sleepq_abort() at netbsd:sleepq_abort+0x31
ltsleep() at netbsd:ltsleep+0x6c
wdc_exec_command() at netbsd:wdc_exec_command+0x16c
wd_flushcache() at netbsd:wd_flushcache+0xaa
wdlastclose() at netbsd:wdlastclose+0x1e
wdclose() at netbsd:wdclose+0x80
bdev_close() at netbsd:bdev_close+0x49
spec_close() at netbsd:spec_close+0x22f
VOP_CLOSE() at netbsd:VOP_CLOSE+0x62
rf_find_raid_components() at netbsd:rf_find_raid_components+0x398
rf_autoconfig() at netbsd:rf_autoconfig+0x2f
config_finalize() at netbsd:config_finalize+0x90
main() at netbsd:main+0x391
db{0}> machine cpu 3
using CPU 3
db{0}> bt
_atomic_inc_32_nv() at netbsd:_atomic_inc_32_nv+0x9
_kernel_lock() at netbsd:_kernel_lock+0x16f
sleepq_block() at netbsd:sleepq_block+0x1b3
iic_smbus_intr_thread() at netbsd:iic_smbus_intr_thread+0x86
db{0}> 

-------------------------------------
Kernel lock error: _kernel_lock: spinout

lock address : 0xffffffff807da740 type     :               spin
initialized  : 0xffffffff802519fc
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  4
current cpu  :                  1 last held:                  2
current lwp  : 0xffff800053fe10a0 last held: 0xffffffff807a05c0
last locked* : 0xffffffff803fbcb1 unlocked : 0xffffffff80289a3e
curcpu holds :                  0 wanted by: 0xffff800053fe10a0

panic: LOCKDEBUG
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80172e2d cs 8 rflags 246 cr2  0 cpl 2 rsp ffff800054016af0
Stopped in pid 0.19 (system) at netbsd:breakpoint+0x5:  leave
db{1}> 

ffff800053fe10a0 = softclk/1

db{1}> bt/a 0xffff800053fe10a0
trace: pid 0 lid 19 at 0xffff800054016af0
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x27f
lockdebug_locked() at netbsd:lockdebug_locked
_kernel_lock() at netbsd:_kernel_lock+0x16f
nd6_timer() at netbsd:nd6_timer+0x3f
callout_softclock() at netbsd:callout_softclock+0x1ff
softint_dispatch() at netbsd:softint_dispatch+0xd0
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xffff800054016d70
Xsoftintr() at netbsd:Xsoftintr+0x4f

--------------------------------------

Kernel lock error: _kernel_lock: spinout

lock address : 0xffffffff807da740 type     :               spin
initialized  : 0xffffffff802519fc
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  2
current cpu  :                  0 last held:                  2
current lwp  : 0xffff800051fb4860 last held: 0xffffffff807a05c0
last locked* : 0xffffffff803fbcb1 unlocked : 0xffffffff80289a3e
curcpu holds :                  0 wanted by: 0xffff800051fb4860
crypto: driver 0 registers alg 31 flags 0 maxoplen 0

crypto:LOCKDErUG
modetot dp vype0  codete ria ff11fffag0102e2xops 8 0
crs 24: dr2  r 0 r 2 rterffalg000525ags 0
Stopped in pid 0.5 (system) at  netbsd:breakpoint+0x5:  leave
db{0}> ps/l says:

0    >   5 7   0       200   ffff800051fb4860          softclk/0

db{0}> bt/a ffff800051fb4860
trace: pid 0 lid 5 at 0xffff8000525a2b10
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x27f
lockdebug_locked() at netbsd:lockdebug_locked
_kernel_lock() at netbsd:_kernel_lock+0x16f
callout_softclock() at netbsd:callout_softclock+0x394
softint_dispatch() at netbsd:softint_dispatch+0xd0
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xffff8000525a2d70
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
0:

db{0}> bt/a ffffffff807a05c0
trace: pid 0 lid 1 at 0xffffffff80952e60
crypto_register() at netbsd:crypto_register+0xe7
swcryptoattach() at netbsd:swcryptoattach+0x52e
config_finalize() at netbsd:config_finalize+0x6a
main() at netbsd:main+0x391

db{0}> bt/a ffff800051fb2840
trace: pid 0 lid 2 at 0xffff800052594bf0
pmap_pageidlezero() at netbsd:pmap_pageidlezero+0x60
uvm_pageidlezero() at netbsd:uvm_pageidlezero+0x1d3
idle_loop() at netbsd:idle_loop+0x183

db{0}> bt/a ffff800051fbc080
trace: pid 0 lid 16 at 0xffff800054008be8
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0x4b
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xc4
idle_loop() at netbsd:idle_loop+0x196
Bad frame pointer: 0xffff800051fbc080

---------------------

There's a more detailed log here (includes full dmesg of a successful boot):

http://pointless.net/~jasper/Xterm.log.limpit.2011.09.01.22.54.09.2015

It's an xterm log, so it's full of ^M and ^H so i didn't paste it in here.

>How-To-Repeat:

compile an amd64 kernel from cvs update -Pd -D "2011-08-06 06:40" and boot it with -x on a machine with raid arrays.

(The compile date in the Enviroment section is wrong, i'm running an older kernel atm).

>Fix:
no idea.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.