NetBSD Problem Report #57926
From www@netbsd.org Sun Feb 11 16:42:53 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 5DDE01A9238
for <gnats-bugs@gnats.NetBSD.org>; Sun, 11 Feb 2024 16:42:53 +0000 (UTC)
Message-Id: <20240211164251.9FD9D1A9239@mollari.NetBSD.org>
Date: Sun, 11 Feb 2024 16:42:51 +0000 (UTC)
From: hashikaw@mail.ru
Reply-To: hashikaw@mail.ru
To: gnats-bugs@NetBSD.org
Subject: ppc405: lockdebug crash because cpu_index(curcpu()) is invalid (not 0) value.
X-Send-Pr-Version: www-1.0
>Number: 57926
>Category: port-evbppc
>Synopsis: ppc405: lockdebug crash because cpu_index(curcpu()) is invalid (not 0) value.
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: port-evbppc-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Feb 11 16:45:00 +0000 2024
>Last-Modified: Fri May 10 13:10:01 +0000 2024
>Originator: Kouichi Hashikawa
>Release: 10.0-RC3
>Organization:
>Environment:
>Description:
On OPENBLOCKS266(ppc405GPr) with LOCKDEBUG kernel,
kernel sometimes crash with mutex_vector_exit: not held by current CPU.
(or sometimes mutex_vector_enter,517: uninitialized lock...)
ddb shows following message, revalent CPU is cpu_index(curcpu()).
Why index is changed from 0??
Mutex error: mutex_vector_exit,730: not held by current CPU
[ 2261.9775931] lock address : 7f7ff98
[ 2261.9775931] type : spin
[ 2261.9775931] initialized : netbsd:sched_cpuattach+0x138
[ 2261.9775931] shared holds : 0 exclusive: 1
[ 2261.9775931] shares wanted: 0 exclusive: 0
[ 2261.9775931] relevant cpu : 187 last held: 0
[ 2261.9775931] relevant lwp : 0x0000000001b30400 last held: 0x0000000007ef4940
[ 2261.9775931] last locked* : netbsd:softint_thread+0x14c
[ 2261.9775931] unlocked : netbsd:mi_switch+0x190
[ 2261.9775931] owner field : 000000000000000000 wait/spin: 0/1
[ 2261.9775931] panic: LOCKDEBUG: Mutex error: mutex_vector_exit,730: not held by current CPU
[ 2261.9775931] cpu187: Begin traceback...
[ 2261.9775931] 0x0086bdc0: at vpanic+0x160
[ 2261.9775931] 0x0086bdf0: at panic+0x58
[ 2261.9775931] 0x0086be30: at lockdebug_abort1+0x160
[ 2261.9775931] 0x0086be60: at mutex_exit+0x1e4
[ 2261.9775931] 0x0086be80: at mi_switch+0x18c
[ 2261.9775931] 0x0086bec0: at syscall+0x314
[ 2261.9775931] 0x0086bf20: user SC trap #3 by 0xfdbbc128: srr1=0xc030
[ 2261.9775931] r1=0xfffeaff0 cr=0x44004242 xer=0 ctr=0xfdbbc120 esr=0x800000 pid=0xba
I examine 0x5e3580 (struct cpu_info), I got bb=187 as cpu_index(curcpu()).
db> x/bm 0x5e3580,256
netbsd:cpu_info: 00 00 00 bb 00 00 00 00 07 ef 4f 40 00 50 01 20
..........O@.P.
'187' is ctx+1, that I got from 'machine ctx' on ddb.
db> show procs
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
19192>19192 7 187 40000 1b30400 nbawk
22446 22446 3 187 180 74db0c0 nbmake pipe_rd
18258 18258 3 187 180 74db3c0 sh wait
db> mac ctx
process 0x23f56a8:pid:19192 pmap:0x6cd6c00 ctx:186 nbawk
process 0x23f4f28:pid:22446 pmap:0x7f67b00 ctx:181 nbmake
process 0x23f42a8:pid:18258 pmap:0x7f66e80 ctx:180 sh
process 0x23f51a8:pid:20474 pmap:0x6cd6fc0 ctx:115 nbmake
I cannot find why cpu_index(curcpu()) is overwitten...
>How-To-Repeat:
>Fix:
>Audit-Trail:
From: Kouichi Hashikawa <hashikaw@mail.ru>
To: gnats-bugs@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: port-evbppc/57926: ppc405: lockdebug crash because cpu_index(curcpu()) is invalid (not 0) value.
Date: Fri, 10 May 2024 22:08:17 +0900
=EF=BB=BF
cc -E src/external/gpl3/gcc/dist/libgcc/config/rs6000/crtresxgpr.S
(version of gcc)
...
.type _restgpr_29_x,@function; .globl _restgpr_29_x; _restgpr_29_x: .hidden=20=
_restgpr_29_x; lwz 29,-12(11)
.type _restgpr_30_x,@function; .globl _restgpr_30_x; _restgpr_30_x: .hidden=20=
_restgpr_30_x; lwz 30,-8(11)
.type _restgpr_31_x,@function; .globl _restgpr_31_x; _restgpr_31_x: .hidden=20=
_restgpr_31_x; lwz 0,4(11) <---
lwz 31,-4(11) <---
mtlr 0
mr 1,11
blr
...
cc -E src/sys/lib/libkern/arch/powerpc/gprsavrest.S
(version of NetBSD)
.hidden _restgpr_29_x; .text; .align 2; .globl _restgpr_29_x; .type=20
_restgpr_29
_x,@function; _restgpr_29_x:; lwz 29,(-4*(32-29))(11)
.hidden _restgpr_30_x; .text; .align 2; .globl _restgpr_30_x; .type=20
_restgpr_30
_x,@function; _restgpr_30_x:; lwz 30,(-4*(32-30))(11)
.hidden _restgpr_31_x; .text; .align 2; .globl _restgpr_31_x; .type=20
_restgpr_31
_x,@function; _restgpr_31_x:; lwz 31,(-4*(32-31))(11) <---
lwz 0,4(11) <---
mtlr 0
mr 1,11
blr
I change gprsavrest.S's lwz 31,(-4)(11) and lwz 0,4(11), kernel works more=20=
stable.
(I change lwz 0,4(11); mtlr 0; mr 1,11 to comment like gcc's crtresgpr.S,=20=
kernel works more stable...)
I don't understand what's happening...
# but hptide's udma is unstable(bus-master DMA error: missing interrupt),
# pio-4 is almost stable, but stop on heavy load...
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.