NetBSD Problem Report #57926

From www@netbsd.org  Sun Feb 11 16:42:53 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 5DDE01A9238
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 11 Feb 2024 16:42:53 +0000 (UTC)
Message-Id: <20240211164251.9FD9D1A9239@mollari.NetBSD.org>
Date: Sun, 11 Feb 2024 16:42:51 +0000 (UTC)
From: hashikaw@mail.ru
Reply-To: hashikaw@mail.ru
To: gnats-bugs@NetBSD.org
Subject: ppc405: lockdebug crash because cpu_index(curcpu()) is invalid (not 0) value.
X-Send-Pr-Version: www-1.0

>Number:         57926
>Category:       port-evbppc
>Synopsis:       ppc405: lockdebug crash because cpu_index(curcpu()) is invalid (not 0) value.
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    port-evbppc-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Feb 11 16:45:00 +0000 2024
>Last-Modified:  Fri May 10 13:10:01 +0000 2024
>Originator:     Kouichi Hashikawa
>Release:        10.0-RC3
>Organization:
>Environment:
>Description:
On OPENBLOCKS266(ppc405GPr) with LOCKDEBUG kernel,
kernel sometimes crash with mutex_vector_exit: not held by current CPU.
(or sometimes mutex_vector_enter,517: uninitialized lock...)

ddb shows following message, revalent CPU is cpu_index(curcpu()).
Why index is changed from 0??

Mutex error: mutex_vector_exit,730: not held by current CPU

[ 2261.9775931] lock address : 7f7ff98
[ 2261.9775931] type         : spin
[ 2261.9775931] initialized  : netbsd:sched_cpuattach+0x138
[ 2261.9775931] shared holds :                  0 exclusive:                  1
[ 2261.9775931] shares wanted:                  0 exclusive:                  0
[ 2261.9775931] relevant cpu :                187 last held:                  0
[ 2261.9775931] relevant lwp : 0x0000000001b30400 last held: 0x0000000007ef4940
[ 2261.9775931] last locked* : netbsd:softint_thread+0x14c
[ 2261.9775931] unlocked     : netbsd:mi_switch+0x190
[ 2261.9775931] owner field  : 000000000000000000 wait/spin:                0/1

[ 2261.9775931] panic: LOCKDEBUG: Mutex error: mutex_vector_exit,730: not held by current CPU
[ 2261.9775931] cpu187: Begin traceback...
[ 2261.9775931] 0x0086bdc0: at vpanic+0x160
[ 2261.9775931] 0x0086bdf0: at panic+0x58
[ 2261.9775931] 0x0086be30: at lockdebug_abort1+0x160
[ 2261.9775931] 0x0086be60: at mutex_exit+0x1e4
[ 2261.9775931] 0x0086be80: at mi_switch+0x18c
[ 2261.9775931] 0x0086bec0: at syscall+0x314
[ 2261.9775931] 0x0086bf20: user SC trap #3 by 0xfdbbc128: srr1=0xc030
[ 2261.9775931]             r1=0xfffeaff0 cr=0x44004242 xer=0 ctr=0xfdbbc120 esr=0x800000 pid=0xba

I examine 0x5e3580 (struct cpu_info), I got bb=187 as cpu_index(curcpu()).

db> x/bm 0x5e3580,256
netbsd:cpu_info:        00 00 00 bb 00 00 00 00 07 ef 4f 40 00 50 01 20         
..........O@.P.

'187' is ctx+1, that I got from 'machine ctx' on ddb.

db> show procs
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
19192>19192 7 187     40000            1b30400              nbawk
22446 22446 3 187       180            74db0c0             nbmake pipe_rd
18258 18258 3 187       180            74db3c0                 sh wait
db> mac ctx
process 0x23f56a8:pid:19192 pmap:0x6cd6c00 ctx:186 nbawk
process 0x23f4f28:pid:22446 pmap:0x7f67b00 ctx:181 nbmake
process 0x23f42a8:pid:18258 pmap:0x7f66e80 ctx:180 sh
process 0x23f51a8:pid:20474 pmap:0x6cd6fc0 ctx:115 nbmake

I cannot find why cpu_index(curcpu()) is overwitten...

>How-To-Repeat:

>Fix:

>Audit-Trail:
From: Kouichi Hashikawa <hashikaw@mail.ru>
To: gnats-bugs@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: port-evbppc/57926: ppc405: lockdebug crash because cpu_index(curcpu()) is invalid (not 0) value.
Date: Fri, 10 May 2024 22:08:17 +0900

 =EF=BB=BF

 cc -E src/external/gpl3/gcc/dist/libgcc/config/rs6000/crtresxgpr.S
 (version of gcc)

 ...
 .type _restgpr_29_x,@function; .globl _restgpr_29_x; _restgpr_29_x: .hidden=20=

 _restgpr_29_x; lwz 29,-12(11)
 .type _restgpr_30_x,@function; .globl _restgpr_30_x; _restgpr_30_x: .hidden=20=

 _restgpr_30_x; lwz 30,-8(11)
 .type _restgpr_31_x,@function; .globl _restgpr_31_x; _restgpr_31_x: .hidden=20=

 _restgpr_31_x; lwz 0,4(11)   <---
   lwz 31,-4(11)   <---
   mtlr 0
   mr 1,11
   blr
 ...

 cc -E src/sys/lib/libkern/arch/powerpc/gprsavrest.S
 (version of NetBSD)

 .hidden _restgpr_29_x; .text; .align 2; .globl _restgpr_29_x; .type=20
 _restgpr_29
 _x,@function; _restgpr_29_x:; lwz 29,(-4*(32-29))(11)
 .hidden _restgpr_30_x; .text; .align 2; .globl _restgpr_30_x; .type=20
 _restgpr_30
 _x,@function; _restgpr_30_x:; lwz 30,(-4*(32-30))(11)
 .hidden _restgpr_31_x; .text; .align 2; .globl _restgpr_31_x; .type=20
 _restgpr_31
 _x,@function; _restgpr_31_x:; lwz 31,(-4*(32-31))(11)   <---
 lwz 0,4(11)   <---
 mtlr 0
 mr 1,11
 blr

 I change gprsavrest.S's lwz 31,(-4)(11) and lwz 0,4(11), kernel works more=20=

 stable.
 (I change lwz 0,4(11); mtlr 0; mr 1,11 to comment like gcc's crtresgpr.S,=20=

 kernel works more stable...)

 I don't understand what's happening...

 # but hptide's udma is unstable(bus-master DMA error: missing interrupt),
 # pio-4 is almost stable, but stop on heavy load...

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.