NetBSD Problem Report #44387

From www@NetBSD.org  Fri Jan 14 16:47:24 2011
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id DD10E63B883
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 14 Jan 2011 16:47:23 +0000 (UTC)
Message-Id: <20110114164723.02F2463B873@www.NetBSD.org>
Date: Fri, 14 Jan 2011 16:47:22 +0000 (UTC)
From: riz@NetBSD.org
Reply-To: riz@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: some pthread mutex tests fail on ppc platforms
X-Send-Pr-Version: www-1.0

>Number:         44387
>Notify-List:    uwe@NetBSD.org
>Category:       port-powerpc
>Synopsis:       some pthread mutex tests fail on ppc platforms
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    chs
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jan 14 16:50:00 +0000 2011
>Closed-Date:    Sun Mar 05 16:21:17 +0000 2017
>Last-Modified:  Wed Apr 19 17:05:01 +0000 2017
>Originator:     Jeff Rizzo
>Release:        5.99.43
>Organization:
>Environment:
NetBSD powerbookg4 5.99.43 NetBSD 5.99.43 (GENERIC) #1: Fri Jan 14 08:25:34 PST 2011  riz@hack.lan:/Users/riz/Documents/code/netbsd/obj/sys/arch/macppc/compile/GENERIC macppc

>Description:
Using a kernel that is *not* build with "options DIAGNOSTIC", the
regression tests in /usr/tests/lib/libpthread/t_mutex fail on various
powerpc platforms.  The failing test basically creates two threads
which each increment a variable which is protected by a mutex; when it
fails (it doesn't fail 100% of the time), it looks like both threads
are in "parked" state.  (I have also seen one thread parked and one in
iowait)

I confirmed this on two different macppc boxes I have, and I have at
least one test run from an ofppc box running 5.99.42 (without
DIAGNOSTIC).

For whatever reason, building the kernel with DIAGNOSTIC seems to hide
the problem.  I have not yet determined why; no additional output is
available.

In case it matters: one machine has a PPC 7400, one has a 7447, and
one has a 750CX CPU.
>How-To-Repeat:
On a machine with a GENERIC kernel:

cd /usr/tests/lib/libpthread
powerbookg4:riz  /usr/tests/lib/libpthread> ./t_mutex mutex2
1: Mutex-test 2
1: Thread 0xffe00000
2: Second thread (0xefa00000). Count is 10000000

<hang indefinitely>

Sometimes it does not hang - frequency of hang seems to vary by
machine and kernel.  You may also wish to run the entire pthread test
suite from that directory with "atf-run | atf-report".  It will take
about 10 minutes (because of the mutex test hangs).
>Fix:
None given.

>Release-Note:

>Audit-Trail:
From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44387 CVS commit: src/tests/lib/libpthread
Date: Mon, 21 Feb 2011 21:43:42 +0000

 Module Name:	src
 Committed By:	riz
 Date:		Mon Feb 21 21:43:41 UTC 2011

 Modified Files:
 	src/tests/lib/libpthread: t_mutex.c

 Log Message:
 mutex2/mutex3 are expected to fail on powerpc because of
 PR port-powerpc/44387.

 XXX the ugly sleep at the end is because ATF will mark an un-triggered
 race condition (ie, the test passes unexpectedly) as a test failure otherwise.


 To generate a diff of this commit:
 cvs rdiff -u -r1.3 -r1.4 src/tests/lib/libpthread/t_mutex.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-powerpc/44387: some pthread mutex tests fail on ppc
 platforms
Date: Thu, 19 May 2016 04:43:31 +0300

 I did a bit of mixing and matching of normal and -DDIAGNOSTIC objects
 and apparently the bug does not manifest with -DDIAGNOSTIC kern_mutex.c

 So it's probably -DFULL that is enabled by -DDIAGNOSTIC, though I
 haven't tried narrowing it further down.

 -uwe

From: "Chuck Silvers" <chs@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44387 CVS commit: src/sys/arch/powerpc/powerpc
Date: Tue, 28 Feb 2017 17:35:29 +0000

 Module Name:	src
 Committed By:	chs
 Date:		Tue Feb 28 17:35:29 UTC 2017

 Modified Files:
 	src/sys/arch/powerpc/powerpc: locore_subr.S

 Log Message:
 in cpu_switchto() and the fast-softint context switch code,
 put back the stwcx. instruction to clear the reservation.
 we used to have this in the old cpu_switch() until it was
 if-0'd in 2003 and removed completely in 2007.
 this fixes hangs I've seen where a softint thread is
 blocked waiting for a mutex that is not held.
 this should also fix PR 44387.


 To generate a diff of this commit:
 cvs rdiff -u -r1.54 -r1.55 src/sys/arch/powerpc/powerpc/locore_subr.S

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Chuck Silvers" <chs@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44387 CVS commit: src/tests/lib/libpthread
Date: Sun, 5 Mar 2017 16:08:23 +0000

 Module Name:	src
 Committed By:	chs
 Date:		Sun Mar  5 16:08:23 UTC 2017

 Modified Files:
 	src/tests/lib/libpthread: t_mutex.c

 Log Message:
 reenable mutex2 and mutex3 on powerpc now that PR 44387 is fixed.


 To generate a diff of this commit:
 cvs rdiff -u -r1.15 -r1.16 src/tests/lib/libpthread/t_mutex.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: port-powerpc-maintainer->chs
Responsible-Changed-By: chs@NetBSD.org
Responsible-Changed-When: Sun, 05 Mar 2017 16:21:17 +0000
Responsible-Changed-Why:
I fixed this.


State-Changed-From-To: open->closed
State-Changed-By: chs@NetBSD.org
State-Changed-When: Sun, 05 Mar 2017 16:21:17 +0000
State-Changed-Why:
the problem is confirmed fixed by Frank Wille.


From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@NetBSD.org
Cc: chs@NetBSD.org
Subject: Re: port-powerpc/44387 (some pthread mutex tests fail on ppc
 platforms)
Date: Sun, 5 Mar 2017 19:23:51 +0300

 On Sun, Mar 05, 2017 at 16:21:18 +0000, chs@NetBSD.org wrote:

 > State-Changed-From-To: open->closed
 > State-Changed-By: chs@NetBSD.org
 > State-Changed-When: Sun, 05 Mar 2017 16:21:17 +0000
 > State-Changed-Why:
 > the problem is confirmed fixed by Frank Wille.

 Thanks!

 This probably needs to stay open until pullups are done?

 -uwe

From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44387 CVS commit: [netbsd-7] src/sys/arch/powerpc/powerpc
Date: Wed, 19 Apr 2017 17:02:43 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Wed Apr 19 17:02:43 UTC 2017

 Modified Files:
 	src/sys/arch/powerpc/powerpc [netbsd-7]: locore_subr.S

 Log Message:
 Pull up following revision(s) (requested by phx in ticket #1382):
 	sys/arch/powerpc/powerpc/locore_subr.S: revision 1.55
 in cpu_switchto() and the fast-softint context switch code,
 put back the stwcx. instruction to clear the reservation.
 we used to have this in the old cpu_switch() until it was
 if-0'd in 2003 and removed completely in 2007.
 this fixes hangs I've seen where a softint thread is
 blocked waiting for a mutex that is not held.
 this should also fix PR 44387.


 To generate a diff of this commit:
 cvs rdiff -u -r1.54 -r1.54.2.1 src/sys/arch/powerpc/powerpc/locore_subr.S

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.