NetBSD Problem Report #37930
From martin@duskware.de Thu Jan 31 11:44:45 2008
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 5FE8263B8BD
for <gnats-bugs@gnats.netbsd.org>; Thu, 31 Jan 2008 11:44:45 +0000 (UTC)
Message-Id: <20080131113912.0DCA963B8A2@narn.NetBSD.org>
Date: Thu, 31 Jan 2008 11:39:12 +0000 (UTC)
From: ad@netbsd.org
Reply-To: ad@netbsd.org
To: netbsd-bugs-owner@NetBSD.org
Subject: sparc mutex stubs are broken on MP
X-Send-Pr-Version: www-1.0
>Number: 37930
>Category: port-sparc
>Synopsis: sparc mutex stubs are broken on MP
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-sparc-maintainer
>State: suspended
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Jan 31 11:45:01 +0000 2008
>Closed-Date:
>Last-Modified: Thu Feb 14 14:16:00 +0000 2008
>Originator: Andrew Doran
>Release: 4.99.52
>Organization:
The NetBSD Project
>Environment:
n/a
>Description:
The sparc mutex stubs are broken on MP systems. Usually they
work like this:
- grab the mutex interlock
- that suceeds, so set the full owning LWP address into mtx_owner
On an MP system the following can occur:
cpu2 acquire kernel_lock
cpu1 grab mutex interlock
cpu1 device interrupt occurs before setting mtx_owner
cpu1 spin on kernel_lock trying to process device interrupt
cpu2 spin or block trying to acquire mutex 'half held' by cpu1
-> potentially deadlock
>How-To-Repeat:
Code inspection.
>Fix:
- Implement 'restart' for mutex_enter() in all the interrupt stubs.
This would be very tricky.
or:
- Change the sparc to __HAVE_SIMPLE_MUTEXES.
- Use atomic_cas_ptr() to acquire and release mutexes.
- Work on optimizing atomic_cas_ptr().
I'll create a patch.
>Release-Note:
>Audit-Trail:
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-sparc/37930: sparc mutex stubs are broken on MP
Date: Thu, 31 Jan 2008 11:59:20 +0000
Here's a patch:
http://www.netbsd.org/~ad/sparc.diff
Responsible-Changed-From-To: port-sparc-maintainer->ad
Responsible-Changed-By: ad@narn.netbsd.org
Responsible-Changed-When: Thu, 14 Feb 2008 14:05:00 +0000
Responsible-Changed-Why:
take
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/37930 CVS commit: src/sys/arch/sparc
Date: Thu, 14 Feb 2008 14:07:35 +0000 (UTC)
Module Name: src
Committed By: ad
Date: Thu Feb 14 14:07:35 UTC 2008
Modified Files:
src/sys/arch/sparc/conf: files.sparc
src/sys/arch/sparc/include: mutex.h rwlock.h
Log Message:
Make sparc use atomic_cas_ulong() for mutex and rwlock operations, and
disable the custom mutex/rwlock code. PR port-sparc/37930. ok martin@
To generate a diff of this commit:
cvs rdiff -r1.144 -r1.145 src/sys/arch/sparc/conf/files.sparc
cvs rdiff -r1.7 -r1.8 src/sys/arch/sparc/include/mutex.h
cvs rdiff -r1.3 -r1.4 src/sys/arch/sparc/include/rwlock.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: ad->port-sparc-maintainer
Responsible-Changed-By: ad@narn.netbsd.org
Responsible-Changed-When: Thu, 14 Feb 2008 14:16:00 +0000
Responsible-Changed-Why:
back to port-sparc-maintainer for the time being.
State-Changed-From-To: open->suspended
State-Changed-By: ad@narn.netbsd.org
State-Changed-When: Thu, 14 Feb 2008 14:16:00 +0000
State-Changed-Why:
mrg asked me to leave this open because it could use optimization.
We could either make the custom mutex scheme work properly, or optimize
atomic_cas_ulong() so that it does not need to disable interrupts. Both of
these would mean changing the interrupt handling code to complete and/or
restart a CAS operation if it's interrupted. I think it would be better to
optimize atomic_cas_ulong() since it would mean we're focusing one one
routine instead of many.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.