NetBSD Problem Report #35363

From johan@giantfoo.org  Fri Jan  5 21:52:06 2007
Return-Path: <johan@giantfoo.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 5172163B880
	for <gnats-bugs@gnats.NetBSD.org>; Fri,  5 Jan 2007 21:52:06 +0000 (UTC)
Message-Id: <20070105215205.23F98FC96@pangu.giantfoo.org>
Date: Fri,  5 Jan 2007 15:52:05 -0600 (CST)
From: johan@giantfoo.org
Reply-To: johan@giantfoo.org
To: gnats-bugs@NetBSD.org
Subject: 3.1_STABLE problems on dual nocache 50 MHz SuperSPARC (390Z50)
X-Send-Pr-Version: 3.95

>Number:         35363
>Category:       port-sparc
>Synopsis:       MP broken for some 50 MHz SuperSPARC (390Z50) processors on 3.1
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-sparc-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jan 05 21:55:00 +0000 2007
>Closed-Date:    Wed Jun 09 06:06:35 +0000 2021
>Last-Modified:  Wed Jun 09 06:06:35 +0000 2021
>Originator:     Johan A. van Zanten
>Release:        NetBSD 3.1_STABLE
>Organization:
Hail Eris!
>Environment:


System: NetBSD vishnu 3.1_STABLE NetBSD 3.1_STABLE (MANGOLASSI.MP) #0: Sun Nov  5 17:37:17 CST 2006  johan@pangu:/tew/003/src/NetBSD/NetBSD-3/src/sys/arch/sparc/compile/MANGOLASSI.MP sparc

Architecture: sparc
Machine: sparc
>Description:

  When the second processor is activated multi-processor 3.1_STABLE sun4m
systems, NetBSD behaves erratically, with some programs seg faulting. The
system is extremely unstable and not usable, but it does not panic.

 Please see my email to port-sparc:

http://mail-index.netbsd.org/port-sparc/2007/01/02/0000.html

 Confirmation of the problem by Michael-John Turner:

http://mail-index.netbsd.org/port-sparc/2007/01/02/0002.html

Original report:
http://mail-index.netbsd.org/port-sparc/2006/12/29/0004.html

Please note that not all 50 MHz SuperSPARC processors trigger the problem.
I have the same OS building running without problems on a dual "390Z55"
system.  My understanding is that a significant difference between CPUs
identified as "390Z50" and "390Z55" is that the '55 has 1 MB of ecache per
CPU, and the '50 has none.

See: http://mbus.sunhelp.org/modules/index.htm#super

 Also, please note that i had a dual 390Z50 system running NetBSD
2.0.2_STABLE without problems, under significant load (Internet-connected
DNS server and MX, as well as KDC), before i upgraded to 3.1_STABLE.
Michael-John Turner's message above also suggests that the problem may
have been introduced between 2.x and 3.x.

>How-To-Repeat:

  Install a NetBSD 3.1 MP kernel (GENERIC.MP produces the problem) on a
sparc with two 390Z50 50 MHz processors.
  Boot the system multi-user.

>Fix:

  Removing the second CPU eliminates the problem, as does switching to
multiple CPUs of a different type, such as "390Z55" or other, faster SPARC
CPUs, all of which appear to have cache (and a cache controller).

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->feedback
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Sat, 20 Mar 2021 16:02:51 +0000
State-Changed-Why:
netbsd/sparc got many many smp fixes since 3.1 days.  if you still have
the system, does this problem still occur for you?  thanks.


From: Tobias Ulmer <tobiasu@tmux.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-sparc/35363 (MP broken for some 50 MHz SuperSPARC (390Z50)
 processors on 3.1)
Date: Mon, 12 Apr 2021 03:14:06 +0200

 On Sat, Mar 20, 2021 at 04:02:51PM +0000, mrg@NetBSD.org wrote:
 > Synopsis: MP broken for some 50 MHz SuperSPARC (390Z50) processors on 3.1
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: mrg@NetBSD.org
 > State-Changed-When: Sat, 20 Mar 2021 16:02:51 +0000
 > State-Changed-Why:
 > netbsd/sparc got many many smp fixes since 3.1 days.  if you still have
 > the system, does this problem still occur for you?  thanks.

 I have a set of these CPUs I believe (390Z50 is the cache?), but they're
 not in a machine.

 They had the same issue last time I've tested them on 8.99something
 (iirc).  Nonsensical segfaults that can't be nailed down.  Same behavior
 on OpenBSD.  true(1) dumping core, sh around setjmp/longjmp.

 My money is on a silicon bug that's not handled by the known errata.

 I've attempted (half-heartedly) to find a workaround in the SunOS source,
 but could not do it.

 Happy to ship these modules to a developer who wants to take a stab at
 it.

State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Wed, 09 Jun 2021 06:06:35 +0000
State-Changed-Why:
The submitter wrote to the gnats administrator mailbox to say:

Hello,

Sadly, i do not have any sun4m systems anymore, so i cannot test.

Regardless, thank you for the fixes! 

  best, johan


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.