NetBSD Problem Report #45255

From mrg@eterna.com.au  Mon Aug 15 02:23:23 2011
Return-Path: <mrg@eterna.com.au>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id A9B9E63BE67
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 15 Aug 2011 02:23:23 +0000 (UTC)
Message-Id: <20110815022321.669C23752A@splode.eterna.com.au>
Date: Mon, 15 Aug 2011 12:23:21 +1000 (EST)
From: mrg@eterna.com.au
Reply-To: mrg@eterna.com.au
To: gnats-bugs@gnats.NetBSD.org
Subject: GCC 4.5.3 and sparc re-triggers the NULL savefpstate IPI issue
X-Send-Pr-Version: 3.95

>Number:         45255
>Category:       port-sparc
>Synopsis:       GCC 4.5.3 and sparc re-triggers the NULL savefpstate IPI issue
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    mrg
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Aug 15 02:25:00 +0000 2011
>Closed-Date:    Sat Jun 20 17:30:52 +0000 2020
>Last-Modified:  Sat Jun 20 17:30:52 +0000 2020
>Originator:     matthew green
>Release:        NetBSD 5.99.55
>Organization:
people's front against (bozotic) www (softwar foundation)
>Environment:
System: NetBSD russian-intervention.eterna23.net 5.99.55 NetBSD 5.99.55 (_russian_) #7: Fri Aug 12 14:07:48 PDT 2011  mrg@xotica.eterna23.net:/var/obj/sparc/usr/src3/sys/arch/sparc/compile/_russian_ sparc
Architecture: sparc
Machine: sparc
>Description:

	we need to revert this comment (again):

Module Name:    src
Committed By:   mrg
Date:           Mon Aug 15 02:19:45 UTC 2011

Modified Files:
        src/sys/arch/sparc/sparc: cpu.c cpuvar.h genassym.cf locore.s

Log Message:
re-introduce the NULL savefpstate IPI checks and evcnts.  something
is Wrong with GCC 4.5.3 and these trigger.  i haven't seen anything
else particularly wrong so for now this will allow sparc to switch
to GCC 4.5, which otherwise seems to be working very well for me.

sigh.  i'm going to file a PR to research what is really wrong here.


To generate a diff of this commit:
cvs rdiff -u -r1.233 -r1.234 src/sys/arch/sparc/sparc/cpu.c
cvs rdiff -u -r1.89 -r1.90 src/sys/arch/sparc/sparc/cpuvar.h
cvs rdiff -u -r1.66 -r1.67 src/sys/arch/sparc/sparc/genassym.cf
cvs rdiff -u -r1.264 -r1.265 src/sys/arch/sparc/sparc/locore.s

	but first the problem it works around that is re-introduced
	with GCC 4.5.3 needs to be figured out.

	my first guess is that some asm() has poor constraints or
	some other netbsd code bug.  i doubt it is a GCC bug, but
	you never really know.

>How-To-Repeat:

	use a sparc kernel built with GCC 4.5.3 heavily.  without
	the above change, it should panic fairly often.  with the
	above change, event counters will tell you how many times
	the panic was avoided.

>Fix:

>Release-Note:

>Audit-Trail:
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: port-sparc-maintainer@NetBSD.org, gnats-admin@NetBSD.org
Subject: Re: port-sparc/45255: GCC 4.5.3 and sparc re-triggers the NULL
 savefpstate IPI issue
Date: Mon, 15 Aug 2011 13:55:39 +0200

 At 2:25 Uhr +0000 15.08.2011, mrg@eterna.com.au wrote:
 >	use a sparc kernel built with GCC 4.5.3 heavily.  without
 >	the above change, it should panic fairly often.

 ...interesting. I have seen these NULL savefpstate panics on my ss20 (2x
 SM71) ever since I upgraded to HEAD. Just last weekend, I switched to a UP
 kernel to at least be able to build a few packages. Should have sent-pr
 earlier, I guess.

 Are you positive this is an exclusively gcc 4.5 related issue?

 	hauke

 -- 
      The ASCII Ribbon Campaign                    Hauke Fath
 ()     No HTML/RTF in email            Institut für Nachrichtentechnik
 /\     No Word docs in email                     TU Darmstadt
      Respect for open standards              Ruf +49-6151-16-3281

From: matthew green <mrg@eterna.com.au>
To: Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: port-sparc-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
    gnats-bugs@NetBSD.org
Subject: re: port-sparc/45255: GCC 4.5.3 and sparc re-triggers the NULL savefpstate IPI issue
Date: Tue, 16 Aug 2011 04:07:43 +1000

 > At 2:25 Uhr +0000 15.08.2011, mrg@eterna.com.au wrote:
 > >	use a sparc kernel built with GCC 4.5.3 heavily.  without
 > >	the above change, it should panic fairly often.
 > 
 > ...interesting. I have seen these NULL savefpstate panics on my ss20 (2x
 > SM71) ever since I upgraded to HEAD. Just last weekend, I switched to a UP
 > kernel to at least be able to build a few packages. Should have sent-pr
 > earlier, I guess.

 hmmm, i haven't seen those panics for several months, not since i
 made this change:

 revision 1.225
 date: 2011/01/22 12:13:25;  author: mrg;  state: Exp;  lines: +3 -3
 convert xpmsg_lock to IPL_SCHED.  the old spl/simple_lock code ran at
 splsched(), and this significantly helps with stability under load
 when running with multiple active CPUs.

 > Are you positive this is an exclusively gcc 4.5 related issue?

 not any more.  when it fails for you, can you get the backtraces
 of both CPUs and post them?  thanks.


 .mrg.

From: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
To: gnats-bugs@NetBSD.org
Cc: port-sparc-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        mrg@eterna.com.au
Subject: re: port-sparc/45255: GCC 4.5.3 and sparc re-triggers the NULL
 savefpstate IPI issue
Date: Wed, 17 Aug 2011 09:36:20 +0200

 At 18:10 Uhr +0000 15.8.2011, matthew green wrote:
 > > ...interesting. I have seen these NULL savefpstate panics on my ss20 (2x
 > > SM71) ever since I upgraded to HEAD. Just last weekend, I switched to a UP
 > > kernel to at least be able to build a few packages. Should have sent-pr
 > > earlier, I guess.
 >
 > hmmm, i haven't seen those panics for several months, not since i
 > made this change:
 >
 > revision 1.225
 > date: 2011/01/22 12:13:25;  author: mrg;  state: Exp;  lines: +3 -3
 > convert xpmsg_lock to IPL_SCHED.  the old spl/simple_lock code ran at
 > splsched(), and this significantly helps with stability under load
 > when running with multiple active CPUs.

 I've seen several reboots every day, and the machine wouldn't stay up long
 enough to build a non-trivial package, or make it through the nightly
 Amanda backup run.

 Since your change, I haven't seen a single kernel panic.

 > > Are you positive this is an exclusively gcc 4.5 related issue?
 >
 > not any more.  when it fails for you, can you get the backtraces
 > of both CPUs and post them?  thanks.

 I'll look into that next week.

 	hauke

 --
 "It's never straight up and down"     (DEVO)


Responsible-Changed-From-To: port-sparc-maintainer->mrg
Responsible-Changed-By: mrg@NetBSD.org
Responsible-Changed-When: Mon, 14 Jan 2019 00:19:45 +0000
Responsible-Changed-Why:
i'll see if this is still a problem.


State-Changed-From-To: open->closed
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Sat, 20 Jun 2020 17:30:52 +0000
State-Changed-Why:
i rmoved this hack a year or so ago, seems fine.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.