NetBSD Problem Report #54719
From gson@gson.org Mon Nov 25 14:08:18 2019
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id E75697A16D
for <gnats-bugs@gnats.NetBSD.org>; Mon, 25 Nov 2019 14:08:18 +0000 (UTC)
Message-Id: <20191125140729.8BF4F253F37@guava.gson.org>
Date: Mon, 25 Nov 2019 16:07:29 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: sparc64 fails to boot since switch to gcc8
X-Send-Pr-Version: 3.95
>Number: 54719
>Category: port-sparc64
>Synopsis: sparc64 fails to boot since switch to gcc8
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: martin
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Nov 25 14:10:00 +0000 2019
>Closed-Date: Thu Dec 05 09:23:22 +0000 2019
>Last-Modified: Thu Dec 05 09:23:22 +0000 2019
>Originator: Andreas Gustafsson
>Release: NetBSD-current, source date >= 2019.11.16.10.23.36
>Organization:
>Environment:
System: NetBSD
Architecture: sparc64
Machine: sparc64
>Description:
The TNF sparc64 testbed is failing to boot from the installation media
since this commit:
2019.11.16.10.23.36 mrg src/share/mk/bsd.own.mk 1.1162
with the commit message
sparc & sparc64 -> GCC 8.
The console log messages look like this:
Welcome to OpenBIOS v1.1 built on Jul 1 2019 17:08
Type 'help' for detailed information
Trying cdrom:f...
Not a bootable ELF image
Not a bootable a.out image
Loading FCode image...
Loaded 7514 bytes
entry point is 0x4000
Evaluating FCode...
NetBSD IEEE 1275 Multi-FS Bootblock
Version $NetBSD: bootblk.fth,v 1.15 2015/08/20 05:40:08 dholland Exp $
..Unhandled Exception 0x0000000000000030
PC = 0x00000000ffd0f2b8 NPC = 0x00000000ffd0f2bc
Stopping execution
This does not affect sparc, only sparc64.
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Subject: Re: port-sparc64/54719: sparc64 fails to boot since switch to gcc8
Date: Mon, 25 Nov 2019 16:19:12 +0100
On Mon, Nov 25, 2019 at 02:10:00PM +0000, Andreas Gustafsson wrote:
> >Number: 54719
> >Category: port-sparc64
> >Synopsis: sparc64 fails to boot since switch to gcc8
> Welcome to OpenBIOS v1.1 built on Jul 1 2019 17:08
> Type 'help' for detailed information
> Trying cdrom:f...
> Not a bootable ELF image
> Not a bootable a.out image
>
> Loading FCode image...
> Loaded 7514 bytes
> entry point is 0x4000
> Evaluating FCode...
> NetBSD IEEE 1275 Multi-FS Bootblock
> Version $NetBSD: bootblk.fth,v 1.15 2015/08/20 05:40:08 dholland Exp $
> ..Unhandled Exception 0x0000000000000030
> PC = 0x00000000ffd0f2b8 NPC = 0x00000000ffd0f2bc
> Stopping execution
>
> This does not affect sparc, only sparc64.
FWIW: it works fine on real hardware (with OpenFirmware).
Mark, could you have a look and tell us what firmware call goes wrong and
what its args are?
Should be enough to boot the sparc64 HEAD iso image (from
http://nycdn.netbsd.org/pub/NetBSD-daily/HEAD/latest/images/NetBSD-9.99.18-sparc64.iso)
in qemu.
Thanks!
Martin
From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
To: Martin Husemann <martin@duskware.de>, gnats-bugs@netbsd.org
Cc:
Subject: Re: port-sparc64/54719: sparc64 fails to boot since switch to gcc8
Date: Mon, 25 Nov 2019 20:52:15 +0000
On 25/11/2019 15:19, Martin Husemann wrote:
> On Mon, Nov 25, 2019 at 02:10:00PM +0000, Andreas Gustafsson wrote:
>>> Number: 54719
>>> Category: port-sparc64
>>> Synopsis: sparc64 fails to boot since switch to gcc8
>> Welcome to OpenBIOS v1.1 built on Jul 1 2019 17:08
>> Type 'help' for detailed information
>> Trying cdrom:f...
>> Not a bootable ELF image
>> Not a bootable a.out image
>>
>> Loading FCode image...
>> Loaded 7514 bytes
>> entry point is 0x4000
>> Evaluating FCode...
>> NetBSD IEEE 1275 Multi-FS Bootblock
>> Version $NetBSD: bootblk.fth,v 1.15 2015/08/20 05:40:08 dholland Exp $
>> ..Unhandled Exception 0x0000000000000030
>> PC = 0x00000000ffd0f2b8 NPC = 0x00000000ffd0f2bc
>> Stopping execution
>>
>> This does not affect sparc, only sparc64.
>
> FWIW: it works fine on real hardware (with OpenFirmware).
>
> Mark, could you have a look and tell us what firmware call goes wrong and
> what its args are?
>
> Should be enough to boot the sparc64 HEAD iso image (from
> http://nycdn.netbsd.org/pub/NetBSD-daily/HEAD/latest/images/NetBSD-9.99.18-sparc64.iso)
> in qemu.
>
> Thanks!
>
> Martin
Hi Martin,
After a fun few hours trying to debug this, I've managed to figure out what's going
on and it's related to the way in which OpenBIOS switches contexts.
What is happening is that right at the end of the FCode compiled from bootblk.fth we
run do-boot which loads /ofwboot into RAM and then executes "init-program" to set the
saved state context, which in OpenBIOS is implemented on a context stack.
The problem is that once "init-program" returns after setting the saved stack
context, OpenBIOS then pops off the previous context which was executing the FCode
before switching to the new one. Unfortunately bootblk.fth, just like the FCode
before it, uses the memory at load-base which means that instead of returning back
to the FCode to read the final end0 (0x0) byte to terminate the interpreter, instead
we start trying to execute the contents of ofwboot as FCode...
Comparing with previous NetBSD versions it seems that with those we just got lucky
and hit a 0x0 byte before too long, switched context to ofwboot and everything worked
fine. With a gcc-8 compiled ofwboot from the latest HEAD ISO it seems we hit a 0x90
"type" token which pulls a bogus address from the stack and then crashes.
Ultimately it makes sense for OpenBIOS to not use a context stack for launching
client program contexts and instead use a single fixed context since this is
evidently what OBP does - but that's going to take a bit of time to fix.
The quickest solution I can think of for now is if you can use a special linker
script for ofwboot to place the first section starting at offset 0x2000 in the ELF
binary after the first 8k page - since the offending byte is at offset 0x1d58 then
this temporarily ensures that when returning from "init-program" then we hit a zero
byte which should allow the context switch to succeed.
ATB,
Mark.
Responsible-Changed-From-To: port-sparc64-maintainer->martin
Responsible-Changed-By: martin@NetBSD.org
Responsible-Changed-When: Tue, 26 Nov 2019 06:24:30 +0000
Responsible-Changed-Why:
Take
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54719 CVS commit: src/sys/arch/sparc/stand/ofwboot
Date: Thu, 28 Nov 2019 14:21:25 +0000
Module Name: src
Committed By: martin
Date: Thu Nov 28 14:21:25 UTC 2019
Modified Files:
src/sys/arch/sparc/stand/ofwboot: srt0.s
Log Message:
Provide a mostly-zeroed page at the start of the text segment, to work around
an OpenBIOS bug, see PR port-sparc64/54719 for details.
To generate a diff of this commit:
cvs rdiff -u -r1.7 -r1.8 src/sys/arch/sparc/stand/ofwboot/srt0.s
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: martin@NetBSD.org
State-Changed-When: Thu, 28 Nov 2019 14:55:20 +0000
State-Changed-Why:
Should be fixed, waiting for official test results to confirm
State-Changed-From-To: feedback->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Thu, 05 Dec 2019 09:23:22 +0000
State-Changed-Why:
Fixed by martin's commit of srt0.s 1.8, thanks.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.45 2018/12/21 14:23:33 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.