NetBSD Problem Report #53303
From gson@gson.org Mon May 21 19:31:19 2018
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id BAF5B7A1C8
for <gnats-bugs@gnats.NetBSD.org>; Mon, 21 May 2018 19:31:19 +0000 (UTC)
Message-Id: <20180521193114.C158A98B6A5@guava.gson.org>
Date: Mon, 21 May 2018 22:31:14 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Some evbarm-earmv7hf test runs jump to 0x04000000
X-Send-Pr-Version: 3.95
>Number: 53303
>Category: port-evbarm
>Synopsis: Some evbarm-earmv7hf test runs jump to 0x04000000
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-evbarm-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon May 21 19:35:00 +0000 2018
>Closed-Date: Mon May 16 07:15:56 +0000 2022
>Last-Modified: Mon May 16 07:15:56 +0000 2022
>Originator: Andreas Gustafsson
>Release: NetBSD-current
>Organization:
>Environment:
System: NetBSD
Architecture: arm
Machine: evbarm
>Description:
The TNF testbed for evbarm-earmv7hf runs the tests in qemu from a
preinstalled disk image that is automatically resized on boot, which
requires a reboot. The problem is that this reboot sometimes fails
with an error message from qemu, as in the following console output:
[ 2.2273094] kern.module.path=/stand/evbarm/8.99.17/modules
Fri May 18 03:13:45 UTC 2018
Starting root file system check:
/dev/rld0a: file system is clean; not checking
Growing ld0 MBR partition #1 (1112MB -> 1824MB)
Growing ld0 disklabel (1336MB -> 2048MB)
Resizing /
reboot:drebootedebybroot**************************************************| 100%
[ 70.2810153] rebooting...
qemu-system-arm: Trying to execute code outside RAM or ROM at 0x04000000
This usually means one of the following happened:
(1) You told QEMU to execute a kernel for the wrong machine type, and it crashed on startup (eg trying to run a raspberry pi kernel on a versatilepb QEMU machine)
(2) You didn't give QEMU a kernel or BIOS filename at all, and QEMU executed a ROM full of no-op instructions until it fell off the end
(3) Your guest kernel has a bug and crashed by jumping off into nowhere
This is almost always one of the first two, so check your command line and that you are using the right type of kernel for this machine.
If you think option (3) is likely then you can try debugging your guest with the -d debug options; in particular -d guest_errors will cause the log to include a dump of the guest register state at this point.
The full log is at:
http://releng.netbsd.org/b5reports/evbarm-earmv7hf/2018/2018.05.17.08.24.28/test.log
I don't know if this is a bug in NetBSD or in qemu.
>How-To-Repeat:
Review testbed logs.
>Fix:
>Release-Note:
>Audit-Trail:
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-evbarm/53303: Some evbarm-earmv7hf test runs jump to 0x04000000
Date: Mon, 18 Jun 2018 18:50:54 +0300
This may not be ARM specific after all - i386 also sometimes gets the same
message from qemu (but with a different address) when it attempts to reboot.
For example, from http://releng.netbsd.org/b5reports/i386/2018/2018.06.17.13.12.25/test.log :
[ 6727.3164359] dumping to dev 0,1 offset 2680
[ 6727.3164359] dump 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 succeeded
[ 6727.3164359] rebooting...
qemu-system-i386: Trying to execute code outside RAM or ROM at 0xefffff53
This usually means one of the following happened:
(1) You told QEMU to execute a kernel for the wrong machine type, and it crashed on startup (eg trying to run a raspberry pi kernel on a versatilepb QEMU machine)
(2) You didn't give QEMU a kernel or BIOS filename at all, and QEMU executed a ROM full of no-op instructions until it fell off the end
(3) Your guest kernel has a bug and crashed by jumping off into nowhere
This is almost always one of the first two, so check your command line and that you are using the right type of kernel for this machine.
If you think option (3) is likely then you can try debugging your guest with the -d debug options; in particular -d guest_errors will cause the log to include a dump of the guest register state at this point.
Execution cannot continue; stopping here.
--
Andreas Gustafsson, gson@gson.org
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-evbarm/53303: Some evbarm-earmv7hf test runs jump to 0x04000000
Date: Sun, 27 Jan 2019 17:37:21 +0200
Recent versions of qemu will no longer print the "Trying to execute
code outside RAM or ROM" message, as the code to do that was removed
in the following qemu commit:
commit 20cb6ae4724d05cbbda0d9ceec7e357d646b6886
Author: Peter Maydell <peter.maydell@linaro.org>
Date: Tue Aug 14 17:17:19 2018 +0100
accel/tcg: Return -1 for execution from MMIO regions in get_page_addr_code()
Now that all the callers can handle get_page_addr_code() returning -1,
remove all the code which tries to handle execution from MMIO regions
or small-MMU-region RAM areas. This will mean that we can correctly
execute from these areas, rather than ending up either aborting QEMU
or delivering an incorrect guest exception.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daud\303\251 <f4bug@amsat.org>
Tested-by: C\303\251dric Le Goater <clg@kaod.org>
Tested-by: Philippe Mathieu-Daud\303\251 <f4bug@amsat.org>
Message-id: 20180710160013.26559-6-peter.maydell@linaro.org
The problem remains, though - attempts by NetBSD to reboot itself
still frequently fail, but now by silently hanging instead of printing
the "Trying to execute code outside RAM or ROM" message.
--
Andreas Gustafsson, gson@gson.org
State-Changed-From-To: open->feedback
State-Changed-By: jmcneill@NetBSD.org
State-Changed-When: Sun, 15 May 2022 19:27:32 +0000
State-Changed-Why:
Is this issue still present? Do you see it with the 'virt' machine type, or only 'vexpress-a15'?
From: Andreas Gustafsson <gson@gson.org>
To: jmcneill@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-evbarm/53303 (Some evbarm-earmv7hf test runs jump to 0x04000000)
Date: Mon, 16 May 2022 09:32:18 +0300
jmcneill@NetBSD.org wrote:
> Is this issue still present?
Apparently not. Looking at the logs of the evbarm-earmv7hf tests on
lyta.netbsd.org, the last case of the VM hanging immediately after
reboot during the install process was when testing source date
2019.10.23.05.20.52:
http://releng.netbsd.org/b5reports/evbarm-earmv7hf/commits-2019.10.html#2019.10.23.05.20.52
It's not clear if it was fixed by a change to NetBSD or by qemu
being upgraded from 3.1.0 to 4.1.0.
Mostly for my own reference, the command I used to search the logs was:
lyta /bracket/evbarm-earmv7hf/results $ find -s . -name bracket.db | xargs grep install_status=1 | sed 's!^\(./..../...................\).*$!\1/install.log.gz!' | xargs zgrep -C1 'rebooting'
> Do you see it with the 'virt' machine type, or only 'vexpress-a15'?
I have not run regular tests using the 'virt' machine type.
--
Andreas Gustafsson, gson@gson.org
State-Changed-From-To: feedback->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Mon, 16 May 2022 07:15:56 +0000
State-Changed-Why:
The problem no longer occurs.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.