NetBSD Problem Report #53303

From gson@gson.org  Mon May 21 19:31:19 2018
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id BAF5B7A1C8
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 21 May 2018 19:31:19 +0000 (UTC)
Message-Id: <20180521193114.C158A98B6A5@guava.gson.org>
Date: Mon, 21 May 2018 22:31:14 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Some evbarm-earmv7hf test runs jump to 0x04000000
X-Send-Pr-Version: 3.95

>Number:         53303
>Category:       port-evbarm
>Synopsis:       Some evbarm-earmv7hf test runs jump to 0x04000000
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-evbarm-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon May 21 19:35:00 +0000 2018
>Closed-Date:    Mon May 16 07:15:56 +0000 2022
>Last-Modified:  Mon May 16 07:15:56 +0000 2022
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current
>Organization:

>Environment:
System: NetBSD
Architecture: arm
Machine: evbarm
>Description:

The TNF testbed for evbarm-earmv7hf runs the tests in qemu from a
preinstalled disk image that is automatically resized on boot, which
requires a reboot.  The problem is that this reboot sometimes fails
with an error message from qemu, as in the following console output:

  [   2.2273094] kern.module.path=/stand/evbarm/8.99.17/modules
  Fri May 18 03:13:45 UTC 2018
  Starting root file system check:
  /dev/rld0a: file system is clean; not checking
  Growing ld0 MBR partition #1 (1112MB -> 1824MB)
  Growing ld0 disklabel (1336MB -> 2048MB)
  Resizing /
  reboot:drebootedebybroot**************************************************| 100%
  [  70.2810153] rebooting...
  qemu-system-arm: Trying to execute code outside RAM or ROM at 0x04000000
  This usually means one of the following happened:

  (1) You told QEMU to execute a kernel for the wrong machine type, and it crashed on startup (eg trying to run a raspberry pi kernel on a versatilepb QEMU machine)
  (2) You didn't give QEMU a kernel or BIOS filename at all, and QEMU executed a ROM full of no-op instructions until it fell off the end
  (3) Your guest kernel has a bug and crashed by jumping off into nowhere

  This is almost always one of the first two, so check your command line and that you are using the right type of kernel for this machine.
  If you think option (3) is likely then you can try debugging your guest with the -d debug options; in particular -d guest_errors will cause the log to include a dump of the guest register state at this point.

The full log is at:

  http://releng.netbsd.org/b5reports/evbarm-earmv7hf/2018/2018.05.17.08.24.28/test.log

I don't know if this is a bug in NetBSD or in qemu.

>How-To-Repeat:

Review testbed logs.

>Fix:

>Release-Note:

>Audit-Trail:
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-evbarm/53303: Some evbarm-earmv7hf test runs jump to 0x04000000
Date: Mon, 18 Jun 2018 18:50:54 +0300

 This may not be ARM specific after all - i386 also sometimes gets the same
 message from qemu (but with a different address) when it attempts to reboot.
 For example, from http://releng.netbsd.org/b5reports/i386/2018/2018.06.17.13.12.25/test.log :

   [ 6727.3164359] dumping to dev 0,1 offset 2680
   [ 6727.3164359] dump 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 succeeded


   [ 6727.3164359] rebooting...
   qemu-system-i386: Trying to execute code outside RAM or ROM at 0xefffff53
   This usually means one of the following happened:

   (1) You told QEMU to execute a kernel for the wrong machine type, and it crashed on startup (eg trying to run a raspberry pi kernel on a versatilepb QEMU machine)
   (2) You didn't give QEMU a kernel or BIOS filename at all, and QEMU executed a ROM full of no-op instructions until it fell off the end
   (3) Your guest kernel has a bug and crashed by jumping off into nowhere

   This is almost always one of the first two, so check your command line and that you are using the right type of kernel for this machine.
   If you think option (3) is likely then you can try debugging your guest with the -d debug options; in particular -d guest_errors will cause the log to include a dump of the guest register state at this point.

   Execution cannot continue; stopping here.

 -- 
 Andreas Gustafsson, gson@gson.org

From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-evbarm/53303: Some evbarm-earmv7hf test runs jump to 0x04000000
Date: Sun, 27 Jan 2019 17:37:21 +0200

 Recent versions of qemu will no longer print the "Trying to execute
 code outside RAM or ROM" message, as the code to do that was removed
 in the following qemu commit:

   commit 20cb6ae4724d05cbbda0d9ceec7e357d646b6886
   Author: Peter Maydell <peter.maydell@linaro.org>
   Date:   Tue Aug 14 17:17:19 2018 +0100

       accel/tcg: Return -1 for execution from MMIO regions in get_page_addr_code()

       Now that all the callers can handle get_page_addr_code() returning -1,
       remove all the code which tries to handle execution from MMIO regions
       or small-MMU-region RAM areas. This will mean that we can correctly
       execute from these areas, rather than ending up either aborting QEMU
       or delivering an incorrect guest exception.

       Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
       Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
       Reviewed-by: Philippe Mathieu-Daud\303\251 <f4bug@amsat.org>
       Tested-by: C\303\251dric Le Goater <clg@kaod.org>
       Tested-by: Philippe Mathieu-Daud\303\251 <f4bug@amsat.org>
       Message-id: 20180710160013.26559-6-peter.maydell@linaro.org

 The problem remains, though - attempts by NetBSD to reboot itself
 still frequently fail, but now by silently hanging instead of printing
 the "Trying to execute code outside RAM or ROM" message.
 -- 
 Andreas Gustafsson, gson@gson.org

State-Changed-From-To: open->feedback
State-Changed-By: jmcneill@NetBSD.org
State-Changed-When: Sun, 15 May 2022 19:27:32 +0000
State-Changed-Why:
Is this issue still present? Do you see it with the 'virt' machine type, or only 'vexpress-a15'?


From: Andreas Gustafsson <gson@gson.org>
To: jmcneill@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-evbarm/53303 (Some evbarm-earmv7hf test runs jump to 0x04000000)
Date: Mon, 16 May 2022 09:32:18 +0300

 jmcneill@NetBSD.org wrote:
 > Is this issue still present?

 Apparently not.  Looking at the logs of the evbarm-earmv7hf tests on
 lyta.netbsd.org, the last case of the VM hanging immediately after
 reboot during the install process was when testing source date
 2019.10.23.05.20.52:

   http://releng.netbsd.org/b5reports/evbarm-earmv7hf/commits-2019.10.html#2019.10.23.05.20.52

 It's not clear if it was fixed by a change to NetBSD or by qemu
 being upgraded from 3.1.0 to 4.1.0.

 Mostly for my own reference, the command I used to search the logs was:

   lyta /bracket/evbarm-earmv7hf/results $ find -s . -name bracket.db | xargs grep install_status=1 | sed 's!^\(./..../...................\).*$!\1/install.log.gz!' | xargs zgrep -C1 'rebooting'

 > Do you see it with the 'virt' machine type, or only 'vexpress-a15'?

 I have not run regular tests using the 'virt' machine type.
 -- 
 Andreas Gustafsson, gson@gson.org

State-Changed-From-To: feedback->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Mon, 16 May 2022 07:15:56 +0000
State-Changed-Why:
The problem no longer occurs.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.