NetBSD Problem Report #56063

From gson@gson.org  Thu Mar 18 14:16:10 2021
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 2157F1A9217
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 18 Mar 2021 14:16:10 +0000 (UTC)
Message-Id: <20210318141605.0D0EE253F67@guava.gson.org>
Date: Thu, 18 Mar 2021 16:16:05 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Xen boot fails with "heap full"
X-Send-Pr-Version: 3.95

>Number:         56063
>Category:       port-xen
>Synopsis:       Xen boot fails with "heap full"
>Confidential:   no
>Severity:       critical
>Priority:       low
>Responsible:    port-xen-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Mar 18 14:20:00 +0000 2021
>Last-Modified:  Fri Jun 10 13:00:01 +0000 2022
>Originator:     Andreas Gustafsson
>Release:        NetBSD 8.2, 9.0, 9.1
>Organization:

>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:

When I try to boot any NetBSD release newer than 8.1 as a Xen 4.11
dom0 on a HP DL360 G7 server, the boot fails with a "heap full"
message.  For example, with 9.1:

  >> NetBSD/x86 BIOS Boot, Revision 5.11 (Sun Oct 18 19:24:30 UTC 2020) (from NetBSD 9.1)
  >> Memory: 637/3668992 k

       1. Xen
       2. Boot normally
       3. Boot single user
       4. Drop to boot prompt

  Choose an option; RETURN for default; SPACE to stop countdown.
  Option 1 will be chosen in 0 seconds. 4 seconds. 3 seconds. 2 seconds. 1 seconds. 0 seconds. 0 seconds.
  |/-\|/-\|/-\|/-\2666632|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|+1339256/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|=0x3d20ec
  /-\|/-\|/-\|/-\|Loading /netbsd-XEN3_DOM0.gz /-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/heap full (0x6a800+32768)

The boot blocks are the freshly installed ones of the OS version in case.

I bisected this on the -8 branch, and was able to narrow down the time
when the bug appeared on the branch to the interval that started with
the pullup

  2019.09.18.16.30.33 martin src/sys/arch/x86/acpi/acpi_machdep.c 1.18.6.1

and ended with

  2019.10.04.11.34.18 martin src/sys/arch/i386/stand/pxeboot/start_pxe.S 1.6.48.1

I am unable to narrow it down further because the pullup at
2019.09.18.16.30.33 broke the build, and when the build was fixed,
PXE booting (which my automated test relies on) was broken until fixed
at 2019.10.04.11.34.18.

I understand that others have been able to boot the versions that are
not working for me, so this is probably hardware or firmware dependent
in some way.  For example, given the nature of the first pullup, it
could have something to do with the contents of the ACPI tables.

I am marking this critical because the system doesn't even boot, yet
low priority because I just happened to run into it in the course of
unrelated testing and don't have any actual plans to run Xen on the
machine in case for any purpose other than to demonstrate that it
doesn't work.  If the bug impacts you, feel free to update the
priority accordingly.

>How-To-Repeat:

>Fix:

>Audit-Trail:
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/56063: Xen boot fails with "heap full"
Date: Thu, 23 Sep 2021 16:21:38 +0300

 As Manuel Bouyer hinted on the port-xen list in
 http://mail-index.netbsd.org/port-xen/2021/09/22/msg010182.html, this
 bug appears to only affect booting Xen with a compressed dom0 kernel.
 A work-around is to gunzip netbsd-XEN3_DOM0.gz and adjust boot.cfg
 accordingly.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: port-xen-maintainer@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Subject: Re: port-xen/56063: Xen boot fails with "heap full"
Date: Wed, 8 Jun 2022 17:24:55 +0200

 --/d5GHCpi5NVeYRo/
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Thu, Mar 18, 2021 at 02:20:00PM +0000, Andreas Gustafsson wrote:
 > >Description:
 > 
 > When I try to boot any NetBSD release newer than 8.1 as a Xen 4.11
 > dom0 on a HP DL360 G7 server, the boot fails with a "heap full"
 > message.  For example, with 9.1:
 > 
 >   >> NetBSD/x86 BIOS Boot, Revision 5.11 (Sun Oct 18 19:24:30 UTC 2020) (from NetBSD 9.1)
 >   >> Memory: 637/3668992 k
 > 
 >        1. Xen
 >        2. Boot normally
 >        3. Boot single user
 >        4. Drop to boot prompt
 > 
 >   Choose an option; RETURN for default; SPACE to stop countdown.
 >   Option 1 will be chosen in 0 seconds. 4 seconds. 3 seconds. 2 seconds. 1 seconds. 0 seconds. 0 seconds.
 >   |/-\|/-\|/-\|/-\2666632|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|+1339256/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|=0x3d20ec
 >   /-\|/-\|/-\|/-\|Loading /netbsd-XEN3_DOM0.gz /-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/heap full (0x6a800+32768)
 > 
 > The boot blocks are the freshly installed ones of the OS version in case.

 Hello
 are you still seeing this with a recent kernel and /boot ?
 If so, could you try the attached patch ?

 I can't reproduce it myself ...

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

 --/d5GHCpi5NVeYRo/
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename=diff

 Index: stand/boot/Makefile.boot
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/i386/stand/boot/Makefile.boot,v
 retrieving revision 1.75
 diff -u -p -u -r1.75 Makefile.boot
 --- stand/boot/Makefile.boot	6 Sep 2020 07:20:28 -0000	1.75
 +++ stand/boot/Makefile.boot	8 Jun 2022 15:14:01 -0000
 @@ -81,7 +81,7 @@ CPPFLAGS+= -DLIBSA_ENABLE_LS_OP
  # The biosboot code is linked to 'virtual' address of zero and is
  # loaded at physical address 0x10000.
  # XXX The heap values should be determined from _end.
 -SAMISCCPPFLAGS+= -DHEAP_START=0x40000 -DHEAP_LIMIT=0x70000
 +SAMISCCPPFLAGS+= -DHEAP_START=0x40000 -DHEAP_LIMIT=0x80000
  SAMISCCPPFLAGS+= -DLIBSA_PRINTF_LONGLONG_SUPPORT
  SAMISCMAKEFLAGS+= SA_USE_CREAD=yes	# Read compressed kernels
  SAMISCMAKEFLAGS+= SA_INCLUDE_NET=no	# Netboot via TFTP, NFS

 --/d5GHCpi5NVeYRo/--

From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc: Manuel Bouyer <bouyer@antioche.eu.org>
Subject: Re: port-xen/56063: Xen boot fails with "heap full"
Date: Thu, 9 Jun 2022 18:22:06 +0300

 Manuel Bouyer wrote:
 >  are you still seeing this with a recent kernel and /boot ?

 I am unable to test because PR 56873.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc: bouyer@antioche.eu.org
Subject: Re: port-xen/56063: Xen boot fails with "heap full"
Date: Fri, 10 Jun 2022 15:56:21 +0300

 Manuel Bouyer wrote:
 >  I can't reproduce it myself ...

 Booting a compressed dom0 kernel works for me, too, as of source date
 2022.06.09.17.39.21.  Logs from the test run are here:

   https://www.gson.org/netbsd/bugs/xen/results/2022-06-10/

 It would be good to know which commit fixed it so that the fix could
 be pulled up.
 -- 
 Andreas Gustafsson, gson@gson.org

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.