NetBSD Problem Report #54886

From gson@gson.org  Thu Jan 23 12:48:23 2020
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 383937A160
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 23 Jan 2020 12:48:23 +0000 (UTC)
Message-Id: <20200123124817.CF0D4253F3E@guava.gson.org>
Date: Thu, 23 Jan 2020 14:48:17 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: i386 testbed hangs running ATF tests
X-Send-Pr-Version: 3.95

>Number:         54886
>Category:       misc
>Synopsis:       i386 testbed hangs running ATF tests
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    ad
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jan 23 12:50:00 +0000 2020
>Closed-Date:    
>Last-Modified:  Fri Apr 24 13:55:51 +0000 2020
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2020.01.16.16.47.19
>Organization:
>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:

On the TNF i386 testbed, the ATF tests runs are hanging since the
following commits:

  2020.01.16.16.47.19 martin src/usr.sbin/sysinst/bsddisklabel.c 1.35
  2020.01.16.16.47.19 martin src/usr.sbin/sysinst/defs.h 1.51
  2020.01.16.16.47.19 martin src/usr.sbin/sysinst/disks.c 1.60
  2020.01.16.16.47.19 martin src/usr.sbin/sysinst/main.c 1.20

The hang consistently occurs at the time of the
bin/sh/t_patterns:filename_expansion test case.

Since the above commits cause a tmpfs to be mounted at /tmp and the
tests are run in a qemu VM with 128 MB of RAM, 25% of which will
now be allocated to the tmpfs, I'm guessing this is related to
the tmpfs filling up or the system running low on available memory.

The pmax and hpcmips tests are also failing to complete since around
the same time.

The hang does not occur when running the tests on real hardware with
plenty of RAM.

For logs, see:

  http://releng.netbsd.org/b5reports/i386/commits-2020.01.html#2020.01.16.16.47.19

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54886 CVS commit: src/usr.sbin/sysinst
Date: Fri, 24 Jan 2020 07:31:15 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Fri Jan 24 07:31:15 UTC 2020

 Modified Files:
 	src/usr.sbin/sysinst: bsddisklabel.c defs.h disks.c

 Log Message:
 Factor out all RAM size thresholds as defines to avoid magic numbers.
 To work around PR misc/54886 bump the threshold for a tmpfs /tmp mount
 up to 256 MB.


 To generate a diff of this commit:
 cvs rdiff -u -r1.36 -r1.37 src/usr.sbin/sysinst/bsddisklabel.c
 cvs rdiff -u -r1.52 -r1.53 src/usr.sbin/sysinst/defs.h
 cvs rdiff -u -r1.60 -r1.61 src/usr.sbin/sysinst/disks.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: misc/54886: i386 testbed hangs running ATF tests
Date: Sat, 25 Jan 2020 17:40:33 +0200

 The testbed no longer hangs since martin bumped the threshold for
 creating the tmpfs to 256 MB:

   commit 2020.01.24.07.31.15 martin src/usr.sbin/sysinst/bsddisklabel.c 1.37
   commit 2020.01.24.07.31.15 martin src/usr.sbin/sysinst/defs.h 1.53
   commit 2020.01.24.07.31.15 martin src/usr.sbin/sysinst/disks.c 1.61

 However, if you were to manually configure a system with 128 MB RAM
 and a 32 MB tmpfs, it will still hang, so there's still a bug and the
 PR should remain open.
 -- 
 Andreas Gustafsson, gson@gson.org

Responsible-Changed-From-To: misc-bug-people->ad
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Wed, 26 Feb 2020 21:58:09 +0000
Responsible-Changed-Why:
take


From: Andreas Gustafsson <gson@gson.org>
To: "Martin Husemann" <martin@netbsd.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: PR/54886 CVS commit: src/usr.sbin/sysinst
Date: Wed, 11 Mar 2020 11:35:50 +0200

 Martin,

 On Jan 24, you wrote:
 >  Log Message:
 >  Factor out all RAM size thresholds as defines to avoid magic numbers.
 >  To work around PR misc/54886 bump the threshold for a tmpfs /tmp mount
 >  up to 256 MB.
 >  
 >  To generate a diff of this commit:
 >  cvs rdiff -u -r1.36 -r1.37 src/usr.sbin/sysinst/bsddisklabel.c
 >  cvs rdiff -u -r1.52 -r1.53 src/usr.sbin/sysinst/defs.h
 >  cvs rdiff -u -r1.60 -r1.61 src/usr.sbin/sysinst/disks.c

 Could you bump this a bit further, for example to 512 MB?  The amd64
 port no longer reliably installs in 128 MB (it gets "out of swap"
 errors during set extraction), and if you increase memory to 256 MB to
 work around this, the tmpfs /tmp activates, and although the system
 does not hang as it did in PR 54886, many of the ATF file system tests
 fail with ENOSPC.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Martin Husemann <martin@duskware.de>
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@netbsd.org, ad@NetBSD.org
Subject: Re: PR/54886 CVS commit: src/usr.sbin/sysinst
Date: Wed, 11 Mar 2020 10:49:13 +0100

 On Wed, Mar 11, 2020 at 11:35:50AM +0200, Andreas Gustafsson wrote:
 > Martin,
 > 
 > On Jan 24, you wrote:
 > >  Log Message:
 > >  Factor out all RAM size thresholds as defines to avoid magic numbers.
 > >  To work around PR misc/54886 bump the threshold for a tmpfs /tmp mount
 > >  up to 256 MB.
 > >  
 > >  To generate a diff of this commit:
 > >  cvs rdiff -u -r1.36 -r1.37 src/usr.sbin/sysinst/bsddisklabel.c
 > >  cvs rdiff -u -r1.52 -r1.53 src/usr.sbin/sysinst/defs.h
 > >  cvs rdiff -u -r1.60 -r1.61 src/usr.sbin/sysinst/disks.c
 > 
 > Could you bump this a bit further, for example to 512 MB?  The amd64
 > port no longer reliably installs in 128 MB (it gets "out of swap"
 > errors during set extraction), and if you increase memory to 256 MB to
 > work around this, the tmpfs /tmp activates, and although the system
 > does not hang as it did in PR 54886, many of the ATF file system tests
 > fail with ENOSPC.

 I think there are two issues here that we should address separately.

 First with 128 MB RAM obviously the swap partition is created too small,
 sysinst scales it probably to 256 MB and we maybe should have a MD lower
 limit for it (if space on target disk permits).

 The second issue is ATF tests requiring at least a certain /tmp size, and
 25% of RAM (the default) not being enough for that. Do you have any idea
 what minimal /tmp size is needed? I can increase the default tmpfs size on
 lower ram machines to more than 25%, or as you suggested just raise the
 threshold. With enough swap available, bigger tmpfs is no problem (I have
 localy used very high percent-of-ram tmpfs setups on big memory machines
 to build completely in a tmpfs).

 Martin

From: Martin Husemann <martin@duskware.de>
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@netbsd.org, ad@NetBSD.org
Subject: Re: PR/54886 CVS commit: src/usr.sbin/sysinst
Date: Wed, 11 Mar 2020 17:55:37 +0100

 Another thing we should do: make more tests deal with restricted /tmp sizes.
 This would for example help my landisk test runs a lot (but I haven't come
 around to it). We have such checks for a few individual tests already.

 Martin

From: Andreas Gustafsson <gson@gson.org>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: PR/54886 CVS commit: src/usr.sbin/sysinst
Date: Wed, 11 Mar 2020 19:42:01 +0200

 Martin Husemann wrote:
 >  Do you have any idea what minimal /tmp size is needed?

 I'm running some tests to find out but they will still take a while to
 finish.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Andreas Gustafsson <gson@gson.org>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: PR/54886 CVS commit: src/usr.sbin/sysinst
Date: Thu, 12 Mar 2020 10:28:10 +0200

 Martin Husemann wrote:
 > Do you have any idea what minimal /tmp size is needed?

 A test run of -current in a qemu VM with 512 MB RAM completed with no
 more failures than usual, so a 128 MB /tmp is big enough (and a 64 MB
 /tmp is not).
 -- 
 Andreas Gustafsson, gson@gson.org

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54886 CVS commit: src/usr.sbin/sysinst
Date: Mon, 16 Mar 2020 06:48:18 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon Mar 16 06:48:18 UTC 2020

 Modified Files:
 	src/usr.sbin/sysinst: defs.h

 Log Message:
 PR misc/54886: bump threshold for automatic/default creation of a tmpfs /tmp
 up slightly (to 384 MB ram). This will make sure the default install has
 a > 64 MB /tmp available (number pulled out of thin air, 64 MB is the minimum
 required by the ZFS tests).


 To generate a diff of this commit:
 cvs rdiff -u -r1.56 -r1.57 src/usr.sbin/sysinst/defs.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Martin Husemann <martin@duskware.de>
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@netbsd.org, ad@NetBSD.org
Subject: Re: PR/54886 CVS commit: src/usr.sbin/sysinst
Date: Mon, 16 Mar 2020 07:53:06 +0100

 On Wed, Mar 11, 2020 at 05:55:37PM +0100, Martin Husemann wrote:
 > Another thing we should do: make more tests deal with restricted /tmp sizes.
 > This would for example help my landisk test runs a lot (but I haven't come
 > around to it). We have such checks for a few individual tests already.

 I did two things:

  - modified all tests that failed with a 32MB /tmp to properly check
    for available space upfront and skip

  - as all ZFS tests can not work with less than 64 MB free
    space in /tmp, bumped the threshold for default creation of a tmpfs
    /tmp in sysinst to 384 MB available ram (with 25% default, so
    96 MB /tmp in tmpfs minimal)

 Martin

State-Changed-From-To: open->feedback
State-Changed-By: ad@NetBSD.org
State-Changed-When: Mon, 20 Apr 2020 19:18:02 +0000
State-Changed-Why:
do you consider this fixed?


From: Andreas Gustafsson <gson@gson.org>
To: ad@NetBSD.org
Cc: gnats-bugs@netbsd.org
Subject: Re: misc/54886 (i386 testbed hangs running ATF tests)
Date: Tue, 21 Apr 2020 16:07:42 +0300

 ad@NetBSD.org wrote:
 > do you consider this fixed?

 I consider it worked around, but not fixed.  For example, the
 following still hangs as of source date 2020.04.21.06.45.16:

   host$ qemu-system-i386 -m 128 [...other args to boot a -current/i386 image...]
   guest# mount_tmpfs -s 33M tmpfs /tmp
   guest# cd /usr/tests/bin/sh && atf-run t_patterns

 -- 
 Andreas Gustafsson, gson@gson.org

State-Changed-From-To: feedback->open
State-Changed-By: maya@NetBSD.org
State-Changed-When: Fri, 24 Apr 2020 13:55:51 +0000
State-Changed-Why:
Feedback provided.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.