NetBSD Problem Report #49141

From gson@gson.org  Fri Aug 22 14:36:20 2014
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 70841AEA54
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 22 Aug 2014 14:36:20 +0000 (UTC)
Message-Id: <20140822143609.7C5F275E2E@guava.gson.org>
Date: Fri, 22 Aug 2014 17:36:09 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@gnats.NetBSD.org
Subject: lib/librumpclient/t_exec/threxec test randomly fails
X-Send-Pr-Version: 3.95

>Number:         49141
>Category:       bin
>Synopsis:       lib/librumpclient/t_exec/threxec test randomly fails
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 22 14:40:00 +0000 2014
>Last-Modified:  Thu Nov 24 00:45:01 +0000 2016
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current
>Organization:

>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:

The lib/librumpclient/t_exec/threxec test case has been randomly
failing ever since it was first created.  What happens is that the 
h_execthr program sometimes hangs until ATF times out and kills it.

Here is the log from the first recorded failure, from the day the
test was committed.  This is from my own testbed since the TNF one
didn't exist yet:

  http://www.gson.org/netbsd/bugs/build/i386/2011/2011.03.08.22.21.52/test.html#lib_librumpclient_t_exec_threxec

Here is the log from a recent failure on the TNF testbed:

  http://releng.netbsd.org/b5reports/i386/build/2014.08.21.22.00.30/test.html#lib_librumpclient_t_exec_threxec

The test also sometimes fails in the amd64 and sparc runs, but less
often than in the i386 ones.  Perhaps this has something to do with
the i386 VM having less memory than the others.  Anyway, I'm also
seeing this when running the tests on the bare metal, so it's clearly
not a qemu issue.  The test's 300 second timeout does not appear to be
to short, either, because when the test passes, it does so quickly,
typically in 30 seconds or less.

>How-To-Repeat:

Run the lib/librumpclient/t_exec test repeatedly.

>Fix:

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/49141: lib/librumpclient/t_exec/threxec test randomly fails
Date: Mon, 21 Nov 2016 06:37:09 +0000

 On Fri, Aug 22, 2014 at 02:40:00PM +0000, Andreas Gustafsson wrote:
  > The lib/librumpclient/t_exec/threxec test case has been randomly
  > failing ever since it was first created.  What happens is that the 
  > h_execthr program sometimes hangs until ATF times out and kills it.

 I tried running it outside of atf, as follows:

    % setenv RUMP_SERVER unix://csock
    % rump_server -lrumpnet -lrumpnet_net -lrumpnet_netinet -lrumpdev \
         -lrumpvfs $RUMP_SERVER
    % ./obj.amd64/h_execthr

 It fails during initialization unless you unlimit maxthread (since it
 creates a lot of threads)... after that it was hanging every time, in
 what turned out to be the rump_sys___sysctl() call in getproc() in
 h_execthr.c... for some inexplicable rump reason, as killing and
 restarting the rump_server made it start working. Now it runs
 reliably.

 I'm going to make it print what it's doing in the hopes that we can
 get a line on where it's hanging when it happens in a full test run.

 -- 
 David A. Holland
 dholland@netbsd.org

From: "David A. Holland" <dholland@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/49141 CVS commit: src/tests/lib/librumpclient
Date: Mon, 21 Nov 2016 06:38:18 +0000

 Module Name:	src
 Committed By:	dholland
 Date:		Mon Nov 21 06:38:18 UTC 2016

 Modified Files:
 	src/tests/lib/librumpclient: h_execthr.c

 Log Message:
 As a debugging measure for PR 49141, log what this is doing as it runs
 to stdout. Hopefully this will get reported when the test fails in the
 testbed rather than just causing ATF to report that it printed
 unexpected output.


 To generate a diff of this commit:
 cvs rdiff -u -r1.5 -r1.6 src/tests/lib/librumpclient/h_execthr.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/49141: lib/librumpclient/t_exec/threxec test randomly fails
Date: Thu, 24 Nov 2016 00:41:22 +0000

 Wrong PR number, sorry.

    ------

 From: "David A. Holland" <dholland@netbsd.org>
 To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
 	prlw1@cam.ac.uk
 Cc: 
 Subject: PR/49140 CVS commit: src/tests/lib/librumpclient
 Date: Thu, 24 Nov 2016 00:40:00 +0000 (UTC)

 The following reply was made to PR kern/49140; it has been noted by GNATS.

 From: "David A. Holland" <dholland@netbsd.org>
 To: gnats-bugs@gnats.NetBSD.org
 Cc: 
 Subject: PR/49140 CVS commit: src/tests/lib/librumpclient
 Date: Thu, 24 Nov 2016 00:37:30 +0000

  Module Name:	src
  Committed By:	dholland
  Date:		Thu Nov 24 00:37:29 UTC 2016

  Modified Files:
  	src/tests/lib/librumpclient: h_execthr.c

  Log Message:
  Turn off the PR 49140 logging, because it itself makes the test fail.

  As usual, ATF is actively interfering with test debugging. Almost all
  runs in the past few days have failed this test with "stdout not
  empty". In one run it timed out:
  http://releng.netbsd.org/b5reports/i386/build/2016.11.22.06.51.14/test.html
  but in this case ATF helpfully suppressed the log data.

  Maybe if someone can figure out how to make the test hang reliably
  then they can turn the logging on again and run it outside of ATF to
  see what's happening.

  In the meantime this problem is not likely to get fixed until we have
  a less obstructive testing framework.


  To generate a diff of this commit:
  cvs rdiff -u -r1.6 -r1.7 src/tests/lib/librumpclient/h_execthr.c

  Please note that diffs are not public domain; they are subject to the
  copyright notices on the relevant files.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.