NetBSD Problem Report #49141
From gson@gson.org Fri Aug 22 14:36:20 2014
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 70841AEA54
for <gnats-bugs@gnats.NetBSD.org>; Fri, 22 Aug 2014 14:36:20 +0000 (UTC)
Message-Id: <20140822143609.7C5F275E2E@guava.gson.org>
Date: Fri, 22 Aug 2014 17:36:09 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@gnats.NetBSD.org
Subject: lib/librumpclient/t_exec/threxec test randomly fails
X-Send-Pr-Version: 3.95
>Number: 49141
>Category: bin
>Synopsis: lib/librumpclient/t_exec/threxec test randomly fails
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Aug 22 14:40:00 +0000 2014
>Last-Modified: Thu Nov 24 00:45:01 +0000 2016
>Originator: Andreas Gustafsson
>Release: NetBSD-current
>Organization:
>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:
The lib/librumpclient/t_exec/threxec test case has been randomly
failing ever since it was first created. What happens is that the
h_execthr program sometimes hangs until ATF times out and kills it.
Here is the log from the first recorded failure, from the day the
test was committed. This is from my own testbed since the TNF one
didn't exist yet:
http://www.gson.org/netbsd/bugs/build/i386/2011/2011.03.08.22.21.52/test.html#lib_librumpclient_t_exec_threxec
Here is the log from a recent failure on the TNF testbed:
http://releng.netbsd.org/b5reports/i386/build/2014.08.21.22.00.30/test.html#lib_librumpclient_t_exec_threxec
The test also sometimes fails in the amd64 and sparc runs, but less
often than in the i386 ones. Perhaps this has something to do with
the i386 VM having less memory than the others. Anyway, I'm also
seeing this when running the tests on the bare metal, so it's clearly
not a qemu issue. The test's 300 second timeout does not appear to be
to short, either, because when the test passes, it does so quickly,
typically in 30 seconds or less.
>How-To-Repeat:
Run the lib/librumpclient/t_exec test repeatedly.
>Fix:
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/49141: lib/librumpclient/t_exec/threxec test randomly fails
Date: Mon, 21 Nov 2016 06:37:09 +0000
On Fri, Aug 22, 2014 at 02:40:00PM +0000, Andreas Gustafsson wrote:
> The lib/librumpclient/t_exec/threxec test case has been randomly
> failing ever since it was first created. What happens is that the
> h_execthr program sometimes hangs until ATF times out and kills it.
I tried running it outside of atf, as follows:
% setenv RUMP_SERVER unix://csock
% rump_server -lrumpnet -lrumpnet_net -lrumpnet_netinet -lrumpdev \
-lrumpvfs $RUMP_SERVER
% ./obj.amd64/h_execthr
It fails during initialization unless you unlimit maxthread (since it
creates a lot of threads)... after that it was hanging every time, in
what turned out to be the rump_sys___sysctl() call in getproc() in
h_execthr.c... for some inexplicable rump reason, as killing and
restarting the rump_server made it start working. Now it runs
reliably.
I'm going to make it print what it's doing in the hopes that we can
get a line on where it's hanging when it happens in a full test run.
--
David A. Holland
dholland@netbsd.org
From: "David A. Holland" <dholland@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/49141 CVS commit: src/tests/lib/librumpclient
Date: Mon, 21 Nov 2016 06:38:18 +0000
Module Name: src
Committed By: dholland
Date: Mon Nov 21 06:38:18 UTC 2016
Modified Files:
src/tests/lib/librumpclient: h_execthr.c
Log Message:
As a debugging measure for PR 49141, log what this is doing as it runs
to stdout. Hopefully this will get reported when the test fails in the
testbed rather than just causing ATF to report that it printed
unexpected output.
To generate a diff of this commit:
cvs rdiff -u -r1.5 -r1.6 src/tests/lib/librumpclient/h_execthr.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/49141: lib/librumpclient/t_exec/threxec test randomly fails
Date: Thu, 24 Nov 2016 00:41:22 +0000
Wrong PR number, sorry.
------
From: "David A. Holland" <dholland@netbsd.org>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
prlw1@cam.ac.uk
Cc:
Subject: PR/49140 CVS commit: src/tests/lib/librumpclient
Date: Thu, 24 Nov 2016 00:40:00 +0000 (UTC)
The following reply was made to PR kern/49140; it has been noted by GNATS.
From: "David A. Holland" <dholland@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/49140 CVS commit: src/tests/lib/librumpclient
Date: Thu, 24 Nov 2016 00:37:30 +0000
Module Name: src
Committed By: dholland
Date: Thu Nov 24 00:37:29 UTC 2016
Modified Files:
src/tests/lib/librumpclient: h_execthr.c
Log Message:
Turn off the PR 49140 logging, because it itself makes the test fail.
As usual, ATF is actively interfering with test debugging. Almost all
runs in the past few days have failed this test with "stdout not
empty". In one run it timed out:
http://releng.netbsd.org/b5reports/i386/build/2016.11.22.06.51.14/test.html
but in this case ATF helpfully suppressed the log data.
Maybe if someone can figure out how to make the test hang reliably
then they can turn the logging on again and run it outside of ATF to
see what's happening.
In the meantime this problem is not likely to get fixed until we have
a less obstructive testing framework.
To generate a diff of this commit:
cvs rdiff -u -r1.6 -r1.7 src/tests/lib/librumpclient/h_execthr.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.