NetBSD Problem Report #56506

From gson@gson.org  Wed Nov 17 20:24:11 2021
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 404811A9239
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 17 Nov 2021 20:24:11 +0000 (UTC)
Message-Id: <20211117202356.6853D254286@guava.gson.org>
Date: Wed, 17 Nov 2021 22:23:56 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: sys/rc/t_rc_d_cli tests randomly fail
X-Send-Pr-Version: 3.95

>Number:         56506
>Category:       bin
>Synopsis:       sys/rc/t_rc_d_cli tests randomly fail
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 17 20:25:01 +0000 2021
>Last-Modified:  Sat Feb 26 16:25:01 +0000 2022
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current
>Organization:

>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:

On one of my testbeds, a physical i386 laptop, various test cases of
the sys/rc/t_rc_d_cli test program fail randomly.  The log output from
a typical failure is here:

  https://www.gson.org/netbsd/bugs/build/i386-laptop/2021/2021.11.14.18.36.13/test.html#sys_rc_t_rc_d_cli_default_stop_no_args

In this case, the default_restart_no_args test case failed with the
error message "h_simple not running?".

This looks like a race condition in rc.subr, which in some cases
checks whether a service is running by examining the output of ps(1).
When ps runs, the process running a newly started service will have
forked, but it may not yet have completed an exec(), and if so, it
will not show up in the ps output under the expected name.

To test this theory, I modified rc.subr to save the ps output to a
file using tee(1), and found that when the test fails, the ps output
shows a process with the name "(sh)" in place of the expected
"h_simple".

>How-To-Repeat:

  cd /usr/tests/sys/rc
  while atf-run t_rc_d_cli:default_stop_no_args; do true; done

The :default_stop_no_args part is only supported on -current;
omit it if testing on a release.  Repeat on different machines
until you find one that happens to have the right timing for
the test to fail.

>Fix:

>Audit-Trail:
From: "Andreas Gustafsson" <gson@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56506 CVS commit: src/tests/sys/rc
Date: Sat, 26 Feb 2022 16:21:59 +0000

 Module Name:	src
 Committed By:	gson
 Date:		Sat Feb 26 16:21:59 UTC 2022

 Modified Files:
 	src/tests/sys/rc: t_rc_d_cli.sh

 Log Message:
 Mark randomly failing test cases as expected failures with a reference
 to PR bin/56506.


 To generate a diff of this commit:
 cvs rdiff -u -r1.4 -r1.5 src/tests/sys/rc/t_rc_d_cli.sh

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.