NetBSD Problem Report #59255

From www@netbsd.org  Sun Apr  6 14:16:11 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits)
	 client-signature RSA-PSS (2048 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 070DA1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  6 Apr 2025 14:16:11 +0000 (UTC)
Message-Id: <20250406141609.C729D1A923C@mollari.NetBSD.org>
Date: Sun,  6 Apr 2025 14:16:09 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: tests/lib/librumpclient/t_exec: intermittent failures
X-Send-Pr-Version: www-1.0

>Number:         59255
>Category:       misc
>Synopsis:       tests/lib/librumpclient/t_exec: intermittent failures
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    misc-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Apr 06 14:20:00 +0000 2025
>Last-Modified:  Mon Apr 07 16:00:02 +0000 2025
>Originator:     Taylor R Campbell
>Release:        current
>Organization:
The RumpBSD Execution
>Environment:
>Description:
Various test cases in tests/lib/librumpclient/t_exec have been intermittently failing for a while:

https://releng.netbsd.org/b5reports/i386/2025/2025.03.02.08.14.26/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.02.14.13.22/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.04.00.41.00/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.04.16.40.46/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.08.19.09.46/test.html#lib_librumpclient_t_exec_threxec
https://releng.netbsd.org/b5reports/i386/2025/2025.03.09.18.50.20/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.09.18.58.18/test.html#lib_librumpclient_t_exec_cloexec
https://releng.netbsd.org/b5reports/i386/2025/2025.03.09.22.06.28/test.html#lib_librumpclient_t_exec_cloexec
https://releng.netbsd.org/b5reports/i386/2025/2025.03.10.05.06.02/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.11.05.48.26/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.11.14.13.45/test.html#lib_librumpclient_t_exec_cloexec
https://releng.netbsd.org/b5reports/i386/2025/2025.03.12.07.57.05/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.14.06.40.51/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.14.18.50.03/test.html#lib_librumpclient_t_exec_threxec
https://releng.netbsd.org/b5reports/i386/2025/2025.03.18.07.58.09/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.19.18.15.27/test.html#lib_librumpclient_t_exec_cloexec
https://releng.netbsd.org/b5reports/i386/2025/2025.03.21.07.09.58/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.24.00.13.58/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.26.00.05.56/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.29.11.51.54/test.html#lib_librumpclient_t_exec_threxec
https://releng.netbsd.org/b5reports/i386/2025/2025.03.29.17.29.20/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.29.21.45.08/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.30.14.13.59/test.html#lib_librumpclient_t_exec_threxec
https://releng.netbsd.org/b5reports/i386/2025/2025.03.30.16.23.13/test.html#lib_librumpclient_t_exec_threxec
https://releng.netbsd.org/b5reports/i386/2025/2025.03.31.13.03.23/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.03.31.14.46.42/test.html#lib_librumpclient_t_exec_threxec
https://releng.netbsd.org/b5reports/i386/2025/2025.04.01.23.02.29/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.04.02.17.44.07/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.04.03.14.51.37/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.04.03.17.49.49/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.04.04.18.57.01/test.html#lib_librumpclient_t_exec_vfork
https://releng.netbsd.org/b5reports/i386/2025/2025.04.06.03.33.51/test.html#lib_librumpclient_t_exec_cloexec
>How-To-Repeat:
cd /usr/tests/lib/librumpclient
atf-run t_exec | atf-report
>Fix:
Yes, please!

>Audit-Trail:
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: misc/59255: tests/lib/librumpclient/t_exec: intermittent failures
Date: Mon, 07 Apr 2025 02:00:06 +0700

     Date:        Sun,  6 Apr 2025 14:20:01 +0000 (UTC)
     From:        campbell+netbsd@mumble.net
     Message-ID:  <20250406142001.10B3B1A923E@mollari.NetBSD.org>


   | Various test cases in tests/lib/librumpclient/t_exec have
   | been intermittently failing for a while:

 Probably forever.   I took a look at this one in particular a
 while ago, there looks to be a fairly obvious race condition in
 the test - it vforks, then the child and parent each set argv[0]
 and re-exec the test file again - the test case looks to see that
 both processes have a shared socket open - or something like that.

 But that test happens when the parent exits, simply assuming that by
 that time the child will have had time to establish its setup (since,
 being vfork(), it has to exec() or exit() before the parent gets
 control back from the vfork()), and often, that works, but not always.
 When it doesn't, when the test code (the script) looks to see the state
 of the sockets it isn't what it is expecting.

 I would have fixed it at the time (a few months ago now), but I couldn't
 determine exactly what the test was supposed to be testing, and different
 ways to overcome the problem might break whatever that was supposed to
 be (rendering the test even more useless than it currently is).

 If I were to do anything, I'd probably just delete the whole test as
 being basically useless.

 I didn't examine the other test cases (some of which also intermittently
 fail) but I'd be not at all surprised to see much of the same from them
 (except perhaps the very basic one which doesn't do almost anything, and
 probably never fails).

 kre

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Robert Elz <kre@munnari.OZ.AU>
Cc: gnats-bugs@netbsd.org, misc-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: misc/59255: tests/lib/librumpclient/t_exec: intermittent failures
Date: Mon, 7 Apr 2025 12:51:52 +0000

 This is a multi-part message in MIME format.
 --=_rFLsIlDMAmZS54PhaM3eE2ii5a8qA3Zz

 > Date: Mon, 07 Apr 2025 02:00:06 +0700
 > From: Robert Elz <kre@munnari.OZ.AU>
 > 
 >     Date:        Sun,  6 Apr 2025 14:20:01 +0000 (UTC)
 >     From:        campbell+netbsd@mumble.net
 >     Message-ID:  <20250406142001.10B3B1A923E@mollari.NetBSD.org>
 > 
 > 
 >   | Various test cases in tests/lib/librumpclient/t_exec have
 >   | been intermittently failing for a while:
 > 
 > Probably forever.   I took a look at this one in particular a
 > while ago, there looks to be a fairly obvious race condition in
 > the test - it vforks, then the child and parent each set argv[0]
 > and re-exec the test file again - the test case looks to see that
 > both processes have a shared socket open - or something like that.
 > 
 > But that test happens when the parent exits, simply assuming that by
 > that time the child will have had time to establish its setup (since,
 > being vfork(), it has to exec() or exit() before the parent gets
 > control back from the vfork()), and often, that works, but not always.
 > When it doesn't, when the test code (the script) looks to see the state
 > of the sockets it isn't what it is expecting.

 The sequence of events is something like this:

 1. vfork()
 2.   vfork returns in child
 3.   child sets childran=1
 4.   child sends HANDSHAKE_FORK to rump_server
 5.     server sets up file descriptors
 6.   child receives HANDSHAKE_FORK reply from rump_server
 7.   child calls rumpclient_exec
 8.     child calls execve (no communication with rump_server)

 At this point, the child and parent run in parallel:

 9(a). child calls rumpclient_init -> lwproc_execnotify to tell
       rump_server its p_comm (and whatever else, like closing
       O_CLOEXEC descriptors)
 9(b). vfork returns in parent, parent exits, test runs rump.sockstat

 The test fails if 9(b) runs before 9(a) so rump.sockstat still shows
 the old p_comm rather than the new p_comm.

 We can ensure these are sequenced, preserving the non-rumpy vfork(2)
 semantics, by creating a pipe shared between parent and child.  The
 attached patch implements this.

 (I have not been able to reproduce the failure at all in a VM on my
 laptop, though, after thousands of trials, so I can't confirm it
 eliminates the symptom.)

 That said, I'm not entirely sure that p_comm access is _guaranteed_ to
 be ready by the time a vforked execve(2) wakes the parent.  There is a
 similar question about psstrings with posix_spawn:
 https://gnats.netbsd.org/59175

 But it's not really that costly to add this additional logic to
 rumpclient to dispense with the question altogether; it's more for
 testing and experiments than performance.

 --=_rFLsIlDMAmZS54PhaM3eE2ii5a8qA3Zz
 Content-Type: text/plain; charset="ISO-8859-1"; name="pr59255-rumpvforkexecwait"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="pr59255-rumpvforkexecwait.patch"

 # HG changeset patch
 # User Taylor R Campbell <riastradh@NetBSD.org>
 # Date 1744029106 0
 #      Mon Apr 07 12:31:46 2025 +0000
 # Branch trunk
 # Node ID a51ca88c1069b743be56446e0cc8b6e102719bec
 # Parent  cce289282f347391206ac4392309e42775902465
 # EXP-Topic riastradh-pr59255-rumpexectests
 librumpclient: Make rumpclient_vfork wait for child to finish exec.

 `Finish exec' here means making it into rumpclient_init past
 lwproc_execnotify.

 No change to rumpclient_fork, other than reserving thread-local
 storage for one int object to pass a file descriptor from
 rumpclient_vfork to rumpclient_exec.

 PR misc/59255: tests/lib/librumpclient/t_exec: intermittent failures

 diff -r cce289282f34 -r a51ca88c1069 distrib/sets/lists/base/shl.mi
 --- a/distrib/sets/lists/base/shl.mi	Mon Apr 07 01:54:02 2025 +0000
 +++ b/distrib/sets/lists/base/shl.mi	Mon Apr 07 12:31:46 2025 +0000
 @@ -90,7 +90,7 @@
  ./lib/libradius.so.5.0				base-sys-shlib		dynamicroot
  ./lib/librumpclient.so				base-sys-shlib		dynamicroot,rump
  ./lib/librumpclient.so.0			base-sys-shlib		dynamicroot,rump
 -./lib/librumpclient.so.0.0			base-sys-shlib		dynamicroot,rump
 +./lib/librumpclient.so.0.1			base-sys-shlib		dynamicroot,rump
  ./lib/librumpres.so				base-sys-shlib		dynamicroot,rump
  ./lib/librumpres.so.0				base-sys-shlib		dynamicroot,rump
  ./lib/librumpres.so.0.0				base-sys-shlib		dynamicroot,rump
 diff -r cce289282f34 -r a51ca88c1069 distrib/sets/lists/debug/shl.mi
 --- a/distrib/sets/lists/debug/shl.mi	Mon Apr 07 01:54:02 2025 +0000
 +++ b/distrib/sets/lists/debug/shl.mi	Mon Apr 07 12:31:46 2025 +0000
 @@ -29,7 +29,7 @@
  ./usr/libdata/debug/lib/libprop.so.1.2.debug			comp-sys-debug	debug,dynami=
 croot
  ./usr/libdata/debug/lib/libpthread.so.1.5.debug			comp-sys-debug	debug,dyn=
 amicroot
  ./usr/libdata/debug/lib/libradius.so.5.0.debug			comp-sys-debug	debug,dyna=
 microot
 -./usr/libdata/debug/lib/librumpclient.so.0.0.debug		comp-rump-debug	debug,=
 dynamicroot,rump
 +./usr/libdata/debug/lib/librumpclient.so.0.1.debug		comp-rump-debug	debug,=
 dynamicroot,rump
  ./usr/libdata/debug/lib/librumpres.so.0.0.debug			comp-rump-debug	debug,dy=
 namicroot,rump
  ./usr/libdata/debug/lib/libterminfo.so.2.0.debug		comp-sys-debug	debug,dyn=
 amicroot
  ./usr/libdata/debug/lib/libumem.so.0.0.debug			comp-zfs-debug	debug,dynami=
 croot,zfs
 diff -r cce289282f34 -r a51ca88c1069 lib/librumpclient/rumpclient.c
 --- a/lib/librumpclient/rumpclient.c	Mon Apr 07 01:54:02 2025 +0000
 +++ b/lib/librumpclient/rumpclient.c	Mon Apr 07 12:31:46 2025 +0000
 @@ -83,6 +83,7 @@
 =20
  #define HOSTOPS
  int	(*host_socket)(int, int, int);
 +int	(*host_socketpair)(int, int, int, int *);
  int	(*host_close)(int);
  int	(*host_connect)(int, const struct sockaddr *, socklen_t);
  int	(*host_fcntl)(int, int, ...);
 @@ -121,9 +122,13 @@ static struct spclient clispc =3D {
  static int holyfd =3D -1;
  static sigset_t fullset;
 =20
 +static __thread int waitforexecnotifyparentfd =3D -1;
 +
  static int doconnect(void);
  static int handshake_req(struct spclient *, int, void *, int, bool);
 =20
 +static void waitforexec_notify(int);
 +
  /*
   * Default: don't retry.  Most clients can't handle it
   * (consider e.g. fds suddenly going missing).
 @@ -864,6 +869,7 @@ rumpclient_init(void)
  	int error;
  	int rv =3D -1;
  	int hstype;
 +	int notifyparentfd;
  	pid_t mypid;
 =20
  	/*
 @@ -911,6 +917,7 @@ rumpclient_init(void)
  	FINDSYM(socket)
  #endif
 =20
 +	FINDSYM(socketpair)
  	FINDSYM(close)
  	FINDSYM(connect)
  	FINDSYM(fcntl)
 @@ -960,7 +967,8 @@ rumpclient_init(void)
  		goto out;
 =20
  	if ((p =3D getenv("RUMPCLIENT__EXECFD")) !=3D NULL) {
 -		sscanf(p, "%d,%d", &clispc.spc_fd, &holyfd);
 +		sscanf(p, "%d,%d,%d", &clispc.spc_fd, &holyfd,
 +		    &notifyparentfd);
  		unsetenv("RUMPCLIENT__EXECFD");
  		hstype =3D HANDSHAKE_EXEC;
  	} else {
 @@ -980,6 +988,12 @@ rumpclient_init(void)
  	}
  	rv =3D 0;
 =20
 +	/*
 +	 * Notify the parent that exec has completed, in case it is
 +	 * waiting on vfork.
 +	 */
 +	waitforexec_notify(notifyparentfd);
 +
   out:
  	if (rv =3D=3D -1)
  		init_done =3D 0;
 @@ -1023,6 +1037,129 @@ rumpclient_prefork(void)
  	return rpf;
  }
 =20
 +#define	WAITFOREXECFD_PARENT	0
 +#define	WAITFOREXECFD_CHILD	1
 +
 +/*
 + * rumpclient_waitforexec_prepare(waitforexecfd)
 + *
 + *	Called from the parent before forking when the parent wants to
 + *	wait for exec to complete in the child, for vfork(2) semantics.
 + *	Initialize execfd[0] and execfd[1] with file descriptors for
 + *	use with rumpclient_waitforexec_parent/child (or
 + *	rumpclient_waitforexec_cancel).
 + */
 +int
 +rumpclient_waitforexec_prepare(int waitforexecfd[2])
 +{
 +
 +	return host_socketpair(PF_LOCAL, SOCK_STREAM, 0, waitforexecfd);
 +}
 +
 +/*
 + * rumpclient_waitforexec_cancel(waitforexecfd)
 + *
 + *	Called from the parent after rumpclient_waitforexec_prepare if
 + *	fork/vfork fails.
 + */
 +void
 +rumpclient_waitforexec_cancel(int waitforexecfd[2])
 +{
 +
 +	(void)host_close(waitforexecfd[WAITFOREXECFD_PARENT]);
 +	(void)host_close(waitforexecfd[WAITFOREXECFD_CHILD]);
 +}
 +
 +/*
 + * rumpclient_waitforexec_child(waitforexecfd)
 + *
 + *	Called by the child between rumpclient_vfork and
 + *	rumpclient_exec with the fds created by
 + *	rumpclient_waitforexec_prepare.
 + */
 +void
 +rumpclient_waitforexec_child(int waitforexecfd[2])
 +{
 +
 +	/*
 +	 * Close the parent's fd -- we don't need it any more.
 +	 */
 +	(void)host_close(waitforexecfd[WAITFOREXECFD_PARENT]);
 +
 +	/*
 +	 * Record the fd we will use to notify the parent after exec,
 +	 * passed through the RUMPCLIENT__EXECFD environment variable.
 +	 *
 +	 * We are running as the single thread of a child process, but
 +	 * we still share address space with the parent, so it's really
 +	 * multithreaded.  waitforexecnotifyparentfd is a thread-local
 +	 * variable to obviate any need for serialization here.
 +	 */
 +	waitforexecnotifyparentfd =3D waitforexecfd[WAITFOREXECFD_CHILD];
 +}
 +
 +/*
 + * rumpclient_waitforexec_parent(waitforexecfd)
 + *
 + *	Called by the parent after rumpclient_vfork to wait for the
 + *	child to call rumpclient_exec, or exit.
 + */
 +void
 +rumpclient_waitforexec_parent(int waitforexecfd[2])
 +{
 +	char c;
 +
 +	/*
 +	 * Close the child's end for writing to us so we won't hang
 +	 * forever if the child exits without writing anything.  Next,
 +	 * read from our end to wait until the child has written or
 +	 * exited.
 +	 */
 +	(void)host_close(waitforexecfd[WAITFOREXECFD_CHILD]);
 +	(void)host_read(waitforexecfd[WAITFOREXECFD_PARENT], &c, 1);
 +
 +	/*
 +	 * Close our end now that we've read from it, and out of
 +	 * paranoia, clear it out of waitforexecnotifyparentfd.
 +	 *
 +	 * We must wait until _after_ the read to clear out
 +	 * waitforexecnotifyparentfd because the child shares address
 +	 * space until it execs.
 +	 */
 +	(void)host_close(waitforexecfd[WAITFOREXECFD_PARENT]);
 +	waitforexecnotifyparentfd =3D -1;
 +}
 +
 +/*
 + * waitforexec_notify(notifyparentfd)
 + *
 + *	Called by the child after exec.  Wakes the parent waiting in
 + *	rumpclient_waitforexec_parent, if any.
 + */
 +static void
 +waitforexec_notify(int notifyparentfd)
 +{
 +	char b =3D 0;
 +	struct iovec iov =3D { .iov_base =3D &b, .iov_len =3D 1 };
 +	struct msghdr msg =3D { .msg_iov =3D &iov, .msg_iovlen =3D 1 };
 +
 +	/*
 +	 * If there's no notifyparentfd, because the child was created
 +	 * with fork rather than vfork, nothing to do.
 +	 */
 +	if (notifyparentfd =3D=3D -1)
 +		return;
 +
 +	/*
 +	 * Parent is waiting for exec to complete.  Notify them that
 +	 * exec has completed -- but if the parent has died, don't
 +	 * raise SIGPIPE; just move on.  After this we have no more
 +	 * need of the connection, so close the fd.
 +	 */
 +	(void)host_sendmsg(notifyparentfd, &msg, MSG_NOSIGNAL);
 +	(void)host_close(notifyparentfd);
 +}
 +
  int
  rumpclient_fork_init(struct rumpclient_fork *rpf)
  {
 @@ -1148,7 +1285,7 @@ pid_t
  rumpclient_fork(void)
  {
 =20
 -	return rumpclient__dofork(fork);
 +	return rumpclient__dofork(fork, /*waitforexec*/0);
  }
 =20
  /*
 @@ -1166,8 +1303,8 @@ rumpclient_exec(const char *path, char *
  	size_t nelem;
  	int rv, sverrno;
 =20
 -	snprintf(buf, sizeof(buf), "RUMPCLIENT__EXECFD=3D%d,%d",
 -	    clispc.spc_fd, holyfd);
 +	snprintf(buf, sizeof(buf), "RUMPCLIENT__EXECFD=3D%d,%d,%d",
 +	    clispc.spc_fd, holyfd, waitforexecnotifyparentfd);
  	envstr =3D malloc(strlen(buf)+1);
  	if (envstr =3D=3D NULL) {
  		return ENOMEM;
 diff -r cce289282f34 -r a51ca88c1069 lib/librumpclient/rumpclient.h
 --- a/lib/librumpclient/rumpclient.h	Mon Apr 07 01:54:02 2025 +0000
 +++ b/lib/librumpclient/rumpclient.h	Mon Apr 07 12:31:46 2025 +0000
 @@ -48,7 +48,7 @@ typedef RUMP_REGISTER_T register_t;
 =20
  struct rumpclient_fork;
 =20
 -#define rumpclient_vfork() rumpclient__dofork(vfork)
 +#define rumpclient_vfork() rumpclient__dofork(vfork, /*waitforexec*/1)
 =20
  #ifdef __BEGIN_DECLS
  __BEGIN_DECLS
 @@ -63,6 +63,10 @@ struct rumpclient_fork *rumpclient_prefo
  int			rumpclient_fork_init(struct rumpclient_fork *);
  void			rumpclient_fork_cancel(struct rumpclient_fork *);
  void			rumpclient_fork_vparent(struct rumpclient_fork *);
 +int			rumpclient_waitforexec_prepare(int[2]);
 +void			rumpclient_waitforexec_cancel(int[2]);
 +void			rumpclient_waitforexec_child(int[2]);
 +void			rumpclient_waitforexec_parent(int[2]);
 =20
  pid_t rumpclient_fork(void);
  int rumpclient_exec(const char *, char *const [], char *const[]);
 @@ -86,21 +90,31 @@ int rumpclient__closenotify(int *, enum=20
   * run in the caller's stackframe.
   */
  static __attribute__((__always_inline__)) __returns_twice inline pid_t
 -rumpclient__dofork(pid_t (*forkfn)(void))
 +rumpclient__dofork(pid_t (*forkfn)(void), int waitforexec)
  {
  	struct rumpclient_fork *rf;
  	pid_t pid;
  	int childran =3D 0;
 +	int waitforexecfd[2];
 =20
  	if (!(rf =3D rumpclient_prefork()))
  		return -1;
 -               =20
 +	if (waitforexec) {
 +		if (rumpclient_waitforexec_prepare(waitforexecfd) =3D=3D -1) {
 +			rumpclient_fork_cancel(rf);
 +			return -1;
 +		}
 +	}
  	switch ((pid =3D forkfn())) {
  	case -1:
 +		if (waitforexec)
 +			rumpclient_waitforexec_cancel(waitforexecfd);
  		rumpclient_fork_cancel(rf);
  		break;
  	case 0:
  		childran =3D 1;
 +		if (waitforexec)
 +			rumpclient_waitforexec_child(waitforexecfd);
  		if (rumpclient_fork_init(rf) =3D=3D -1)
  			pid =3D -1;
  		break;
 @@ -108,6 +122,8 @@ rumpclient__dofork(pid_t (*forkfn)(void)
  		/* XXX: multithreaded vforker?  do they exist? */
  		if (childran)
  			rumpclient_fork_vparent(rf);
 +		if (waitforexec)
 +			rumpclient_waitforexec_parent(waitforexecfd);
  		break;
  	}
 =20
 diff -r cce289282f34 -r a51ca88c1069 lib/librumpclient/shlib_version
 --- a/lib/librumpclient/shlib_version	Mon Apr 07 01:54:02 2025 +0000
 +++ b/lib/librumpclient/shlib_version	Mon Apr 07 12:31:46 2025 +0000
 @@ -1,4 +1,4 @@
  #	$NetBSD: shlib_version,v 1.1 2010/11/04 21:01:29 pooka Exp $
  #
  major=3D0
 -minor=3D0
 +minor=3D1

 --=_rFLsIlDMAmZS54PhaM3eE2ii5a8qA3Zz--

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: misc/59255: tests/lib/librumpclient/t_exec: intermittent failures
Date: Mon, 07 Apr 2025 22:56:03 +0700

     Date:        Mon,  7 Apr 2025 12:55:01 +0000 (UTC)
     From:        "Taylor R Campbell via gnats" <gnats-admin@NetBSD.org>
     Message-ID:  <20250407125501.54C371A923C@mollari.NetBSD.org>


   |  The sequence of events is something like this:

   |  9(b). vfork returns in parent, parent exits, test runs rump.sockstat

 As I recall it (9(b)) is a little more complicated, but that's just
 incidental details, in essence exactly, and

   |  The test fails if 9(b) runs before 9(a) so rump.sockstat still shows
   |  the old p_comm rather than the new p_comm.

 Yes, that was my conclusion.   On most systems this is probably rare, the
 child is already using the CPU, and would normally just keep on running,
 while the parent has been sleeping and needs to get itself scheduled.
 That's likely why you can't make it fail in local tests.   b5 is something
 of an unusual environment - I haven't attempted to look, but it could be
 that the probability of failure is higher when b5 is simultaneously
 doing several other parallel builds/test runs when the test is run, and
 much less likely to fail when it is (for b5) relatively idle (or even
 perhaps vice versa).

   |  We can ensure these are sequenced, preserving the non-rumpy vfork(2)
   |  semantics, by creating a pipe shared between parent and child.  The
   |  attached patch implements this.

 That's one way - what's needed is some way for the child to inform the
 parent that it has completed its task, and is ready for the script to
 test the results.   A pipe can achieve that, so could sending a (caught)
 signal from the child to the parent (which would not require any kind of
 detour via rump).   The are other more heavyweight possibilities.

 But before doing any of that I think we really need to understand the
 purpose of the test, if it is to test that sockstat can get owner info
 from sockets, that can be done with a much simpler test.   If it is to
 test that vfork works, that can also be done with a much simpler test
 (the one that is there now would be satisfied by fork() instead I believe,
 whereas we need that vfork() have vfork() properties and not just be
 fork()) if it is to test that exec passes args than can be parsed, that
 can also be done with a much simpler test.

 I just cannot fathom what the test is actually testing.   Without that
 what ought be done to it remains mysterious, and is why I just gave up
 on looking at it.

   |  That said, I'm not entirely sure that p_comm access is _guaranteed_ to
   |  be ready by the time a vforked execve(2) wakes the parent.

 Aside from the race condition above, I didn't look further, so that may
 indeed also be an issue.

   |  But it's not really that costly to add this additional logic to
   |  rumpclient to dispense with the question altogether; it's more for
   |  testing and experiments than performance.

 Yes, and while keeping the test runtime on b5 down to something reasonable
 (ie: not adding anything not really necessary for a test) is a good thing,
 its hard to see any changes here making any material difference.

 kre

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.49 2026/05/14 01:52:41 riastradh Exp $
$NetBSD: gnats_config.sh,v 1.10 2026/05/13 22:00:09 riastradh Exp $
Copyright © 1994-2026 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.