NetBSD Problem Report #48892

From martin@duskware.de  Wed Jun 11 08:22:59 2014
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 5B09BA64F0
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 11 Jun 2014 08:22:59 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: some tests will not clean up rump server processes
X-Send-Pr-Version: 3.95

>Number:         48892
>Notify-List:    riastradh@NetBSD.org
>Category:       bin
>Synopsis:       some tests will not clean up rump server processes
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jun 11 08:25:00 +0000 2014
>Closed-Date:    
>Last-Modified:  Sat Apr 26 02:20:03 +0000 2025
>Originator:     Martin Husemann
>Release:        NetBSD 6.99.43
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD unpluged.duskware.de 6.99.43 NetBSD 6.99.43 (UNPLUGED) #5: Tue Jun 10 20:13:06 CEST 2014 martin@seven-days-to-the-wolves.aprisoft.de:/usr/src/sys/arch/evbarm/compile/UNPLUGED evbarm
Architecture: earm
Machine: evbarm
>Description:
Some tests, like for example /usr/tests/fs/nfs/t_rquotad, will create 
background server processes via rump_server or similar. There seems to be
no proper atf cleanup path used in this tests (or it does not work).

Now consider this fragment fo r_quotad:

        #now try a quota(8) call
        export RUMPHIJACK='blanket=/mnt,socket=all,path=/rump,vfs=getvfsstat'
        for q in ${expect} ; do
                local id=$(id -${q})
                atf_check -s exit:0 \
-o "match:/mnt        0       10    40960               1      20   51200      $
-o "match:Disk quotas for .*: $" \
                    quota -${q} -v
        done

and note the cleanup code after it:

        unset LD_PRELOAD
        rump_quota_shutdown

Unfortunately this means that if the check fails (the -o match: disagrees with
the output) the test will abort and rump_quota_shutdown will not be invoked.

Worse: imagine you are using a (limited) tmpfs on /tmp. The left over 
rump_server (or rump_* mounts) will keep the filesystem image open, even
after atf automatically removed the working directory. So your tmpfs runs
full and more tests fail...

>How-To-Repeat:
Currently a diskless system with / on nfs and /tmp on tmpfs is enough to
trigger bogus output from quota(1), so this will trigger the issue. Once
that is fixed: just modify the match expression to mismatch and run the
test, then pgrep for rump_server.

>Fix:
use atf explicit cleanup handling to shut down the rump servers.

>Release-Note:

>Audit-Trail:
From: "Andreas Gustafsson" <gson@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48892 CVS commit: src/tests/fs/nfs
Date: Thu, 20 Aug 2020 07:32:40 +0000

 Module Name:	src
 Committed By:	gson
 Date:		Thu Aug 20 07:32:40 UTC 2020

 Modified Files:
 	src/tests/fs/nfs: t_rquotad.sh

 Log Message:
 Add cleanup of possible leftover rump processes, replacing the
 non-working cleanup code just removed from ffs_common.sh.  Fixes
 PR bin/48892 with respect to the t_rquotad test.


 To generate a diff of this commit:
 cvs rdiff -u -r1.7 -r1.8 src/tests/fs/nfs/t_rquotad.sh

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Wed, 21 Jul 2021 05:03:30 +0000
State-Changed-Why:
is this fixed?


From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/48892 (some tests will not clean up rump server processes)
Date: Wed, 21 Jul 2021 10:22:35 +0200

 Not sure how to describe this properly - for the concrete test mentioned
 in the PR: yes.

 But we have disabled *lots* of tests in the main test runs because they
 leave rump_server processes around (and there are likely a few more PRs
 for some individual cases).

 Not sure it is worth to have this blanket PR open or how to systematically
 collect all issues best.

 Martin

State-Changed-From-To: feedback->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 24 Jul 2021 03:27:22 +0000
State-Changed-Why:
feed is back


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/48892 (some tests will not clean up rump server processes)
Date: Sat, 24 Jul 2021 03:27:04 +0000

 On Wed, Jul 21, 2021 at 08:25:01AM +0000, Martin Husemann wrote:
  >  Not sure how to describe this properly - for the concrete test mentioned
  >  in the PR: yes.
  >  
  >  But we have disabled *lots* of tests in the main test runs because they
  >  leave rump_server processes around (and there are likely a few more PRs
  >  for some individual cases).
  >  
  >  Not sure it is worth to have this blanket PR open or how to systematically
  >  collect all issues best.

 Bleh. Ok, I'm going to suggest the following: Someone(TM) gather a
 list of tests that are explicitly disabled, and file them in a new PR.
 We'll leave this one open until that eventually happens.

 We could also file a PR on atf noting that it ought to attend to this
 automatically (given that a good chunk of its justification for
 existence is that it "sandboxes" and "cleans up after" tests) but I
 doubt that will accomplish anything. :-|

 -- 
 David A. Holland
 dholland@netbsd.org

From: Andreas Gustafsson <gson@gson.org>
To: dholland-bugs@netbsd.org, martin@NetBSD.org
Cc: gnats-bugs@netbsd.org
Subject: Re: bin/48892 (some tests will not clean up rump server processes)
Date: Sat, 24 Jul 2021 16:11:09 +0300

 David Holland wrote:
 >  Bleh. Ok, I'm going to suggest the following: Someone(TM) gather a
 >  list of tests that are explicitly disabled, and file them in a new PR.
 >  We'll leave this one open until that eventually happens.

 Gathering a list is fine, but I don't see what would be accomplished
 by replacing the present PR with a new one reporting the same problem,
 other than losing history showing how past instances of the problem
 were fixed.

 As for the list, I could only find three test cases skipped due to
 leftover rump_server processes, all in the same test:

   ./rump/rumpkern/t_sp.sh:test_case_skip stress_long kern/50350 "leftover rump_server"
   ./rump/rumpkern/t_sp.sh:test_case_skip stress_killer kern/55356 "leftover rump_server"
   ./rump/rumpkern/t_sp.sh:test_case_skip reconnect kern/55304 "leftover rump_server"

 Perhaps martin can find more?
 -- 
 Andreas Gustafsson, gson@gson.org

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/48892 (some tests will not clean up rump server processes)
Date: Sun, 25 Jul 2021 01:18:03 +0000

 On Sat, Jul 24, 2021 at 01:15:02PM +0000, Andreas Gustafsson wrote:
  >  Gathering a list is fine, but I don't see what would be accomplished
  >  by replacing the present PR with a new one reporting the same problem,
  >  other than losing history showing how past instances of the problem
  >  were fixed.

 Maybe none, but sometimes it's helpful to not have the first chunk of
 the PR make it look like the problem's been fixed.

  >  As for the list, I could only find three test cases skipped due to
  >  leftover rump_server processes, all in the same test:
  >  
  >    ./rump/rumpkern/t_sp.sh:test_case_skip stress_long kern/50350 "leftover rump_server"
  >    ./rump/rumpkern/t_sp.sh:test_case_skip stress_killer kern/55356 "leftover rump_server"
  >    ./rump/rumpkern/t_sp.sh:test_case_skip reconnect kern/55304 "leftover rump_server"

 Thanks.

 I don't understand how any of these scripts manage to work, but
 hopefully someone does.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Taylor R Campbell <riastradh@NetBSD.org>
To: martin@NetBSD.org
Cc: gson@NetBSD.org, dholland-bugs@NetBSD.org, uwe@NetBSD.org
Subject: Re: bin/48892: some tests will not clean up rump server processes
Date: Sat, 26 Apr 2025 02:16:07 +0000

 This is a multi-part message in MIME format.
 --=_QJyEfqlAfncoM8fB56hLDQdX5vxg3caH

 The attached patch teaches rump_server to respect an environment
 variable RUMPDAEMON_KEEPSESSION so that it does not do setsid (just
 setting the variable is enough, any value will do).

 That way, when atf kills the test process's process group (which it
 already does), it will kill any rump_server processes spawned by the
 test.

 I haven't patched all of the several hundred uses of rump_server
 throughout src/tests to set it.  But, until that is done, you could
 test this patch by running the tests with

 RUMPDAEMON_KEEPSESSION= atf-run

 and see if the troublesome stress-killers still leave rump_servers
 around.

 Another alternative -- kludgier but perhaps more effective since it
 covers more than just rump_server -- would be to LD_PRELOAD a library
 that overrides:

 pid_t setsid(void) { return getpid(); }

 --=_QJyEfqlAfncoM8fB56hLDQdX5vxg3caH
 Content-Type: text/plain; charset="ISO-8859-1"; name="pr48892-rumpsuppresssetsid"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="pr48892-rumpsuppresssetsid.patch"

 # HG changeset patch
 # User Taylor R Campbell <riastradh@NetBSD.org>
 # Date 1745629371 0
 #      Sat Apr 26 01:02:51 2025 +0000
 # Branch trunk
 # Node ID 0651c236b598cad1b91312f3fce68600bdec9095
 # Parent  fa66a8de28195fb9d8985a1c997f6a4b3fd8da4d
 # EXP-Topic riastradh-pr48892-atfservers
 rump: New environment variable RUMPDAEMON_KEEPSESSION.

 If defined, the server will remain in the same session and process
 group as the caller when it daemonizes.

 This way, we can define it during test runs so that all the
 rump_server processes are in the same process group as the atf test
 itself -- and so even if the cleanups fail, when atf kills the
 process group with killpg, the servers should terminate more
 reliably

 PR bin/48892: some tests will not clean up rump server processes

 diff -r fa66a8de2819 -r 0651c236b598 lib/librumpuser/rumpuser_daemonize.c
 --- a/lib/librumpuser/rumpuser_daemonize.c	Thu Apr 24 18:37:59 2025 +0000
 +++ b/lib/librumpuser/rumpuser_daemonize.c	Sat Apr 26 01:02:51 2025 +0000
 @@ -112,7 +112,8 @@ rumpuser_daemonize_begin(void)
 =20
  	switch (fork()) {
  	case 0:
 -		if (setsid() =3D=3D -1) {
 +		if (getenv("RUMPDAEMON_KEEPSESSION") =3D=3D NULL &&
 +		    _setsid() =3D=3D -1) {
  			rumpuser_daemonize_done(errno);
  		}
  		rv =3D 0;
 diff -r fa66a8de2819 -r 0651c236b598 usr.bin/rump_allserver/rump_allserver.1
 --- a/usr.bin/rump_allserver/rump_allserver.1	Thu Apr 24 18:37:59 2025 +0000
 +++ b/usr.bin/rump_allserver/rump_allserver.1	Sat Apr 26 01:02:51 2025 +0000
 @@ -211,6 +211,19 @@ After use,
  .Nm
  can be made to exit using
  .Xr rump.halt 1 .
 +.Sh ENVIRONMENT
 +The following environment variables affect
 +.Nm rump_server
 +and
 +.Nm rump_allserver :
 +.Bl -tag -width Ev
 +.It Ev RUMPDAEMON_KEEPSESSION
 +If defined, the server will remain in the same session and process
 +group as the caller when it daemonizes.
 +By default, when the server daemonizes it will enter a new session and
 +process group as with
 +.Xr setsid 2 .
 +.El
  .Sh EXAMPLES
  Start a server and load the tmpfs file system module, and halt the
  server immediately afterwards:

 --=_QJyEfqlAfncoM8fB56hLDQdX5vxg3caH--

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.