NetBSD Problem Report #59081

From www@netbsd.org  Sun Feb 16 20:22:37 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits)
	 client-signature RSA-PSS (2048 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id B0F6F1A923D
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 16 Feb 2025 20:22:37 +0000 (UTC)
Message-Id: <20250216202236.780351A923E@mollari.NetBSD.org>
Date: Sun, 16 Feb 2025 20:22:36 +0000 (UTC)
From: rbranco@suse.de
Reply-To: rbranco@suse.de
To: gnats-bugs@NetBSD.org
Subject: Add close_range() system call
X-Send-Pr-Version: www-1.0

>Number:         59081
>Category:       kern
>Synopsis:       Add close_range() system call
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Sun Feb 16 20:25:00 +0000 2025
>Last-Modified:  Sat Jul 19 19:25:01 +0000 2025
>Originator:     Ricardo Branco
>Release:        
>Organization:
>Environment:
NetBSD netbsdx.fritz.box 10.99.12 NetBSD 10.99.12 (CUSTOM) amd64

>Description:
Add close_range() system call

Adapt existing code in compat_linux for close_range and make it use the new native system call

An existing test case for closefrom(3) was adapted and extended for close_range(2)
>How-To-Repeat:

>Fix:
https://github.com/NetBSD/src/pull/43

>Audit-Trail:
From: "David H. Gutteridge" <david@gutteridge.ca>
To: Gnats Bugs <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: kern/59081: Add close_range() system call
Date: Tue, 25 Mar 2025 17:41:35 -0400

 It would be nice to have this available natively, for sure. I was asked
 by an upstream project why NetBSD didn't have this.

 Thanks,

 Dave

From: =?UTF-8?Q?J=C3=B6rg_Sonnenberger?= <joerg@bec.de>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, rbranco@suse.de
Cc: 
Subject: Re: kern/59081: Add close_range() system call
Date: Wed, 26 Mar 2025 16:46:37 +0100

 On 3/25/25 10:45 PM, David H. Gutteridge via gnats wrote:
 > The following reply was made to PR kern/59081; it has been noted by GNATS.
 > 
 > From: "David H. Gutteridge" <david@gutteridge.ca>
 > To: Gnats Bugs <gnats-bugs@netbsd.org>
 > Cc:
 > Subject: Re: kern/59081: Add close_range() system call
 > Date: Tue, 25 Mar 2025 17:41:35 -0400
 > 
 >   It would be nice to have this available natively, for sure. I was asked
 >   by an upstream project why NetBSD didn't have this.

 I've never seen a use case that closefrom(3) doesn't cover.

 Joerg

From: Ricardo Branco <rbranco@suse.de>
To: =?UTF-8?Q?J=C3=B6rg_Sonnenberger?= <joerg@bec.de>, gnats-bugs@netbsd.org,
 kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/59081: Add close_range() system call
Date: Wed, 26 Mar 2025 17:00:42 +0100

 closefrom doesn't handle CLOSE_RANGE_CLOEXEC.

 I see close_range being used by container projects.


 Best,

 Ricardo

 On 3/26/25 4:46 PM, Jörg Sonnenberger wrote:
 >
 >
 > On 3/25/25 10:45 PM, David H. Gutteridge via gnats wrote:
 >> The following reply was made to PR kern/59081; it has been noted by 
 >> GNATS.
 >>
 >> From: "David H. Gutteridge" <david@gutteridge.ca>
 >> To: Gnats Bugs <gnats-bugs@netbsd.org>
 >> Cc:
 >> Subject: Re: kern/59081: Add close_range() system call
 >> Date: Tue, 25 Mar 2025 17:41:35 -0400
 >>
 >>   It would be nice to have this available natively, for sure. I was 
 >> asked
 >>   by an upstream project why NetBSD didn't have this.
 >
 > I've never seen a use case that closefrom(3) doesn't cover.
 >
 > Joerg

From: Thomas Klausner <wiz@NetBSD.org>
To: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: kern/59081: Add close_range() system call
Date: Wed, 26 Mar 2025 17:49:16 +0100

 On Wed, Mar 26, 2025 at 04:05:01PM +0000, Ricardo Branco via gnats wrote:
 >  closefrom doesn't handle CLOSE_RANGE_CLOEXEC.

 Is that the same as:

 int fd;
 for (fd=start; fd<end; fd++) {
     if (fcntl(fd, F_SETFD, FD_CLOEXEC) == -1) {
         /* TODO: error handling */
     }
 }

 ?
  Thomas

From: Christos Zoulas <christos@zoulas.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org, rbranco@suse.de
Subject: Re: kern/59081: Add close_range() system call
Date: Wed, 26 Mar 2025 13:00:18 -0400

 FreeBSD and Linux have it. This says python wants it: 
 https://reviews.freebsd.org/D21627

 christos

From: Christos Zoulas <christos@zoulas.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 rbranco@suse.de
Subject: Re: kern/59081: Add close_range() system call
Date: Thu, 27 Mar 2025 17:20:09 -0400

 --Apple-Mail=_5E07A147-354B-43C5-A631-962ED35D838E
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 Here is a complete patch: https://www.netbsd.org/~christos/close_range.diff

 christos

 --Apple-Mail=_5E07A147-354B-43C5-A631-962ED35D838E
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----
 Comment: GPGTools - http://gpgtools.org

 iF0EARECAB0WIQS+BJlbqPkO0MDBdsRxESqxbLM7OgUCZ+XBCgAKCRBxESqxbLM7
 OuIMAJ9FklZeEZuYMwmsCdgoKqIlQdQyqgCgwhhIws9q3y8dBKwyrGJbhSKv48Q=
 =Ixwx
 -----END PGP SIGNATURE-----

 --Apple-Mail=_5E07A147-354B-43C5-A631-962ED35D838E--

From: Ricardo Branco <rbranco@suse.de>
To: Christos Zoulas <christos@zoulas.com>, gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/59081: Add close_range() system call
Date: Sun, 30 Mar 2025 21:57:18 +0200

 Looks good to me. The only thing missing is the test for 
 CLOSE_RANGE_UNSHARE.

 On 3/27/25 10:20 PM, Christos Zoulas wrote:
 > Here is a complete patch: https://www.netbsd.org/~christos/close_range.diff
 >
 > christos

From: Taylor R Campbell <riastradh@NetBSD.org>
To: =?UTF-8?Q?J=C3=B6rg_Sonnenberger?= <joerg@bec.de>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org,
	Ricardo Branco <rbranco@suse.de>,
	"David H. Gutteridge" <david@gutteridge.ca>,
	Christos Zoulas <christos@zoulas.com>
Subject: Re: kern/59081: Add close_range() system call
Date: Sun, 30 Mar 2025 21:41:05 +0000

 > Date: Wed, 26 Mar 2025 16:46:37 +0100
 > From: J=F6rg Sonnenberger <joerg@bec.de>
 >=20
 > > It would be nice to have this available natively, for sure. I was asked
 > > by an upstream project why NetBSD didn't have this.
 >=20
 > I've never seen a use case that closefrom(3) doesn't cover.

 I wish the motivation were more clearly spelled out.  My best guess is
 the following:

 Suppose you want to create a process with a specific fd mapping.  It
 is not necessarily contiguous: for example, with librumphijack, we
 deliberately use two separate ranges of file descriptors, one for
 `host' fds (e.g., the socket to talk to the rump server) and one for
 `rump' fds (interpreted by the rump server), these are separated by a
 large number to reduce the chance of collision.

 So, the fd mapping might look like this:

 parent                 child
 ------                 -----
 0 (stdin)              0 (stdin)
 3 (output file)        1 (stdout)
 3 (output file)        2 (stderr)
 4 (rump socket)        65536

 This shape of mapping is, really, the right interface for a program
 running a subprocess, and I was always disappointed that
 posix_spawn(2) had a sequence of open/dup2/close actions instead of
 such a mapping.

 How do you effect this mapping?

 With closefrom(2), you might do something like this:

 	bitmap_t keepopen =3D {0}
 	int maxfd =3D -1
 	for (entry in map) {
 		bitmap_set(&keepopen, entry.child)
 		if (entry.child =3D=3D entry.parent)
 			continue
 		/* If target entry.child is needed as a source, dup. */
 		for (entry1 in map) {
 			if (entry.child =3D=3D entry1.parent)
 				entry1.parent =3D dup(entry1.parent)
 		}
 		dup2(entry.parent, entry.child)
 		maxfd =3D MAX(maxfd, entry.child)
 	}
 	for (fd =3D 0; fd < maxfd; fd++) {
 		if (!bitmap_isset(&keepopen))
 			close(fd)
 	}
 	closefrom(maxfd + 1)

 With close_range(2), you can instead do:

 	close_range(0, UINT_MAX, CLOSE_RANGE_CLOEXEC)
 	for (entry in map) {
 		if (entry.child =3D=3D entry.parent)
 			continue
 		/* If target entry.child is needed as a source, dup. */
 		for (entry1 in map) {
 			if (entry.child =3D=3D entry1.parent)
 				entry1.parent =3D dup_cloexec(entry.child)
 		}
 		dup2(entry.parent, entry.child)
 		/* Clear FD_CLOEXEC, i.e., keep it open on exec. */
 		fcntl(entry.child, F_SETFD,
 		    fcntl(entry.child, F_GETFD) & ~FD_CLOEXEC)
 	}

 (The inner loop could be eliminated, of course, by first indexing the
 parent sources in linear time and then updating a parent->replacement
 map as we go so the whole thing runs in linear rather than quadratic
 time and never dups the same source repeatedly.  But this is the same
 for both algorithms; it doesn't distinguish closefrom(2) from
 close_range(2).)

 Here's an example of the second algorithm in the real world (with=20

 https://github.com/GNOME/vte/blob/b23aaaeeca588439d4579f4ed06c1f4850219fc5/=
 src/spawn.cc#L380-L385
 https://github.com/GNOME/vte/blob/b23aaaeeca588439d4579f4ed06c1f4850219fc5/=
 src/spawn.cc#L437-L505

 One advantage of the second algorithm with close_range(2) is that it
 doesn't require computing any auxiliary data structure for a
 (potentially sparse) bit map in userland, and doesn't require userland
 to iterate over a (potentially large and sparse) range of file
 descriptors below the first one to closefrom(2).

 One advantage of the first algorithm with closefrom(2) has only one
 traversal over the whole fd table (userland loop + closefrom), while
 the second algorithm with close_range(2) has two -- close_range(2)
 traverses it once to set CLOEXEC, and then in the subsequent exec, the
 kernel traverses it once more to interpret CLOEXEC.  Maybe the kernel
 traversal is cheaper so that doesn't matter.

 So, it's not a priori clear to me that one algorithm wins over the
 other in performance with large fd tables.  But close_range(2) is a
 little more convenient for implementing the interface that is really
 useful.

From: Ricardo Branco <rbranco@suse.de>
To: Christos Zoulas <christos@zoulas.com>, gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/59081: Add close_range() system call
Date: Sat, 26 Apr 2025 19:42:35 +0200

 Can we merge?


 Best,

 R

 On 3/27/25 10:20 PM, Christos Zoulas wrote:
 > Here is a complete patch: https://www.netbsd.org/~christos/close_range.diff
 >
 > christos

From: Ricardo Branco <rbranco@suse.de>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
 Christos Zoulas <christos@zoulas.com>
Cc: 
Subject: Re: kern/59081: Add close_range() system call
Date: Sun, 13 Jul 2025 21:49:23 +0200

 On 3/27/25 10:25 PM, Christos Zoulas via gnats wrote:
 > The following reply was made to PR kern/59081; it has been noted by GNATS.
 >
 > From: Christos Zoulas <christos@zoulas.com>
 > To: gnats-bugs@netbsd.org
 > Cc: kern-bug-people@netbsd.org,
 >   gnats-admin@netbsd.org,
 >   netbsd-bugs@netbsd.org,
 >   rbranco@suse.de
 > Subject: Re: kern/59081: Add close_range() system call
 > Date: Thu, 27 Mar 2025 17:20:09 -0400
 >
 >   --Apple-Mail=_5E07A147-354B-43C5-A631-962ED35D838E
 >   Content-Transfer-Encoding: 7bit
 >   Content-Type: text/plain;
 >   	charset=us-ascii
 >   
 >   Here is a complete patch: https://www.netbsd.org/~christos/close_range.diff
 >   
 >   christos
 The whole approach with CLOSE_RANGE_UNSHARE is broken, meaning that
 the current behaviour with the Linux emulation is broken.

 It doesn't respect the fd range and unshares the whole filedesc structure:

 https://github.com/NetBSD/src/blob/trunk/sys/compat/linux/common/linux_misc.c#L2090

 The diff in https://www.netbsd.org/~christos/close_range.diff is doing 
 the same.

 Linux limits CLOSE_RANGE_UNSHARE to the specified range:

 https://github.com/torvalds/linux/blob/master/fs/file.c#L788

 So either we consider my patch in its original form or drop this attempt to
 extend CLOSE_RANGE_UNSHARE to NetBSD.

 Either way, the current Linux code needs fixing, which my patch does.

 Best,
 Ricardo.

From: Ricardo Branco <rbranco@suse.de>
To: Christos Zoulas <christos@zoulas.com>, gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/59081: Add close_range() system call
Date: Sat, 19 Jul 2025 10:51:42 +0200

 On 3/27/25 10:20 PM, Christos Zoulas wrote:
 > Here is a complete patch: https://www.netbsd.org/~christos/close_range.diff
 >
 > christos

 I updated the PR to add support for CLOSE_RANGE_CLOFORK just as FreeBSD 
 & Illumos implement it.

 Best,

From: Ricardo Branco <rbranco@suse.de>
To: Christos Zoulas <christos@zoulas.com>, gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/59081: Add close_range() system call
Date: Sat, 19 Jul 2025 21:24:26 +0200

 On 3/27/25 10:20 PM, Christos Zoulas wrote:
 > Here is a complete patch: https://www.netbsd.org/~christos/close_range.diff
 >
 > christos

 I uploaded a fcntl version (no syscall) here:
 https://github.com/NetBSD/src/pull/56

 It doesn't touch the Linux code.  I'll leave that to another PR.

 Best,
 Ricardo

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.