NetBSD Problem Report #49017
From christos@astron.com Fri Jul 18 17:07:54 2014
Return-Path: <christos@astron.com>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id DB6A0A5672
for <gnats-bugs@gnats.NetBSD.org>; Fri, 18 Jul 2014 17:07:53 +0000 (UTC)
Message-Id: <20140718155148.475D614B68@quasar.astron.com>
Date: Fri, 18 Jul 2014 15:51:48 +0000 (UTC)
From: christos@netbsd.org
Reply-To: christos@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: vfork does not suspend all threads
X-Send-Pr-Version: 3.95
>Number: 49017
>Category: kern
>Synopsis: vfork does not suspend all threads
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Jul 18 17:10:00 +0000 2014
>Last-Modified: Mon Apr 10 00:40:00 +0000 2017
>Originator: Christos Zoulas
>Release: NetBSD 6.99.47
>Organization:
You've been vforked!
>Environment:
System: NetBSD quasar.astron.com 6.99.47 NetBSD 6.99.47 (QUASAR) #2: Fri Jul 18 08:23:06 EDT 2014 christos@quasar.astron.com:/usr/src/sys/arch/amd64/compile/QUASAR amd64
Architecture: x86_64
Machine: amd64
>Description:
vfork is supposed to suspend the parent while the child is preparing for
exec. The parent is resumed after the child exec's or exits. This is done
so that the memory image shared between the parent and the child is not
changed by the parent while it is preparing to exec.
>How-To-Repeat:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <err.h>
#define NTHREAD 2
void *
worker(void *arg) {
size_t *i = arg;
for (size_t j = 0;; j++) {
printf("[%d] %zu %zu\n", getpid(), *i, j);
sleep(1);
}
}
int
main(int argc, char *argv[])
{
pthread_t t[NTHREAD];
for (size_t i = 0; i < NTHREAD; i++)
pthread_create(&t[i], NULL, worker, &i);
switch (vfork()) {
case -1:
err(1, "vfork");
case 0:
default:
sleep(100000);
break;
}
return 0;
}
>Fix:
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Fri, 18 Jul 2014 19:38:00 +0200
Isn't calling vfork in a threaded program just a no-no?
Martin
From: Nico Williams <Nico.Williams@twosigma.com>
To: <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Wed, 5 Apr 2017 18:42:25 +0000
Please do NOT stop any threads in the vfork() parent other than the one that
called vfork(). Also, allow me to make an argument for this.
First, I suppose I should look at rationales for stopping all threads in a
vfork() parent. I can think of two (but if I'm missing some, let me know!): a)
that the man page always said that the parent process is stopped, ergo it must
now mean "all threads in the parent process", and b) that the set of safe
functions to call in the vfork() child is made unacceptably smaller by not
stopping all threads in the parent.
Before refuting my strawman rationales for stopping all threads, I'll explain
why stopping all threads is highly undesirable: it kills performance, the very
reason for vfork()'s existence.
There are several ways to use vfork() to spawn children in a high-performance
way:
- First, obviously, in posix_spawn().
It would be terrible to have to stop all of a JVM's many threads just to
spawn a child, and would negate some of vfork()'s massive performance
advantage over fork().
Why should unrelated threads in the parent suffer? (This gets to the safety
issues which I posit might motivate stopping all parent threads, and which I
address below.) Even if there were a strong safety argument for this, we
should aim to make it go away as the performance rationale for using vfork()
is extremely important in real life cases.
(I should point out that, for example, Linux's vfork() does not stop all
other threads in the parent. I can provide a test program that demonstrates
this.)
- Second, one can implement a very fast popen()-like API that uses a threaded
taskq where threads pre-vfork(), enabling a program to spawn processes
faster than with posix_spawn(): without blocking for the child to spin up
then execve()-or-_exit() -- the threads that pre-call vfork() block that
way, but the threads that dispatch the requests to the pre-vfork()ed
children do not block at all, they only call write(2) to write the job to a
pipe to the child.
See my gist about this where I describe this in detail and propose a new
function with this signature:
pid_t avfork(int (*start_routine)(void *), void *arg);
and provide a partial implementation based on a pre-vforking threaded taskq:
https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c18d7db234
There is such a very fast popen()-like implementation here:
https://github.com/famzah/popen-noshell (warning GPLv3)
that uses clone() on Linux to get something very much like the avfork() that
I argue for. Its author needs to be able to spawn thousands of processes
very quickly sometimes (see
https://github.com/famzah/popen-noshell/issues/11#issuecomment-287235234).
Now, to knock down my strawman rationales for stopping all threads in the
vfork() parent:
- Regarding (a), pre-threads vfork() man page text saying "stops the parent
process" should not be interpreted as meaning "all threads" now that we have
a threaded world. Clearly the original authors could not have meant that,
nor for that matter would they have meant that only the thread that called
vfork() in the parent must be stopped. We must decide this matter de novo.
Clearly the thread that called vfork() must be stopped until the child
execve()s or _exit()s. That much is utterly clear: because two schedulable
threads/entities simply cannot share a stack concurrently. So we only need
to decide whether other threads in the parent must also be stopped, and the
original man page text simply can't guide us as to that as it predates
threads.
- Regarding (b), it may already the case that the set of functions that may
safely be called in the vfork() child is somewhat smaller than the set of
functions that may be called in a fork() child. Since POSIX has deprecated
vfork(), we don't know what that set is (though we can inspect earlier POSIX
standards) and may now define it to our liking.
In any case, the set of async-signal-safe functions defined by POSIX looks
like it should be safe to call in a vfork() child on any reasonable OS since
all of them should be system calls that do not affect the shared address
space (or anything else that might still be shared between the child and the
parent): http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html
As an aside, obviously the child might also probably not want to change FD
or FL flags with fcntl() for file descriptors shared with the parent. And
it should also not use the horrible POSIX file locking, though that's mostly
because nothing should use the horrible POSIX file locking! This aside
brought to you by intense feelings of disgust elicited by POSIX file locking.
Note that there are a number of functions NOT INCLUDED in the standard list
of async-signal-safe functions:
- pthread_*()
- brk(), sbrk(), mmap(), munmap(), mprot()
- the heap allocator (quite naturally, since it might need to call
brk()/sbrk() and/or mmap()/munmap(), or pthread_*() functions, none of
which are async-signal-safe)
which means that the scariest functions one might call on the child-side of
vfork() are by definition (e.g., the old POSIX vfork() specification)
already not safe to call on the child-side of vfork().
In any case, again, NetBSD is free to further narrow the set of functions
that are safe to call in the child-side of vfork() should that be necessary.
The biggest problem with vfork(), really, is that unsafe signal handlers in the
might run in the child before the child can block them. This could be bad even
if all threads in the parent are stopped.
Indeed, I would argue that the set of functions that are safe to call in an
asynchronous signal handler (as opposed to the child-side of fork() or vfork())
is smaller than that which POSIX says. The only things I ever do in the signal
handlers I write are:
- write to sig_atomic_t variables
- call write(2) to write a single byte into a pipe that is used in the
application's event loop
If the application does not have an event loop I do sometimes ensure that
there's a thread blocking on read(2) on the other side of that pipe.
- call write(2) to write to stderr
- call _exit(2)
If I had my way those would be actions things I'd allow in signal handlers in
POSIX! (And then we'd have to give a new name to the async-signal-safe
function set that we reuse to define the functions that are safe to call in
various other contexts such as the child-side of fork()!)
Thanks for taking the time to read this -- it's probably too long, and I
apologize about that. If I'm wrong about something here, please let me know!
Nico
--
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Wed, 5 Apr 2017 21:17:17 +0200
On Wed, Apr 05, 2017 at 06:45:01PM +0000, Nico Williams wrote:
> There are several ways to use vfork() to spawn children in a high-performance
> way:
>
> - First, obviously, in posix_spawn().
At leaset in NetBSD posix_spawn() is completely unrelated to vfork().
Noone is suggesting to stop any thread in a process doing posix_spawn().
Using vfork() in a program with multiple active threads is madness,
posix_spawn() is the only sensible way.
Martin
From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Wed, 5 Apr 2017 21:35:03 +0200
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--9dKGxhHN2jDhmwoBomF2VlDWlkmB7NHmb
Content-Type: multipart/mixed; boundary="bRWBLgUFq29jGiVIHBEcLfb3l0vIxScq9";
protected-headers="v1"
From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Message-ID: <89c89a59-2ac0-8acc-61ff-f3d1a89298cb@gmx.com>
Subject: Re: kern/49017: vfork does not suspend all threads
References: <pr-kern-49017@gnats.netbsd.org>
<20140718155148.475D614B68@quasar.astron.com>
<20170405192000.C1F147A2BC@mollari.NetBSD.org>
In-Reply-To: <20170405192000.C1F147A2BC@mollari.NetBSD.org>
--bRWBLgUFq29jGiVIHBEcLfb3l0vIxScq9
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
On 05.04.2017 21:20, Martin Husemann wrote:
> The following reply was made to PR kern/49017; it has been noted by GNA=
TS.
>=20
> From: Martin Husemann <martin@duskware.de>
> To: gnats-bugs@NetBSD.org
> Cc:=20
> Subject: Re: kern/49017: vfork does not suspend all threads
> Date: Wed, 5 Apr 2017 21:17:17 +0200
>=20
> On Wed, Apr 05, 2017 at 06:45:01PM +0000, Nico Williams wrote:
> > There are several ways to use vfork() to spawn children in a high-p=
erformance
> > way:
> > =20
> > - First, obviously, in posix_spawn().
> =20
> At leaset in NetBSD posix_spawn() is completely unrelated to vfork().
> Noone is suggesting to stop any thread in a process doing posix_spawn(=
).
> =20
> Using vfork() in a program with multiple active threads is madness,
> posix_spawn() is the only sensible way.
> =20
> Martin
> =20
>=20
Well vfork(2) is supposed to suspend a parent process.
"The parent process is suspended while the child is using its resources."=
-- vfork(2)
It's out of POSIX so it's rather harsh to dictate behavior change.
Also going for your proposal is imho violating thread-process model in
NetBSD. It's Linux concept to emulate threads with clone(2), while they
are still regular processes.
In ptrace(2) we have two interfaces: PTRACE_FORK and PTRACE_VFORK. The
difference between them is only in the point whether the parent process
(with all threads) has been suspended or not. No matter what the
original syscall or API was used (clone(2), __clone(2), posix_spwan(2),
fork(2)...).
I thing you might be interested in designing something like _lwp_vfork().=
--bRWBLgUFq29jGiVIHBEcLfb3l0vIxScq9--
--9dKGxhHN2jDhmwoBomF2VlDWlkmB7NHmb
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAEBCAAGBQJY5UbsAAoJEEuzCOmwLnZs/ocP/0uZTFLgXhZCZU0f1e9UoLFq
0xgpnbl5nJdIMZsgCw7NMDzOiSAQnBFUOGKsI/6KsSJrc8eQtSzRgxlZW7MnFBUb
QuDl589usGblCoXfInfdQ91kXcqYOJrjyEC6eB62QxfS3+sYsy036zIaFIUGSOiu
TUxZgs85zw8cw+b8Ll2tt1CeJoG+ooWUFLtl5uuAlKNoP/3k8bKj3E0ZLfPn7BCI
+Ms8gjGntXVdNGRX3Xs0mHrfF6gaMZIKU/5P6lz8tHr1M7sQXcb0EbkY9y7YBObd
dXdROxSn8BTGQAXrPfuD6wIxNTkkZ0em/vsn8dxuCToC1k2ePBjklwuQ4daZA5Cq
d6RdSjxq7fi+EXeYZRrsv0kWhmK4qTDlSOEICRLu0wIdoGkHvZtpMBL7fKDwSAef
mh2zopBw0bNyOvXBQHeQE17iUeRWdUtXbjy/sn4oXvn5Rz/Kn5PIYq0Qhc+8uZxj
TtPM08wE0zf3sO7JT+YzOc351bZYeVTWNpqatGEkHUNxoc48V2FeMTumG9gclZBY
i+1iELDsi1PUtiBkfiMhSFOPNwc+Rc2ZARdWumxjL84UPsLt2Y4Odnac3qsD/Qwq
Jqevn9QqDtHk++igR11iJPe82Ljz7xMdfzMtc8cqhNCi/i52FZ/6cqD/A37wy52B
fs8G20B1qKE2ZOFkVWFW
=vBxX
-----END PGP SIGNATURE-----
--9dKGxhHN2jDhmwoBomF2VlDWlkmB7NHmb--
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Wed, 5 Apr 2017 16:26:04 -0400
On Apr 5, 7:20pm, martin@duskware.de (Martin Husemann) wrote:
-- Subject: Re: kern/49017: vfork does not suspend all threads
| Using vfork() in a program with multiple active threads is madness,
| posix_spawn() is the only sensible way.
Why is that? We allow it, so it should do something reasonable/useful...
Or we should not allow it...
christos
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Wed, 5 Apr 2017 16:38:32 -0400
On Apr 5, 7:35pm, n54@gmx.com (Kamil Rytarowski) wrote:
-- Subject: Re: kern/49017: vfork does not suspend all threads
| Well vfork(2) is supposed to suspend a parent process.
| "The parent process is suspended while the child is using its resources."
|
| -- vfork(2)
|
| It's out of POSIX so it's rather harsh to dictate behavior change.
This wording predates threads and it is not specified in ToG:
http://pubs.opengroup.org/onlinepubs/009695399/functions/vfork.html
I.e. the suspension of the parent is historical behavior mandated by
implementation convience; it would be difficult to make anything working
reliably if the parent was altering the stack frame the child is currently
executing.
| Also going for your proposal is imho violating thread-process model in
| NetBSD. It's Linux concept to emulate threads with clone(2), while they
| are still regular processes.
Well, they are not regular processes; linux just does not differentiate
between threads and processes by re-using the proc structure to describe
both. In both implementations they end up sharing vmspace, file descriptors,
etc.
| In ptrace(2) we have two interfaces: PTRACE_FORK and PTRACE_VFORK. The
| difference between them is only in the point whether the parent process
| (with all threads) has been suspended or not. No matter what the
| original syscall or API was used (clone(2), __clone(2), posix_spwan(2),
| fork(2)...).
That is orthogonal; in fork() the parent is not suspended and ptrace has
no problem with that. In vfork() only the thread executing vfork() is,
and again ptrace has no problem with that. The semantics if the other
threads should be suspended is the question here. Nico claims that it
is not harmful if they are, and it is actually beneficial. I have come
to the realization that this is true if the child is careful not to
alter the parent data (which has been always the case). The question is:
Can the other threads harm the child or the parent while it is vfork()ing?
I can't think of a way.
| I thing you might be interested in designing something like _lwp_vfork().=
How does this solve the problem? What does _lwp_vfork() fork? Does it create
a new thread? In what process context?
christos
From: Joerg Sonnenberger <joerg@bec.de>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Wed, 5 Apr 2017 23:03:19 +0200
On Wed, Apr 05, 2017 at 04:38:32PM -0400, Christos Zoulas wrote:
> Can the other threads harm the child or the parent while it is vfork()ing?
>
> I can't think of a way.
Race conditions in mutex code. Keep in mind that the park/unpark events
are per-process.
I fully support the proposal to block all threads in the parent. If you
want to create child processes from multi-threaded programs and care at
all about performance, use posix_spawn. How it is implemented is an
implementation detail and independent of vfork.
Joerg
From: Nico Williams <Nico.Williams@twosigma.com>
To: <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Wed, 5 Apr 2017 20:45:53 +0000
On Wed, Apr 05, 2017 at 06:42:25PM +0000, Nico Williams wrote:
> Please do NOT stop any threads in the vfork() parent other than the one that
> called vfork(). Also, allow me to make an argument for this.
>
> First, I suppose I should look at rationales for stopping all threads in a
> vfork() parent. [...]
Let me respond to each response so far:
Martin Husemann <martin@duskware.de> wrote:
> At leaset in NetBSD posix_spawn() is completely unrelated to vfork().
> Noone is suggesting to stop any thread in a process doing posix_spawn().
I'm glad that's the case (it seems posix_spawn() is a system call in NetBSD).
The only relevance of this to vfork() is that you could obsolete it (but
probably not remove it for a long time yet), but then why make any changes to
vfork()? Are there applications that are breaking that couldn't be fixed in
some other way?
(ISTR Thor telling me that the kernel-coded posix_spawn() reduced performance
and that it was removed. It seems I remembered half-correctly. He did tell me
that the kernel implementation is slower than the user-land implementation.)
> Using vfork() in a program with multiple active threads is madness,
> posix_spawn() is the only sensible way.
Such a statement requires explanation. I was careful to provide explanation
for my possibly-extraordinary-seeming statements about this.
Kamil Rytarowski wrote:
> Well vfork(2) is supposed to suspend a parent process.
>
> "The parent process is suspended while the child is using its resources."=
I addressed this very specifically. I believe it's incorrect to say that
because vfork() absolutely must stop the parent thread that called it, and
because the man page said so without referring to threads because it long
predated threads, that it must mean "stop all parent threads" now that we have
threads.
Please recall that an incorrect "vfork() is harmful" meme caused it to be
removed (broken, actually) in 4.4BSD, and later to be re-added in all
subsequent BSDs. Cargo culting "stops the parent process" now is just that.
> It's out of POSIX so it's rather harsh to dictate behavior change.
How is it harsh? It's out of POSIX -- that means you're _free_ to change it.
Moreover, since some OSes don't stop all threads in the parent (e.g., Linux and
NetBSD), one perfectly legitimate thing to do at the Open Group would be to
seek to modify the specification to allow it, if POSIX still specified it at
all. Participants do this all the time. And even POSIX never said that
vfork() stops all threads in the parent process -- it merely cargo copied the
original text from BSD and then added a scary warning that one must never use
vfork(), as if fork() were somehow trivial to use safely (it is not).
> Also going for your proposal is imho violating thread-process model in
> NetBSD. It's Linux concept to emulate threads with clone(2), while they
> are still regular processes.
The only way to make sense of this statement is that you're taking text from
the pre-threads vfork() man page about stopping the parent process and
interpreting that in terms of the threaded process model. But that's WRONG.
The pre-threads vfork() man page is pre-threaded process model. In a threaded
world it is perfectly sensible to revisit that text rather than cargo cult it.
> In ptrace(2) we have two interfaces: PTRACE_FORK and PTRACE_VFORK. The
> difference between them is only in the point whether the parent process
> (with all threads) has been suspended or not. [...]
How does that force vfork() (when not being ptrace'd) to also stop all threads
in the parent?? I don't follow.
Thanks,
Nico
--
From: Nico Williams <Nico.Williams@twosigma.com>
To: <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Wed, 5 Apr 2017 20:57:03 +0000
Kamil Rytarowski also wrote:
> I thing you might be interested in designing something like _lwp_vfork().=
Actually, I am interested in something like that:
pid_t avfork(int (*)(void *), void *);
It would have similar constraints for the child as vfork(). E.g., no allocator
calls, etc.
avfork()'s main benefit over vfork() is that it wouldn't have to stop even the
thread that called it in the parent.
avfork() can even be implemented in terms of threads and vfork()... if only
vfork() doesn't stop all threads in the parent. See:
https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c18d7db234
for a partial implementation and lots more about vfork() in general.
Nico
--
From: Nico Williams <Nico.Williams@twosigma.com>
To: <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Wed, 5 Apr 2017 21:24:06 +0000
Joerg Sonnenberger <joerg@bec.de> wrote:
> Race conditions in mutex code. Keep in mind that the park/unpark events
> are per-process.
As I mentioned in my oroginal post, pthread_mutex_*lock() and such are NOT in
the async-signal-safe function set, therefore vfork() children cannot safely
call them, and if they don't call them, how can they cause other threads in the
parent any problems?
(An OS could certainly make pthread functions work in the vfork() child,
provided that neither child nor parent exit due signals, say, while holding a
lock. However, it is not necessary to do this as pthread functions are already
not async-signal-safe.)
Nico
--
From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Thu, 6 Apr 2017 01:45:41 +0200
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--loBFkqqmbaTL6qMB5HOwdEd3T6KoipVTW
Content-Type: multipart/mixed; boundary="sjhsCvm4OOlJkfE1ExweiNW4qTjujRatf";
protected-headers="v1"
From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Message-ID: <aa93a804-22df-1f49-2f74-d60140bab630@gmx.com>
Subject: Re: kern/49017: vfork does not suspend all threads
References: <pr-kern-49017@gnats.netbsd.org>
<20140718155148.475D614B68@quasar.astron.com>
<20170405204001.D2D987A27F@mollari.NetBSD.org>
In-Reply-To: <20170405204001.D2D987A27F@mollari.NetBSD.org>
--sjhsCvm4OOlJkfE1ExweiNW4qTjujRatf
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
On 05.04.2017 22:40, Christos Zoulas wrote:
> The following reply was made to PR kern/49017; it has been noted by GNA=
TS.
>=20
> From: christos@zoulas.com (Christos Zoulas)
> To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,=20
> gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
> Cc:=20
> Subject: Re: kern/49017: vfork does not suspend all threads
> Date: Wed, 5 Apr 2017 16:38:32 -0400
>=20
> On Apr 5, 7:35pm, n54@gmx.com (Kamil Rytarowski) wrote:
> -- Subject: Re: kern/49017: vfork does not suspend all threads
> =20
> | Well vfork(2) is supposed to suspend a parent process.
> | "The parent process is suspended while the child is using its resou=
rces."
> | =20
> | -- vfork(2)
> | =20
> | It's out of POSIX so it's rather harsh to dictate behavior change.
> =20
> This wording predates threads and it is not specified in ToG:
> =20
> http://pubs.opengroup.org/onlinepubs/009695399/functions/vfork.html
> =20
> I.e. the suspension of the parent is historical behavior mandated by
> implementation convience; it would be difficult to make anything worki=
ng
> reliably if the parent was altering the stack frame the child is curre=
ntly
> executing.
> =20
> | Also going for your proposal is imho violating thread-process model=
in
> | NetBSD. It's Linux concept to emulate threads with clone(2), while =
they
> | are still regular processes.
> =20
> Well, they are not regular processes; linux just does not differentiat=
e
> between threads and processes by re-using the proc structure to descri=
be
> both. In both implementations they end up sharing vmspace, file descri=
ptors,
> etc.
> =20
> | In ptrace(2) we have two interfaces: PTRACE_FORK and PTRACE_VFORK. =
The
> | difference between them is only in the point whether the parent pro=
cess
> | (with all threads) has been suspended or not. No matter what the
> | original syscall or API was used (clone(2), __clone(2), posix_spwan=
(2),
> | fork(2)...).
> =20
> That is orthogonal; in fork() the parent is not suspended and ptrace h=
as
> no problem with that. In vfork() only the thread executing vfork() is,=
> and again ptrace has no problem with that. The semantics if the other
> threads should be suspended is the question here. Nico claims that it
> is not harmful if they are, and it is actually beneficial. I have come=
> to the realization that this is true if the child is careful not to
> alter the parent data (which has been always the case). The question i=
s:
> =20
> Can the other threads harm the child or the parent while it is vfo=
rk()ing?
> =20
> I can't think of a way.
> =20
> | I thing you might be interested in designing something like _lwp_vf=
ork().=3D
> =20
> How does this solve the problem? What does _lwp_vfork() fork? Does it =
create
> a new thread? In what process context?
I don't have strong opinions.
If we can ensure that vfork(2) is always safe first, then we can go for
it... assuming that someone has to do the work.
Regarding _lwp_fork().. I got an impression that optimizing forking is
today like optimizing thread creation. There is LWP_VFORK in the kernel.
I know that threads and processes are different.. however there are
still Unices out there without POSIX threads like Minix.
> =20
> christos
> =20
>=20
--sjhsCvm4OOlJkfE1ExweiNW4qTjujRatf--
--loBFkqqmbaTL6qMB5HOwdEd3T6KoipVTW
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAEBCAAGBQJY5YG1AAoJEEuzCOmwLnZsWb8QAJ+J2PgQm5P4Z5RkR8Dh3GqN
FFVG3P80GOoVpoKyeEtrYca1DQ0KHYNwYQNpfzwrtSECizzEiRQUiakOXn0vGhp9
z7H8IYhZ59NPJF5Q8NZxncAo+4/jvIhDiJzRqnbXsK2Sv4RWmM9UDIxD6pHr10jV
rYnty5yItghI1RHpCMD4pWus63+EmXxV9AqxvxMpzZckJC0LLhAN0Ms7+SpRh2mX
Vwrl3+r/kKV79eATxSg8jpSDuViPot2pnLFD47zC++VVEc0BeFVBOZJsVGAIJb4r
0esPXg+ftxq6+3HvLue9gJHy4y9IaZ1vgGLW2xOqJ+s6gBspW1GOOIV1HzCrj76p
PCvYFN2oKwumNPSMtt3uELdw4zLy0ak9KzZZgrVf98Hv+KCSHwfpxUX9yZ66ipjp
bQoNP5+AS1vjtySsSAFRBXXkZyLCiyM/pihcm2ymd/1Ayy7tFBtms38M02EYDUeO
dpWGeK/FdvNL7W/iUM9Pcc6h66W1HUzP4ZkxGUblTqZADlz4InD5BtjtUBc1J6f9
2NPWwVsMNz3coKKkhOl3cNFdtBOTG/M0lDhcC196WevZkFSKOUuhqaGci8Y9X5hD
LXm5gsX01tX+cN/z5aWLlUEQW0I8G07FVq7HZuHsUH8fJso6EFd+pVy9HnH3LAG7
hxJJgfAVOgzdm7LK6Ll3
=uRNq
-----END PGP SIGNATURE-----
--loBFkqqmbaTL6qMB5HOwdEd3T6KoipVTW--
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Thu, 06 Apr 2017 10:08:40 +0700
It appears to me as if the discussion on this has largely divided
along traditional lines - those who (fundamentally) believe that
vfork() is an evil hack, and should be removed (ideally) but in the
meantime restricted in all ways possible to limit its use, so that
hopefully it can go away some day not too far away - and those who
believe vfork() is a valuable tool, which should be exploited in
all ways possible, and enhanced where practical to make it even more
useful.
As long as that division persists, there will never be consensus on
anything related to vfork().
While I certainly agree that the man page description of vfork() as
"stopping the parent process" is irrelevant to the current issue, as
others have said, that was written in an age when there were no threads,
not even as a possibility on the horizon, and was just never changed again.
Whatever relationship might have existed between vfork() and threads seems
to have simply been ignored (because vfork() proponents, and thread
proponents, seem to largely be disjoint sets, or were.)
Before going further, I think it useful to actually understand vfork() a
little better - first it is (kind of) badly named, as in most respects it
is not a fork() type function at all, it is rather a kind of setjmp()/longjmp()
with some peculiar semantics - and a new proc struct (and hence new pid,
and anything that affects only the proc struct, like setuid() being magic).
That is, in reality, what happens is that the "parent" process both stalls,
and continues running (just like with setjmp()) until the terminating
condition occurs (the longjmp() equiv - _exit() or exec()) - and (here
I disagree with Nico) the "child" process after a vfork() can do just about
anything that would be safe between a setjmp() and longjmp(), unless the
operation does something which would require a proc struct alteration which
is also reflected in user space (so brk() is bad).
There's no need to restrict vfork() children to async signal safe operations
(the process limiting itself that way certainly won't hurt it, but it is
not required) - it can do anything that the parent can do that affects only
its internal (userland) state, or which affects purely the proc struct
state in the kernel (so it can close files, or change the "close on exec"
state, but not other file status flags).
Of course, the parent needs to expect all of this - it needs to co-operate
with the child, just as is required with a setjmp()/longjmp(), and understand
just what the "child" might have done with the memory image after it gets
a chance to observe.
What all this means to a threaded process, is that overall I'm of the opinion
that only the parent thread should block (just as if the parent thread used
a setjmp()) and whatever sync with other threads should be just the same for
the child of the vfork() as it would have been for the thread had it not done
a vfork(), with the sole exception that the child cannot use, or rely upon,
anything that uses kernel process private state (and hence cannot access, or
change, anything which would be protected by such a mechanism). In process
type spin locks, would be safe, sys call activated sleeping locks would not be.
kre
ps: unrelated here, but the one facility missing from vfork() that would make
it more useful, would be a "complete the fork" sys call, which would turn the
vfork() into a fork() (dup the addr space) and be a third "wakeup the parent"
operation.
From: Martin Husemann <martin@duskware.de>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Thu, 6 Apr 2017 11:03:09 +0200
On Wed, Apr 05, 2017 at 04:26:04PM -0400, Christos Zoulas wrote:
> On Apr 5, 7:20pm, martin@duskware.de (Martin Husemann) wrote:
> -- Subject: Re: kern/49017: vfork does not suspend all threads
>
> | Using vfork() in a program with multiple active threads is madness,
> | posix_spawn() is the only sensible way.
>
> Why is that? We allow it, so it should do something reasonable/useful...
> Or we should not allow it...
We now have two processes with active threads each and a shared vmpspace.
This sounds like completely out of spec for the unix process model to me
and I'd call it madness.
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Thu, 6 Apr 2017 11:04:48 +0200
On Wed, Apr 05, 2017 at 10:05:01PM +0000, Nico Williams wrote:
> (ISTR Thor telling me that the kernel-coded posix_spawn() reduced performance
> and that it was removed. It seems I remembered half-correctly. He did tell me
> that the kernel implementation is slower than the user-land implementation.)
Are you sure that was about NetBSD?
Would be nice to see numbers for that, I would be suprised.
Martin
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Thu, 6 Apr 2017 07:16:36 -0400
On Apr 6, 3:10am, kre@munnari.OZ.AU (Robert Elz) wrote:
-- Subject: Re: kern/49017: vfork does not suspend all threads
| That is, in reality, what happens is that the "parent" process both stalls,
| and continues running (just like with setjmp()) until the terminating
| condition occurs (the longjmp() equiv - _exit() or exec()) - and (here
| I disagree with Nico) the "child" process after a vfork() can do just about
| anything that would be safe between a setjmp() and longjmp(), unless the
| operation does something which would require a proc struct alteration which
| is also reflected in user space (so brk() is bad).
It is much more restricted than that. For example if the child
continues and returns from its current stack frame to other higher
up frames, it makes changes to them so when the parent resumes it
finds an inconsistent state. So one of the restrictions is "you
can't safely return from the current stack frame... It is also
very similar (as you said) to setjmp and longjmp as far as the
current function frame is concerned (and register liveness) but
the compiler takes care of that.
| What all this means to a threaded process, is that overall I'm of the opinion
| that only the parent thread should block (just as if the parent thread used
| a setjmp()) and whatever sync with other threads should be just the same for
| the child of the vfork() as it would have been for the thread had it not done
| a vfork(), with the sole exception that the child cannot use, or rely upon,
| anything that uses kernel process private state (and hence cannot access, or
| change, anything which would be protected by such a mechanism). In process
| type spin locks, would be safe, sys call activated sleeping locks would not be.
We agree there.
| ps: unrelated here, but the one facility missing from vfork() that would make
| it more useful, would be a "complete the fork" sys call, which would turn the
| vfork() into a fork() (dup the addr space) and be a third "wakeup the parent"
| operation.
The difficulty with that is that the child needs to know that it has been
vforked instead of forked... So the child needs code to handle both cases
in general which is annoying.
christos
From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, christos@netbsd.org
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Thu, 6 Apr 2017 13:22:21 +0200
On Thu, Apr 06, 2017 at 03:10:01AM +0000, Robert Elz wrote:
> There's no need to restrict vfork() children to async signal safe operations
> (the process limiting itself that way certainly won't hurt it, but it is
> not required) - it can do anything that the parent can do that affects only
> its internal (userland) state, or which affects purely the proc struct
> state in the kernel (so it can close files, or change the "close on exec"
> state, but not other file status flags).
The problem is that many of the functions outside the "async signal
safe" category are exactly allowed to such things. Anything using
mutexes will not correctly work after vfork. That is an implementation
restriction related to how NetBSD's libpthread work, but it is certainly
not the only possible pitfall.
I don't classify vfork in general as a hack -- it is a building block
with a number of serious restrictions. Not working well with threads is
just another one of those restrictions.
Joerg
From: Robert Elz <kre@munnari.OZ.AU>
To: christos@zoulas.com (Christos Zoulas)
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Thu, 06 Apr 2017 21:15:08 +0700
Date: Thu, 6 Apr 2017 07:16:36 -0400
From: christos@zoulas.com (Christos Zoulas)
Message-ID: <20170406111636.92DF217FDA8@rebar.astron.com>
| It is much more restricted than that. For example if the child
| continues and returns from its current stack frame to other higher
| up frames, it makes changes to them so when the parent resumes it
| finds an inconsistent state.
Just the same as setjmp() ... no question but that there are restrictions
after a vfork(), but they're not as strict (not nearly as strict) as async
signal safe (which needs to be prepared to execute with the "parent" in
any state at all - here we know the parent is blocked in vfork() so is
not currently doing anything.)
| | would be a "complete the fork" sys call, which would turn the
| | vfork() into a fork()
| The difficulty with that is that the child needs to know that it has been
| vforked instead of forked... So the child needs code to handle both cases
| in general which is annoying.
Huh? The code is
if (vfork() == 0) {
/* here I am the child */
}
How does the child possibly not know it has been vforked? What I
suggested (as some kind of dream, not a serious proposal, or not without
trying it anyway) would be to allow
if (vfork() == 0) {
/* child, with parent blocked and shared mem */
/* code here, probably tests to decide what next */
vfork_into_fork();
/* now parent is unblocked, and child has its own addr space */
}
(and no, I don't think "vfork_into_fork()" is a sensible name...)
Joerg Sonnenberger <joerg@bec.de> said:
| The problem is that many of the functions outside the "async signal safe"
| category are exactly allowed to such things.
Sure, but there are plenty that are, much more than what is safe in a
signal handler.
There's no question but that using vfork() requires some care (or very
simple operations) but that's not a reason to restrict it more than what
it already is.
For the PR that started all this, my suggestion would be to simply fix the
man page, and close it.
kre
From: Nico Williams <Nico.Williams@twosigma.com>
To: <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Thu, 6 Apr 2017 15:01:56 +0000
Robert Elz <kre@munnari.OZ.AU> wrote:
> It appears to me as if the discussion on this has largely divided
> along traditional lines - those who (fundamentally) believe that
> vfork() is an evil hack, and should be removed (ideally) but in the
Well, no one seems to have said that, though several seem wedded to
pre-threaded-model text about stopping the parent process and then extending it
to the threaded model in the obvious, but horrible way.
> meantime restricted in all ways possible to limit its use, so that
> hopefully it can go away some day not too far away - and those who
> believe vfork() is a valuable tool, which should be exploited in
> all ways possible, and enhanced where practical to make it even more
> useful.
I'll take it much further: it is fork() that is EVIL, and vfork() that is GOOD.
Here's my rationale for such an extraordinary statement:
https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c18d7db234
Briefly: fork()'s copying and/or COW are terrible and would never have been
necessary had Dennis Ritchie et. al. thought of vfork()'s semantics. In the
beginning processes were so small that fork()'s copying was not obviously a
horrible feature. Besides, fork() has a ton of safety issues (which I mostly
did not address in that gist, but I could and will) that make vfork() look like
not so bad. Anyone who says that vfork() is impossibly hard to use correctly
without pointing out that fork() is in some ways harder still to use
correctly... just hasn't had the misfortune of having to fix fork-safety
problems in real-life, production code (I have).
Now, vfork() is... clumsy because of the stack sharing silliness, but it
predates threads, so its authors probably did not realize that taking a
callback function and argument to run in a new stack would have been a superior
design -- I forgive them this oversight because vfork() is nonetheless awesome
goodness, and all the more so if it doesn't stop all other threads in the
parent process.
I didn't always think this way.
I used to think: "haha, look at CreateProcess() on Windows, what a disaster,
how terribly hard it is to use, and all because they had the elegance of
fork()+exec()". But that was wrong. fork() is inelegant. Whereas vfork()+
exec() is quite elegant.
Nico
--
From: Nico Williams <Nico.Williams@twosigma.com>
To: <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Thu, 6 Apr 2017 15:26:53 +0000
Robert Elz <kre@munnari.OZ.AU> wrote:
> There's no need to restrict vfork() children to async signal safe operations
> (the process limiting itself that way certainly won't hurt it, but it is
It is certainly necessary to restrict what a fork() child can do when the
parent process has built up a lot of state... If the fork() parent has built
very little state, if the fork() call comes early in its life, then it is safe
to do just about anything on the child-side.
OTOH, if the fork() parent has built a lot of state, possibly with various
possibly-unknown-to-it libraries, it may not be safest to call anything but
async-signal-safe functions. And that's why POSIX says that's what the child
can do.
(Some APIs are explicitly fork-unsafe (e.g., PKCS#11); using them on the parent
side of fork() means that one cannot use them on the child side without
reinitializing the library or execve()'ing.)
Heck, "async-signal-safe" is a somewhat misleading concept because fork() and
vfork() do not atomically block signals on the child side, so a number of
async-signal-safe functions are actually very much unsafe to call in a signal
handler without first checking that getpid() returns the expected PID.
With few exceptions, the only things I ever do in a signal handler are: check
if getpid() returns the expected PID, write to sig_atomic_t global variables,
and/or write(2) a single byte to a pipe the other end of which is handled by an
event loop.
"Async-signal-safe" is what POSIX calls the set of functions that it thinks
are safe to call in signal handlers and the child side of fork(). Even if it
is too small a set, it is useful enough a concept that we can use it to talk
about what kinds of things are safe to do on the child side of vfork().
> not required) - it can do anything that the parent can do that affects only
> its internal (userland) state, or which affects purely the proc struct
> state in the kernel (so it can close files, or change the "close on exec"
> state, but not other file status flags).
Certainly one can make safe use of some functions outside the async-signal-safe
set in the fork()/vfork() child sides. It does help to have some idea of what
might go wrong when one does that. POSIX defines such a set in part so that
one can write portable code without having to know much about particular OSes.
Thor pointed out to me yesterday that using mutexes on the child side of
vfork() should have the same sorts of semantics and dangers as using shared
mutexes, so one should not categorically dismiss the use of mutexes on the
child side of vfork(). I agree with Thor on this, though I would generally
discourage the use of shared mutexes anyways.
Nico
--
From: Nico Williams <Nico.Williams@twosigma.com>
To: <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Thu, 6 Apr 2017 15:36:41 +0000
Martin Husemann <martin@duskware.de> wrote:
> On Wed, Apr 05, 2017 at 04:26:04PM -0400, Christos Zoulas wrote:
> > On Apr 5, 7:20pm, martin@duskware.de (Martin Husemann) wrote:
> > -- Subject: Re: kern/49017: vfork does not suspend all threads
> >
> > | Using vfork() in a program with multiple active threads is madness,
> > | posix_spawn() is the only sensible way.
> >
> > Why is that? We allow it, so it should do something reasonable/useful...
> > Or we should not allow it...
>
> We now have two processes with active threads each and a shared vmpspace.
> This sounds like completely out of spec for the unix process model to me
> and I'd call it madness.
"sounds like"
Everything that BSD ever did prior to the advent of POSIX and other such
standards... was "completely out of spec" for Unix.
That is precisely how one innovates: by stepping outside the spec.
Your rejection of this seems emotional rather than thought out. It happens
because we're humans; naturally I do this too sometimes.
I urge you to read what I've written rather than merely react to the one-line
summary of the proposal.
Please leave behind the idea that vfork() is dangerous. It absolutely is not.
Decades of experience with it bears that out:
- posix_spawn() on Linux, Solaris, Illumos, NetBSD (before posix_spawn()
became a system call), and other BSDs -- all use vfork() internally
- many app suse vfork(), including, famously, csh (now, I know, csh is evil,
but that it successfully and safely uses vfork() cannot be denied)
- you can search online nowadays for more vfork()-using code, and you can look
at https://github.com/famzah/popen-noshell (warning: GPLv3), including the
long discussion I had with the author in the issues
I've yet to see a single bug report anywhere about vfork() not stopping all
other parent threads causing an application to break. Objectively speaking,
without such a report, and without POSIX saying so (it does not! POSIX removed
vfork()), NetBSD should not make that change!
Nico
--
From: Martin Husemann <martin@duskware.de>
To: Robert Elz <kre@munnari.OZ.AU>
Cc: Christos Zoulas <christos@zoulas.com>, gnats-bugs@NetBSD.org,
kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Thu, 6 Apr 2017 17:59:36 +0200
On Thu, Apr 06, 2017 at 09:15:08PM +0700, Robert Elz wrote:
> For the PR that started all this, my suggestion would be to simply fix the
> man page, and close it.
That gets my vote too.
Martin
From: Nico Williams <Nico.Williams@twosigma.com>
To: <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Thu, 6 Apr 2017 16:57:14 +0000
Robert Elz <kre@munnari.OZ.AU> wrote:
> [ description of vfork_into_fork() elided ]
That's a neat idea, but I don't think it's needed. I can't think of why I
would ever need it or any time that I could have used it.
What I really want is
pid_t avfork(int (*)(void *), void *);
which is like vfork() but allocates a new stack, calls the given callback in it
just like pthread_create() would, and does not stop any threads in the parent,
not even the one that called it. The 'a' stands for "asynchronous".
Note that avfork() would have much the same constraints for the child as
vfork() does, except, naturally, that the avfork() child could return while the
vfork() child cannot.
I have written portable multi-processed daemons that build on Unix and Windows.
What I do on Windows is I spawn() the child processes, exec'ing the same
executable as the parent and passing in information needed by the child via
arguments or a pipe. On Unix I get lazy and fork(), but what I do on Windows
would work just as well on Unix. You can see this here:
https://github.com/heimdal/heimdal/blob/master/lib/roken/detach.c
Nico
--
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Fri, 07 Apr 2017 10:07:46 +0700
Date: Thu, 6 Apr 2017 15:05:01 +0000 (UTC)
From: Nico Williams <Nico.Williams@twosigma.com>
Message-ID: <20170406150501.356437A2B8@mollari.NetBSD.org>
| I'll take it much further: it is fork() that is EVIL, and vfork()
| that is GOOD.
|
| Here's my rationale for such an extraordinary statement:
| https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c18d7db234
All that shows is that fork() is (or can be) expensive, which is hardly
news, nothing at all about evilness, in fact, the closest I can see that
it comes are these sentences ...
Since back then programs and processes were small that inelegance was
easy to overlook. But now processes tend to be huge, and that makes
copying even just a parent's resident set, and page table fiddling for
the rest, extremely expensive.
That's kind of like saying that Ferrari's are evil, because they cost
too much if all you do is drive them grocery shopping once a week...
Being expensive (even too expensive to routinely use) is not evil, just
not the right (or perhaps even, just not the best) choice in many cases.
If anything is "evil" from your text (IMO) it would be "But now processes
tend to be huge" - that is the problem, not fork().
But fork() permits
if (fork() > 0)
_exit(0);
which most of the other methods (I know nothing of clone(), so that one
possibly excepted) do not - expensive perhaps (in some cases) but useful
nevertheless.
| Briefly: fork()'s copying and/or COW are terrible and would never have been
| necessary had Dennis Ritchie et. al. thought of vfork()'s semantics.
While I suspect that fork() is really Ken's, not Dennis's (irrelevant here)
I kind of doubt that. First because neither of them is/was in any way
deficient in their thinking (simply ignoring a possibility like that is
not something I would expect) and second, because fork(), expensive or not,
is simply far more general than vfork().
| Besides, fork() has a ton of safety issues (which I mostly
| did not address in that gist,
Nor anywhere else I have seen - I'm sure it is possible to write code
badly enough that fork() would cause problems, (and it is certainly
possible to make a mess using buffered I/O) but almost all of that is
trivially overcome.
| Now, vfork() is... clumsy because of the stack sharing silliness, but it
| predates threads, so its authors probably did not realize that taking a
| callback function and argument to run in a new stack would have been a
| superior design
When vfork() was designed, the total (guaranteed) address space was just
64KB (text, data, stack, all combined). Duplicating stacks (adding an
extra stack - and if you want to be able to return in the child, it actually
means copying the existing stack, while adjusting any self-referencing pointers
that occur there) would have been laughed away as absurd.
In another message Nico.Williams@twosigma.com (kind of) quotes me:
| Robert Elz <kre@munnari.OZ.AU> wrote:
| > [ description of vfork_into_fork() elided ]
and then says...
| That's a neat idea, but I don't think it's needed. I can't think of why I
| would ever need it or any time that I could have used it.
Maybe you never would have, but I know of one immediate use - that is /bin/sh
Our sh uses vfork() whenever it can (for the obvious reason) but sometimes,
while evaluating the code to be executed in the sub-shell, it discovers that
it simply cannot do that after a vfork() and really needs a whole new process.
What happens now is that the child sets a magic "do me again using fork()"
flag (in the parent's address space, which it shares of course) and then
exits. The parent observes the flag, fork()'s, and the child starts all over
again. If the child could have simply converted its vfork() state into a
fork() state that wacky dance would not be needed.
Now, of course, the shell could avoid this by examining the tree of commands
to be executed before the fork()/vfork() (it does that for the very common
cases that will certainly require fork() rather than vfork()) but that would
mean duplicating the whole process, initially just to discover which kind of
fork() is required, and then again to actually do the work - for every
sub-shell invocation (more or less every command executed that isn't a
function) and all this for a relatively rare circumstance.
| What I really want is
| pid_t avfork(int (*)(void *), void *);
| which is like vfork() but allocates a new stack, calls the given callback
| in it just like pthread_create() would, and does not stop any threads in
| the parent, not even the one that called it.
I have no objection to that, go ahead, write the code for it, and submit
it, it sounds useful enough to consider at least.
But...
| Note that avfork() would have much the same constraints for the child as
| vfork() does, except, naturally, that the avfork() child could return while
| the vfork() child cannot.
Return to what? You're having it execute a callback, are you saying that
that function can return? Return to where exactly? And what does that
mean? What would be the difference between
child = avfork(func, &sp);
and
if ((child = avfork(&sp)) == 0) func();
??
If there's none, why the need for the callback? If avfork() cannot
actually return in the child, so the second is not possible, then neither
can func() right?
kre
From: Nico Williams <Nico.Williams@twosigma.com>
To: <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Fri, 7 Apr 2017 16:35:04 +0000
Robert Elz <kre@munnari.OZ.AU> wrote:
> | I'll take it much further: it is fork() that is EVIL, and vfork()
> | that is GOOD.
> |
> | Here's my rationale for such an extraordinary statement:
> | https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c18d7db234
>
> All that shows is that fork() is (or can be) expensive, which is hardly
> news, nothing at all about evilness, in fact, the closest I can see that
> it comes are these sentences ...
It isn't just expensive. I didn't go into detail about fork-safety
issues, but those are real enough, and also unnecessary.
Fork-safety issues are a necessary result of sharing state from a
starting snapshot of it (which is easy enough to do if you're a small
shell, but quite difficult if you're a large process with many libraries
loaded, many unbeknownst to the original program).
> [...]
>
> That's kind of like saying that Ferrari's are evil, because they cost
> too much if all you do is drive them grocery shopping once a week...
To me it's "evil" (I know, hyperbole; a lifeless tool can't really be
evil) because of the fork-safety issues. Admittedly I did not go into
much detail on those. The inherently-slow design does not exactly help
either.
The Unix community has basically been saying "fork() good, vfork() bad"
for decades. In this we have been sorely mistaken. At the very least,
vfork() is not bad.
> If anything is "evil" from your text (IMO) it would be "But now processes
> tend to be huge" - that is the problem, not fork().
Certainly "slow" is not good. However, layering issues involving APIs
with shortcomings are a fact of life because we are too willing to
re-use code.
Layering issues in Java and similar are something else (legendary?); I
won't go into them. You might object to JVMs in the first place, so
let's not go there.
Layering issues in C can still be quite complex though! Here's one
case:
- main program
-> getXbyY() name service switch
-> LDAP plugin
-> OpenSSL
-> SASL
-> SASL GSS plugin
-> GSS-API
-> Kerberos
-> OpenSSL
Here the main program source is simple-looking, but turns out to be
complex at run-time. "But use nscd!" Yes, but nscd itself looks like
this internally.
Slight API deficiencies in various layers in this example mean that
passing down configuration, or intent to _exit() a fork() parent, and so
on, is basically impossible. One could open-code all of it to avoid
this. One could say "screw TLS, GSS, Kerberos, I'll use IPsec, and
open-code everything", but IPsec is actually the hardest of these
security protocols to use correctly, and anyways, open-coding everything
will a lot of take time and effort.
> But fork() permits
>
> if (fork() > 0)
> _exit(0);
Yes! This is true, this is very helpful, and you'll see I make use of
this myself.
Nothing, mind you, really prevents vfork() from supporting the same,
except that the parent must block :(
Naturally, "if (avfork() > 0) _exit(0);" would be cheaper :)
One can also daemonize ("detach from tty", whatever) by doing vfork()
and then exec(self). That's effectively how one has to do such things
on Windows due to its lack of fork() (though perhaps now with their WSL
thing to support Ubuntu on Windows they now have a fork()??).
I've written code that does this, including open source code (e.g., in
Heimdal).
> | Briefly: fork()'s copying and/or COW are terrible and would never have been
> | necessary had Dennis Ritchie et. al. thought of vfork()'s semantics.
>
> While I suspect that fork() is really Ken's, not Dennis's (irrelevant here)
> I kind of doubt that. First because neither of them is/was in any way
> deficient in their thinking (simply ignoring a possibility like that is
Oh, I certainly did not mean to imply that they were!
They are/were luminaries who gave us the best OS of its time, with the
best derivative lineage since. For this I am ever thankful.
That does not mean that they can't have made mistakes (e.g., the lack of
a "create time" for files!), including ones they simply would not have
recognized as mistakes then, but which perhaps later it turns out could
have been designed differently to stand the test of time.
> not something I would expect) and second, because fork(), expensive or not,
> is simply far more general than vfork().
We could have done without fork(). But we could not have done without a
monstrosity like CreateProcess() unless we had either fork() or vfork().
Better then to have fork() than not, but even better to have vfork() to
begin with. vfork() was a bit of brilliance that had to come from
outside New Jersey.
I speculate that the brilliance of fork() in the beginning lay in making
it easy to develop programs like shells by placing the critical process
spawning code in user-land as opposed to kernel-land.
> | Besides, fork() has a ton of safety issues (which I mostly
> | did not address in that gist,
>
> Nor anywhere else I have seen - I'm sure it is possible to write code
> badly enough that fork() would cause problems, (and it is certainly
> possible to make a mess using buffered I/O) but almost all of that is
> trivially overcome.
Is this PR right place to do this? (A bit late to ask that, I know.) I
promise to write up a gist about fork-safety some time soon.
The gist of it is this: sharing state based on a one-time snapshot +
shared file descriptors can be devilishly difficult, if not impossible
to do.
A classic example is PKCS#11 and cryptography APIs in general. Recall
the complex layering mentioned above: there may not be a way for the
code that calls fork() to re-setup state that cannot be shared.
PKCS#11 explicitly says that the child-side of fork() MUST call
C_Initialize() and lose all its previous state. This follows in part
because the API might internally communicate with a device (e.g., a TPM,
smartcard, other token) via a file descriptor, and it would be difficult
to have two processes communicate with said device over the same file
descriptor in non-atomic ways (the fd not being anything like a
SOCK_DGRAM fd).
Even if you arrange to re-open the device on the child side, your open
sessions will need to be re-logged-in!
Even if you arrange to establish new sessions by reference to old
sessions, some cryptographic primitives fail catastrophically when
reused incorrectly...
So one can use pthread_atfork() (e.g., libpkcs11 in Illumos uses it to
automatically re-initialize on the child-side of fork()) to avoid a lot
of these issues, but again, suppose you want to do
if (fork() > 0)
_exit(0);
But how do you indicate intent to continue with pre-fork() state in the
child and not the parent?
If the PKCS#11 / whatever state is buried N>2 layers deep then
indicating intent to exit the parent can be impossible to do.
Now, PKCS#11-using libraries could be made to use pthread_atfork() to
reestablish state on the child side of fork(), but again, some things
can't safely be reused, so intent to exit one or the other side of
fork() is critical.
We could have a variant of fork() that runs the pthread_atfork() child
handlers in the parent and the parent handlers in the child... but that
would have other weirdness.
So if you want to exit the parent, then the only thing that actually
works is this: fork() early, before complex state is setup.
This brings me to a related issue: daemon() is bad. It's bad because
either the parent exits before the child is ready or complex state must
be setup before daemon() that might not survive the fork(). Oops. A
decade ago in Solaris/Illumos we adopted an alternative design (which I
use in Heimdal now) where two functions are used: one that fork()s and
has the parent wait for the child to signal readiness, and the other
(executed in the child) that signals readiness:
daemon_prep(); /* Returns here in the child-side of fork(); waits in
read(2) on a pipe in the parent*/
<setup code>
/*
* Tell the parent waiting inside daemon_prep() that the child is ready.
*
* The parent will exit. If we exit, the parent will notice and exit with
* an error.
*/
daemon_ready();
This has no fork-safety issues because all the code with fork-safety
issues happens on the child-side of an early fork().
This is extremely convenient:
# kdc && echo ready
ready
# kinit -k && echo yes
yes
#
when you get the shell prompt back that means the service is either
running or failed to start. There is no way you can get the prompt back
and the service subsequently fails to start.
Whereas using daemon() this can happen:
# kdc && echo ready
ready
# kinit || echo no
no
#
We adopted this approach in Solaris/Illumos because we replaced the SysV
init and inetd system with a new one (SMF) that understands
inter-service dependencies and does not want to start a service until
its dependencies are running. And that means needing to know precisely
that a service has started, and the way we do that is by having the
service's main program behave as described above.
(SMF also has a process grouping mechanism for representing
multi-process services. This is used to, among other things, detect
crashes of some such processes in order to restart the service.)
One need not like/adopt SMF in order to appreciate/adopt the
daemon_prep()/daemon_ready() approach.
> | Now, vfork() is... clumsy because of the stack sharing silliness, but it
> | predates threads, so its authors probably did not realize that taking a
> | callback function and argument to run in a new stack would have been a
> | superior design
>
> When vfork() was designed, the total (guaranteed) address space was just
> 64KB (text, data, stack, all combined). Duplicating stacks (adding an
> extra stack - and if you want to be able to return in the child, it actually
> means copying the existing stack, while adjusting any self-referencing pointers
> that occur there) would have been laughed away as absurd.
The extra stack can be tiny, since one expects the child to
exec-or-exit.. But sure, I understand. OTOH, copying as in fork()
isn't exactly light on resource usage either!
> In another message Nico.Williams@twosigma.com (kind of) quotes me:
> | Robert Elz <kre@munnari.OZ.AU> wrote:
> | > [ description of vfork_into_fork() elided ]
>
> and then says...
>
> | That's a neat idea, but I don't think it's needed. I can't think of why I
> | would ever need it or any time that I could have used it.
>
> Maybe you never would have, but I know of one immediate use - that is /bin/sh
>
> Our sh [...]
Aha, thanks. I get that a fork-me-after-all system call would simplify
that shell. That seems like a valid use case indeed (even if there are
other ways to handle this).
> | What I really want is
> | pid_t avfork(int (*)(void *), void *);
> | which is like vfork() but allocates a new stack, calls the given callback
> | in it just like pthread_create() would, and does not stop any threads in
> | the parent, not even the one that called it.
>
> I have no objection to that, go ahead, write the code for it, and submit
> it, it sounds useful enough to consider at least.
>
> But...
>
> | Note that avfork() would have much the same constraints for the child as
> | vfork() does, except, naturally, that the avfork() child could return while
> | the vfork() child cannot.
>
> Return to what? You're having it execute a callback, are you saying that
> that function can return? Return to where exactly? And what does that
> mean? What would be the difference between
When main() returns, the program exits.
When the callback function in pthread_create() returns, the thread
exits.
Ditto with avfork(): when the callback returns, the child process exits.
> child = avfork(func, &sp);
> and
> if ((child = avfork(&sp)) == 0) func();
> ??
func() has to run in a separate stack in order to avoid having to stop the
parent thread that called avfork(). Sharing a stack is the reason that
the vfork() parent must stop while the child goes on.
avfork() looks almost exactly like pthread_create() (minus pthread_attr_t).
> If there's none, why the need for the callback? If avfork() cannot
The callback is the function to call on a new stack in the child. Samd as with
pthread_create(), only creating a child process that shares the parent's
address space just like vfork().
avfork() is like a combination of pthread_create() and vfork().
> actually return in the child, so the second is not possible, then neither
> can func() right?
The func() is expected to execve() or _exit(), just like vfork()
children. But it can also return since it is a C function! And just
like main(), if it returns, the process (the child in this case) exits.
Nico
--
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Sat, 08 Apr 2017 22:19:36 +0700
Date: Fri, 7 Apr 2017 20:40:01 +0000 (UTC)
From: Nico Williams <Nico.Williams@twosigma.com>
Message-ID: <20170407204001.1F0147A2B8@mollari.NetBSD.org>
| Fork-safety issues are a necessary result of sharing state from a
| starting snapshot of it
But that kind of issue is the same whatever kind of fork() (with the
possible execption of the current vfork()) is used, there's nothing in
any of that which is peculiar to fork() over avfork() or perhaps even
lwp_create().
| Better then to have fork() than not, but even better to have vfork() to
| begin with.
I disagree with that, fork() is essential, vfork() is a nice optimisation
that is sometimes useful.
| That does not mean that they can't have made mistakes (e.g., the lack of
| a "create time" for files!),
You really don't want to get me started on that one ... "create time"
(aka the current birthtime in UFS2) is the greatest crock of sh*t of
all time. I have never yet been able to find anyone who could explain
a use case for that nonsense that actually corresponds to anything that
has ever been implemented, or is even implementable.
That is it is easy to explain what would be useful for a create time,
but no-one has ever implemented it in a way that those uses work, and
it is probably impossible (since much of what is actually desired depends
on intangibles of what is going on inside the user's head.) On the other
hand, as the current UFS2 illustrates, implementing something called
a birthtime (or create time) is easy, it just doesn't correspond to
anything actually useful in practice (which is why it is the most
underused filesystem feature of all - probably the least used feature of
the whole system, including the exotic stuff.)
| Is this PR right place to do this?
No...
| When main() returns, the program exits.
| When the callback function in pthread_create() returns, the thread
| exits.
That's not what we mean we we say the child of a vfork() cannot return,
what we mean is that it cannot unwind its stack, if one says the child
can return (as it can after fork()) then the two processes can continue
in parallel, each doing their own thing (as much as that makes sense to
the logic). "You can return, but that means you exit" is not particularly
useful.
kre
From: Nico Williams <Nico.Williams@twosigma.com>
To: <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Mon, 10 Apr 2017 00:38:43 +0000
Robert Elz <kre@munnari.OZ.AU> wrote:
> | Fork-safety issues are a necessary result of sharing state from a
> | starting snapshot of it
>
> But that kind of issue is the same whatever kind of fork() (with the
> possible execption of the current vfork()) is used, there's nothing in
> any of that which is peculiar to fork() over avfork() or perhaps even
> lwp_create().
vfork() has no fork-safety issues: the child must execve() or _exit(),
and the restrictions on what functions in can call before then make it
very difficult to do anything other than setup the execve(). The
vfork() child certainly can't make any PKCS#11 function calls or what
have you.
avfork() would have the same execve()-or-_exit() requirement as
vfork(), except _perhaps_ modified as "child-must-execve()-or-_exit()-
OR-parent-must-_exit()".
Any library code that calls getpid() to discover forks will be tripped
though in the avfork() child, so I think we can't really make that
relaxation. But aside from that, if the parent _exit()s, then the
avfork() child should be free from fork-safety concerns.
> | Better then to have fork() than not, but even better to have vfork() to
> | begin with.
>
> I disagree with that, fork() is essential, vfork() is a nice optimisation
> that is sometimes useful.
posix_spawn()/_spawn()/CreateProcess() demonstrates that fork() isn't
essential. fork() was essential to speeding up development of shells by
moving the spawning code into user-land -- that's my theory.
> | That does not mean that they can't have made mistakes (e.g., the lack of
> | a "create time" for files!),
>
> You really don't want to get me started on that one ... "create time"
> (aka the current birthtime in UFS2) is the greatest crock of sh*t of
> all time. I have never yet been able to find anyone who could explain
> a use case for that nonsense that actually corresponds to anything that
> has ever been implemented, or is even implementable.
Really? Maybe I'll ask you off-list. I'm quite curious.
> | Is this PR right place to do this?
>
> No...
Agreed.
> | When main() returns, the program exits.
> | When the callback function in pthread_create() returns, the thread
> | exits.
>
> That's not what we mean we we say the child of a vfork() cannot return,
> what we mean is that it cannot unwind its stack, if one says the child
Yes, but the reason it can't return is the shared stack. avfork() would
have no shared stack, therefore it wouldn't have that problem.
> can return (as it can after fork()) then the two processes can continue
> in parallel, each doing their own thing (as much as that makes sense to
> the logic). "You can return, but that means you exit" is not particularly
> useful.
It's what happens with: main() and pthread_create(). It's quite
sensible.
Nico
--
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.