NetBSD Problem Report #47428

From gdt@ir.bbn.com  Wed Jan  9 22:49:52 2013
Return-Path: <gdt@ir.bbn.com>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id D455863BC0E
	for <gnats-bugs@gnats.NetBSD.org>; Wed,  9 Jan 2013 22:49:52 +0000 (UTC)
Message-Id: <20130109224949.A2D66AECC@fnord.ir.bbn.com>
Date: Wed,  9 Jan 2013 17:49:49 -0500 (EST)
From: gdt@ir.bbn.com
Reply-To: gdt@ir.bbn.com
To: gnats-bugs@gnats.NetBSD.org
Subject: malloc locking is trouble with threads anad fork
X-Send-Pr-Version: 3.95

>Number:         47428
>Category:       lib
>Synopsis:       malloc locking is trouble with threads anad fork
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 09 22:50:01 +0000 2013
>Last-Modified:  Wed Jul 16 19:10:07 +0000 2014
>Originator:     Bev Schwartz (via Greg Troxel)
>Release:        NetBSD 6_STABLE
>Organization:
    Greg Troxel <gdt@ir.bbn.com>
>Environment:


(netbsd-6 branch, i386, but issue is MI)

>Description:

(This PR is being filed for the record with contents from a
tech-userlevel message.)

From: Beverly Schwartz <bschwart@bbn.com>
Subject: Should we use _malloc_prefork and _malloc_postfork?
To: tech-userlevel@NetBSD.org
Cc: Beverly Schwartz <bschwart@bbn.com>
Date: Mon, 7 Jan 2013 15:25:43 -0500 (2 days, 2 hours, 18 minutes ago)

We had a problem with a Python2.6 program running on NetBSD-6
where the main application would fork() and then exec() some
other NetBSD program.  We would occassionally observe the
child process hang.  Running on a single core, this would
occur maybe once a week.  When run on multi-core, this would
occur once every few hours.  Our Python program is multi-threaded
(about 15 threads), and we use Python 2.6 built from pkgsrc,
so it is using native pthreads.

After a lot of debugging I traced it down to an
interaction between pthreads, the malloc library, and
the malloc library's use of semaphores and locks and the
rules for what can and can not be done after forking.

The IEEE Std 1003.1 Specification says that only async-signal-safe
calls may be called in an after-fork child before exec is called.
http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html
The number of async-signal-safe calls are few, and malloc and free
are not included in this list.
https://www.securecoding.cert.org/confluence/display/seccode/SIG30-C.+Call+only+asynchronous-safe+functions+within+signal+handlers

In practice, this requirement is only necessary if the program
is threaded, and if there is more than 1 thread running.  Thus,
daemonization (using double fork) at the beginning of a program
before it launches its threads is okay, even though it is technically
not compliant with the spec.

Python does *not* register handlers with pthread_atfork, but it
does call its own function PyOS_AfterFork after calling fork.
This is in the C code that implements Python, so calling fork in
Python (or anything that calls fork such as the functions in the
subprocess module) guarantees a call to PyOS_AfterFork.
PyOS_AfterFork calls functions which are not async-signal-safe.
This is true for both Python2.* and Python3.*.

Visual inspection of PyOS_AfterFork uncovered a number of calls
to functions that are *not* async-signal-safe.  However, the
vast majority of the problem calls are to malloc and free.

When a child is forked, it gets a copy of the executing thread
only.  If another thread happens to hold a mutex in the malloc
module, and then a child tries to do a malloc or free, the child
will hang, because the thread which holds the mutex will never
come alive and free it.

I have attached a simple non-compliant program to illustrate the
problem with malloc and free. In this program, there are threads
(four by default, configurabale via command line) which just call
malloc and free in a loop. The main thread is a loop which forks a
child and waits for it.  The child does a single malloc and then exits.
The parent process will print the pid of the child forked, and then
print the pid and status after waiting for the child. Without the
changes recommended below, the program will eventually hang while
waiting for a child.  When this happens, if the child is subsequently
killed with a signal, the wait call in the parent will return a
non-zero status, and execution of the parent will resume.

Has anyone had experience using _malloc_prefork and
_malloc_postfork? Is there any reason why they are not registered in
pthread_init?  While the Posix specification doesn't require that we
add these handlers, in practice, they are very useful because there
is so much non-compliant software out there. (Python is only one
example; I have found others.)  It's a simple and reasonable change
that avoids the vast majority of bugs caused by non-compliant programs.

I have tried running my non-compliant program with the modifications
described [below]. My program has not hung once.

My non-compliant program runs correctly on my Mac (OS X 10.8.2) and on
Fedora 15 Linux installed on an x86 host.  I haven't looked into whether
that's because of an explicit locking strategy or not.

The key hard question is whether a tiny performance hit for fork is a
good trade for making buggy programs work better.  Given that fork is
relatively rare, and many uses of fork are likely to be buggy, it seems
like this change would be a net win.

Bev Schwartz
BBN Technologies
bschwart@bbn.com

>How-To-Repeat:

See
http://mail-index.netbsd.org/tech-userlevel/2013/01/07/msg007117.html
for a test program.

>Fix:

A simple workaround is to take all the malloc module locks before
calling fork, then release them after the fork. This guarantees that
the locks are held in the executing thread, and thus the child will
be able to release them. This is what the _malloc_prefork and
_malloc_postfork functions do.

In both malloc.c and jemalloc.c, there are _malloc_prefork and
_malloc_postfork functions defined, but they are unused. They are
also not documented, and I am guessing not intended for general use.

Taking the malloc module locks must be the last thing done before
calling fork, and releasing the locks must be the first thing done
after calling fork. This is so other functions registered with
pthread_atfork can use malloc and free in their functions. This can
be guaranteed by registering _malloc_prefork and _malloc_postfork in
pthread_init. pthread_init is called before main, so no user program
can possibly get a pthread_atfork handler registered before the ones
in pthread_init. Here is the diff for pthread.c:

diff --git a/netbsd/src/lib/libpthread/pthread.c b/netbsd/src/lib/libpthread/pthread.c
index 4880957..6ba7a00 100644
--- a/netbsd/src/lib/libpthread/pthread.c
+++ b/netbsd/src/lib/libpthread/pthread.c
@@ -55,6 +55,9 @@ __RCSID("$NetBSD: pthread.c,v 1.125.4.1 2012/05/07 03:12:33 riz Exp $");
 #include "pthread.h"
 #include "pthread_int.h"

+/* Need to use libc-private names for malloc pre/post fork operations. */
+#include "../libc/include/extern.h"
+
 pthread_rwlock_t pthread__alltree_lock = PTHREAD_RWLOCK_INITIALIZER;
 RB_HEAD(__pthread__alltree, __pthread_st) pthread__alltree;

@@ -230,6 +233,7 @@ pthread__init(void)
        }

        /* Tell libc that we're here and it should role-play accordingly. */
+       pthread_atfork(_malloc_prefork, _malloc_postfork, _malloc_postfork);
        pthread_atfork(NULL, NULL, pthread__fork_callback);
        __isthreaded = 1;
 }


I did find one subtle bug in the _malloc_prefork and
_malloc_postfork implementation in jemalloc.c. In both the prefork
and postfork functions, the arenas_mtx is taken and released. I am
wondering why it isn't just held in the prefork call to be released
in the postfork call. Here's the diff for jemalloc.c:

diff --git a/netbsd/src/lib/libc/stdlib/jemalloc.c b/netbsd/src/lib/libc/stdlib/jemalloc.c
index 81dbf45..7905d9d 100644
--- a/netbsd/src/lib/libc/stdlib/jemalloc.c
+++ b/netbsd/src/lib/libc/stdlib/jemalloc.c
@@ -3915,7 +3915,6 @@ _malloc_prefork(void)
                if (arenas[i] != NULL)
                        malloc_mutex_lock(&arenas[i]->mtx);
        }
-       malloc_mutex_unlock(&arenas_mtx);

        malloc_mutex_lock(&base_mtx);

@@ -3933,7 +3932,6 @@ _malloc_postfork(void)

        malloc_mutex_unlock(&base_mtx);

-       malloc_mutex_lock(&arenas_mtx);
        for (i = 0; i < narenas; i++) {
                if (arenas[i] != NULL)
                        malloc_mutex_unlock(&arenas[i]->mtx);


>Release-Note:

>Audit-Trail:

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, lib-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: lib/47428: malloc locking is trouble with threads anad fork
Date: Wed, 9 Jan 2013 17:52:50 -0500

 On Jan 9, 10:50pm, gdt@ir.bbn.com (gdt@ir.bbn.com) wrote:
 -- Subject: lib/47428: malloc locking is trouble with threads anad fork

 Do you have a test program that shows the problem?

 christos

From: Taylor R Campbell <campbell+netbsd@mumble.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: lib/47428: malloc locking is trouble with threads anad fork
Date: Wed, 25 Jun 2014 00:04:13 +0000

 This is a multi-part message in MIME format.
 --=_nXJoC1A/QFB8fYHpRw1+roM0kwFmhARk

 Here's a test program, attached.

 As is, it timed out in 93 of 100 trials on my 8-thread Ivy Bridge i7
 laptop and 91 of 100 trials on my 8-thread Ivy Bridge i7 desktop.

 When I changed the `#if 0' to `#if 1', so that it installs the malloc
 atfork handlers first, it timed out in only 27 of 100 trials on my
 laptop and only 26 of 100 trials on my desktop.

 When I changed the `#if 0' to `#if 1' and modified _malloc_prefork and
 _malloc_postfork to avoid dropping arenas_mtx pre-fork and regrabbing
 it post-fork, it timed out in 0 of 100 trials on my desktop.

 (I didn't much feel like messing with libc on my laptop because I am
 using it to type this message.  I could probably make the timeout more
 reliable with the arenas_mtx locking bug, but I spent too much time
 doing this experiment already.)

 --=_nXJoC1A/QFB8fYHpRw1+roM0kwFmhARk
 Content-Type: text/plain; charset="ISO-8859-1"; name="mallocfork"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="mallocfork.c"

 #include <sys/cdefs.h>
 #include <sys/sysctl.h>
 #include <sys/wait.h>

 #include <err.h>
 #include <errno.h>
 #include <inttypes.h>
 #include <pthread.h>
 #include <signal.h>
 #include <stdbool.h>
 #include <stdlib.h>
 #include <unistd.h>

 static void *
 mallocloop_thread(void *arg __unused)
 {

 	for (;;) {
 		free(malloc(1));
 		pthread_testcancel();
 	}

 	return NULL;
 }

 static bool	timed_out =3D false;
 static pid_t	child_pid =3D 0;

 static void
 alarm_handler(int signo __unused)
 {

 	if (child_pid) {
 		timed_out =3D true;
 		if (kill(child_pid, SIGKILL) =3D=3D -1)
 			warn("kill(%"PRIdMAX", SIGKILL)", (intmax_t)child_pid);
 	} else {
 		errx(1, "timed out waiting for threads");
 	}
 }

 extern void	_malloc_prefork(void);
 extern void	_malloc_postfork(void);

 int
 main(int argc __unused, char **argv __unused)
 {
 	const int mib[] =3D { CTL_HW, HW_NCPU };
 	unsigned ncpu;
 	size_t len;
 	pthread_t *threads;
 	unsigned t, i;
 	int status;
 	int error;

 #if 0
 	error =3D pthread_atfork(_malloc_prefork, _malloc_postfork,
 	    _malloc_postfork);
 	if (error) {
 		errno =3D error;
 		err(1, "pthread_atfork");
 	}
 #endif

 	if (signal(SIGALRM, &alarm_handler) =3D=3D SIG_ERR)
 		err(1, "signal(SIGALRM)");

 	len =3D sizeof ncpu;
 	if (sysctl(mib, __arraycount(mib), &ncpu, &len, NULL, 0) =3D=3D -1)
 		err(1, "sysctl(hw.ncpu)");

 	threads =3D calloc(ncpu, sizeof threads[0]);
 	if (threads =3D=3D NULL)
 		err(1, "calloc");

 	for (t =3D 0; t < ncpu; t++) {
 		error =3D pthread_create(&threads[t], NULL, &mallocloop_thread,
 		    NULL);
 		if (error) {
 			errno =3D error;
 			err(1, "pthread_create");
 		}
 	}

 	for (i =3D 0; i < 100; i++) {
 		child_pid =3D fork();
 		switch (child_pid) {
 		case -1:
 			err(1, "fork");
 		case 0:		/* child */
 			free(malloc(1));
 			_exit(0);
 		default:	/* parent */
 			break;
 		}
 		if (alarm(3) =3D=3D (unsigned)-1)
 			err(1, "alarm");
 		if (waitpid(child_pid, &status, 0) =3D=3D -1)
 			err(1, "waitpid");
 		child_pid =3D 0;
 		if (alarm(0) =3D=3D (unsigned)-1)
 			err(1, "alarm");
 		if (timed_out)
 			errx(1, "timed out waiting for child");
 		if (WIFEXITED(status)) {
 			if (WEXITSTATUS(status) !=3D 0) {
 				errx(1, "child exited with %d",
 				    WEXITSTATUS(status));
 			}
 			continue;
 		}
 		if (WIFSIGNALED(status)) {
 			errx(1, "child terminated on signal %d",
 			    WTERMSIG(status));
 		}
 	}

 	for (t =3D 0; t < ncpu; t++) {
 		error =3D pthread_cancel(threads[t]);
 		if (error) {
 			errno =3D error;
 			err(1, "pthread_cancel");
 		}

 		if (alarm(3) =3D=3D (unsigned)-1)
 			err(1, "alarm");
 		error =3D pthread_join(threads[t], NULL);
 		if (error) {
 			errno =3D error;
 			err(1, "pthread_join");
 		}
 		if (alarm(0) =3D=3D (unsigned)-1)
 			err(1, "alarm");
 	}

 	return 0;
 }

 --=_nXJoC1A/QFB8fYHpRw1+roM0kwFmhARk--

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/47428 CVS commit: src/lib/libc/stdlib
Date: Wed, 16 Jul 2014 19:09:53 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Wed Jul 16 19:09:53 UTC 2014

 Modified Files:
 	src/lib/libc/stdlib: jemalloc.c

 Log Message:
 Hold arenas_mtx across (still unused) _malloc_pre/postfork (PR 47428).


 To generate a diff of this commit:
 cvs rdiff -u -r1.32 -r1.33 src/lib/libc/stdlib/jemalloc.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:
 bschwart@bbn.com

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.