NetBSD Problem Report #43409
From www@NetBSD.org Thu Jun 3 10:32:44 2010
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id E3E1363B8EB
for <gnats-bugs@gnats.NetBSD.org>; Thu, 3 Jun 2010 10:32:43 +0000 (UTC)
Message-Id: <20100603103243.9A9BF63B8E3@www.NetBSD.org>
Date: Thu, 3 Jun 2010 10:32:43 +0000 (UTC)
From: pooka@iki.fi
Reply-To: pooka@iki.fi
To: gnats-bugs@NetBSD.org
Subject: jemalloc x (threads + rlimit) = perpetual ENOMEM
X-Send-Pr-Version: www-1.0
>Number: 43409
>Category: bin
>Synopsis: jemalloc x (threads + rlimit) = perpetual ENOMEM
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Jun 03 10:35:00 +0000 2010
>Last-Modified: Sun Aug 28 08:55:01 +0000 2011
>Originator: Antti Kantee
>Release: 5.0
>Organization:
>Environment:
i386
>Description:
Under some condition(s) a multithreaded program can trigger perpetual
ENOMEM from posix_memalign() (and probably malloc too). The problem
persists even if the program releases a lot more memory than it tries
to allocate.
(the following analysis is tied to the program in "how-to-repeat")
If we run the program with MALLOC_OPTIONS U, we first see the following
backend allocation failure:
9283 1 a.out CALL mmap(0,0x100000,3,0x14001002,0xffffffff,0,0,0)
9283 1 a.out RET mmap -1 errno 12 Cannot allocate memory
9283 1 a.out CALL break(0x9000000)
9283 1 a.out RET break 0
9283 1 a.out CALL utrace(0xbbbb4c51,0xbfbfdda0,0xc)
9283 1 a.out MISC malloc: 12, 00000000001000000010f008
9283 1 a.out RET utrace 0
mmap(MAP_ANON) failed, but since break() was still successful, the
allocation could be carried out. Then:
9283 1 a.out CALL mmap(0,0x100000,3,0x14001002,0xffffffff,0,0,0)
9283 1 a.out RET mmap -1 errno 12 Cannot allocate memory
9283 1 a.out CALL break(0x9100000)
9283 1 a.out RET break -1 errno 12 Cannot allocate memory
9283 1 a.out CALL utrace(0xbbbb4c51,0xbfbfdda0,0xc)
9283 1 a.out MISC malloc: 12, 000000000010000000000000
9283 1 a.out RET utrace 0
Now break() fails too and allocation fails. This causes our
program to release the "emergency" memory:
9283 1 a.out CALL utrace(0xbbbb4c51,0xbfbfdda4,0xc)
9283 1 a.out MISC malloc: 12, 005090bb0000000000000000
9283 1 a.out RET utrace 0
9283 1 a.out CALL utrace(0xbbbb4c51,0xbfbfdda4,0xc)
9283 1 a.out MISC malloc: 12, 006090bb0000000000000000
9283 1 a.out RET utrace 0
[.....]
And retry allocation:
9283 1 a.out CALL mmap(0,0x100000,3,0x14001002,0xffffffff,0,0,0)
9283 1 a.out RET mmap -1 errno 12 Cannot allocate memory
9283 1 a.out CALL utrace(0xbbbb4c51,0xbfbfdda0,0xc)
9283 1 a.out MISC malloc: 12, 000000000010000000000000
9283 1 a.out RET utrace 0
However, memory is not allocated from the recently freed fragments
(even though they are the same size as what we are trying to allocate),
but rather more memory is requested from the backend. Since none has
been freed to the backend, this request fails. Further requests
to malloc anything will fail as well.
Note that we freed the memory from the same thread we are attempting
to reallocate it from. ktrace shows no other calls to malloc between
free() and the next call to posix_memalign(). That makes me
unsure of if this is really a malloc problem or something else.
Also, ideally other threads would be able to steal/use memory from
other arenas if backend memory has been exhausted. I didn't read
the code closely enough to see if this is supported.
>How-To-Repeat:
Run the following program:
=== snip ===
#include <sys/types.h>
#include <sys/sysctl.h>
#include <sys/mman.h>
#include <kvm.h>
#include <limits.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
static void *
mythread(void *arg)
{
void *v;
for (;;) {
if (posix_memalign(&v, 4096, 4096) == 0)
free(v);
}
}
int
main(int argc, char *argv[])
{
char buf[_POSIX2_LINE_MAX];
struct kinfo_proc2 *kp;
kvm_t *kd;
struct rlimit rl;
pthread_t pt;
int cnt;
/* scope out current size, give us 16megs more */
#define MOOOORE 16*1024*1024
kd = kvm_openfiles(NULL, NULL, NULL, KVM_NO_FILES, buf);
if (kd == NULL)
err(1, "kvm_openfiles: %s", buf);
kp = kvm_getproc2(kd, KERN_PROC_PID, getpid(), sizeof(*kp), &cnt);
if (kp == NULL)
err(1, "kvm_getprocs: %s", kvm_geterr(kd));
if (getrlimit(RLIMIT_AS, &rl) == -1)
err(1, "getrlimit");
rl.rlim_cur = kp->p_vm_vsize + MOOOORE;
if (setrlimit(RLIMIT_AS, &rl) == -1)
err(1, "setrlimit");
if (getrlimit(RLIMIT_DATA, &rl) == -1)
err(1, "getrlimit");
rl.rlim_cur = rl.rlim_max = MOOOORE;
if (setrlimit(RLIMIT_DATA, &rl) == -1)
err(1, "setrlimit");
pthread_create(&pt, NULL, mythread, NULL);
{
void *store[8];
void *v;
int i;
for (i = 0; i < 8; i++)
posix_memalign(&store[i], 4096, 4096);
while (posix_memalign(&v, 4096, 4096) == 0)
continue;
for (i = 0; i < 8; i++)
free(store[i]);
if (posix_memalign(&v, 4096, 4096) != 0)
err(1, "fail");
}
}
=== snip ===
Note that the condition triggers quite rarely:
pain-rustique:186:~> repeat 500 ./a.out
a.out: fail: Cannot allocate memory
a.out: fail: Cannot allocate memory
a.out: fail: Cannot allocate memory
a.out: fail: Cannot allocate memory
pain-rustique:187:~>
i.e. 496/500 times it did not show up. I could not trigger the
problem without the "helper" thread.
>Fix:
currently unknown
>Audit-Trail:
From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/43409: jemalloc x (threads + rlimit) = perpetual ENOMEM
Date: Sun, 28 Aug 2011 03:50:46 -0400
I ran the test a fair while but was unable to reproduce the problem on
recent netbsd-5 i386 (no SMP). Does it still trigger for someone with
SMP? If so, it could be that the problem only occurs with higher
concurrency, i.e. a race condition...
Thanks,
--
Matt
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.