NetBSD Problem Report #55670
From he@smistad.uninett.no Sat Sep 19 18:37:27 2020
Return-Path: <he@smistad.uninett.no>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 4CCF71A9239
for <gnats-bugs@gnats.NetBSD.org>; Sat, 19 Sep 2020 18:37:27 +0000 (UTC)
Message-Id: <20200919183722.D486F43EAA7@smistad.uninett.no>
Date: Sat, 19 Sep 2020 20:37:22 +0200 (CEST)
From: he@NetBSD.org
Reply-To: he@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: pthread_create / pthread_join test may wedge
X-Send-Pr-Version: 3.95
>Number: 55670
>Category: kern
>Synopsis: pthread_create / pthread_join test may wedge
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Sep 19 18:40:00 +0000 2020
>Last-Modified: Sat Sep 19 19:55:01 +0000 2020
>Originator: he@NetBSD.org
>Release: NetBSD 9.0_STABLE
>Organization:
I Try...
>Environment:
System: NetBSD smistad.uninett.no 9.0_STABLE NetBSD 9.0_STABLE (GENERIC) #0: Sat May 30 02:09:41 CEST 2020 he@smistad.uninett.no:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
This simple program adapted from
https://github.com/rust-lang/rust/issues/76600#issuecomment-695335502
--------------------
#include <assert.h>
#include <pthread.h>
#include <stdlib.h>
#define N 800
static pthread_t threads[N];
static void *run(void *arg) {
return malloc(1024);
}
int main() {
for (int i = 0; i != N; ++i) assert(pthread_create(&threads[i], NULL, run, NULL) == 0);
for (int i = 0; i != N; ++i) assert(pthread_join(threads[i], NULL) == 0);
}
--------------------
when built with "cc -pthread t.c" and run repeatedly, may
eventually wedge (the program, not the system). When this
happens, "ps sdw" shows one thread stuck in Z state, and the
others in "parked" state:
UID PID PPID CPU LID NLWP PRI NI VSZ RSS WCHAN STAT TTY LTIME COMMAND
169 6279 5683 0 1 1 85 0 27172 2816 ttyraw I pts/6 0:00.01 - -tcsh
169 7549 29786 0 1 1 85 0 27320 2968 pause I pts/8 0:00.06 `-- -tcsh
169 11316 7549 0 618 13 43 0 3405132 23264 parked I- pts/8 0:00.00 |-- ./a.out
169 11316 7549 0 614 13 43 0 3405132 23264 parked I- pts/8 0:00.00 |-- ./a.out
169 11316 7549 0 570 13 43 0 3405132 23264 parked I- pts/8 0:00.00 |-- ./a.out
169 11316 7549 0 436 13 43 0 3405132 23264 - Z- pts/8 0:00.00 |-- ./a.out
169 11316 7549 0 414 13 43 0 3405132 23264 parked I- pts/8 0:00.00 |-- ./a.out
169 11316 7549 0 399 13 43 0 3405132 23264 parked I- pts/8 0:00.00 |-- ./a.out
169 11316 7549 0 386 13 43 0 3405132 23264 parked I- pts/8 0:00.00 |-- ./a.out
169 11316 7549 0 371 13 43 0 3405132 23264 parked I- pts/8 0:00.00 |-- ./a.out
169 11316 7549 0 343 13 43 0 3405132 23264 parked I- pts/8 0:00.00 |-- ./a.out
169 11316 7549 0 317 13 43 0 3405132 23264 parked I- pts/8 0:00.00 |-- ./a.out
169 11316 7549 0 313 13 43 0 3405132 23264 parked I- pts/8 0:00.00 |-- ./a.out
169 11316 7549 0 301 13 43 0 3405132 23264 parked I- pts/8 0:00.00 |-- ./a.out
169 11316 7549 0 1 13 43 0 3405132 23264 parked I pts/8 0:00.01 |-- ./a.out
In my case (i7 4th gen, 4 cores, 8 with HT), I had to try 25
times before hitting the wedge.
The original reproducer had N at just 4, and I could not get
it to wedge with that on my host (I did more than 10000
attempts).
>How-To-Repeat:
See above.
>Fix:
Sorry, don't know.
>Audit-Trail:
From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/55670: pthread_create / pthread_join test may wedge
Date: Sat, 19 Sep 2020 21:54:49 +0200
On Sat, Sep 19, 2020 at 06:40:00PM +0000, he@NetBSD.org wrote:
> >Number: 55670
> >Category: kern
> >Synopsis: pthread_create / pthread_join test may wedge
> >Confidential: no
> >Severity: serious
> >Priority: medium
> >Responsible: kern-bug-people
> >State: open
> >Class: sw-bug
> >Submitter-Id: net
> >Arrival-Date: Sat Sep 19 18:40:00 +0000 2020
> >Originator: he@NetBSD.org
> >Release: NetBSD 9.0_STABLE
> >Organization:
> I Try...
> >Environment:
> System: NetBSD smistad.uninett.no 9.0_STABLE NetBSD 9.0_STABLE (GENERIC) #0: Sat May 30 02:09:41 CEST 2020 he@smistad.uninett.no:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
> Architecture: x86_64
> Machine: amd64
> >Description:
> This simple program adapted from
>
> https://github.com/rust-lang/rust/issues/76600#issuecomment-695335502
No surprise. There are a variety of locking issues in both jemalloc
variants in 9.0. Many (all?) are fixed in current.
Joerg
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.