NetBSD Problem Report #55670

From he@smistad.uninett.no  Sat Sep 19 18:37:27 2020
Return-Path: <he@smistad.uninett.no>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 4CCF71A9239
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 19 Sep 2020 18:37:27 +0000 (UTC)
Message-Id: <20200919183722.D486F43EAA7@smistad.uninett.no>
Date: Sat, 19 Sep 2020 20:37:22 +0200 (CEST)
From: he@NetBSD.org
Reply-To: he@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: pthread_create / pthread_join test may wedge
X-Send-Pr-Version: 3.95

>Number:         55670
>Category:       kern
>Synopsis:       pthread_create / pthread_join test may wedge
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Sep 19 18:40:00 +0000 2020
>Last-Modified:  Sat Sep 19 19:55:01 +0000 2020
>Originator:     he@NetBSD.org
>Release:        NetBSD 9.0_STABLE
>Organization:
   I Try...
>Environment:
System: NetBSD smistad.uninett.no 9.0_STABLE NetBSD 9.0_STABLE (GENERIC) #0: Sat May 30 02:09:41 CEST 2020 he@smistad.uninett.no:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
	This simple program adapted from

	https://github.com/rust-lang/rust/issues/76600#issuecomment-695335502

--------------------
#include <assert.h>
#include <pthread.h>
#include <stdlib.h>

#define N 800

static pthread_t threads[N];

static void *run(void *arg) {
        return malloc(1024);
}

int main() {
        for (int i = 0; i != N; ++i) assert(pthread_create(&threads[i], NULL, run, NULL) == 0);
        for (int i = 0; i != N; ++i) assert(pthread_join(threads[i], NULL) == 0);
}
--------------------

	when built with "cc -pthread t.c" and run repeatedly, may
	eventually wedge (the program, not the system).  When this
	happens, "ps sdw" shows one thread stuck in Z state, and the
	others in "parked" state:

UID   PID  PPID   CPU LID NLWP PRI NI     VSZ    RSS WCHAN  STAT TTY      LTIME COMMAND
169  6279  5683     0   1    1  85  0   27172   2816 ttyraw I    pts/6  0:00.01 - -tcsh 
169  7549 29786     0   1    1  85  0   27320   2968 pause  I    pts/8  0:00.06   `-- -tcsh 
169 11316  7549     0 618   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 614   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 570   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 436   13  43  0 3405132  23264 -      Z-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 414   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 399   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 386   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 371   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 343   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 317   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 313   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 301   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0   1   13  43  0 3405132  23264 parked I    pts/8  0:00.01     |-- ./a.out 

	In my case (i7 4th gen, 4 cores, 8 with HT), I had to try 25
	times before hitting the wedge.

	The original reproducer had N at just 4, and I could not get
	it to wedge with that on my host (I did more than 10000
	attempts).


>How-To-Repeat:
	See above.
>Fix:
	Sorry, don't know.

>Audit-Trail:
From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/55670: pthread_create / pthread_join test may wedge
Date: Sat, 19 Sep 2020 21:54:49 +0200

 On Sat, Sep 19, 2020 at 06:40:00PM +0000, he@NetBSD.org wrote:
 > >Number:         55670
 > >Category:       kern
 > >Synopsis:       pthread_create / pthread_join test may wedge
 > >Confidential:   no
 > >Severity:       serious
 > >Priority:       medium
 > >Responsible:    kern-bug-people
 > >State:          open
 > >Class:          sw-bug
 > >Submitter-Id:   net
 > >Arrival-Date:   Sat Sep 19 18:40:00 +0000 2020
 > >Originator:     he@NetBSD.org
 > >Release:        NetBSD 9.0_STABLE
 > >Organization:
 >    I Try...
 > >Environment:
 > System: NetBSD smistad.uninett.no 9.0_STABLE NetBSD 9.0_STABLE (GENERIC) #0: Sat May 30 02:09:41 CEST 2020 he@smistad.uninett.no:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
 > Architecture: x86_64
 > Machine: amd64
 > >Description:
 > 	This simple program adapted from
 > 
 > 	https://github.com/rust-lang/rust/issues/76600#issuecomment-695335502

 No surprise. There are a variety of locking issues in both jemalloc
 variants in 9.0. Many (all?) are fixed in current.

 Joerg

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.