NetBSD Problem Report #53998

From www@NetBSD.org  Thu Feb 21 20:08:39 2019
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id A31DC7A1BC
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 21 Feb 2019 20:08:39 +0000 (UTC)
Message-Id: <20190221200838.BBF197A1F0@mollari.NetBSD.org>
Date: Thu, 21 Feb 2019 20:08:38 +0000 (UTC)
From: joel.bertrand@systella.fr
Reply-To: joel.bertrand@systella.fr
To: gnats-bugs@NetBSD.org
Subject: sem_init() fails with error -1
X-Send-Pr-Version: www-1.0

>Number:         53998
>Category:       kern
>Synopsis:       sem_init() fails with error -1
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Feb 21 20:10:00 +0000 2019
>Last-Modified:  Wed Jul 17 08:20:01 +0000 2019
>Originator:     BERTRAND Joël
>Release:        NetBSD-8 (stable)
>Organization:
>Environment:
NetBSD schwarz.systella.fr 8.0_STABLE NetBSD 8.0_STABLE (CUSTOM) #28: Fri Feb  1 14:16:44 CET 2019  root@legendre.systella.fr:/usr/src/netbsd-8/obj/sys/arch/amd64/compile/CUSTOM amd64
>Description:
Hello,

A program I have written (and that perfectly runs under Linux or Solaris) randomly aborts on NetBSD. I have tried to debug and I have found that sem_init() can returns -1 without any reason.

My program use pthread_create() to start several threads. In each thread, I initialize a semaphore with sem_init(). This semaphore is deleted with sem_destroy() just before pthread_exit().

In each thread, I use fork() followed by execve() to launch a shell script.

My last execution stops with :

Interruption 16884 depuis 2 <- 16884th thread !
LAST ERROR: Unknown error: 4294967295 <- errno set by sem_init()
AT librpl_lancement_thread() FROM gestion_threads-conv.c LINE 86

Line 86 is :
if (sem_init(&((*s_etat_processus).semaphore_fork), 0, 0) != 0)
{
...
}

errno -1 is not set by librt (only EINVAL or ENOSPC as this semaphore is not shared).
I don't understand why sem_init() fails (and why with errno -1 !). I suspect a bug in librt or kernel itself.

Best regards,

JB
>How-To-Repeat:
I can provide my test program if required
>Fix:

>Audit-Trail:
From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53998 CVS commit: src/sys/kern
Date: Thu, 21 Feb 2019 16:49:23 -0500

 Module Name:	src
 Committed By:	christos
 Date:		Thu Feb 21 21:49:23 UTC 2019

 Modified Files:
 	src/sys/kern: uipc_sem.c

 Log Message:
 PR/53998: Joel Bertrand: Return ENOSPC when SEM_NSEMS_MAX is exceeded
 instead of -1.


 To generate a diff of this commit:
 cvs rdiff -u -r1.53 -r1.54 src/sys/kern/uipc_sem.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys/kern
Date: Fri, 22 Feb 2019 00:03:52 +0100

 Christos Zoulas a écrit :
 > The following reply was made to PR kern/53998; it has been noted by GNATS.
 > 
 > From: "Christos Zoulas" <christos@netbsd.org>
 > To: gnats-bugs@gnats.NetBSD.org
 > Cc: 
 > Subject: PR/53998 CVS commit: src/sys/kern
 > Date: Thu, 21 Feb 2019 16:49:23 -0500
 > 
 >  Module Name:	src
 >  Committed By:	christos
 >  Date:		Thu Feb 21 21:49:23 UTC 2019
 >  
 >  Modified Files:
 >  	src/sys/kern: uipc_sem.c
 >  
 >  Log Message:
 >  PR/53998: Joel Bertrand: Return ENOSPC when SEM_NSEMS_MAX is exceeded
 >  instead of -1.
 >  
 >  
 >  To generate a diff of this commit:
 >  cvs rdiff -u -r1.53 -r1.54 src/sys/kern/uipc_sem.c
 >  
 >  Please note that diffs are not public domain; they are subject to the
 >  copyright notices on the relevant files.

 	Christos,

 	Thanks a lot for this patch, but I'm not really sure it will fix this
 issue. Of course, with your patch, sem_init() will return ENOSPC instead
 of -1. but I'm not understand how a memory allocation error can occur in
 my code. I have verified that my test program doesn't contain memory
 leak and, when sem_init() fails, system has more than 10GB free.

 	Of course, I create a semahore with sem_init() in each thread but I
 destroy this semaphore also with sem_destroy() (returns 0) before
 pthread_exit().

 	Best regards,

 	JB

From: Christos Zoulas <christos@zoulas.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 joel.bertrand@systella.fr
Subject: Re: PR/53998 CVS commit: src/sys/kern
Date: Thu, 21 Feb 2019 18:58:55 -0500

 > 
 > 	Christos,
 > 
 > 	Thanks a lot for this patch, but I'm not really sure it will fix this
 > issue. Of course, with your patch, sem_init() will return ENOSPC instead
 > of -1. but I'm not understand how a memory allocation error can occur in
 > my code. I have verified that my test program doesn't contain memory
 > leak and, when sem_init() fails, system has more than 10GB free.
 > 
 > 	Of course, I create a semahore with sem_init() in each thread but I
 > destroy this semaphore also with sem_destroy() (returns 0) before
 > pthread_exit().

 Yes, I understand, I am documenting that this also fails when the
 max number of semaphors is exceeded. So perhaps we have a leak
 somewhere. Can you share your test program?

 christos

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys/kern
Date: Fri, 22 Feb 2019 08:55:23 +0100

 Christos Zoulas a écrit :
 > The following reply was made to PR kern/53998; it has been noted by GNATS.
 > 
 > From: Christos Zoulas <christos@zoulas.com>
 > To: "gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>
 > Cc: kern-bug-people@netbsd.org,
 >  gnats-admin@netbsd.org,
 >  netbsd-bugs@netbsd.org,
 >  joel.bertrand@systella.fr
 > Subject: Re: PR/53998 CVS commit: src/sys/kern
 > Date: Thu, 21 Feb 2019 18:58:55 -0500
 > 
 >  > 
 >  > 	Christos,
 >  > 
 >  > 	Thanks a lot for this patch, but I'm not really sure it will fix this
 >  > issue. Of course, with your patch, sem_init() will return ENOSPC instead
 >  > of -1. but I'm not understand how a memory allocation error can occur in
 >  > my code. I have verified that my test program doesn't contain memory
 >  > leak and, when sem_init() fails, system has more than 10GB free.
 >  > 
 >  > 	Of course, I create a semahore with sem_init() in each thread but I
 >  > destroy this semaphore also with sem_destroy() (returns 0) before
 >  > pthread_exit().
 >  
 >  Yes, I understand, I am documenting that this also fails when the
 >  max number of semaphors is exceeded. So perhaps we have a leak
 >  somewhere. Can you share your test program?

 	Of course, I can share this code. It is a test program written in
 RPL/2. I will put both (RPL/2 sources and test program) on an anonymous
 ftp server as soon as possible.

 	Regards,

 	JB

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys/kern
Date: Fri, 22 Feb 2019 10:42:35 +0100

 	OK,

 	I have uploaded test program on fermat.systella.fr (anonymous ftp).
 Please note that this server does not run NetBSD but OpenVMS 7.3 and
 don't forget to use BIN keyword before trying to download tarball.

 	Two files :
 - RPL2-CURRENT.TGZ (script language)
 - TEST-SCHWARZ.RPL (program written in RPL/2)

 	To build RPL/2, you must have gcc (I use gcc version 8.2.0 from pkgsrc)
 with gfortran and g++, bash, automake, autoconf, libtool, uuencode,
 patch and gmake. Untar tarball. Last operation will create rpl subdirectory.

 	In rpl directory, run autogen.sh. You can now create another directory
 (for example ../build) to run configure script :
 ../rpl/configure --disable-motif --disable-rplcas
 --enable-final-encoding=UTF-8

 	configure script takes a long time. Without --prefix option, program
 will be installed in /usr/local (you can install RPL where you want, but
 you have to fix she-bang in test-schwarz.rpl).

 	When configure is done, run gmake and gmake install (not BSD make).
 Make takes a long time also as it builds some libraries that are not
 installed by default on several operating systems. Now, you can run
 test-schwarz.rpl. You have to wait until error occurs (often, between
 10000 and 40000 threads...). This program tries to run two threads
 (instruction SPAWN), and only two in parallel (with of course, main
 thread). Each thread evaluates "echo test" (in SYSEVAL instruction, not
 with system() but with fork() and execve()). SPAWN instruction starts
 its child thread in lancement_thread() function (src/gestion_threads.c).
 Faulty semaphore is created line 80 and destroyed line 1053. Of course,
 I have verified that sem_destroy() is called and returns without error.
 First error always comes from sem_init().

 	By default, rpl will print a message like:

 LAST ERROR: Unknown error: 4294967295
 ERROR 2001 AT librpl_lancement_thread() FROM gestion_threads-conv.c LINE 86
 [21624-124346313330688] librpl_lancement_thread() from
 gestion_threads-conv.c at line 86: BACKTRACE only defined in glibc

 when error occurs. In this case, error was set line 86 of
 src/gestion_threads.c.

 	Please note that it could be a memory leak somewhere in RPL/2. But
 valgrind (on Linux) doesn't show any memory leak and I have run the same
 test script during several days without any trouble (more than 3E6
 threads) on both Linux (amd64) and Solaris (sparc).

 	I have seen another bug that could be related to this one. Sometimes,
 when I have debugged this program, RPL/2 aborts directly line 322 of
 src/rpl.c (it's the same semaphore, but created for main thread). Only
 one solution : reboot NetBSD. I suppose sem_destroy() doesn't really
 destroy semaphore or kernel doesn't free associated memory even if all
 processes that access to this semaphore are terminated.

 	Best regards,

 	JB

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys/kern
Date: Sat, 23 Feb 2019 11:04:16 +0100

 This is a multi-part message in MIME format.
 --------------C83B134363CD954F89200937
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 7bit

 	Hello,

 	I have found another strange behavior with POSIX semaphores that could
 be related to this PR. I have attached a little test program. This
 program only does two fork() to daemonize itself.

 	Please note that it creates two named semaphores, one in main program
 and another one in daemonized process.

 schwarz# gcc daemon.c -g -lrt
 schwarz# ./a.out
 > /TEST-29002
 Parent process
 < /TEST-29002
 [1]   Segmentation fault (core dumped) ./a.out
 schwarz# End of second process
 Daemonized process
 > /TEST-19064
 Closing semaphore
 < /TEST-19064

 	sem_close() raises a segfault:

 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x0000759346a017b8 in sem_close () from /usr/lib/librt.so.1
 (gdb) bt
 #0  0x0000759346a017b8 in sem_close () from /usr/lib/librt.so.1
 #1  0x0000000000400c1a in sem_delete (semaphore=0x7f7fff6b44f0) at
 daemon.c:36
 #2  0x0000000000400d1c in main () at daemon.c:84

 	To be clear, each segfault triggers a segfault and a.out generates two
 a.out.core (one for main program, second one for daemonized process).

 	Same program perfetcly runs on a Linux workstation :

 hilbert:[~] > ./a.out
 > /TEST-26345
 Parent process
 < /TEST-26345
 sem_close: Invalid argument
 After sem_close()
 End of parent process
 hilbert:[~] > End of second process
 Daemonized process
 > /TEST-26347
 Closing semaphore
 < /TEST-26347
 sem_close: Invalid argument
 After sem_close()
 End

 	Best regards,

 	JKB


 --------------C83B134363CD954F89200937
 Content-Type: text/x-csrc;
  name="daemon.c"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="daemon.c"

 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <semaphore.h>
 #include <sys/stat.h>
 #include <fcntl.h>

 static void
 sem_create(sem_t *semaphore)
 {
 	char	name[15];

 	sprintf(name, "/TEST-%d", getpid());
 	printf("> %s\n", name);
 	fflush(NULL);

 	if ((semaphore = sem_open(name, O_RDWR | O_CREAT | O_EXCL,
 			S_IRUSR | S_IWUSR, 0)) == SEM_FAILED)
 	{
 		perror("sem_open()");
 		fflush(NULL);
 	}

 	return;
 }

 static void
 sem_delete(sem_t *semaphore)
 {
 	char	name[15];

 	sprintf(name, "/TEST-%d", getpid());
 	printf("< %s\n", name);
 	fflush(NULL);

 	if (sem_close(semaphore) != 0)
 	{
 		perror("sem_close");
 		fflush(NULL);
 	}

 	printf("After sem_close()\n");
 	sem_unlink(name);

 	return;
 }

 int
 main()
 {
 	int		pid;
 	sem_t	semaphore;

 	sem_create(&semaphore);

 	pid = fork();

 	if (pid > 0)
 	{
 		printf("Parent process\n");
 		sem_delete(&semaphore);
 		printf("End of parent process\n");
 		fflush(NULL);
 		_exit(EXIT_SUCCESS);
 	}

 	sleep(1);

 	setsid();
 	pid = fork();

 	if (pid > 0)
 	{
 		printf("End of second process\n");
 		fflush(NULL);
 		_exit(EXIT_SUCCESS);
 	}

 	printf("Daemonized process\n");
 	fflush(NULL);
 	sem_create(&semaphore);
 	printf("Closing semaphore\n");
 	fflush(NULL);
 	sem_delete(&semaphore);
 	printf("End\n");
 	fflush(NULL);

 	sleep(1);

 	exit(EXIT_SUCCESS);
 }

 --------------C83B134363CD954F89200937--

From: Christos Zoulas <christos@zoulas.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 joel.bertrand@systella.fr
Subject: Re: PR/53998 CVS commit: src/sys/kern
Date: Sat, 23 Feb 2019 09:43:40 -0500

 The original program was incorrect; here's a fixed version that works as =
 expected.

 christos

 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <semaphore.h>
 #include <sys/stat.h>
 #include <fcntl.h>

 static void
 sem_create(sem_t **semaphore)
 {
         char    name[15];

         sprintf(name, "/TEST-%d", getpid());
         printf("> %s\n", name);
         fflush(NULL);

         if ((*semaphore =3D sem_open(name, O_RDWR | O_CREAT | O_EXCL,
                         S_IRUSR | S_IWUSR, 0)) =3D=3D SEM_FAILED)
         {
                 perror("sem_open()");
                 fflush(NULL);
         }

         return;
 }

 static void
 sem_delete(sem_t **semaphore)
 {
         char    name[15];

         sprintf(name, "/TEST-%d", getpid());
         printf("< %s\n", name);
         fflush(NULL);

         if (sem_close(*semaphore) !=3D 0)
         {
                 perror("sem_close");
                 fflush(NULL);
         }

         printf("After sem_close()\n");
         sem_unlink(name);

         return;
 }

 int
 main()
 {
         int             pid;
         sem_t   *semaphore;

         sem_create(&semaphore);

         pid =3D fork();

         if (pid > 0)
         {
                 printf("Parent process\n");
                 sem_delete(&semaphore);
                 printf("End of parent process\n");
                 fflush(NULL);
                 _exit(EXIT_SUCCESS);
         }

         sleep(1);

         setsid();
         pid =3D fork();

         if (pid > 0)
         {
                 printf("End of second process\n");
                 fflush(NULL);
                 _exit(EXIT_SUCCESS);
         }

         printf("Daemonized process\n");
         fflush(NULL);
         sem_create(&semaphore);
         printf("Closing semaphore\n");
         fflush(NULL);
         sem_delete(&semaphore);
         printf("End\n");
         fflush(NULL);

         sleep(1);

         exit(EXIT_SUCCESS);
 }

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: Christos Zoulas <christos@zoulas.com>,
        "gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: PR/53998 CVS commit: src/sys/kern
Date: Sat, 23 Feb 2019 19:51:10 +0100

 This is a multi-part message in MIME format.
 --------------B994DFAEB0534B5A216F6793
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit

 Christos Zoulas a écrit :
 > The original program was incorrect; here's a fixed version that works as expected.

 	Oops, sorry, I don't understand why I have forgotten a pointer.
 That's being said, in my original program, I use sem_t * and this
 program fails.

 	I have tried to reproduce in a simple program that raises issue I can
 see in RPL/2 (daemon.c in attachment).

 schwarz# gcc daemon.c -lrt
 schwarz# ./a.out
 > /TEST-23204-1
 > /TEST-23204-2
 Parent process
 < /TEST-23204-1
 After sem_close()
 < /TEST-23204-2
 After sem_close()
 End of parent process
 schwarz# End of second process
 Daemonized process
 > /TEST-7787-1
 sem_open(): Unknown error: 4294967295
 > /TEST-7787-2
 sem_open(): Unknown error: 4294967295
 Closing semaphore
 < /TEST-7787-1

 and generates core file.

 	Oops ;-)

 	To trigger this issue, you have to define more than one semaphore. With
 only ONE semaphore, program runs fine. With more than one, I suppose
 sem_close() does some memory corruptions.

 	sem_close() is called from child process as sem_unlink() is not enough
 to free ressources.

 	Best regards,

 	JKB

 --------------B994DFAEB0534B5A216F6793
 Content-Type: text/x-csrc;
  name="daemon.c"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="daemon.c"

 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <semaphore.h>
 #include <sys/stat.h>
 #include <fcntl.h>

 static void
 sem_create(sem_t **semaphore, int s)
 {
 	char	name[15];

 	sprintf(name, "/TEST-%d-%d", getpid(), s);
 	printf("> %s\n", name);
 	fflush(NULL);

 	if ((*semaphore = sem_open(name, O_RDWR | O_CREAT | O_EXCL,
 			S_IRUSR | S_IWUSR, 0)) == SEM_FAILED)
 	{
 		perror("sem_open()");
 		fflush(NULL);
 	}

 	return;
 }

 static void
 sem_delete(sem_t **semaphore, int s)
 {
 	char	name[15];

 	sprintf(name, "/TEST-%d-%d", getpid(), s);
 	printf("< %s\n", name);
 	fflush(NULL);

 	if (sem_close(*semaphore) != 0)
 	{
 		perror("sem_close");
 		fflush(NULL);
 	}

 	printf("After sem_close()\n");
 	sem_unlink(name);

 	return;
 }

 int
 main()
 {
 	int		pid;
 	sem_t	*semaphore1;
 	sem_t	*semaphore2;

 	sem_create(&semaphore1, 1);
 	sem_create(&semaphore2, 2);

 	pid = fork();

 	if (pid > 0)
 	{
 		printf("Parent process\n");
 		sem_delete(&semaphore1, 1);
 		sem_delete(&semaphore2, 2);
 		printf("End of parent process\n");
 		_exit(EXIT_SUCCESS);
 	}

 	sleep(1);

 	setsid();
 	pid = fork();

 	if (pid > 0)
 	{
 		sem_close(semaphore1);
 		sem_close(semaphore2);
 		printf("End of second process\n");
 		fflush(NULL);
 		_exit(EXIT_SUCCESS);
 	}

 	sem_close(semaphore1);
 	sem_close(semaphore2);

 	printf("Daemonized process\n");
 	fflush(NULL);
 	sem_create(&semaphore1, 1);
 	sem_create(&semaphore2, 2);
 	printf("Closing semaphore\n");
 	fflush(NULL);
 	sem_delete(&semaphore1, 1);
 	sem_delete(&semaphore2, 2);
 	printf("End\n");
 	fflush(NULL);

 	sleep(1);

 	exit(EXIT_SUCCESS);
 }

 --------------B994DFAEB0534B5A216F6793--

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	joel.bertrand@systella.fr
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys/kern
Date: Sat, 23 Feb 2019 23:58:56 -0500

 On Feb 23,  6:55pm, joel.bertrand@systella.fr (=?UTF-8?Q?BERTRAND_Jo=c3=abl?=) wrote:
 -- Subject: Re: PR/53998 CVS commit: src/sys/kern

 The problem is that p_nsems is zeroed out on fork so we end up with negative
 counts when we close on the child (which are interpreted as overflow). The
 following patch moves p_nsems to the copy part of struct proc.

 christos

 Index: proc.h
 ===================================================================
 RCS file: /cvsroot/src/sys/sys/proc.h,v
 retrieving revision 1.350
 diff -u -p -u -r1.350 proc.h
 --- proc.h	5 Dec 2018 18:16:51 -0000	1.350
 +++ proc.h	24 Feb 2019 04:56:30 -0000
 @@ -316,7 +316,6 @@ struct proc {
  	pid_t 		p_vfpid_done;	/* :: vforked done pid */
  	lwpid_t		p_lwp_created;	/* :: lwp created */
  	lwpid_t		p_lwp_exited;	/* :: lwp exited */
 -	u_int		p_nsems;	/* Count of semaphores */
  	char		*p_path;	/* :: full pathname of executable */

  /*
 @@ -338,7 +337,7 @@ struct proc {

  	vaddr_t		p_psstrp;	/* :: address of process's ps_strings */
  	u_int		p_pax;		/* :: PAX flags */
 -
 +	u_int		p_nsems;	/* Count of semaphores */
  	int		p_xexit;	/* p: exit code */
  /*
   * End area that is copied on creation

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys/kern
Date: Sun, 24 Feb 2019 18:19:09 +0100

 	Thanks a lot, Christos. Your patch fixes test program but not this PR.
 I have rebuilt all NetBSD-8 with this patch and my RPL/2 test program
 randomly aborts in sem_init() :

 LAST ERROR: Unknown error: 4294967295
 ERROR 2001 AT librpl_lancement_thread() FROM gestion_threads-conv.c LINE 86
 [22596-137493181526016] librpl_lancement_thread() from
 gestion_threads-conv.c at line 86: BACKTRACE only defined in glibc

 	I haven't applied your first patch to obtain ENOSPC...

 	Best regards,

 	JB

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	joel.bertrand@systella.fr
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys/kern
Date: Sun, 24 Feb 2019 15:27:55 -0500

 On Feb 24,  5:20pm, joel.bertrand@systella.fr (=?UTF-8?Q?BERTRAND_Jo=c3=abl?=) wrote:
 -- Subject: Re: PR/53998 CVS commit: src/sys/kern

 I can see the race conditions for p_nsems, but since this intended just
 to be a guard to prevent abuse, it is simple enough to avoid it for now...

 christos

 Index: uipc_sem.c
 ===================================================================
 RCS file: /cvsroot/src/sys/kern/uipc_sem.c,v
 retrieving revision 1.54
 diff -u -p -u -r1.54 uipc_sem.c
 --- uipc_sem.c	21 Feb 2019 21:49:23 -0000	1.54
 +++ uipc_sem.c	24 Feb 2019 20:25:43 -0000
 @@ -511,7 +511,11 @@ ksem_free(ksem_t *ks)
  	kmem_free(ks, sizeof(ksem_t));

  	atomic_dec_uint(&nsems_total);
 - 	atomic_dec_uint(&curproc->p_nsems);	
 + 	while ((int)atomic_dec_uint_nv(&curproc->p_nsems) < 0) {
 +		printf("%s: pid=%d sem=%p negarive refcount %u\n", __func__,
 +		    curproc->p_pid, ks, curproc->p_nsems);
 +		atomic_inc_uint(&curproc->p_nsems);
 +	}
  }

  #define	KSEM_ID_IS_PSHARED(id)		\

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: Christos Zoulas <christos@zoulas.com>, gnats-bugs@NetBSD.org,
        kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys/kern
Date: Mon, 25 Feb 2019 15:00:35 +0100

 This is a multi-part message in MIME format.
 --------------DCA37B7454E92FDAA76087C5
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 7bit

 	Christos,

 	I have applied bith patch (you will find in attachment patch against -8).

 	This patch partially fixes this PR as my test program can create more
 threads before first sem_init() failure. But I think there is another
 race condition somewhere in semaphore subsystem.

 	After sem_init() failure, there is no specific message in dmesg or logfile.

 	I don't understand why atomic_dec_uint(&nsems_total) or
 atomic_inc_uint(&nsems_total) are not protected by a lock.

 	Best regards,

 	JB

 --------------DCA37B7454E92FDAA76087C5
 Content-Type: text/x-patch;
  name="PR53998.patch"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="PR53998.patch"

 ? sys/arch/amd64/conf/CUSTOM
 Index: sys/kern/uipc_sem.c
 ===================================================================
 RCS file: /cvsroot/src/sys/kern/uipc_sem.c,v
 retrieving revision 1.47
 diff -u -r1.47 uipc_sem.c
 --- sys/kern/uipc_sem.c	31 Oct 2016 15:08:45 -0000	1.47
 +++ sys/kern/uipc_sem.c	25 Feb 2019 12:25:45 -0000
 @@ -374,7 +374,11 @@
  	kmem_free(ks, sizeof(ksem_t));

  	atomic_dec_uint(&nsems_total);
 - 	atomic_dec_uint(&curproc->p_nsems);	
 +	while((int)atomic_dec_uint_nv(&curproc->p_nsems) < 0) {
 +		printf("%s: pid=%d sem=%p negative refcount %u\n", __func__,
 +				curproc->p_pid, ks, curproc->p_nsems);
 +		atomic_inc_uint(&curproc->p_nsems);
 +	}
  }

  int
 Index: sys/sys/proc.h
 ===================================================================
 RCS file: /cvsroot/src/sys/sys/proc.h,v
 retrieving revision 1.340.6.1
 diff -u -r1.340.6.1 proc.h
 --- sys/sys/proc.h	12 Apr 2018 13:42:49 -0000	1.340.6.1
 +++ sys/sys/proc.h	25 Feb 2019 12:25:46 -0000
 @@ -314,7 +314,6 @@
  	pid_t 		p_vfpid_done;	/* :: vforked done pid */
  	lwpid_t		p_lwp_created;	/* :: lwp created */
  	lwpid_t		p_lwp_exited;	/* :: lwp exited */
 -	u_int		p_nsems;	/* Count of semaphores */

  /*
   * End area that is zeroed on creation
 @@ -335,7 +334,7 @@

  	vaddr_t		p_psstrp;	/* :: address of process's ps_strings */
  	u_int		p_pax;		/* :: PAX flags */
 -
 +	u_int		p_nsems;	/* Count of semaphores */
  	int		p_xexit;	/* p: exit code */
  /*
   * End area that is copied on creation

 --------------DCA37B7454E92FDAA76087C5--

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	joel.bertrand@systella.fr
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys/kern
Date: Mon, 25 Feb 2019 09:22:40 -0500

 On Feb 25,  2:05pm, joel.bertrand@systella.fr (=?UTF-8?Q?BERTRAND_Jo=c3=abl?=) wrote:
 -- Subject: Re: PR/53998 CVS commit: src/sys/kern

 |  	Christos,
 |  
 |  	I have applied bith patch (you will find in attachment patch against -8).
 |  
 |  	This patch partially fixes this PR as my test program can create more
 |  threads before first sem_init() failure. But I think there is another
 |  race condition somewhere in semaphore subsystem.
 |  
 |  	After sem_init() failure, there is no specific message in dmesg or logfile.
 |  
 |  	I don't understand why atomic_dec_uint(&nsems_total) or
 |  atomic_inc_uint(&nsems_total) are not protected by a lock.

 Yes, the whole thing is racy... But atomics don't need to be protected
 by locks... How does it fail now ENFILE?

 christos

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys/kern
Date: Mon, 25 Feb 2019 15:38:19 +0100

 Christos Zoulas a écrit :
 > The following reply was made to PR kern/53998; it has been noted by GNATS.
 > 
 > From: christos@zoulas.com (Christos Zoulas)
 > To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
 > 	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
 > 	joel.bertrand@systella.fr
 > Cc: 
 > Subject: Re: PR/53998 CVS commit: src/sys/kern
 > Date: Mon, 25 Feb 2019 09:22:40 -0500
 > 
 >  On Feb 25,  2:05pm, joel.bertrand@systella.fr (=?UTF-8?Q?BERTRAND_Jo=c3=abl?=) wrote:
 >  -- Subject: Re: PR/53998 CVS commit: src/sys/kern
 >  
 >  |  	Christos,
 >  |  
 >  |  	I have applied bith patch (you will find in attachment patch against -8).
 >  |  
 >  |  	This patch partially fixes this PR as my test program can create more
 >  |  threads before first sem_init() failure. But I think there is another
 >  |  race condition somewhere in semaphore subsystem.
 >  |  
 >  |  	After sem_init() failure, there is no specific message in dmesg or logfile.
 >  |  
 >  |  	I don't understand why atomic_dec_uint(&nsems_total) or
 >  |  atomic_inc_uint(&nsems_total) are not protected by a lock.
 >  
 >  Yes, the whole thing is racy... But atomics don't need to be protected
 >  by locks... How does it fail now ENFILE?

 LAST ERROR: Unknown error: 4294967295 (-1)

 Same errno.

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys/kern
Date: Thu, 28 Feb 2019 23:32:01 +0100

 This is a multi-part message in MIME format.
 --------------BA7F7BE325A203B8FEE1ED6D
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 7bit

 	Hello,

 	In attachement a quick and dirty program that shows sem_init() bugs. If
 I remove fork() in child thread, sem_init() runs fine.

 	Best regards,

 	JB

 --------------BA7F7BE325A203B8FEE1ED6D
 Content-Type: text/plain; charset=UTF-8;
  name="Makefile"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment;
  filename="Makefile"

 c2VtYXBob3JlOiBzZW1hcGhvcmUubwoJZ2NjIC1nIC1vICRAICQrIC1scHRocmVhZAoKY2xl
 YW46CglybSAtZiAqLm8gc2VtYXBob3JlCgolLm86ICUuYwoJZ2NjIC1nIC1jICQ8IC1XYWxs
 IC1vICRACg==
 --------------BA7F7BE325A203B8FEE1ED6D
 Content-Type: text/x-csrc;
  name="semaphore.c"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="semaphore.c"

 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <stdbool.h>
 #include <semaphore.h>
 #include <pthread.h>
 #include <string.h>

 #define NUMBER_OF_THREADS	8

 typedef struct
 {
 	pthread_t		tid;
 	volatile bool	active;
 	long long		i;
 	sem_t			s;
 } thread_arg_t;

 thread_arg_t targ[NUMBER_OF_THREADS];

 void *
 thread(void *arg)
 {
 	char			**arguments;

 	pid_t			pid;
 	thread_arg_t	*ta = arg;

 	if (sem_init(&(ta->s), 0, 0) != 0)
 	{
 		perror("sem_init()");
 		_exit(EXIT_FAILURE);
 	}

 	pid = fork();

 	if (pid < 0)
 	{
 		perror("fork()");
 		_exit(EXIT_FAILURE);
 	}
 	else if (pid == 0)
 	{
 		// Child

 		if ((arguments = malloc(3 * sizeof(char *))) == NULL)
 		{
 			perror("malloc()");
 			_exit(EXIT_FAILURE);
 		}

 		if ((arguments[0] = malloc(5 * sizeof(char *))) == NULL)
 		{
 			perror("malloc(0)");
 			_exit(EXIT_FAILURE);
 		}

 		if ((arguments[1] = malloc(40 * sizeof(char *))) == NULL)
 		{
 			perror("malloc(1)");
 			_exit(EXIT_FAILURE);
 		}

 		strcpy(arguments[0], "echo");
 		sprintf(arguments[1], "< %lld >", ta->i);
 		arguments[2] = NULL;

 		execvp(arguments[0], arguments);
 		perror("execvp()");
 		_exit(EXIT_FAILURE);
 	}

 	if (sem_destroy(&(ta->s)) != 0)
 	{
 		perror("sem_destroy()");
 		_exit(EXIT_FAILURE);
 	}

 	ta->active = false;
 	pthread_exit(NULL);
 }

 int
 main()
 {
 	int			i;
 	long long	nt;

 	for(nt = 0;;)
 	{
 		for(i = 0; i < NUMBER_OF_THREADS; i++)
 		{
 			if (targ[i].active == false)
 			{
 				break;
 			}
 		}

 		if (i < NUMBER_OF_THREADS)
 		{
 			printf("Free slot : %d (%lld threads)\n", i, ++nt);
 			targ[i].active = true;
 			targ[i].i = nt;

 			if (pthread_create(&(targ[i].tid), NULL, thread, &(targ[i])) != 0)
 			{
 				perror("pthread_create()");
 				exit(EXIT_FAILURE);
 			}

 			if (pthread_detach(targ[i].tid) != 0)
 			{
 				perror("pthread_detach()");
 				exit(EXIT_FAILURE);
 			}
 		}
 		else
 		{
 			usleep(1000);
 		}
 	}

 	exit(EXIT_SUCCESS);
 }

 --------------BA7F7BE325A203B8FEE1ED6D--

From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53998 CVS commit: src/sys
Date: Thu, 28 Feb 2019 22:03:19 -0500

 Module Name:	src
 Committed By:	christos
 Date:		Fri Mar  1 03:03:19 UTC 2019

 Modified Files:
 	src/sys/kern: kern_uidinfo.c uipc_sem.c
 	src/sys/sys: param.h proc.h uidinfo.h

 Log Message:
 PR/53998: Joel Bertrand:  Limit the number of semaphores on a
 per-user basis not a per-process.  We cannot really keep track on
 a per-process basis because a parent process can create the semaphore
 and a child can free it taking credit for it.  There is also a
 similar issue about resource exhaustion if we limited the number
 of lwps per process as opposed to per user (which we don't).


 To generate a diff of this commit:
 cvs rdiff -u -r1.10 -r1.11 src/sys/kern/kern_uidinfo.c
 cvs rdiff -u -r1.54 -r1.55 src/sys/kern/uipc_sem.c
 cvs rdiff -u -r1.582 -r1.583 src/sys/sys/param.h
 cvs rdiff -u -r1.350 -r1.351 src/sys/sys/proc.h
 cvs rdiff -u -r1.3 -r1.4 src/sys/sys/uidinfo.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys
Date: Fri, 1 Mar 2019 08:57:17 +0100

 	Thanks a lot for this patch, Christos. I cannot test it before two or
 three days as my NetBSD server currently runs a huge Spice simulation
 but I'll come back to give a feedback.

 	Best regards,

 	JB

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys
Date: Sat, 9 Mar 2019 08:56:11 +0100

 	Hello Christos,

 	I have built a new kernel with your patches (-current, not -8 as I
 haven't time enough to try to put your patches in -8 kernel, I have
 tried without success). Thus, my test workstation now run -8 userland
 with -current kernel :

 schwarz:[~] > uname -a
 NetBSD schwarz.systella.fr 8.99.35 NetBSD 8.99.35 (GENERIC) #2: Fri Mar
 8 13:08:03 CET 2019
 root@legendre.systella.fr:/usr/src/netbsd-current/obj/sys/arch/amd64/compile/GENERIC
 amd64

 	My RPL/2 test aborts, but not in sem_init(). I have rebuilt my quick
 and dirty stress test semaphore.c (and I have added before sem_destroy()
 in thread() a new ligne with sleep(1) to be sure that forked process is
 terminated before end of parent thread.).

 	Now, semaphore.c aborts with :
 Free slot : 6 (911 threads)
 Free slot : 7 (912 threads)
 < 905 >
 < 907 >
 < 906 >
 < 910 >
 < 908 >
 < 909 >
 < 911 >
 < 912 >
 Free slot : 0 (913 threads)
 Free slot : 1 (914 threads)
 Free slot : 2 (915 threads)
 Free slot : 3 (916 threads)
 Free slot : 4 (917 threads)
 Free slot : 5 (918 threads)
 Free slot : 6 (919 threads)
 Free slot : 7 (920 threads)
 < 913 >
 < 914 >
 fork(): Resource temporarily unavailable

 	I suppose that one second is enough to run "echo <thread>". That's
 being said, when semaphore.c aborts, you can see that there is no more 8
 forked processes.

 	Please note that program aborts always after 900 or 1000 forked processes.

 	Best regards,

 	JB

From: Christos Zoulas <christos@zoulas.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 joel.bertrand@systella.fr
Subject: Re: PR/53998 CVS commit: src/sys
Date: Sat, 9 Mar 2019 22:19:16 -0500

 This is not a bug. Nobody is waiting for those processes (they all end =
 up zombies), so eventually you run out.

 [10:18pm] 2463>diff -u semaphore.c{,.new}
 --- semaphore.c 2019-03-09 22:18:25.682843288 -0500
 +++ semaphore.c.new     2019-03-09 22:18:11.078956324 -0500
 @@ -5,6 +5,7 @@
  #include <semaphore.h>
  #include <pthread.h>
  #include <string.h>
 +#include <sys/wait.h>
 =20
  #define NUMBER_OF_THREADS      8
 =20
 @@ -22,6 +23,7 @@
  thread(void *arg)
  {
         char                    **arguments;
 +       int status;
 =20
         pid_t                   pid;
         thread_arg_t    *ta =3D arg;
 @@ -75,6 +77,7 @@
                 perror("sem_destroy()");
                 _exit(EXIT_FAILURE);
         }
 +       wait(&status);
 =20
         ta->active =3D false;
         pthread_exit(NULL);

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: Christos Zoulas <christos@zoulas.com>,
        "gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: PR/53998 CVS commit: src/sys
Date: Sun, 10 Mar 2019 11:09:24 +0100

 This is a multi-part message in MIME format.
 --------------C8161B390C4FAED3061F6A9A
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit

 Christos Zoulas a écrit :
 > This is not a bug. Nobody is waiting for those processes (they all end up zombies), so eventually you run out.

 	Certainly. I haven't put waitpid() to simplify my stress test. When
 test program aborts, there are some child processes in run state (not in
 zombie state) and they take a lot of CPU time. gdb shows that these
 processes stall in /usr/libexec/ld.elf_so without any other information
 (no debugging symbols found).

 	That's being said, in my complete program, I wait for child process
 with waitpid(), line 4138 of src/instructions_s1.c :

         while(waitpid(pid, &status, 0) == -1)
         {
             if (errno != EINTR)
             {
                 (*s_etat_processus).erreur_systeme = d_es_processus;
                 return;
             }
         }

 	If you run attached program, you will obtain following output :

 < 454 >
 < 456 >
 < 452 >
 < 450 >
 < 453 >
 Free slot : 0 (457 threads)
 Free slot : 1 (458 threads)
 Free slot : 2 (459 threads)
 Free slot : 3 (460 threads)
 Free slot : 4 (461 threads)
 Free slot : 5 (462 threads)
 Free slot : 6 (463 threads)
 Free slot : 7 (464 threads)
 < 458 >
 < 460 >
 < 459 >
 < 457 >
 < 464 >
 < 461 >
 < 463 >
 < 462 >
 Free slot : 0 (465 threads)
 Free slot : 1 (466 threads)
 Free slot : 2 (467 threads)
 Free slot : 3 (468 threads)
 Free slot : 4 (469 threads)
 Free slot : 5 (470 threads)
 Free slot : 6 (471 threads)
 dead lock detected   <-------------- ?
 Free slot : 7 (472 threads)
 < 466 >
 < 467 >
 < 469 >

 	After some times :

 < 20301 >
 Free slot : 2 (20302 threads)
 < 20302 >
 Free slot : 5 (20303 threads)
 < 20303 >
 Free slot : 0 (20304 threads)
 Free slot : 6 (20305 threads)
 < 20304 >
 < 20305 >
 Free slot : 4 (20306 threads)
 < 20306 >
 Free slot : 2 (20307 threads)
 < 20307 >
 Free slot : 5 (20308 threads)
 < 20308 >
 Free slot : 0 (20309 threads)
 Free slot : 6 (20310 threads)
 < 20310 >
 < 20309 >
 Free slot : 4 (20311 threads)
 < 20311 >
 Free slot : 2 (20312 threads)
 < 20312 >
 Free slot : 5 (20313 threads)
 < 20313 >
 Free slot : 0 (20314 threads)
 Free slot : 6 (20315 threads)
 < 20314 >
 < 20315 >
 Free slot : 4 (20316 threads)
 < 20316 >
 Free slot : 2 (20317 threads)
 < 20317 >
 Free slot : 5 (20318 threads)
 < 20318 >
 Free slot : 0 (20319 threads)
 Free slot : 6 (20320 threads) <- only 5 active processes
 < 20320 >
 < 20319 >
 ....
 Free slot : 0 (31041 threads)
 < 31041 >
 Free slot : 2 (31042 threads)
 Free slot : 5 (31043 threads)
 Free slot : 6 (31044 threads) <- only 4
 < 31043 >
 < 31042 >
 < 31044 >
 Free slot : 0 (31045 threads)
 < 31045 >
 Free slot : 2 (31046 threads)
 Free slot : 5 (31047 threads)
 Free slot : 6 (31048 threads)
 < 31047 >
 < 31048 >
 < 31046 >
 Free slot : 0 (31049 threads)
 < 31049 >
 Free slot : 2 (31050 threads)
 Free slot : 5 (31051 threads)
 Free slot : 6 (31052 threads)
 < 31052 >
 < 31051 >
 < 31050 >
 Free slot : 0 (31053 threads)
 < 31053 >
 Free slot : 2 (31054 threads)
 Free slot : 5 (31055 threads)
 Free slot : 6 (31056 threads)
 < 31055 >
 < 31054 >
 < 31056 >
 Free slot : 0 (31057 threads)
 < 31057 >
 Free slot : 2 (31058 threads)
 Free slot : 5 (31059 threads)
 ...
 Free slot : 2 (214116 threads)
 Free slot : 5 (214117 threads)
 < 214116 >
 < 214117 >
 Free slot : 2 (214118 threads)
 Free slot : 5 (214119 threads)
 < 214118 >
 < 214119 >
 Free slot : 2 (214120 threads)
 Free slot : 5 (214121 threads)
 < 214120 >
 < 214121 >
 Free slot : 2 (214122 threads)
 Free slot : 5 (214123 threads)
 < 214122 >
 < 214123 >
 Free slot : 2 (214124 threads) <- only 2

 	Some childs seem to be blocked :

 schwarz# ps auwx | grep semaphore | grep -v grep
 root      4544  0.0  0.0 79096   404 ttyp0  I+    9:08AM 0:00.00
 ./semaphore
 root      5678  0.0  0.0 79096   400 ttyp0  I+    8:59AM 0:00.00
 ./semaphore
 root      6754  0.0  0.0 79096   400 ttyp0  I+    9:01AM 0:00.00
 ./semaphore
 root      7623  0.0  0.0 79096  1564 ttyp0  Sl+   8:58AM 2:33.14
 ./semaphore
 root     10765  0.0  0.0 79096   392 ttyp0  I+    8:59AM 0:00.00
 ./semaphore
 root     24954  0.0  0.0 79096   396 ttyp0  I+   10:20AM 0:00.00
 ./semaphore
 root     26642  0.0  0.0 79096   396 ttyp0  I+   10:25AM 0:00.00
 ./semaphore
 schwarz#  gdb -p 5678
 GNU gdb (GDB) 7.12
 Copyright (C) 2016 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later
 <http://gnu.org/licenses/gpl.html>
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
 and "show warranty" for details.
 This GDB was configured as "x86_64--netbsd".
 Type "show configuration" for configuration details.
 For bug reporting instructions, please see:
 <http://www.gnu.org/software/gdb/bugs/>.
 Find the GDB manual and other documentation resources online at:
 <http://www.gnu.org/software/gdb/documentation/>.
 For help, type "help".
 Type "apropos word" to search for commands related to "word".
 Attaching to process 5678
 Reading symbols from /root/./semaphore...done.
 Reading symbols from /usr/lib/libpthread.so.1...(no debugging symbols
 found)...done.
 Reading symbols from /usr/lib/libc.so.12...(no debugging symbols
 found)...done.
 Reading symbols from /usr/libexec/ld.elf_so...(no debugging symbols
 found)...done.
 [Switching to LWP 1]
 0x00007f7e5c00ae3a in ___lwp_park60 () from /usr/libexec/ld.elf_so
 (gdb) bt
 #0  0x00007f7e5c00ae3a in ___lwp_park60 () from /usr/libexec/ld.elf_so
 #1  0x00007f7e5c001db8 in _rtld_shared_enter () from /usr/libexec/ld.elf_so
 #2  0x00007f7e5c000b17 in _rtld_bind () from /usr/libexec/ld.elf_so
 #3  0x00007f7e5c0007cd in _rtld_bind_start () from /usr/libexec/ld.elf_so
 #4  0x0000000000000246 in ?? ()
 #5  0x0000798a4243eb0a in _lwp_ctl () from /usr/lib/libc.so.12
 #6  0x0000000000400000 in ?? ()
 #7  0x00000000000000a8 in ?? ()
 #8  0x0000000000000018 in ?? ()
 #9  0x0000000000000000 in ?? ()
 (gdb)

 	It is possible I have misunderstood something in process and thread
 management...

 	Best regards,

 	JB

 --------------C8161B390C4FAED3061F6A9A
 Content-Type: text/x-csrc;
  name="semaphore.c"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="semaphore.c"

 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <stdbool.h>
 #include <semaphore.h>
 #include <pthread.h>
 #include <string.h>
 #include <sys/wait.h>

 #define NUMBER_OF_THREADS	8

 typedef struct
 {
 	pthread_t		tid;
 	volatile bool	active;
 	long long		i;
 	sem_t			s;
 } thread_arg_t;

 thread_arg_t targ[NUMBER_OF_THREADS];

 void *
 thread(void *arg)
 {
 	char			**arguments;

 	int				status;
 	pid_t			pid;
 	thread_arg_t	*ta = arg;

 	if (sem_init(&(ta->s), 0, 0) != 0)
 	{
 		perror("sem_init()");
 		_exit(EXIT_FAILURE);
 	}

 	pid = fork();

 	if (pid < 0)
 	{
 		perror("fork()");
 		_exit(EXIT_FAILURE);
 	}
 	else if (pid == 0)
 	{
 		// Child

 		if ((arguments = malloc(3 * sizeof(char *))) == NULL)
 		{
 			perror("malloc()");
 			_exit(EXIT_FAILURE);
 		}

 		if ((arguments[0] = malloc(5 * sizeof(char *))) == NULL)
 		{
 			perror("malloc(0)");
 			_exit(EXIT_FAILURE);
 		}

 		if ((arguments[1] = malloc(40 * sizeof(char *))) == NULL)
 		{
 			perror("malloc(1)");
 			_exit(EXIT_FAILURE);
 		}

 		strcpy(arguments[0], "echo");
 		sprintf(arguments[1], "< %lld >", ta->i);
 		arguments[2] = NULL;

 		execvp(arguments[0], arguments);
 		perror("execvp()");
 		_exit(EXIT_FAILURE);
 	}

 	if (waitpid(pid, &status, 0) == -1)
 	{
 		perror("waitpid()");
 		_exit(EXIT_FAILURE);
 	}

 	usleep(100000);

 	if (sem_destroy(&(ta->s)) != 0)
 	{
 		perror("sem_destroy()");
 		_exit(EXIT_FAILURE);
 	}

 	ta->active = false;
 	pthread_exit(NULL);
 }

 int
 main()
 {
 	int			i;
 	long long	nt;

 	for(nt = 0;;)
 	{
 		for(i = 0; i < NUMBER_OF_THREADS; i++)
 		{
 			if (targ[i].active == false)
 			{
 				break;
 			}
 		}

 		if (i < NUMBER_OF_THREADS)
 		{
 			printf("Free slot : %d (%lld threads)\n", i, ++nt);
 			targ[i].active = true;
 			targ[i].i = nt;

 			if (pthread_create(&(targ[i].tid), NULL, thread, &(targ[i])) != 0)
 			{
 				perror("pthread_create()");
 				exit(EXIT_FAILURE);
 			}

 			if (pthread_detach(targ[i].tid) != 0)
 			{
 				perror("pthread_detach()");
 				exit(EXIT_FAILURE);
 			}
 		}
 		else
 		{
 			usleep(1000);
 		}
 	}

 	exit(EXIT_SUCCESS);
 }

 --------------C8161B390C4FAED3061F6A9A--

From: Havard Eidnes <he@NetBSD.org>
To: joel.bertrand@systella.fr
Cc: christos@zoulas.com, gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: PR/53998 CVS commit: src/sys
Date: Sun, 10 Mar 2019 13:09:10 +0100 (CET)

 > Free slot : 3 (468 threads)
 > Free slot : 4 (469 threads)
 > Free slot : 5 (470 threads)
 > Free slot : 6 (471 threads)
 > dead lock detected   <-------------- ?

 This message comes from ld.elf_so.  It may be spuriously seen
 during a rust build (which aborts the build in that case).

 Joerg says about this:

   Multi-threaded programs forking in one thread while another
   thread holds an exclusive lock would be required. It might be
   necessary for the other thread to be the main thread, so that
   _lwp_self() will be the same LWP ID, but that's just
   speculation at this point.

 Don't know if that brings you closer to figuring out the root
 cause.

 Regards,

 - H=E5vard

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys
Date: Wed, 8 May 2019 11:33:25 +0200

 Havard Eidnes a écrit :
 > The following reply was made to PR kern/53998; it has been noted by GNATS.
 > 
 > From: Havard Eidnes <he@NetBSD.org>
 > To: joel.bertrand@systella.fr
 > Cc: christos@zoulas.com, gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
 >  gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
 > Subject: Re: PR/53998 CVS commit: src/sys
 > Date: Sun, 10 Mar 2019 13:09:10 +0100 (CET)
 > 
 >  > Free slot : 3 (468 threads)
 >  > Free slot : 4 (469 threads)
 >  > Free slot : 5 (470 threads)
 >  > Free slot : 6 (471 threads)
 >  > dead lock detected   <-------------- ?
 >  
 >  This message comes from ld.elf_so.  It may be spuriously seen
 >  during a rust build (which aborts the build in that case).
 >  
 >  Joerg says about this:
 >  
 >    Multi-threaded programs forking in one thread while another
 >    thread holds an exclusive lock would be required. It might be
 >    necessary for the other thread to be the main thread, so that
 >    _lwp_self() will be the same LWP ID, but that's just
 >    speculation at this point.
 >  
 >  Don't know if that brings you closer to figuring out the root
 >  cause.
 >  
 >  Regards,
 >  
 >  - H=E5vard
 >  
 > 

 	I have added an exclusive lock (mutex) to protect fork(). I haven't
 seen this message anymore but some threads or process still stall.

 	Regards,

 	JKB

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: gnats-bugs@netbsd.org, netbsd-bugs@netbsd.org, kern-bug-people@netbsd.org
Cc: 
Subject: Re: PR/53998 CVS commit: src/sys
Date: Wed, 17 Jul 2019 10:19:35 +0200

 	Hello,

 	Did anyone test the program I have sent ? I would be curious to know if
 someone can reproduce this bug. I have tried to fix it without success.

 	Best regards,

 	JB

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.