NetBSD Problem Report #50308

From paul@pokey.whooppee.com  Tue Oct  6 23:55:05 2015
Return-Path: <paul@pokey.whooppee.com>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 4EE9BA5674
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  6 Oct 2015 23:55:05 +0000 (UTC)
Message-Id: <20151006235401.19E5816E74@pokey.whooppee.com>
Date: Wed,  7 Oct 2015 07:54:01 +0800 (PHT)
From: paul@whooppee.com
Reply-To: paul@whooppee.com
To: gnats-bugs@NetBSD.org
Subject: PS_STOPEXIT processes don't correctly handle p_nstopchild
X-Send-Pr-Version: 3.95

>Number:         50308
>Category:       kern
>Synopsis:       PS_STOPEXIT processes don't correctly handle p_nstopchild
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    pgoyette
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Oct 07 00:00:00 +0000 2015
>Closed-Date:    Sat Nov 07 22:18:56 +0000 2015
>Last-Modified:  Sat Nov 07 22:18:56 +0000 2015
>Originator:     paul@whooppee.com
>Release:        NetBSD 7.99.21
>Organization:
+------------------+--------------------------+-------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+------------------+--------------------------+-------------------------+
>Environment:


System: NetBSD pokey.whooppee.com 7.99.21 NetBSD 7.99.21 (POKEY 2015-09-20 02:09:58) #0: Sun Sep 20 14:01:44 PHT 2015 paul@pokey.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/POKEY amd64
Architecture: x86_64
Machine: amd64
>Description:
	In src/sys/kern/kern_exit.c, processes flagged with PS_STOPEXIT
	do not update the parent process's p_nstopchild counter.  So
	the parent process might not process its entire p_children list
	and thus might not reap all of its exitted children.

>How-To-Repeat:
	Found by code-inspection.

>Fix:
Index: kern/kern_exit.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_exit.c,v
retrieving revision 1.245
diff -u -p -r1.245 kern_exit.c
--- kern/kern_exit.c    2 Oct 2015 16:54:15 -0000       1.245
+++ kern/kern_exit.c    6 Oct 2015 11:38:04 -0000
@@ -227,7 +227,10 @@ exit1(struct lwp *l, int rv)
 	if (__predict_false(p->p_sflag & PS_STOPEXIT)) {
 		KERNEL_UNLOCK_ALL(l, &l->l_biglocks);
 		sigclearall(p, &contsigmask, &kq);
+		mutex_enter(proc_lock);
 		p->p_waited = 0;
+		p->p_pptr->p_nstopchild++;
+		mutex_exit(proc_lock);
 		membar_producer();
 		p->p_stat = SSTOP;
 		lwp_lock(l);



>Release-Note:

>Audit-Trail:

From: Paul Goyette <paul@vps1.whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/50308: PS_STOPEXIT processes don't correctly handle
 b_nstopchild
Date: Fri, 9 Oct 2015 12:40:42 +0800 (PHT)

 The previously suggested patch is not quite correct, since there could
 be a deadlock waiting for the proc_lock mutex.  The following patch
 addresses this deadlock:

 diff -u -p -r1.245 kern_exit.c
 --- kern_exit.c 2 Oct 2015 16:54:15 -0000       1.245
 +++ kern_exit.c 9 Oct 2015 04:35:42 -0000
 @@ -227,7 +227,15 @@ exit1(struct lwp *l, int rv)
   	if (__predict_false(p->p_sflag & PS_STOPEXIT)) {
   		KERNEL_UNLOCK_ALL(l, &l->l_biglocks);
   		sigclearall(p, &contsigmask, &kq);
 +
 +		if (!mutex_tryenter(proc_lock)) {
 +			mutex_exit(p->p_lock);
 +			mutex_enter(proc_lock);
 +			mutex_enter(p->p_lock);
 +		}
   		p->p_waited = 0;
 +		p->p_pptr->p_nstopchild++;
 +		mutex_exit(proc_lock);
   		membar_producer();
   		p->p_stat = SSTOP;
   		lwp_lock(l);

 Additionally, I question whether or not the call to membar_producer()
 is still required here, after insert the appropriate mutex code?



 +------------------+--------------------------+-------------------------+
 | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
 | (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
 | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
 +------------------+--------------------------+-------------------------+

Responsible-Changed-From-To: kern-bug-people->pgoyette
Responsible-Changed-By: pgoyette@NetBSD.org
Responsible-Changed-When: Sun, 11 Oct 2015 23:31:34 +0000
Responsible-Changed-Why:
I have the fix.


From: "Paul Goyette" <pgoyette@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/50308 CVS commit: src/sys/kern
Date: Tue, 13 Oct 2015 00:28:22 +0000

 Module Name:	src
 Committed By:	pgoyette
 Date:		Tue Oct 13 00:28:22 UTC 2015

 Modified Files:
 	src/sys/kern: kern_exit.c

 Log Message:
 For processes marked with PS_STOPEXIT, update the process's p_waited
 value, and update its parent's p_nstopchild value when marking the
 process's p_stat to SSTOP.  The process needed to be SACTIVE to get
 here, so this transition represents an additional process for which
 the parent needs to wait.

 Fixes PR kern/50308

 Pullups will be requested for:

        NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


 To generate a diff of this commit:
 cvs rdiff -u -r1.246 -r1.247 src/sys/kern/kern_exit.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->pending-pullups
State-Changed-By: pgoyette@NetBSD.org
State-Changed-When: Tue, 13 Oct 2015 01:35:12 +0000
State-Changed-Why:
Committed to head, pending pullups


State-Changed-From-To: pending-pullups->closed
State-Changed-By: pgoyette@NetBSD.org
State-Changed-When: Sat, 07 Nov 2015 22:18:56 +0000
State-Changed-Why:
Fixed, pullups completed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.