NetBSD Problem Report #50308
From paul@pokey.whooppee.com Tue Oct 6 23:55:05 2015
Return-Path: <paul@pokey.whooppee.com>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 4EE9BA5674
for <gnats-bugs@gnats.NetBSD.org>; Tue, 6 Oct 2015 23:55:05 +0000 (UTC)
Message-Id: <20151006235401.19E5816E74@pokey.whooppee.com>
Date: Wed, 7 Oct 2015 07:54:01 +0800 (PHT)
From: paul@whooppee.com
Reply-To: paul@whooppee.com
To: gnats-bugs@NetBSD.org
Subject: PS_STOPEXIT processes don't correctly handle p_nstopchild
X-Send-Pr-Version: 3.95
>Number: 50308
>Category: kern
>Synopsis: PS_STOPEXIT processes don't correctly handle p_nstopchild
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: pgoyette
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Oct 07 00:00:00 +0000 2015
>Closed-Date: Sat Nov 07 22:18:56 +0000 2015
>Last-Modified: Sat Nov 07 22:18:56 +0000 2015
>Originator: paul@whooppee.com
>Release: NetBSD 7.99.21
>Organization:
+------------------+--------------------------+-------------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+------------------+--------------------------+-------------------------+
>Environment:
System: NetBSD pokey.whooppee.com 7.99.21 NetBSD 7.99.21 (POKEY 2015-09-20 02:09:58) #0: Sun Sep 20 14:01:44 PHT 2015 paul@pokey.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/POKEY amd64
Architecture: x86_64
Machine: amd64
>Description:
In src/sys/kern/kern_exit.c, processes flagged with PS_STOPEXIT
do not update the parent process's p_nstopchild counter. So
the parent process might not process its entire p_children list
and thus might not reap all of its exitted children.
>How-To-Repeat:
Found by code-inspection.
>Fix:
Index: kern/kern_exit.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_exit.c,v
retrieving revision 1.245
diff -u -p -r1.245 kern_exit.c
--- kern/kern_exit.c 2 Oct 2015 16:54:15 -0000 1.245
+++ kern/kern_exit.c 6 Oct 2015 11:38:04 -0000
@@ -227,7 +227,10 @@ exit1(struct lwp *l, int rv)
if (__predict_false(p->p_sflag & PS_STOPEXIT)) {
KERNEL_UNLOCK_ALL(l, &l->l_biglocks);
sigclearall(p, &contsigmask, &kq);
+ mutex_enter(proc_lock);
p->p_waited = 0;
+ p->p_pptr->p_nstopchild++;
+ mutex_exit(proc_lock);
membar_producer();
p->p_stat = SSTOP;
lwp_lock(l);
>Release-Note:
>Audit-Trail:
From: Paul Goyette <paul@vps1.whooppee.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/50308: PS_STOPEXIT processes don't correctly handle
b_nstopchild
Date: Fri, 9 Oct 2015 12:40:42 +0800 (PHT)
The previously suggested patch is not quite correct, since there could
be a deadlock waiting for the proc_lock mutex. The following patch
addresses this deadlock:
diff -u -p -r1.245 kern_exit.c
--- kern_exit.c 2 Oct 2015 16:54:15 -0000 1.245
+++ kern_exit.c 9 Oct 2015 04:35:42 -0000
@@ -227,7 +227,15 @@ exit1(struct lwp *l, int rv)
if (__predict_false(p->p_sflag & PS_STOPEXIT)) {
KERNEL_UNLOCK_ALL(l, &l->l_biglocks);
sigclearall(p, &contsigmask, &kq);
+
+ if (!mutex_tryenter(proc_lock)) {
+ mutex_exit(p->p_lock);
+ mutex_enter(proc_lock);
+ mutex_enter(p->p_lock);
+ }
p->p_waited = 0;
+ p->p_pptr->p_nstopchild++;
+ mutex_exit(proc_lock);
membar_producer();
p->p_stat = SSTOP;
lwp_lock(l);
Additionally, I question whether or not the call to membar_producer()
is still required here, after insert the appropriate mutex code?
+------------------+--------------------------+-------------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+------------------+--------------------------+-------------------------+
Responsible-Changed-From-To: kern-bug-people->pgoyette
Responsible-Changed-By: pgoyette@NetBSD.org
Responsible-Changed-When: Sun, 11 Oct 2015 23:31:34 +0000
Responsible-Changed-Why:
I have the fix.
From: "Paul Goyette" <pgoyette@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/50308 CVS commit: src/sys/kern
Date: Tue, 13 Oct 2015 00:28:22 +0000
Module Name: src
Committed By: pgoyette
Date: Tue Oct 13 00:28:22 UTC 2015
Modified Files:
src/sys/kern: kern_exit.c
Log Message:
For processes marked with PS_STOPEXIT, update the process's p_waited
value, and update its parent's p_nstopchild value when marking the
process's p_stat to SSTOP. The process needed to be SACTIVE to get
here, so this transition represents an additional process for which
the parent needs to wait.
Fixes PR kern/50308
Pullups will be requested for:
NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
To generate a diff of this commit:
cvs rdiff -u -r1.246 -r1.247 src/sys/kern/kern_exit.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->pending-pullups
State-Changed-By: pgoyette@NetBSD.org
State-Changed-When: Tue, 13 Oct 2015 01:35:12 +0000
State-Changed-Why:
Committed to head, pending pullups
State-Changed-From-To: pending-pullups->closed
State-Changed-By: pgoyette@NetBSD.org
State-Changed-When: Sat, 07 Nov 2015 22:18:56 +0000
State-Changed-Why:
Fixed, pullups completed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.