NetBSD Problem Report #55510
From www@netbsd.org Tue Jul 21 20:37:23 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id E0AE71A9213
for <gnats-bugs@gnats.NetBSD.org>; Tue, 21 Jul 2020 20:37:23 +0000 (UTC)
Message-Id: <20200721203723.0445B1A9217@mollari.NetBSD.org>
Date: Tue, 21 Jul 2020 20:37:23 +0000 (UTC)
From: n54@gmx.com
Reply-To: n54@gmx.com
To: gnats-bugs@NetBSD.org
Subject: pg_jobc going negative and crashing the kernel
X-Send-Pr-Version: www-1.0
>Number: 55510
>Category: kern
>Synopsis: pg_jobc going negative and crashing the kernel
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jul 21 20:40:00 +0000 2020
>Last-Modified: Tue Jun 28 16:52:22 +0000 2022
>Originator: Kamil Rytarowski
>Release: NetBSD 9.99.69 amd64
>Organization:
The NetBSD Foundation, Inc.
>Environment:
NetBSD chieftec 9.99.69 NetBSD 9.99.69 (GENERIC) #0: Tue Jul 21 21:32:24 CEST 2020 root@chieftec:/public/netbsd-root/sys/arch/amd64/compile/GENERIC amd64
>Description:
pg_jobc counts the number of processes with parent controlling terminal.
Under certain circumstances it can go negative.
>How-To-Repeat:
https://www.netbsd.org/~kamil/ptrace/pg_jobc-crash.c
>Fix:
N/A
>Release-Note:
>Audit-Trail:
here is the history of freebsd working on and fixing this bug:
commit 5844bd058aed6f3d0c8cbbddd6aa95993ece0189
Author: Konstantin Belousov <kib@FreeBSD.org>
Date: Tue Dec 29 02:41:56 2020 +0200
jobc: rework detection of orphaned groups.
Instead of trying to maintain pg_jobc counter on each process group
update (and sometimes before), just calculate the counter when needed.
Still, for the benefit of the signal delivery code, explicitly mark
orphaned groups as such with the new process group flag.
This way we prevent bugs in the corner cases where updates to the counter
were missed due to complicated configuration of p_pptr/p_opptr/real_parent
(debugger).
Since we need to iterate over all children of the process on exit, this
change mostly affects the process group entry and leave, where we need
to iterate all process group members to detect orpaned status.
(For MFC, keep pg_jobc around but unused).
Reported by: jhb
Reviewed by: jilles
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27871
commit c5bc28b27333e892b44e767c6188c214f09d5e7c
Author: Konstantin Belousov <kib@FreeBSD.org>
Date: Sat Aug 22 21:32:11 2020 +0000
Fix several issues with process group orphanage.
Attempt of adding assertions that pgrp->pg_jobc counters do not
underflow in r361967, reverted in r362910, points out bugs in the
handling of job control. Peter Holm was able to narrow down the
problem to very easy reproduction with timeout(1) which uses reaping.
The following list of problems with calculation of pg_jobs which
directs SIGHUP/SIGCONT delivery for orphaned process group was
identified:
- Re-calculation of the orphaned status for children of exiting parent
was wrong, but mostly unnoticed when all children were reparented to
init(8). When child can be reparented to a different process which
could affect the child' job control state, it was not properly
accounted for in pg_jobc.
- Lockless check for exiting process' parent process group is racy
because nothing prevents the parent from changing its group
membership.
- Exited process is left in the process group, until waited. This
affects other calculations of pg_jobc.
Split handling of job control status on process changing its process
group, and process exiting. Calculate increments and decrements for
pg_jobs by exact checking the orphanage instead of assuming process
group membership for children and parent. Move the call to killjobc()
later under the proctree_lock. Mark exiting process in killjobc()
with a new flag P_TREE_GRPEXITED and skip it for all pg_jobc
calculations after the flag is set.
Add checker that independently recalculates pg_jobc value and compares
it with the memoized process group state. This is enabled under INVARIANTS.
Reviewed by: jilles
Discussed with: kevans
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D26116
Notes:
svn path=/head/; revision=364495
commit 58199a70529d2e1a1cbcd0d86dc5877b4e45f48e
Author: Mateusz Guzik <mjg@FreeBSD.org>
Date: Fri Jul 3 09:23:11 2020 +0000
ifdef out pg_jobc assertions added in r361967
They trigger for some people, the bug is not obvious, there are no takers
for fixing it, the issue already had to be there for years beforehand and
is low priority.
Notes:
svn path=/head/; revision=362910
commit 90a08d6cad6f761a4fd91d5ac16382b1ad705dcf
Author: Mateusz Guzik <mjg@FreeBSD.org>
Date: Tue Jun 9 15:17:23 2020 +0000
Assert on pg_jobc state.
Stolen from NetBSD.
Notes:
svn path=/head/; revision=361967
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.