NetBSD Problem Report #55510

From www@netbsd.org  Tue Jul 21 20:37:23 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E0AE71A9213
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 21 Jul 2020 20:37:23 +0000 (UTC)
Message-Id: <20200721203723.0445B1A9217@mollari.NetBSD.org>
Date: Tue, 21 Jul 2020 20:37:23 +0000 (UTC)
From: n54@gmx.com
Reply-To: n54@gmx.com
To: gnats-bugs@NetBSD.org
Subject: pg_jobc going negative and crashing the kernel
X-Send-Pr-Version: www-1.0

>Number:         55510
>Category:       kern
>Synopsis:       pg_jobc going negative and crashing the kernel
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jul 21 20:40:00 +0000 2020
>Last-Modified:  Tue Jun 28 16:52:22 +0000 2022
>Originator:     Kamil Rytarowski
>Release:        NetBSD 9.99.69 amd64
>Organization:
The NetBSD Foundation, Inc.
>Environment:
NetBSD chieftec 9.99.69 NetBSD 9.99.69 (GENERIC) #0: Tue Jul 21 21:32:24 CEST 2020  root@chieftec:/public/netbsd-root/sys/arch/amd64/compile/GENERIC amd64
>Description:
pg_jobc counts the number of processes with parent controlling terminal.

Under certain circumstances it can go negative.
>How-To-Repeat:
https://www.netbsd.org/~kamil/ptrace/pg_jobc-crash.c
>Fix:
N/A

>Release-Note:

>Audit-Trail:

here is the history of freebsd working on and fixing this bug:

commit 5844bd058aed6f3d0c8cbbddd6aa95993ece0189
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Tue Dec 29 02:41:56 2020 +0200

    jobc: rework detection of orphaned groups.

    Instead of trying to maintain pg_jobc counter on each process group
    update (and sometimes before), just calculate the counter when needed.
    Still, for the benefit of the signal delivery code, explicitly mark
    orphaned groups as such with the new process group flag.

    This way we prevent bugs in the corner cases where updates to the counter
    were missed due to complicated configuration of p_pptr/p_opptr/real_parent
    (debugger).

    Since we need to iterate over all children of the process on exit, this
    change mostly affects the process group entry and leave, where we need
    to iterate all process group members to detect orpaned status.

    (For MFC, keep pg_jobc around but unused).

    Reported by:    jhb
    Reviewed by:    jilles
    Tested by:      pho
    MFC after:      2 weeks
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D27871

commit c5bc28b27333e892b44e767c6188c214f09d5e7c
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Sat Aug 22 21:32:11 2020 +0000

    Fix several issues with process group orphanage.

    Attempt of adding assertions that pgrp->pg_jobc counters do not
    underflow in r361967, reverted in r362910, points out bugs in the
    handling of job control.  Peter Holm was able to narrow down the
    problem to very easy reproduction with timeout(1) which uses reaping.

    The following list of problems with calculation of pg_jobs which
    directs SIGHUP/SIGCONT delivery for orphaned process group was
    identified:
    - Re-calculation of the orphaned status for children of exiting parent
      was wrong, but mostly unnoticed when all children were reparented to
      init(8).  When child can be reparented to a different process which
      could affect the child' job control state, it was not properly
      accounted for in pg_jobc.
    - Lockless check for exiting process' parent process group is racy
      because nothing prevents the parent from changing its group
      membership.
    - Exited process is left in the process group, until waited. This
      affects other calculations of pg_jobc.

    Split handling of job control status on process changing its process
    group, and process exiting.  Calculate increments and decrements for
    pg_jobs by exact checking the orphanage instead of assuming process
    group membership for children and parent.  Move the call to killjobc()
    later under the proctree_lock.  Mark exiting process in killjobc()
    with a new flag P_TREE_GRPEXITED and skip it for all pg_jobc
    calculations after the flag is set.

    Add checker that independently recalculates pg_jobc value and compares
    it with the memoized process group state. This is enabled under INVARIANTS.

    Reviewed by:    jilles
    Discussed with: kevans
    Tested by:      pho
    Sponsored by:   The FreeBSD Foundation
    MFC after:      2 weeks
    Differential revision:  https://reviews.freebsd.org/D26116

Notes:
    svn path=/head/; revision=364495

commit 58199a70529d2e1a1cbcd0d86dc5877b4e45f48e
Author: Mateusz Guzik <mjg@FreeBSD.org>
Date:   Fri Jul 3 09:23:11 2020 +0000

    ifdef out pg_jobc assertions added in r361967

    They trigger for some people, the bug is not obvious, there are no takers
    for fixing it, the issue already had to be there for years beforehand and
    is low priority.

Notes:
    svn path=/head/; revision=362910

commit 90a08d6cad6f761a4fd91d5ac16382b1ad705dcf
Author: Mateusz Guzik <mjg@FreeBSD.org>
Date:   Tue Jun 9 15:17:23 2020 +0000

    Assert on pg_jobc state.

    Stolen from NetBSD.

Notes:
    svn path=/head/; revision=361967



>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.