NetBSD Problem Report #58091
From www@netbsd.org Sat Mar 30 13:31:46 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 95CA11A923B
for <gnats-bugs@gnats.NetBSD.org>; Sat, 30 Mar 2024 13:31:46 +0000 (UTC)
Message-Id: <20240330133145.957871A923C@mollari.NetBSD.org>
Date: Sat, 30 Mar 2024 13:31:45 +0000 (UTC)
From: michael.dusan@gmail.com
Reply-To: michael.dusan@gmail.com
To: gnats-bugs@NetBSD.org
Subject: after fork/execve or posix_spawn, parent kill(child, SIGTERM) has race condition making it unreliable
X-Send-Pr-Version: www-1.0
>Number: 58091
>Category: kern
>Synopsis: after fork/execve or posix_spawn, parent kill(child, SIGTERM) has race condition making it unreliable
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Mar 30 13:35:00 +0000 2024
>Originator: Michael Dusan
>Release:
>Organization:
Zig Software Foundation
>Environment:
NetBSD netbsd100-amd64 10.0_RC6 NetBSD 10.0_RC6 (GENERIC) #0: Tue Mar 12 10:19:02 UTC 2024 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
NetBSD netbsd93-amd64 9.3 NetBSD 9.3 (GENERIC) #0: Thu Aug 4 15:30:37 UTC 2022 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
Fork/exec a child and first action of parent, send SIGTERM to child and ~3 out of million times the signal is never received by child.
Variant using posix_spawn tends to manifest much more frequently on netbsd 10.0 RC6, and more frequently on netbsd 9.3 .
Unable to reproduce this bug on archlinux, macos 14.0, freebsd 14.4,, openbsd 7.4, dragonfly 6.4 .
Using ktrace, I was able to see the bug (with the motivating .zig programming language code for this bug report) much more frequently and observed that the closer parent `kill()` call is in ktrace output to the child calling `execve()`, ie: immediately preceding, this bug manifests.
It seems that the signal is lost somewhere in kernel execve preparation.
>How-To-Repeat:
0. caution: running this bug may hose the system. In another incarnation it would end my ssh session (and other sessions to same netbsd system), requiring a reboot
1. see affixed but.c code
2. cc -o bug bug.c
3. in shell `repeat 1000 ./bug`
4. over time, the output "whups" indicates child did not end due to signal
5. it sometimes help to busy the sytem, eg. concurrently run step #3 in another shell
6. I usually observe 2 or 3 "whups" per invocation
7. testing env 1: qemu VM netbsd 10.0_RC6 as "8 core" guest
8. testing env 2: qemu VM netbsd 9.3 amd64 as "8 core" guest
9. VM host: archlinux, AMD Ryzen 9 7900X 12-Core Processor
///////////////////////////////////////////////////////////////////////////////
// bug.c
#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
void doit() {
pid_t pid = fork();
if (pid == 0) {
char *argv[] = { "sleep", "10", NULL };
int res = execve("/bin/sleep", argv, NULL);
} else {
// we are parent
if (kill(pid, SIGTERM) == -1) {
fprintf(stderr, "kill: errno=%d\n", errno);
return;
}
int status;
if (waitpid(pid, &status, 0) == -1) {
fprintf(stderr, "kill: errno=%d\n", errno);
return;
}
if (!WIFSIGNALED(status)) {
fprintf(stderr, "whups!\n");
}
}
}
int main() {
for (int i = 0; i < 1000; i++) {
doit();
}
}
///////////////////////////////////////////////////////////////////////////////
// bug_posix.c
// this variant uses `posix_spawn()` instead of fork/execve
// here it's set to do 1 million iterations
// netbsd 10.0_RC3 emits "whups" over a hundred times on average
// netbsd 9.3 emits "whups" maybe 20 times on average
#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <spawn.h>
#include <sys/wait.h>
void doit() {
char *argv[] = { "sleep", "1", NULL };
pid_t pid;
if (posix_spawn(&pid, "/bin/sleep", NULL, NULL, argv, NULL) == -1) {
fprintf(stderr, "posix_spawn: errno=%d\n", errno);
return;
}
if (kill(pid, SIGTERM) == -1) {
fprintf(stderr, "kill: errno=%d\n", errno);
return;
}
int status;
if (waitpid(pid, &status, 0) == -1) {
fprintf(stderr, "kill: errno=%d\n", errno);
return;
}
if (!WIFSIGNALED(status)) {
fprintf(stderr, "whups!\n");
}
}
int main() {
for (int i = 0; i < 1000000; i++) {
doit();
}
}
>Fix:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.