NetBSD Problem Report #55241
From gson@gson.org Thu May 7 11:15:00 2020
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 08B391A9213
for <gnats-bugs@gnats.NetBSD.org>; Thu, 7 May 2020 11:14:59 +0000 (UTC)
Message-Id: <20200507111454.5F890253F45@guava.gson.org>
Date: Thu, 7 May 2020 14:14:54 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Many t_ptrace_wait* test cases fail since Apr 16
X-Send-Pr-Version: 3.95
>Number: 55241
>Category: lib
>Synopsis: Many t_ptrace_wait* test cases fail since Apr 16
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kamil
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu May 07 11:15:01 +0000 2020
>Closed-Date: Sun May 17 17:46:56 +0000 2020
>Last-Modified: Mon May 25 17:10:01 +0000 2020
>Originator: Andreas Gustafsson
>Release: NetBSD-current, source date >= 2020.04.16.14.39.58
>Organization:
>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:
On Apr 18, an automated report was sent to current-users, saying:
> The newly failing test cases are:
>
> lib/libc/sys/t_ptrace_wait4:fork10
> lib/libc/sys/t_ptrace_wait4:fork2
> lib/libc/sys/t_ptrace_wait4:fork4
> lib/libc/sys/t_ptrace_wait4:fork6
> lib/libc/sys/t_ptrace_wait4:fork8
> lib/libc/sys/t_ptrace_wait4:fork_singalmasked
> lib/libc/sys/t_ptrace_wait4:unrelated_tracer_fork16
> lib/libc/sys/t_ptrace_wait6:fork14
> lib/libc/sys/t_ptrace_wait6:fork4
> lib/libc/sys/t_ptrace_wait6:fork_singalmasked
> lib/libc/sys/t_ptrace_wait6:unrelated_tracer_fork14
> lib/libc/sys/t_ptrace_wait6:unrelated_tracer_fork2
> lib/libc/sys/t_ptrace_wait6:unrelated_tracer_fork4
> lib/libc/sys/t_ptrace_waitid:fork14
> lib/libc/sys/t_ptrace_waitid:fork2
> lib/libc/sys/t_ptrace_waitid:fork4
> lib/libc/sys/t_ptrace_waitid:fork6
> lib/libc/sys/t_ptrace_waitid:fork8
> lib/libc/sys/t_ptrace_waitid:fork_singalmasked
> lib/libc/sys/t_ptrace_waitid:unrelated_tracer_fork10
> lib/libc/sys/t_ptrace_waitid:unrelated_tracer_fork12
> lib/libc/sys/t_ptrace_waitid:unrelated_tracer_fork8
> lib/libc/sys/t_ptrace_waitpid:fork10
> lib/libc/sys/t_ptrace_waitpid:fork12
> lib/libc/sys/t_ptrace_waitpid:fork14
> lib/libc/sys/t_ptrace_waitpid:fork16
> lib/libc/sys/t_ptrace_waitpid:fork2
> lib/libc/sys/t_ptrace_waitpid:fork4
> lib/libc/sys/t_ptrace_waitpid:fork6
> lib/libc/sys/t_ptrace_waitpid:fork8
> lib/libc/sys/t_ptrace_waitpid:fork_singalignored
> lib/libc/sys/t_ptrace_waitpid:fork_singalmasked
> lib/libc/sys/t_ptrace_waitpid:unrelated_tracer_fork2
> lib/libc/sys/t_ptrace_waitpid:unrelated_tracer_fork8
>
[...]
>
> The following commits were made between the last successful test and
> the failed test:
>
> 2020.04.16.14.39.58 joerg src/lib/libc/gen/pthread_atfork.c,v 1.13
> 2020.04.16.14.39.58 joerg src/libexec/ld.elf_so/rtld.c,v 1.204
> 2020.04.16.14.39.58 joerg src/libexec/ld.elf_so/rtld.h,v 1.139
> 2020.04.16.14.39.58 joerg src/libexec/ld.elf_so/symbols.map,v 1.3
The following day, Joerg wrote:
> AFAICT the tests are bad. They fail significantly more often when
> compiled with optimisations than without, strongly suggesting race
> conditions involved. The original change here certainly changes the
> timing and I am aware of one potential bug in it, but that bug is most
> definitely not exercised by the test cases.
These test cases (and other t_ptrace_wait* test cases that were reported
in separate mails) are still failing:
http://releng.netbsd.org/b5reports/i386/2020/2020.05.06.09.18.10/test.html#failed-tcs-summary
Most of them fail only under qemu, but the fork_signalmasked test
cases are also failing on real hardware:
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.05.06.20.40.33/test.html#failed-tcs-summary
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
From: Andreas Gustafsson <gson@gson.org>
To: kamil@NetBSD.org
Cc: gnats-bugs@netbsd.org
Subject: Re: lib/55241: Many t_ptrace_wait* test cases fail since Apr 16
Date: Thu, 7 May 2020 14:22:07 +0300
Hi Kamil,
Do you accept Joerg's assertion that the test failures of lib/55241
happen because "the tests are bad"?
--
Andreas Gustafsson, gson@gson.org
From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: lib/55241: Many t_ptrace_wait* test cases fail since Apr 16
Date: Thu, 7 May 2020 14:12:35 +0200
On Thu, May 07, 2020 at 11:15:01AM +0000, Andreas Gustafsson wrote:
> Most of them fail only under qemu, but the fork_signalmasked test
> cases are also failing on real hardware:
>
> http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.05.06.20.40.33/test.html#failed-tcs-summary
That's fork_singalmasked, just to save others the time.
Joerg
From: Kamil Rytarowski <n54@gmx.com>
To: Andreas Gustafsson <gson@gson.org>, kamil@NetBSD.org
Cc: gnats-bugs@netbsd.org
Subject: Re: lib/55241: Many t_ptrace_wait* test cases fail since Apr 16
Date: Thu, 7 May 2020 14:12:47 +0200
On 07.05.2020 13:22, Andreas Gustafsson wrote:
> Hi Kamil,
>
> Do you accept Joerg's assertion that the test failures of lib/55241
> happen because "the tests are bad"?
>
This claim does not contain any analysis. The same test scenarios are
verified with vfork, clone, posix_spawn.
From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: lib/55241: Many t_ptrace_wait* test cases fail since Apr 16
Date: Thu, 7 May 2020 14:40:23 +0200
On Thu, May 07, 2020 at 11:15:01AM +0000, Andreas Gustafsson wrote:
> These test cases (and other t_ptrace_wait* test cases that were reported
> in separate mails) are still failing:
>
> http://releng.netbsd.org/b5reports/i386/2020/2020.05.06.09.18.10/test.html#failed-tcs-summary
>
> Most of them fail only under qemu, but the fork_signalmasked test
> cases are also failing on real hardware:
>
> http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.05.06.20.40.33/test.html#failed-tcs-summary
AFAICT, the fork_singalmasked tests make invalid assumptions,
essentially:
get signal mask
fork()
set signal mask of child immediately on return to userland
This can't work in a world where the dynamic linker has to protect
against reentrance as it also has to deal with TLS access from signal
handlers. The tests can either defer the check e.g. by self-suspending
the child immediately after fork returns or drop this part, but the
observation they are trying to make is certainly only valid for the
actual fork system call, not the library frontend.
Joerg
From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, Andreas Gustafsson <gson@gson.org>
Subject: Re: lib/55241: Many t_ptrace_wait* test cases fail since Apr 16
Date: Thu, 7 May 2020 14:57:13 +0200
On Thu, May 07, 2020 at 11:30:02AM +0000, Andreas Gustafsson wrote:
> Do you accept Joerg's assertion that the test failures of lib/55241
> happen because "the tests are bad"?
It is important that I claimed that a lot of those test failures
disappear depending on the optimizer settings used. That alone is a
strong indicator of bad test cases, independent of any functional aspect
involved here.
Joerg
From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: lib/55241: Many t_ptrace_wait* test cases fail since Apr 16
Date: Thu, 7 May 2020 15:25:47 +0200
On 07.05.2020 14:57, Joerg Sonnenberger wrote:
> On Thu, May 07, 2020 at 11:30:02AM +0000, Andreas Gustafsson wrote:
>> Do you accept Joerg's assertion that the test failures of lib/55241
>> happen because "the tests are bad"?
>
> It is important that I claimed that a lot of those test failures
> disappear depending on the optimizer settings used. That alone is a
> strong indicator of bad test cases, independent of any functional aspect
> involved here.
>
> Joerg
>
I totally disagree with such reasoning. With such claim any random
kernel race indicates broken userland.
From: Christos Zoulas <christos@zoulas.com>
To: gnats-bugs@netbsd.org
Cc: lib-bug-people@netbsd.org,
gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org,
Andreas Gustafsson <gson@gson.org>
Subject: Re: lib/55241: Many t_ptrace_wait* test cases fail since Apr 16
Date: Thu, 7 May 2020 09:41:12 -0400
--Apple-Mail=_88C3D28E-CBF4-42F0-952B-99E99DEE18EE
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
Do the tests fail if ran with LD_BIND_NOW or linked statically?
christos
--Apple-Mail=_88C3D28E-CBF4-42F0-952B-99E99DEE18EE
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
iF0EARECAB0WIQS+BJlbqPkO0MDBdsRxESqxbLM7OgUCXrQP+AAKCRBxESqxbLM7
OhclAKCq8fiHVZRyHSB1JZIP99hUjdTGTgCggHmXaflWWYKNVH3U7coF0rWiCIE=
=NEX3
-----END PGP SIGNATURE-----
--Apple-Mail=_88C3D28E-CBF4-42F0-952B-99E99DEE18EE--
Responsible-Changed-From-To: lib-bug-people->kamil
Responsible-Changed-By: kamil@NetBSD.org
Responsible-Changed-When: Tue, 12 May 2020 12:02:48 +0200
Responsible-Changed-Why:
Take.
fork_singalmasked already fixed in src/tests/lib/libc/sys/t_ptrace_fork_wait.h r.1.2
From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@netbsd.org, Christos Zoulas <christos@netbsd.org>,
Joerg Sonnenberger <joerg@netbsd.org>, Martin Husemann <martin@netbsd.org>
Cc:
Subject: Re: lib/55241 (Many t_ptrace_wait* test cases fail since Apr 16)
Date: Thu, 14 May 2020 02:36:21 +0200
I think, I have narrowed down the problem to signal races in the
test/kernel and I'm working on a fix.
Christos asked about LD_BIND_NOW, but that one is not related.
From: Joerg Sonnenberger <joerg@bec.de>
To: Kamil Rytarowski <n54@gmx.com>
Cc: gnats-bugs@netbsd.org, Christos Zoulas <christos@netbsd.org>,
Joerg Sonnenberger <joerg@netbsd.org>,
Martin Husemann <martin@netbsd.org>
Subject: Re: lib/55241 (Many t_ptrace_wait* test cases fail since Apr 16)
Date: Thu, 14 May 2020 02:59:24 +0200
On Thu, May 14, 2020 at 02:36:21AM +0200, Kamil Rytarowski wrote:
> I think, I have narrowed down the problem to signal races in the
> test/kernel and I'm working on a fix.
>
> Christos asked about LD_BIND_NOW, but that one is not related.
LD_BIND_NOW doesn't affect anything.
Joerg
State-Changed-From-To: open->closed
State-Changed-By: kamil@NetBSD.org
State-Changed-When: Thu, 14 May 2020 21:23:23 +0200
State-Changed-Why:
Fixed in t_ptrace_fork_wait.h r.1.3.
From: "Kamil Rytarowski" <kamil@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55241 CVS commit: src/tests/lib/libc/sys
Date: Thu, 14 May 2020 19:21:35 +0000
Module Name: src
Committed By: kamil
Date: Thu May 14 19:21:35 UTC 2020
Modified Files:
src/tests/lib/libc/sys: t_ptrace_fork_wait.h
Log Message:
Ignore interception of the SIGCHLD signals.
SIGCHLD once blocked is discarded by the kernel as it has the
SA_IGNORE property. During the fork(2) operation all signals can be
shortly blocked and missed (unless there is a registered signal
handler in the traced child). This leads to a race in this test if
there would be an intention to catch SIGCHLD.
Fixes PR lib/55241 by Andreas Gustafsson
To generate a diff of this commit:
cvs rdiff -u -r1.2 -r1.3 src/tests/lib/libc/sys/t_ptrace_fork_wait.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Andreas Gustafsson <gson@gson.org>
To: kamil@netbsd.org
Cc: gnats-bugs@netbsd.org
Subject: Re: PR/55241 CVS commit: src/tests/lib/libc/sys
Date: Sat, 16 May 2020 19:07:16 +0300
Kamil Rytarowski wrote:
> SIGCHLD once blocked is discarded by the kernel as it has the
> SA_IGNORE property. During the fork(2) operation all signals can be
> shortly blocked and missed (unless there is a registered signal
> handler in the traced child). This leads to a race in this test if
> there would be an intention to catch SIGCHLD.
>
> Fixes PR lib/55241 by Andreas Gustafsson
>
>
> To generate a diff of this commit:
> cvs rdiff -u -r1.2 -r1.3 src/tests/lib/libc/sys/t_ptrace_fork_wait.h
This commit fixed many, but not all of the failures reported in this PR.
For example, lib/libc/sys/t_ptrace_wait4:fork_singalmasked failed in
the test run immediately after the commit:
http://releng.netbsd.org/b5reports/i386/commits-2020.05.html#2020.05.14.19.21.35
http://releng.netbsd.org/b5reports/i386/2020/2020.05.14.19.21.35/test.html#lib_libc_sys_t_ptrace_wait4_fork_singalmasked
--
Andreas Gustafsson, gson@gson.org
State-Changed-From-To: closed->open
State-Changed-By: gson@NetBSD.org
State-Changed-When: Sat, 16 May 2020 16:11:26 +0000
State-Changed-Why:
Not all of the test cases are fixed.
From: "Kamil Rytarowski" <kamil@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55241 CVS commit: src/tests/lib/libc/sys
Date: Sat, 16 May 2020 19:08:20 +0000
Module Name: src
Committed By: kamil
Date: Sat May 16 19:08:20 UTC 2020
Modified Files:
src/tests/lib/libc/sys: t_ptrace_fork_wait.h
Log Message:
Ignore interception of SIGCHLD signals in the debugger
Set SIGPASS for SIGCHLD for the traced child in the following tests:
- posix_spawn_singalmasked
- posix_spawn_singalignored
- fork_singalmasked
- fork_singalignored
- vfork_singalmasked
- vfork_singalignored
- vforkdone_singalmasked
- vforkdone_singalignored
There is a race that SIGCHLD might be blocked during forking and dropped.
PR/55241 by Andreas Gustafsson
To generate a diff of this commit:
cvs rdiff -u -r1.3 -r1.4 src/tests/lib/libc/sys/t_ptrace_fork_wait.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Kamil Rytarowski" <kamil@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55241 CVS commit: src/tests/lib/libc/sys
Date: Sat, 16 May 2020 23:10:27 +0000
Module Name: src
Committed By: kamil
Date: Sat May 16 23:10:26 UTC 2020
Modified Files:
src/tests/lib/libc/sys: t_ptrace_fork_wait.h
Log Message:
Ignore interception of SIGCHLD signals in the debugger
Set SIGPASS for SIGCHLD for the traced child in the following tests:
- unrelated_tracer_fork*
- unrelated_tracer_vfork*
- unrelated_tracer_posix_spawn*
There is a race that SIGCHLD might be blocked during forking and dropped.
PR/55241 by Andreas Gustafsson
To generate a diff of this commit:
cvs rdiff -u -r1.5 -r1.6 src/tests/lib/libc/sys/t_ptrace_fork_wait.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: kamil@NetBSD.org
State-Changed-When: Sun, 17 May 2020 01:17:37 +0200
State-Changed-Why:
Should be fixed in t_ptrace_fork_wait.h 1.6.
State-Changed-From-To: feedback->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Sun, 17 May 2020 17:46:56 +0000
State-Changed-Why:
Fixed, thanks.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55241 CVS commit: [netbsd-9] src/tests/lib/libc/sys
Date: Mon, 25 May 2020 17:06:53 +0000
Module Name: src
Committed By: martin
Date: Mon May 25 17:06:52 UTC 2020
Modified Files:
src/tests/lib/libc/sys [netbsd-9]: t_ptrace_wait.c
Log Message:
Apply patch, requested by kamil in ticket #925:
Adaption of:
tests/lib/libc/sys/t_ptrace_fork_wait.h 1.3,1.4,1.6
Ignore interception of SIGCHLD signals in the debugger
There is a race that SIGCHLD might be blocked during forking and dropped.
PR/55241 by Andreas Gustafsson
To generate a diff of this commit:
cvs rdiff -u -r1.131.2.7 -r1.131.2.8 src/tests/lib/libc/sys/t_ptrace_wait.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.