NetBSD Problem Report #55241

From gson@gson.org  Thu May  7 11:15:00 2020
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 08B391A9213
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  7 May 2020 11:14:59 +0000 (UTC)
Message-Id: <20200507111454.5F890253F45@guava.gson.org>
Date: Thu,  7 May 2020 14:14:54 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Many t_ptrace_wait* test cases fail since Apr 16
X-Send-Pr-Version: 3.95

>Number:         55241
>Category:       lib
>Synopsis:       Many t_ptrace_wait* test cases fail since Apr 16
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kamil
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu May 07 11:15:01 +0000 2020
>Closed-Date:    Sun May 17 17:46:56 +0000 2020
>Last-Modified:  Mon May 25 17:10:01 +0000 2020
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2020.04.16.14.39.58
>Organization:
>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:

On Apr 18, an automated report was sent to current-users, saying:

>  The newly failing test cases are:
>
>     lib/libc/sys/t_ptrace_wait4:fork10
>     lib/libc/sys/t_ptrace_wait4:fork2
>     lib/libc/sys/t_ptrace_wait4:fork4
>     lib/libc/sys/t_ptrace_wait4:fork6
>     lib/libc/sys/t_ptrace_wait4:fork8
>     lib/libc/sys/t_ptrace_wait4:fork_singalmasked
>     lib/libc/sys/t_ptrace_wait4:unrelated_tracer_fork16
>     lib/libc/sys/t_ptrace_wait6:fork14
>     lib/libc/sys/t_ptrace_wait6:fork4
>     lib/libc/sys/t_ptrace_wait6:fork_singalmasked
>     lib/libc/sys/t_ptrace_wait6:unrelated_tracer_fork14
>     lib/libc/sys/t_ptrace_wait6:unrelated_tracer_fork2
>     lib/libc/sys/t_ptrace_wait6:unrelated_tracer_fork4
>     lib/libc/sys/t_ptrace_waitid:fork14
>     lib/libc/sys/t_ptrace_waitid:fork2
>     lib/libc/sys/t_ptrace_waitid:fork4
>     lib/libc/sys/t_ptrace_waitid:fork6
>     lib/libc/sys/t_ptrace_waitid:fork8
>     lib/libc/sys/t_ptrace_waitid:fork_singalmasked
>     lib/libc/sys/t_ptrace_waitid:unrelated_tracer_fork10
>     lib/libc/sys/t_ptrace_waitid:unrelated_tracer_fork12
>     lib/libc/sys/t_ptrace_waitid:unrelated_tracer_fork8
>     lib/libc/sys/t_ptrace_waitpid:fork10
>     lib/libc/sys/t_ptrace_waitpid:fork12
>     lib/libc/sys/t_ptrace_waitpid:fork14
>     lib/libc/sys/t_ptrace_waitpid:fork16
>     lib/libc/sys/t_ptrace_waitpid:fork2
>     lib/libc/sys/t_ptrace_waitpid:fork4
>     lib/libc/sys/t_ptrace_waitpid:fork6
>     lib/libc/sys/t_ptrace_waitpid:fork8
>     lib/libc/sys/t_ptrace_waitpid:fork_singalignored
>     lib/libc/sys/t_ptrace_waitpid:fork_singalmasked
>     lib/libc/sys/t_ptrace_waitpid:unrelated_tracer_fork2
>     lib/libc/sys/t_ptrace_waitpid:unrelated_tracer_fork8
>
[...]
>
> The following commits were made between the last successful test and
> the failed test:
>
>     2020.04.16.14.39.58 joerg src/lib/libc/gen/pthread_atfork.c,v 1.13
>     2020.04.16.14.39.58 joerg src/libexec/ld.elf_so/rtld.c,v 1.204
>     2020.04.16.14.39.58 joerg src/libexec/ld.elf_so/rtld.h,v 1.139
>     2020.04.16.14.39.58 joerg src/libexec/ld.elf_so/symbols.map,v 1.3

The following day, Joerg wrote:
> AFAICT the tests are bad. They fail significantly more often when
> compiled with optimisations than without, strongly suggesting race
> conditions involved. The original change here certainly changes the
> timing and I am aware of one potential bug in it, but that bug is most
> definitely not exercised by the test cases.

These test cases (and other t_ptrace_wait* test cases that were reported
in separate mails) are still failing:

  http://releng.netbsd.org/b5reports/i386/2020/2020.05.06.09.18.10/test.html#failed-tcs-summary

Most of them fail only under qemu, but the fork_signalmasked test
cases are also failing on real hardware:

  http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.05.06.20.40.33/test.html#failed-tcs-summary

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:
From: Andreas Gustafsson <gson@gson.org>
To: kamil@NetBSD.org
Cc: gnats-bugs@netbsd.org
Subject: Re: lib/55241: Many t_ptrace_wait* test cases fail since Apr 16
Date: Thu, 7 May 2020 14:22:07 +0300

 Hi Kamil, 

 Do you accept Joerg's assertion that the test failures of lib/55241
 happen because "the tests are bad"?
 -- 
 Andreas Gustafsson, gson@gson.org

From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: lib/55241: Many t_ptrace_wait* test cases fail since Apr 16
Date: Thu, 7 May 2020 14:12:35 +0200

 On Thu, May 07, 2020 at 11:15:01AM +0000, Andreas Gustafsson wrote:
 > Most of them fail only under qemu, but the fork_signalmasked test
 > cases are also failing on real hardware:
 > 
 >   http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.05.06.20.40.33/test.html#failed-tcs-summary

 That's fork_singalmasked, just to save others the time.

 Joerg

From: Kamil Rytarowski <n54@gmx.com>
To: Andreas Gustafsson <gson@gson.org>, kamil@NetBSD.org
Cc: gnats-bugs@netbsd.org
Subject: Re: lib/55241: Many t_ptrace_wait* test cases fail since Apr 16
Date: Thu, 7 May 2020 14:12:47 +0200

 On 07.05.2020 13:22, Andreas Gustafsson wrote:
 > Hi Kamil,
 >
 > Do you accept Joerg's assertion that the test failures of lib/55241
 > happen because "the tests are bad"?
 >

 This claim does not contain any analysis. The same test scenarios are
 verified with vfork, clone, posix_spawn.

From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: lib/55241: Many t_ptrace_wait* test cases fail since Apr 16
Date: Thu, 7 May 2020 14:40:23 +0200

 On Thu, May 07, 2020 at 11:15:01AM +0000, Andreas Gustafsson wrote:
 > These test cases (and other t_ptrace_wait* test cases that were reported
 > in separate mails) are still failing:
 > 
 >   http://releng.netbsd.org/b5reports/i386/2020/2020.05.06.09.18.10/test.html#failed-tcs-summary
 > 
 > Most of them fail only under qemu, but the fork_signalmasked test
 > cases are also failing on real hardware:
 > 
 >   http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.05.06.20.40.33/test.html#failed-tcs-summary

 AFAICT, the fork_singalmasked tests make invalid assumptions,
 essentially:

     get signal mask
     fork()
     set signal mask of child immediately on return to userland

 This can't work in a world where the dynamic linker has to protect
 against reentrance as it also has to deal with TLS access from signal
 handlers. The tests can either defer the check e.g. by self-suspending
 the child immediately after fork returns or drop this part, but the
 observation they are trying to make is certainly only valid for the
 actual fork system call, not the library frontend.

 Joerg

From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, Andreas Gustafsson <gson@gson.org>
Subject: Re: lib/55241: Many t_ptrace_wait* test cases fail since Apr 16
Date: Thu, 7 May 2020 14:57:13 +0200

 On Thu, May 07, 2020 at 11:30:02AM +0000, Andreas Gustafsson wrote:
 >  Do you accept Joerg's assertion that the test failures of lib/55241
 >  happen because "the tests are bad"?

 It is important that I claimed that a lot of those test failures
 disappear depending on the optimizer settings used. That alone is a
 strong indicator of bad test cases, independent of any functional aspect
 involved here.

 Joerg

From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: lib/55241: Many t_ptrace_wait* test cases fail since Apr 16
Date: Thu, 7 May 2020 15:25:47 +0200

 On 07.05.2020 14:57, Joerg Sonnenberger wrote:
 > On Thu, May 07, 2020 at 11:30:02AM +0000, Andreas Gustafsson wrote:
 >>  Do you accept Joerg's assertion that the test failures of lib/55241
 >>  happen because "the tests are bad"?
 >
 > It is important that I claimed that a lot of those test failures
 > disappear depending on the optimizer settings used. That alone is a
 > strong indicator of bad test cases, independent of any functional aspect
 > involved here.
 >
 > Joerg
 >

 I totally disagree with such reasoning. With such claim any random
 kernel race indicates broken userland.

From: Christos Zoulas <christos@zoulas.com>
To: gnats-bugs@netbsd.org
Cc: lib-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 Andreas Gustafsson <gson@gson.org>
Subject: Re: lib/55241: Many t_ptrace_wait* test cases fail since Apr 16
Date: Thu, 7 May 2020 09:41:12 -0400

 --Apple-Mail=_88C3D28E-CBF4-42F0-952B-99E99DEE18EE
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 Do the tests fail if ran with LD_BIND_NOW or linked statically?

 christos

 --Apple-Mail=_88C3D28E-CBF4-42F0-952B-99E99DEE18EE
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----
 Comment: GPGTools - http://gpgtools.org

 iF0EARECAB0WIQS+BJlbqPkO0MDBdsRxESqxbLM7OgUCXrQP+AAKCRBxESqxbLM7
 OhclAKCq8fiHVZRyHSB1JZIP99hUjdTGTgCggHmXaflWWYKNVH3U7coF0rWiCIE=
 =NEX3
 -----END PGP SIGNATURE-----

 --Apple-Mail=_88C3D28E-CBF4-42F0-952B-99E99DEE18EE--

Responsible-Changed-From-To: lib-bug-people->kamil
Responsible-Changed-By: kamil@NetBSD.org
Responsible-Changed-When: Tue, 12 May 2020 12:02:48 +0200
Responsible-Changed-Why:
Take.
fork_singalmasked already fixed in src/tests/lib/libc/sys/t_ptrace_fork_wait.h r.1.2


From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@netbsd.org, Christos Zoulas <christos@netbsd.org>,
 Joerg Sonnenberger <joerg@netbsd.org>, Martin Husemann <martin@netbsd.org>
Cc: 
Subject: Re: lib/55241 (Many t_ptrace_wait* test cases fail since Apr 16)
Date: Thu, 14 May 2020 02:36:21 +0200

 I think, I have narrowed down the problem to signal races in the
 test/kernel and I'm working on a fix.

 Christos asked about LD_BIND_NOW, but that one is not related.

From: Joerg Sonnenberger <joerg@bec.de>
To: Kamil Rytarowski <n54@gmx.com>
Cc: gnats-bugs@netbsd.org, Christos Zoulas <christos@netbsd.org>,
	Joerg Sonnenberger <joerg@netbsd.org>,
	Martin Husemann <martin@netbsd.org>
Subject: Re: lib/55241 (Many t_ptrace_wait* test cases fail since Apr 16)
Date: Thu, 14 May 2020 02:59:24 +0200

 On Thu, May 14, 2020 at 02:36:21AM +0200, Kamil Rytarowski wrote:
 > I think, I have narrowed down the problem to signal races in the
 > test/kernel and I'm working on a fix.
 > 
 > Christos asked about LD_BIND_NOW, but that one is not related.

 LD_BIND_NOW doesn't affect anything.

 Joerg

State-Changed-From-To: open->closed
State-Changed-By: kamil@NetBSD.org
State-Changed-When: Thu, 14 May 2020 21:23:23 +0200
State-Changed-Why:
Fixed in t_ptrace_fork_wait.h r.1.3.


From: "Kamil Rytarowski" <kamil@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55241 CVS commit: src/tests/lib/libc/sys
Date: Thu, 14 May 2020 19:21:35 +0000

 Module Name:	src
 Committed By:	kamil
 Date:		Thu May 14 19:21:35 UTC 2020

 Modified Files:
 	src/tests/lib/libc/sys: t_ptrace_fork_wait.h

 Log Message:
 Ignore interception of the SIGCHLD signals.

 SIGCHLD once blocked is discarded by the kernel as it has the
 SA_IGNORE property. During the fork(2) operation all signals can be
 shortly blocked and missed (unless there is a registered signal
 handler in the traced child). This leads to a race in this test if
 there would be an intention to catch SIGCHLD.

 Fixes PR lib/55241 by Andreas Gustafsson


 To generate a diff of this commit:
 cvs rdiff -u -r1.2 -r1.3 src/tests/lib/libc/sys/t_ptrace_fork_wait.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Andreas Gustafsson <gson@gson.org>
To: kamil@netbsd.org
Cc: gnats-bugs@netbsd.org
Subject: Re: PR/55241 CVS commit: src/tests/lib/libc/sys
Date: Sat, 16 May 2020 19:07:16 +0300

 Kamil Rytarowski wrote:
 >  SIGCHLD once blocked is discarded by the kernel as it has the
 >  SA_IGNORE property. During the fork(2) operation all signals can be
 >  shortly blocked and missed (unless there is a registered signal
 >  handler in the traced child). This leads to a race in this test if
 >  there would be an intention to catch SIGCHLD.
 >  
 >  Fixes PR lib/55241 by Andreas Gustafsson
 >  
 >  
 >  To generate a diff of this commit:
 >  cvs rdiff -u -r1.2 -r1.3 src/tests/lib/libc/sys/t_ptrace_fork_wait.h

 This commit fixed many, but not all of the failures reported in this PR.
 For example, lib/libc/sys/t_ptrace_wait4:fork_singalmasked failed in
 the test run immediately after the commit:

   http://releng.netbsd.org/b5reports/i386/commits-2020.05.html#2020.05.14.19.21.35

   http://releng.netbsd.org/b5reports/i386/2020/2020.05.14.19.21.35/test.html#lib_libc_sys_t_ptrace_wait4_fork_singalmasked

 -- 
 Andreas Gustafsson, gson@gson.org

State-Changed-From-To: closed->open
State-Changed-By: gson@NetBSD.org
State-Changed-When: Sat, 16 May 2020 16:11:26 +0000
State-Changed-Why:
Not all of the test cases are fixed.


From: "Kamil Rytarowski" <kamil@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55241 CVS commit: src/tests/lib/libc/sys
Date: Sat, 16 May 2020 19:08:20 +0000

 Module Name:	src
 Committed By:	kamil
 Date:		Sat May 16 19:08:20 UTC 2020

 Modified Files:
 	src/tests/lib/libc/sys: t_ptrace_fork_wait.h

 Log Message:
 Ignore interception of SIGCHLD signals in the debugger

 Set SIGPASS for SIGCHLD for the traced child in the following tests:

  - posix_spawn_singalmasked
  - posix_spawn_singalignored
  - fork_singalmasked
  - fork_singalignored
  - vfork_singalmasked
  - vfork_singalignored
  - vforkdone_singalmasked
  - vforkdone_singalignored

 There is a race that SIGCHLD might be blocked during forking and dropped.

 PR/55241 by Andreas Gustafsson


 To generate a diff of this commit:
 cvs rdiff -u -r1.3 -r1.4 src/tests/lib/libc/sys/t_ptrace_fork_wait.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Kamil Rytarowski" <kamil@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55241 CVS commit: src/tests/lib/libc/sys
Date: Sat, 16 May 2020 23:10:27 +0000

 Module Name:	src
 Committed By:	kamil
 Date:		Sat May 16 23:10:26 UTC 2020

 Modified Files:
 	src/tests/lib/libc/sys: t_ptrace_fork_wait.h

 Log Message:
 Ignore interception of SIGCHLD signals in the debugger

 Set SIGPASS for SIGCHLD for the traced child in the following tests:
  - unrelated_tracer_fork*
  - unrelated_tracer_vfork*
  - unrelated_tracer_posix_spawn*

 There is a race that SIGCHLD might be blocked during forking and dropped.

 PR/55241 by Andreas Gustafsson


 To generate a diff of this commit:
 cvs rdiff -u -r1.5 -r1.6 src/tests/lib/libc/sys/t_ptrace_fork_wait.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: kamil@NetBSD.org
State-Changed-When: Sun, 17 May 2020 01:17:37 +0200
State-Changed-Why:
Should be fixed in t_ptrace_fork_wait.h 1.6.


State-Changed-From-To: feedback->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Sun, 17 May 2020 17:46:56 +0000
State-Changed-Why:
Fixed, thanks.


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55241 CVS commit: [netbsd-9] src/tests/lib/libc/sys
Date: Mon, 25 May 2020 17:06:53 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon May 25 17:06:52 UTC 2020

 Modified Files:
 	src/tests/lib/libc/sys [netbsd-9]: t_ptrace_wait.c

 Log Message:
 Apply patch, requested by kamil in ticket #925:
 Adaption of:

 	tests/lib/libc/sys/t_ptrace_fork_wait.h	1.3,1.4,1.6

 Ignore interception of SIGCHLD signals in the debugger

 There is a race that SIGCHLD might be blocked during forking and dropped.

 PR/55241 by Andreas Gustafsson


 To generate a diff of this commit:
 cvs rdiff -u -r1.131.2.7 -r1.131.2.8 src/tests/lib/libc/sys/t_ptrace_wait.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.