NetBSD Problem Report #54893

From gson@gson.org  Sat Jan 25 08:12:42 2020
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id BE7CF8B486
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 25 Jan 2020 08:12:42 +0000 (UTC)
Message-Id: <20200125081239.A0046253F50@guava.gson.org>
Date: Sat, 25 Jan 2020 10:12:39 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: New t_ptrace_wait*/resume1 failures since Jan 12
X-Send-Pr-Version: 3.95

>Number:         54893
>Category:       bin
>Synopsis:       New t_ptrace_wait*/resume1 failures since Jan 12
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kamil
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jan 25 08:15:00 +0000 2020
>Closed-Date:    Tue Feb 11 00:45:55 +0000 2020
>Last-Modified:  Tue Feb 11 00:50:01 +0000 2020
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2020.01.12.22.03.23
>Organization:

>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:

Several lib/libc/sys/t_ptrace_wait*:resume1 test cases started failing
with the following commits:

  commit 2020.01.12.22.03.22 ad src/sys/kern/kern_exec.c 1.488
  commit 2020.01.12.22.03.22 ad src/sys/kern/kern_runq.c 1.58
  commit 2020.01.12.22.03.23 ad src/sys/sys/lwp.h 1.196
  commit 2020.01.12.22.03.23 ad src/sys/sys/sched.h 1.87

The lib/libc/sys/t_ptrace_wait3:resume1 and
lib/libc/sys/t_ptrace_wait4:resume1 test cases have failed in every
run, and some other resume1 tests have been failing randomly.

These failures are still occurring as of source date 2020.01.25.00.12.42.

For logs, see:

  http://www.gson.org/netbsd/bugs/build/amd64-baremetal/commits-2020.01.html#2020.01.12.22.03.23

See PR 50350 for another issue that (re)appeared with the same commits.

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:

From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54893: New t_ptrace_wait*/resume1 failures since Jan 12
Date: Tue, 28 Jan 2020 19:02:37 +0000

 It looks to me like there is a race in this test case, which the scheduler
 in -current has exposed, because now if any CPU is idle a new LWP will be
 running within microseconds of _lwp_create().

 ===> Working:

 child (LID 1)
                 DPRINTF("Before creating new in child\n");
                 FORKEE_ASSERT(_lwp_create(&uc, 0, &lid) == 0);

  22139      1 t_ptrace_wait CALL  _lwp_create(0x7f7fff1659d0,0,0x7f7fff1659b8)
  22139      1 t_ptrace_wait RET   _lwp_create 0

 child (LID 1):
                 CHILD_TO_PARENT("Message", fds, msg);

 child (LID 2)
         the_lwp_id = _lwp_self();

         raise(SIGTRAP);

         _lwp_exit();

  22139      1 t_ptrace_wait CALL  close(5)
  22139      1 t_ptrace_wait RET   close 0
  22139      1 t_ptrace_wait CALL  close(6)
  22139      2 t_ptrace_wait CALL  _lwp_self
  22139      1 t_ptrace_wait RET   close 0
  22139      2 t_ptrace_wait RET   _lwp_self 2
  22139      2 t_ptrace_wait CALL  _lwp_self
  22139      1 t_ptrace_wait CALL  write(7,0x7f7fff1659b3,1)
  22139      2 t_ptrace_wait RET   _lwp_self 2
  22139      1 t_ptrace_wait GIO   fd 7 wrote 1 bytes
        "\M-^"
  22139      2 t_ptrace_wait CALL  _lwp_kill(2,5)
  22139      1 t_ptrace_wait RET   write 1
  22139      2 t_ptrace_wait RET   _lwp_kill 0
  22139      1 t_ptrace_wait CALL  read(4,0x7f7fff1659b3,1)

 Look at that, amazin'.  Both LWPs stop here.  Later on in the parent:

 parent (LID 1)

   5671      1 t_ptrace_wait CALL  read(6,0x7f7fff1659b3,1)
   5671      1 t_ptrace_wait GIO   fd 6 read 1 bytes
        "\M-^"

 ===> Failing:

  20223      1 t_ptrace_wait CALL  _lwp_create(0x7f7fffb27280,0,0x7f7fffb27268)
  20223      1 t_ptrace_wait RET   _lwp_create 0
  20223      1 t_ptrace_wait CALL  close(5)
  20223      1 t_ptrace_wait RET   close 0
  20223      2 t_ptrace_wait CALL  _lwp_self
  20223      2 t_ptrace_wait RET   _lwp_self 2
  20223      2 t_ptrace_wait CALL  _lwp_self
  20223      2 t_ptrace_wait RET   _lwp_self 2
  20223      2 t_ptrace_wait CALL  _lwp_kill(2,5)
  20223      2 t_ptrace_wait RET   _lwp_kill 0

 Oops!

 ...

 later in the parent:

   8999      1 t_ptrace_wait CALL  read(6,0x7f7fffb27263,1)

 ^ hangs here

Responsible-Changed-From-To: kern-bug-people->bin-bug-people
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Tue, 28 Jan 2020 19:13:20 +0000
Responsible-Changed-Why:
Not a kernel problem from what I can see


Responsible-Changed-From-To: bin-bug-people->kamil
Responsible-Changed-By: kamil@NetBSD.org
Responsible-Changed-When: Wed, 29 Jan 2020 05:07:56 +0100
Responsible-Changed-Why:
Take.


State-Changed-From-To: open->closed
State-Changed-By: kamil@NetBSD.org
State-Changed-When: Tue, 11 Feb 2020 01:45:55 +0100
State-Changed-Why:
Fixed in src/tests/lib/libc/sys/t_ptrace_wait.c r1.155, src/tests/lib/libc/sys/t_ptrace_wait.h r1.19.


From: "Kamil Rytarowski" <kamil@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54893 CVS commit: src/tests/lib/libc/sys
Date: Tue, 11 Feb 2020 00:41:37 +0000

 Module Name:	src
 Committed By:	kamil
 Date:		Tue Feb 11 00:41:37 UTC 2020

 Modified Files:
 	src/tests/lib/libc/sys: t_ptrace_wait.c t_ptrace_wait.h

 Log Message:
 Rewrite the t_ptrace_wait resume1 test

 Switch to pthread functions.
 Rename to 'resume'.
 Synchronize the threads with pthread barriers.
 Avoid race in the test.

 Test passes correctly.

 Fixes PR bin/54893 by Andreas Gustafsson


 To generate a diff of this commit:
 cvs rdiff -u -r1.154 -r1.155 src/tests/lib/libc/sys/t_ptrace_wait.c
 cvs rdiff -u -r1.18 -r1.19 src/tests/lib/libc/sys/t_ptrace_wait.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.