NetBSD Problem Report #52932

From Manuel.Bouyer@lip6.fr  Wed Jan 17 17:53:46 2018
Return-Path: <Manuel.Bouyer@lip6.fr>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 2BE347A19A
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 17 Jan 2018 17:53:46 +0000 (UTC)
Message-Id: <20180117175342.0F571A936@armandeche.soc.lip6.fr>
Date: Wed, 17 Jan 2018 18:53:41 +0100 (MET)
From: Manuel.Bouyer@lip6.fr
Reply-To: Manuel.Bouyer@lip6.fr
To: gnats-bugs@NetBSD.org
Subject: tests hang on netbsd-8
X-Send-Pr-Version: 3.95

>Number:         52932
>Category:       bin
>Synopsis:       tests hang on netbsd-8
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 17 17:55:00 +0000 2018
>Closed-Date:    Mon Mar 05 19:07:13 +0000 2018
>Last-Modified:  Mon Mar 05 19:07:13 +0000 2018
>Originator:     Manuel Bouyer
>Release:        NetBSD 8.0_BETA
>Organization:
>Environment:
System: NetBSD 8.0_BETA
Architecture: x86_64, i386, sparc at last
>Description:
	As shown at
	http://www-soc.lip6.fr/~bouyer/NetBSD-tests/qemu/netbsd-8/
	for several weeks now, and up to 201801120300Z, running tests on
	the netbsd-8 branch from anita consistently hang at:
[...]
lib/libc/sys/t_ptrace_wait (273/721): 81 test cases
    attach3: [0.111435s] Passed.
    attach4: [0.098737s] Passed.
    eventmask1: [0.103849s] Passed.
    eventmask2: [0.101645s] Passed.
    eventmask3: sorry, pid 17367 was killed: orphaned traced process
[0.096996s] Expected failure: PR kern/51630: /usr/src/tests/lib/libc/sys/t_ptrace_wait.c:1122: ptrace(PT_SET_EVENT_MASK, child, &set_event, len) != -1 not met
    eventmask4: [0.092491s] Passed.
    eventmask5: [0.102978s] Passed.
    eventmask6: [0.103471s] Passed.
    fork2: [0.121065s] Passed.
    fpregs1: [0.091827s] Passed.
    fpregs2: [0.103025s] Passed.
    getsigmask1: [0.095784s] Passed.
    getsigmask2: [0.091091s] Passed.
    io_read_auxv1: [0.113325s] Passed.
    io_read_d1: [0.090043s] Passed.
    io_read_d2: [0.099908s] Passed.
    io_read_d3: [0.101111s] Passed.
    io_read_d4: [0.094436s] Passed.
    io_read_d_write_d_handshake1: [0.101792s] Passed.
    io_read_d_write_d_handshake2: [0.096378s] Passed.
    io_read_i1: [0.102845s] Passed.
    io_read_i2: [0.099814s] Passed.
    io_read_i3: [0.090768s] Passed.
    io_read_i4: [0.096865s] Passed.
    io_write_d1: [0.100540s] Passed.
    io_write_d2: [0.100000s] Passed.
    io_write_d3: [0.097249s] Passed.
    io_write_d4: [0.094930s] Passed.
    kill1: [0.081539s] Passed.
    kill2: [0.097514s] Passed.
    lwp_create1: [0.116340s] Passed.
    lwp_exit1: [0.117504s] Passed.
    lwpinfo1: [0.096785s] Passed.
    read_d1: [0.103719s] Passed.
    read_d2: [0.103068s] Passed.
    read_d3: [0.104957s] Passed.
    read_d4: [0.101519s] Passed.
    read_d_write_d_handshake1: [0.098761s] Passed.
    read_d_write_d_handshake2: [0.098112s] Passed.
    read_i1: [0.110516s] Passed.
    read_i2: [0.098445s] Passed.
    read_i3: [0.104074s] Passed.
    read_i4: [0.100000s] Passed.
    regs1: [0.098383s] Passed.
    regs2: [0.102669s] Passed.
    regs3: [0.097452s] Passed.
    regs4: [0.098211s] Passed.
    regs5: [0.100004s] Passed.
    resume1: sorry, pid 16196 was killed: orphaned traced process

    from here, no progress is made, and anita times out.
    The last succesfull run on i386 was 2017-12-09

    Interestingly, running the tests on Xen succeeds, but this may be because
    ptrace has other issues with Xen (here resume1 fails with an expected
    timeout: PR kern/51995)

>How-To-Repeat:
	run anita with a qemu vmm.
>Fix:
	please

>Release-Note:

>Audit-Trail:
From: Andreas Gustafsson <gson@gson.org>
To: Manuel.Bouyer@lip6.fr, gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/52932: tests hang on netbsd-8
Date: Wed, 17 Jan 2018 20:49:15 +0200

 This problem first appeared in -current with src/sys/kern/kern_lwp.c 1.191,
 which was then pulled up: http://releng.netbsd.org/cgi-bin/req-8.cgi?show=417

 The problem was worked around in -current by disabling the hanging
 test in src/tests/lib/libc/sys/t_ptrace_wait.c 1.11, but apparently
 this change has not been pulled up.

 Has anyone actually conclusively determined whether the kern_lwp.c
 commit or the test is at fault?
 -- 
 Andreas Gustafsson, gson@gson.org

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	Manuel.Bouyer@lip6.fr
Cc: 
Subject: Re: bin/52932: tests hang on netbsd-8
Date: Wed, 17 Jan 2018 13:52:11 -0500

 On Jan 17,  6:50pm, gson@gson.org (Andreas Gustafsson) wrote:
 -- Subject: Re: bin/52932: tests hang on netbsd-8

 | The following reply was made to PR bin/52932; it has been noted by GNATS.
 | 
 | From: Andreas Gustafsson <gson@gson.org>
 | To: Manuel.Bouyer@lip6.fr, gnats-bugs@NetBSD.org
 | Cc: 
 | Subject: Re: bin/52932: tests hang on netbsd-8
 | Date: Wed, 17 Jan 2018 20:49:15 +0200
 | 
 |  This problem first appeared in -current with src/sys/kern/kern_lwp.c 1.191,
 |  which was then pulled up: http://releng.netbsd.org/cgi-bin/req-8.cgi?show=417
 |  
 |  The problem was worked around in -current by disabling the hanging
 |  test in src/tests/lib/libc/sys/t_ptrace_wait.c 1.11, but apparently
 |  this change has not been pulled up.
 |  
 |  Has anyone actually conclusively determined whether the kern_lwp.c
 |  commit or the test is at fault?

 Well, there conclusion I've reached is that there are more races in
 ptrace(2) that are exposed by the kern_lwp.c change. Reverting the change
 makes regular (non-ptraced) binaries hang randomly, such as go. Choose
 your poison... Would you rather have the ptrace test hang, or a regular
 program?

 christos

From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/52932: tests hang on netbsd-8
Date: Wed, 17 Jan 2018 19:56:54 +0100

 This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
 --5Kn4Wia7TWaKavsaXE8lN4V1fJxNRHxcM
 Content-Type: multipart/mixed; boundary="7mhXor4jmqTJuMo6E8IBvfO2gBwD1t39L";
  protected-headers="v1"
 From: Kamil Rytarowski <n54@gmx.com>
 To: gnats-bugs@NetBSD.org
 Message-ID: <816fedf8-7d77-7867-3a5d-9d80383907a2@gmx.com>
 Subject: Re: bin/52932: tests hang on netbsd-8
 References: <pr-bin-52932@gnats.netbsd.org>
  <20180117175342.0F571A936@armandeche.soc.lip6.fr>
  <20180117185001.6DE2F80383@mollari.NetBSD.org>
 In-Reply-To: <20180117185001.6DE2F80383@mollari.NetBSD.org>

 --7mhXor4jmqTJuMo6E8IBvfO2gBwD1t39L
 Content-Type: text/plain; charset=utf-8
 Content-Language: en-US
 Content-Transfer-Encoding: quoted-printable

 On 17.01.2018 19:50, Andreas Gustafsson wrote:
 > The following reply was made to PR bin/52932; it has been noted by GNAT=
 S.
 >=20
 > From: Andreas Gustafsson <gson@gson.org>
 > To: Manuel.Bouyer@lip6.fr, gnats-bugs@NetBSD.org
 > Cc:=20
 > Subject: Re: bin/52932: tests hang on netbsd-8
 > Date: Wed, 17 Jan 2018 20:49:15 +0200
 >=20
 >  This problem first appeared in -current with src/sys/kern/kern_lwp.c 1=
 =2E191,
 >  which was then pulled up: http://releng.netbsd.org/cgi-bin/req-8.cgi?s=
 how=3D417
 > =20
 >  The problem was worked around in -current by disabling the hanging
 >  test in src/tests/lib/libc/sys/t_ptrace_wait.c 1.11, but apparently
 >  this change has not been pulled up.
 > =20
 >  Has anyone actually conclusively determined whether the kern_lwp.c
 >  commit or the test is at fault?
 >  --=20
 >  Andreas Gustafsson, gson@gson.org
 > =20
 >=20

 Please merge needed patches from HEAD - disabling failing/hanging
 ptrace(2) tests.


 --7mhXor4jmqTJuMo6E8IBvfO2gBwD1t39L--

 --5Kn4Wia7TWaKavsaXE8lN4V1fJxNRHxcM
 Content-Type: application/pgp-signature; name="signature.asc"
 Content-Description: OpenPGP digital signature
 Content-Disposition: attachment; filename="signature.asc"

 -----BEGIN PGP SIGNATURE-----

 iQJABAEBCAAqFiEELaxVpweEzw+lMDwuS7MI6bAudmwFAlpfnHYMHG41NEBnbXgu
 Y29tAAoJEEuzCOmwLnZs4QYP/RJ7TeIucayu5xCnXY4YgAqpgXqa053J62hVq3xB
 mB9xMeMRbxeemstvri8BMy6iLYq54bzayH7TkgsJ6At+PP2gN8BU296QeM4OMrZR
 7z0lmFFHCAkWkYQA9pGdtU9vmKhgTbuaxSCwuPWBzYNzsuSzb0gQWDH5oP4hGlmI
 nXBzsucyWV7Oe9cUaeRegdNNFbzJBpG69WQTk2DfffqX4lKIhlzAk8HIauhLZETz
 2gdOtHtT34kdWZS2Yq92aXd4qeMvp5PURnX4URX3l1hD9Sk8bMk0koxkg3ZiCPa5
 u41gFX4W3ChC+S/SMJVoG1Y5yyPOpg6Jeje0uemsAFX0tVfEhXvGXBb1K5kraC2n
 H+xaqWBk6IDNkXGERS3PARaID3fnF8XSvUndT1t5g2BnpA9mKXsODTimwE+SG2wM
 RqvPFLex2DF2Yz/lNXnU8mtUHOteqQz7J85t5TCXiWQZM3EbQ7z3ZAa0hdlPDe/O
 +QFgkII41K9T8SSuMpVFsjhK5BFe2V0WTRiTDYGyWnLWUHVjit9RjUy+aLlzYOP8
 Z/AnBPPoM0R4RoF1YJ7Cnx9H65CwdmkogsSepkNN963LVHxUCCMUAS/h0VeAyuqo
 VK4FRcVqS2kibX4elC+AkhIK/9bGnmtMyjTSMm3q1sPLcdEnyK9zURSmCllE3vzK
 ACeL
 =0Y2p
 -----END PGP SIGNATURE-----

 --5Kn4Wia7TWaKavsaXE8lN4V1fJxNRHxcM--

From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: Manuel.Bouyer@lip6.fr, christos@NetBSD.org, kamil@NetBSD.org
Subject: Re: bin/52932: tests hang on netbsd-8
Date: Wed, 17 Jan 2018 20:58:33 +0200

 > Has anyone actually conclusively determined whether the kern_lwp.c
 > commit or the test is at fault?

 Now CC:ing christos@ and kamil@, which I forgot to do in my previous mail.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Kamil Rytarowski <n54@gmx.com>
To: Andreas Gustafsson <gson@gson.org>, gnats-bugs@NetBSD.org
Cc: Manuel.Bouyer@lip6.fr, christos@NetBSD.org, kamil@NetBSD.org
Subject: Re: bin/52932: tests hang on netbsd-8
Date: Wed, 17 Jan 2018 20:06:11 +0100

 This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
 --OqtgXcO4gXUBFhs1Aq6rl1NWfuEfJ7UVi
 Content-Type: multipart/mixed; boundary="BfjGP0wtFC001loNSvHIGRUv0A34diMpT";
  protected-headers="v1"
 From: Kamil Rytarowski <n54@gmx.com>
 To: Andreas Gustafsson <gson@gson.org>, gnats-bugs@NetBSD.org
 Cc: Manuel.Bouyer@lip6.fr, christos@NetBSD.org, kamil@NetBSD.org
 Message-ID: <7f03eca9-254b-b545-b653-46a87a1fcb40@gmx.com>
 Subject: Re: bin/52932: tests hang on netbsd-8
 References: <23135.39595.886133.507223@guava.gson.org>
  <23135.40153.467329.715733@guava.gson.org>
 In-Reply-To: <23135.40153.467329.715733@guava.gson.org>

 --BfjGP0wtFC001loNSvHIGRUv0A34diMpT
 Content-Type: text/plain; charset=utf-8
 Content-Language: en-US
 Content-Transfer-Encoding: quoted-printable

 On 17.01.2018 19:58, Andreas Gustafsson wrote:
 >> Has anyone actually conclusively determined whether the kern_lwp.c
 >> commit or the test is at fault?
 >=20
 > Now CC:ing christos@ and kamil@, which I forgot to do in my previous ma=
 il.
 >=20

 Already replied (please mark to skip the tests). We get fixes for golang
 exposing a buggy behavior of ptrace(2) with thread. It does not break
 more ptrace(2) than it was before.


 --BfjGP0wtFC001loNSvHIGRUv0A34diMpT--

 --OqtgXcO4gXUBFhs1Aq6rl1NWfuEfJ7UVi
 Content-Type: application/pgp-signature; name="signature.asc"
 Content-Description: OpenPGP digital signature
 Content-Disposition: attachment; filename="signature.asc"

 -----BEGIN PGP SIGNATURE-----

 iQJABAEBCAAqFiEELaxVpweEzw+lMDwuS7MI6bAudmwFAlpfnqMMHG41NEBnbXgu
 Y29tAAoJEEuzCOmwLnZshUYP/iOVaInndy+OvX8+j06TgtfvBZJ1J8d5sGzyBDCC
 I4aY2Jkimhd4ofr53AnbVHg9rYIrOs9jP+c12+34E0ma7EgEFnn47D18gVuBy3bw
 ZMJYJnCVokdBDGRxfwIXphY2+G3AkvkM2VaXe7iZZftdtDqrxpS8mLG+9gRRlrNl
 SRZWEkHpbOGu2Vg5XNs+N5DILSIKcsFMD2NLZW22GzLKxCN1BH90Dm6F558aj7pP
 zovRVl1uC1NqO4Z/ook+Yx/xycseLa6mNu5Igw4zkVi1HxU+NQ1z5DYIoBBcDOu4
 6HG251hEa/TYgcWQervORxP9S5ItLQRD1eTA45PLKoRTFPYZot1PuiY4Q9DKoR4H
 y4BXSPweIdjlN28Cl/SBz1VolOv4WE1vckJ2wRLTkk2kt9QG83mTLlA0aOHol3RA
 3sDGu3k2Qh/3xT8AKdqC3+B+VJYdJDd+jYmGyBQVwd+Xl99PQx7nSkpuMWHgjxKl
 4l7iKhU4xnwB+dcnaSXGMB8mg1lftPc5LqKJhl3Y+F40n55Ayab04mAfkHpHAzBC
 acf55+wOsobt+73AGYA9Zmu52ilU7e5neotKc4td1EFVqYviPJfKL+3068HTFhfS
 RsPwhNv8POgbK36qJGJE3gBwMk6zidGbC/EUfu+RtpiZe80eTPHDg1q3j/rSAayO
 M4VO
 =pELd
 -----END PGP SIGNATURE-----

 --OqtgXcO4gXUBFhs1Aq6rl1NWfuEfJ7UVi--

From: Kamil Rytarowski <n54@gmx.com>
To: 
Cc: gnats-bugs@NetBSD.org, christos@NetBSD.org
Subject: Re: bin/52932: tests hang on netbsd-8
Date: Wed, 17 Jan 2018 20:25:39 +0100

 This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
 --L3SLNHi7xTP98H6gueFI1LGr0oK1hIRhA
 Content-Type: multipart/mixed; boundary="ApkHse7hqa2WCUvGe4VDd5tKC91M3tRrD";
  protected-headers="v1"
 From: Kamil Rytarowski <n54@gmx.com>
 Cc: gnats-bugs@NetBSD.org, christos@NetBSD.org
 Message-ID: <e69cb189-f9ba-dd43-7a7d-6695171082bb@gmx.com>
 Subject: Re: bin/52932: tests hang on netbsd-8
 References: <23135.39595.886133.507223@guava.gson.org>
  <23135.40153.467329.715733@guava.gson.org>
  <7f03eca9-254b-b545-b653-46a87a1fcb40@gmx.com>
  <23135.41372.320178.520216@guava.gson.org>
 In-Reply-To: <23135.41372.320178.520216@guava.gson.org>

 --ApkHse7hqa2WCUvGe4VDd5tKC91M3tRrD
 Content-Type: text/plain; charset=utf-8
 Content-Language: en-US
 Content-Transfer-Encoding: quoted-printable

 On 17.01.2018 20:18, Andreas Gustafsson wrote:
 > Kamil Rytarowski wrote:
 >> Already replied (please mark to skip the tests). We get fixes for gola=
 ng
 >> exposing a buggy behavior of ptrace(2) with thread. It does not break
 >> more ptrace(2) than it was before.
 >=20
 > Does this buggy behavior of ptrace(2) have a PR?
 >=20
 > There's also the question of why the hanging test case didn't just
 > time out, but caused the entire test suite to hang.  Is this a
 > consequence of the ptrace(2) bug, or a bug in ATF?
 >=20

 It's a consequence of buggy ptrace(2) internals (and it will be
 addressed in future, likely post 8.0).

 ptrace(2) PT_RESUME is not reliable
 gnats.netbsd.org/51995


 --ApkHse7hqa2WCUvGe4VDd5tKC91M3tRrD--

 --L3SLNHi7xTP98H6gueFI1LGr0oK1hIRhA
 Content-Type: application/pgp-signature; name="signature.asc"
 Content-Description: OpenPGP digital signature
 Content-Disposition: attachment; filename="signature.asc"

 -----BEGIN PGP SIGNATURE-----

 iQJABAEBCAAqFiEELaxVpweEzw+lMDwuS7MI6bAudmwFAlpfozMMHG41NEBnbXgu
 Y29tAAoJEEuzCOmwLnZs188P/AxgT0pw0fbEA8Zops3Rg5BcNLqTZUs3AvulmMJS
 y8nCtTeI14HhzNtDHOxV6l+txR2G3DriSlH3Q5/xifyw97/A8fkc796WBVLUJMTd
 /neaUqRGVSBQh8284E4udXL9PzVIaTJUCR2IIlv/E/l/91L0A0akY40150kVLT3V
 DqjkllA+n6UZiUr+lYhZL5e0CmI8Mcj13GgEY4CjdnxbFxEspgL8AFdozbQdUPk8
 YAULPruOtM5R4hZVeFYpvxTNGMn5KHiAdTtx2Owz+thjfhn6Yq1OcciOsq4Gql2I
 VkWdQ4wKAGeqaUU1cSG3ALRVkbB0LsuZDpacdBDn2nI9/ENiXUXhU5zlCWTYEqIN
 9KjvaBkTP9IATyacZb0RAQ2fMyNTnAJVweSINKEKsP2k/hmrhavXYclYpDyy1YRd
 vefW8C/klw/MHAL/RNtX+p8kVU2LHN1kUoNzGtA5C9LM4Hrmxe4vFwJH1L691MVa
 JWsRfBkHLFqJps4qNyRZnZ4d4CLFXPMtEqUooD8shGhr04FsFnX15LJYghoP2Cfb
 wQbCbLKVrePfSVoahc3P8RRnpVZRnMYMOzbSGSyp0RUE4SCZKkJjV6o5Tz65+4P+
 OUePD0yTXL4lCqPC00nuOlqW2p+EtqwS9Kj+Oo6RjoBzURuRaudNyvNEC495I5w0
 boKZ
 =B+Tf
 -----END PGP SIGNATURE-----

 --L3SLNHi7xTP98H6gueFI1LGr0oK1hIRhA--

From: Andreas Gustafsson <gson@gson.org>
To: Kamil Rytarowski <n54@gmx.com>
Cc: gnats-bugs@NetBSD.org,
    Manuel.Bouyer@lip6.fr,
    christos@NetBSD.org
Subject: Re: bin/52932: tests hang on netbsd-8
Date: Wed, 17 Jan 2018 21:18:52 +0200

 Kamil Rytarowski wrote:
 > Already replied (please mark to skip the tests). We get fixes for golang
 > exposing a buggy behavior of ptrace(2) with thread. It does not break
 > more ptrace(2) than it was before.

 Does this buggy behavior of ptrace(2) have a PR?

 There's also the question of why the hanging test case didn't just
 time out, but caused the entire test suite to hang.  Is this a
 consequence of the ptrace(2) bug, or a bug in ATF?
 -- 
 Andreas Gustafsson, gson@gson.org

From: christos@zoulas.com (Christos Zoulas)
To: Andreas Gustafsson <gson@gson.org>, Kamil Rytarowski <n54@gmx.com>
Cc: gnats-bugs@NetBSD.org, Manuel.Bouyer@lip6.fr
Subject: Re: bin/52932: tests hang on netbsd-8
Date: Wed, 17 Jan 2018 16:23:42 -0500

 On Jan 17,  9:18pm, gson@gson.org (Andreas Gustafsson) wrote:
 -- Subject: Re: bin/52932: tests hang on netbsd-8

 | Kamil Rytarowski wrote:
 | > Already replied (please mark to skip the tests). We get fixes for golang
 | > exposing a buggy behavior of ptrace(2) with thread. It does not break
 | > more ptrace(2) than it was before.
 | 
 | Does this buggy behavior of ptrace(2) have a PR?
 | 
 | There's also the question of why the hanging test case didn't just
 | time out, but caused the entire test suite to hang.  Is this a
 | consequence of the ptrace(2) bug, or a bug in ATF?

 Well, it should eventually timeout, but that's up to the framework to
 do properly. The only way guarantee timeout is to fork a process and
 watch it. I am not sure if ATF does this.

 christos

State-Changed-From-To: open->feedback
State-Changed-By: maya@NetBSD.org
State-Changed-When: Fri, 02 Mar 2018 15:05:45 +0000
State-Changed-Why:
Please retry after https://v4.freshbsd.org/commit/netbsd/src/3QyQwLw3g4TCEhsA


State-Changed-From-To: feedback->closed
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Mon, 05 Mar 2018 19:07:13 +0000
State-Changed-Why:
Seems to be fixed in the 20180226 build


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.