NetBSD Problem Report #51420

From martin@duskware.de  Wed Aug 17 13:06:09 2016
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E19407A266
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 17 Aug 2016 13:06:08 +0000 (UTC)
Date: Wed, 17 Aug 2016 15:05:56 CEST
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: gdb hangs waiting for traced threaded child under compat_netbsd32
X-Send-Pr-Version: 3.95

>Number:         51420
>Category:       kern
>Synopsis:       gdb hangs waiting for traced threaded child under compat_netbsd32
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kamil
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Aug 17 13:10:00 +0000 2016
>Closed-Date:    Thu Aug 06 20:49:26 +0000 2020
>Last-Modified:  Thu Aug 06 20:49:26 +0000 2020
>Originator:     Martin Husemann
>Release:        NetBSD 7.99.35
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD supremacy.duskware.de 7.99.35 NetBSD 7.99.35 (ERLITE) #170: Wed Aug 17 10:29:15 CEST 2016 martin@night-owl.duskware.de:/usr/src/sys/arch/evbmips/compile/ERLITE evbmips
Architecture: mips64eb
Machine: evbmips
>Description:

Running a compat_netbsd32 process under gdb makes gdb miss the child 
termination. The traced child process waits to be collected but parent
never wakes up.

>How-To-Repeat:
gdb dig
run

>Fix:
n/a

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->feedback
State-Changed-By: maya@NetBSD.org
State-Changed-When: Tue, 08 Nov 2016 19:09:44 +0000
State-Changed-Why:
ok now?


State-Changed-From-To: feedback->open
State-Changed-By: martin@NetBSD.org
State-Changed-When: Sat, 12 Nov 2016 11:16:06 +0000
State-Changed-Why:
Better, but not quite there yet


From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51420 (gdb hangs waiting for traced threaded child under compat_netbsd32)
Date: Sat, 12 Nov 2016 12:15:21 +0100

 It does not hang any more, but the child does fail to run still:

 > gdb dig
 GNU gdb (GDB) 7.10.1
 [..]
 (gdb) run
 Starting program: /usr/bin/dig 
 warning: GDB can't find the start of the function at 0x77d59eb8.

     GDB is unable to find the start of the function at 0x77d59eb8
 and thus can't determine the size of that function's stack frame.
 This means that GDB may be unable to access that stack frame, or
 the frames below it.
     This problem is most likely caused by an invalid program counter or
 stack pointer.
     However, if you think GDB should simply search farther back
 from 0x77d59eb8 for code which looks like the beginning of a
 function, you can increase the range of the search using the `set
 heuristic-fence-post' command.

 Program received signal SIGTRAP, Trace/breakpoint trap.
 0x77d59eb8 in ?? ()
 (gdb) bt
 #0  0x77d59eb8 in ?? ()


 and the console logs:

 trap: pid 110(dig): sig 5: cause=0x24 epc=0x787d26f0 va=0x787d26f0
 registers:
 [ 0]=00000000 [ 1]=00000000 [ 2]=00000000 [ 3]=787cb85c
 [ 4]=00000003 [ 5]=7fff6b68 [ 6]=00000000 [ 7]=00000000
 [ 8]=00000001 [ 9]=00000000 [10]=00000006 [11]=77d00248
 [12]=00000001 [13]=00000000 [14]=00000000 [15]=787fc500
 [16]=787f47e0 [17]=7fff6c30 [18]=787f0000 [19]=0003b0f0
 [20]=787f4b90 [21]=787f47bc [22]=00000000 [23]=787f4b90
 [24]=0000012a [25]=787d26f0 [26]=00000000 [27]=00000000
 [28]=787fc500 [29]=7fff6b68 [30]=787cc000 [31]=787d5344
 trap: pid 110(dig): sig 5: cause=0x24 epc=0x787d5344 va=0x787d5344
 registers:
 [ 0]=00000000 [ 1]=00000000 [ 2]=00000000 [ 3]=787cb85c
 [ 4]=00000003 [ 5]=7fff6b68 [ 6]=00000000 [ 7]=00000000
 [ 8]=00000001 [ 9]=00000000 [10]=00000006 [11]=77d00248
 [12]=00000001 [13]=00000000 [14]=00000000 [15]=787fc500
 [16]=787f47e0 [17]=7fff6c30 [18]=787f0000 [19]=0003b0f0
 [20]=787f4b90 [21]=787f47bc [22]=00000000 [23]=787f4b90
 [24]=0000012a [25]=787d26f0 [26]=00000000 [27]=00000000
 [28]=787fc500 [29]=7fff6b68 [30]=787cc000 [31]=787d5344
 trap: pid 110(dig): sig 5: cause=0x24 epc=0x787d26f0 va=0x787d26f0
 registers:
 [ 0]=00000000 [ 1]=00000000 [ 2]=00000001 [ 3]=ffffffffffffffff
 [ 4]=787f4a08 [ 5]=00000000 [ 6]=ffffffff80000001 [ 7]=00000000
 [ 8]=7fff6920 [ 9]=77b981d4 [10]=780e4a38 [11]=00000078
 [12]=00000001 [13]=00000000 [14]=00000000 [15]=787fc500
 [16]=77b9f148 [17]=00000001 [18]=00000001 [19]=787d26f0
 [20]=787f4ce8 [21]=787c3008 [22]=00000000 [23]=00000008
 [24]=0000106c [25]=787d26f0 [26]=00000000 [27]=00000000
 [28]=787fc500 [29]=7fff6958 [30]=00000000 [31]=787d636c


 But I can kill the process and exit gdb properly:

 (gdb) kill
 Kill the program being debugged? (y or n) y
 (gdb) quit
 > dig

 ; <<>> DiG 9.10.4-P3 <<>>
 ;; global options: +cmd
 ;; Got answer:
 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4234
 ;; flags: qr rd ra; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 27

 ;; OPT PSEUDOSECTION:
 ; EDNS: version: 0, flags:; udp: 4096
 ;; QUESTION SECTION:
 ;.                              IN      NS

 ;; ANSWER SECTION:
 .                       109337  IN      NS      a.root-servers.net.
 .                       109337  IN      NS      j.root-servers.net.
 .                       109337  IN      NS      c.root-servers.net.
 [..]


 Martin

Responsible-Changed-From-To: kern-bug-people->kamil
Responsible-Changed-By: kamil@NetBSD.org
Responsible-Changed-When: Sat, 07 Oct 2017 00:10:22 +0200
Responsible-Changed-Why:
Take.


State-Changed-From-To: open->feedback
State-Changed-By: kamil@NetBSD.org
State-Changed-When: Wed, 05 Jun 2019 01:03:07 +0200
State-Changed-Why:
Please test it now on NetBSD 8.99.42 with gdb or gdb.old. Is the problem gone now?


From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: kamil@netbsd.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
	martin@NetBSD.org
Subject: Re: kern/51420 (gdb hangs waiting for traced threaded child under
 compat_netbsd32)
Date: Wed, 5 Jun 2019 07:25:57 +0200

 Not easy to test right now, both gdb versions (or ptrace) are broken in
 -current on sparc64.

 Martin

From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/51420 (gdb hangs waiting for traced threaded child under
 compat_netbsd32)
Date: Wed, 5 Jun 2019 17:33:35 +0200

 On 05.06.2019 07:30, Martin Husemann wrote:
 > The following reply was made to PR kern/51420; it has been noted by GNAT=
 S.
 >
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@netbsd.org
 > Cc: kamil@netbsd.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
 > 	martin@NetBSD.org
 > Subject: Re: kern/51420 (gdb hangs waiting for traced threaded child und=
 er
 >  compat_netbsd32)
 > Date: Wed, 5 Jun 2019 07:25:57 +0200
 >
 >  Not easy to test right now, both gdb versions (or ptrace) are broken in
 >  -current on sparc64.
 >
 >  Martin
 >
 >

 This report is for MIPS.

 >Environment:
 System: NetBSD supremacy.duskware.de 7.99.35 NetBSD 7.99.35 (ERLITE)
 #170: Wed Aug 17 10:29:15 CEST 2016
 martin@night-owl.duskware.de:/usr/src/sys/arch/evbmips/compile/ERLITE
 evbmips
 Architecture: mips64eb
 Machine: evbmips

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/51420 (gdb hangs waiting for traced threaded child under
 compat_netbsd32)
Date: Wed, 5 Jun 2019 18:35:45 +0200

 Oops, confused by my too many PRs - will test and report!

 Martin

From: "Kamil Rytarowski" <kamil@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/51420 CVS commit: src/tests/lib/libc/sys
Date: Sun, 13 Oct 2019 04:05:39 +0000

 Module Name:	src
 Committed By:	kamil
 Date:		Sun Oct 13 04:05:39 UTC 2019

 Modified Files:
 	src/tests/lib/libc/sys: t_ptrace_wait.c

 Log Message:
 Enable TEST_LWP_ENABLED in t_ptrace_wait*

 The LWP events (created, exited) are now reliable in my local tests.

 PR kern/51420
 PR kern/51995


 To generate a diff of this commit:
 cvs rdiff -u -r1.135 -r1.136 src/tests/lib/libc/sys/t_ptrace_wait.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/51420 CVS commit: [netbsd-9] src
Date: Wed, 23 Oct 2019 19:25:39 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Wed Oct 23 19:25:39 UTC 2019

 Modified Files:
 	src/sys/kern [netbsd-9]: kern_sig.c sys_ptrace_common.c
 	src/tests/lib/libc/sys [netbsd-9]: t_ptrace_wait.c

 Log Message:
 Pull up following revision(s) (requested by kamil in ticket #366):

 	tests/lib/libc/sys/t_ptrace_wait.c: revision 1.136
 	sys/kern/kern_sig.c: revision 1.373
 	tests/lib/libc/sys/t_ptrace_wait.c: revision 1.138
 	tests/lib/libc/sys/t_ptrace_wait.c: revision 1.139
 	sys/kern/kern_sig.c: revision 1.376
 	tests/lib/libc/sys/t_ptrace_wait.c: revision 1.140
 	sys/kern/sys_ptrace_common.c: revision 1.64

 Fix typo in a comment

 Enable TEST_LWP_ENABLED in t_ptrace_wait*
 The LWP events (created, exited) are now reliable in my local tests.
 PR kern/51420
 PR kern/51995

 Remove the short-circuit lwp_exit() path from sigswitch()

 sigswitch() can be called from exit1() through:

    ttywait()->ttysleep()-> cv_timedwait_sig()->sleepq_block()->issignal()->sigswitch()

 lwp_exit() called for the last LWP triggers exit1() and this causes a panic.
 The debugger related signals have short-circuit demise paths in
 eventswitch() and other functions, before calling sigswitch().

 This change restores the original behavior, but there is an open question
 whether the kernel crash is a red herring of misbehavior of ttywait().
 This should fix PR kern/54618 by David H. Gutteridge

 Fix a race condition when handling concurrent LWP signals and add a test

 Fix a race condition that caused PT_GET_SIGINFO to return incorrect
 information when multiple signals were delivered concurrently
 to different LWPs.  Add a regression test that verifies that when 50
 threads concurrently use pthread_kill() on themselves, the debugger
 receives all signals with correct information.

 The kernel uses separate signal queues for each LWP.  However,
 the signal context used to implement PT_GET_SIGINFO is stored in 'struct
 proc' and therefore common to all LWPs in the process.  Previously,
 this member was filled in kpsignal2(), i.e. when the signal was sent.

 This meant that if another LWP managed to send another signal
 concurrently, the data was overwritten before the process was stopped.

 As a result, PT_GET_SIGINFO did not report the correct LWP and signal
 (it could even report a different signal than wait()).  This can be
 quite reliably reproduced with the number of 20 LWPs, however it can
 also occur with 10.

 This patch moves setting of signal context to issignal(), just before
 the process is actually stopped.  The data is taken from per-LWP
 or per-process signal queue.  The added test confirms that the debugger
 correctly receives all signals, and PT_GET_SIGINFO reports both correct
 LWP and signal number.
 Reviewed by kamil.

 Remove preprocessor switch TEST_VFORK_ENABLED in t_ptrace_wait*
 vfork(2) tests are now enabled always and confirmed to be stable.

 Remove preprocessor switch TEST_LWP_ENABLED in t_ptrace_wait*
 LWP tests are now enabled always and confirmed to be stable.


 To generate a diff of this commit:
 cvs rdiff -u -r1.364.2.7 -r1.364.2.8 src/sys/kern/kern_sig.c
 cvs rdiff -u -r1.58.2.8 -r1.58.2.9 src/sys/kern/sys_ptrace_common.c
 cvs rdiff -u -r1.131.2.5 -r1.131.2.6 src/tests/lib/libc/sys/t_ptrace_wait.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: feedback->closed
State-Changed-By: kamil@NetBSD.org
State-Changed-When: Thu, 06 Aug 2020 22:49:26 +0200
State-Changed-Why:
Assume fixed in the last 4 years. If there are still/new problems, please file a new bug report.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.