NetBSD Problem Report #43128

From www@NetBSD.org  Mon Apr  5 15:27:37 2010
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id D61D263B86C
	for <gnats-bugs@gnats.NetBSD.org>; Mon,  5 Apr 2010 15:27:37 +0000 (UTC)
Message-Id: <20100405152737.9C08263B11D@www.NetBSD.org>
Date: Mon,  5 Apr 2010 15:27:37 +0000 (UTC)
From: paul_koning@dell.com
Reply-To: paul_koning@dell.com
To: gnats-bugs@NetBSD.org
Subject: Threads support in ptrace() is insufficient for gdb to debug threaded live apps
X-Send-Pr-Version: www-1.0

>Number:         43128
>Category:       kern
>Synopsis:       Threads support in ptrace() is insufficient for gdb to debug threaded live apps
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Apr 05 15:30:00 +0000 2010
>Closed-Date:    Tue Apr 06 13:51:20 +0000 2010
>Last-Modified:  Fri Apr 09 06:00:03 +0000 2010
>Originator:     Paul Koning
>Release:        5.0
>Organization:
Dell
>Environment:
NetBSD pkvm50 5.0 NetBSD 5.0 (GENERIC) #34: Mon Apr  5 10:26:00 EDT 2010  root@pkvm50:/usr/obj/sys/arch/i386/compile/GENERIC i386
>Description:
I have a working implementation of thread debug support in GDB 7.0.

In the process of doing this I found that the ptrace syscall is missing a couple of features needed for debugging of threaded applications.  In particular, PT_STEP and PT_CONTINUE do not have a way to control which threads to operate on.

As discussed on the tech-kern list, what is needed is the ability to step or continue a single thread (leaving any others stopped) and also the ability to step a specified thread (while allowing other threads to continue).

I have implemented changes to sys_ptrace to provide these missing features.  With these changes, GDB can now successfully debug threaded programs (which will cure PR bin/30756).  I'm ready to submit the GDB changes to the FSF but I cannot do this unless the ptrace changes required for this are accepted by NetBSD.  I'm submitting my changes as part of this PR, and I propose these as the solution to the issue.  If a different solution is preferred, that's fine so long as it provides the same capabilities (then I can adjust GDB to match what is approved).

Part of the patch I include here is corrections to the ptrace manpage.  Partly this documents what I added, and partly it corrects inaccuracies in the PT_LWPINFO description.
>How-To-Repeat:
Attempt to debug a live app with GDB.
>Fix:
--- sys/sys/lwp.h.orig	2009-02-05 20:54:09.000000000 -0500
+++ sys/sys/lwp.h	2010-04-05 10:14:35.000000000 -0400
@@ -284,6 +284,7 @@
 int	lwp_trylock(lwp_t *);
 void	lwp_addref(lwp_t *);
 void	lwp_delref(lwp_t *);
+void	lwp_delref2(lwp_t *);
 void	lwp_drainrefs(lwp_t *);
 bool	lwp_alive(lwp_t *);
 lwp_t	*lwp_find_first(proc_t *);
@@ -293,6 +294,7 @@
 void	lwpinit(void);
 int 	lwp_wait1(lwp_t *, lwpid_t, lwpid_t *, int);
 void	lwp_continue(lwp_t *);
+void	lwp_unstop(lwp_t *);
 void	cpu_setfunc(lwp_t *, void (*)(void *), void *);
 void	startlwp(void *);
 void	upcallret(lwp_t *);
--- sys/kern/kern_sig.c.orig	2010-03-18 12:25:46.000000000 -0400
+++ sys/kern/kern_sig.c	2010-03-18 13:44:12.000000000 -0400
@@ -1674,9 +1674,10 @@

 	/*
 	 * If we are no longer being traced, or the parent didn't
-	 * give us a signal, look for more signals.
+	 * give us a signal, or we're stopping, look for more signals.
 	 */
-	if ((p->p_slflag & PSL_TRACED) == 0 || p->p_xstat == 0)
+	if ((p->p_slflag & PSL_TRACED) == 0 || p->p_xstat == 0 ||
+	    (p->p_sflag & PS_STOPPING) != 0)
 		return 0;

 	/*
--- sys/kern/kern_lwp.c.orig	2010-03-22 19:25:30.000000000 -0400
+++ sys/kern/kern_lwp.c	2010-04-05 10:14:41.000000000 -0400
@@ -354,6 +354,44 @@
 }

 /*
+ * Restart a stopped LWP.
+ *
+ * Must be called with p_lock held, and the LWP NOT locked.  Will unlock the
+ * LWP before return.
+ */
+void
+lwp_unstop(struct lwp *l)
+{
+	struct proc *p = l->l_proc;
+    
+	KASSERT(mutex_owned(proc_lock));
+	KASSERT(mutex_owned(p->p_lock));
+
+	lwp_lock(l);
+
+	/* If not stopped, then just bail out. */
+	if (l->l_stat != LSSTOP) {
+		lwp_unlock(l);
+		return;
+	}
+
+	p->p_stat = SACTIVE;
+	p->p_sflag &= ~PS_STOPPING;
+
+	if (!p->p_waited)
+		p->p_pptr->p_nstopchild--;
+
+	if (l->l_wchan == NULL) {
+		/* setrunnable() will release the lock. */
+		setrunnable(l);
+	} else {
+		l->l_stat = LSSLEEP;
+		p->p_nrlwps++;
+		lwp_unlock(l);
+	}
+}
+
+/*
  * Wait for an LWP within the current process to exit.  If 'lid' is
  * non-zero, we are waiting for a specific LWP.
  *
@@ -1368,11 +1406,25 @@
 	struct proc *p = l->l_proc;

 	mutex_enter(p->p_lock);
+	lwp_delref2(l);
+	mutex_exit(p->p_lock);
+}
+
+/*
+ * Remove one reference to an LWP.  If this is the last reference,
+ * then we must finalize the LWP's death.  The proc mutex is held
+ * on entry.
+ */
+void
+lwp_delref2(struct lwp *l)
+{
+	struct proc *p = l->l_proc;
+
+	KASSERT(mutex_owned(p->p_lock));
 	KASSERT(l->l_stat != LSZOMB);
 	KASSERT(l->l_refcnt > 0);
 	if (--l->l_refcnt == 0)
 		cv_broadcast(&p->p_lwpcv);
-	mutex_exit(p->p_lock);
 }

 /*
--- sys/kern/sys_process.c.orig	2010-03-18 12:25:53.000000000 -0400
+++ sys/kern/sys_process.c	2010-04-05 10:15:24.000000000 -0400
@@ -155,7 +155,7 @@
 		syscallarg(int) data;
 	} */
 	struct proc *p = l->l_proc;
-	struct lwp *lt;
+	struct lwp *lt, *lt2;
 	struct proc *t;				/* target process */
 	struct uio uio;
 	struct iovec iov;
@@ -163,7 +163,8 @@
 	struct ptrace_lwpinfo pl;
 	struct vmspace *vm;
 	int error, write, tmp, req, pheld;
-	int signo;
+	int signo = 0;
+	int resume_all;
 	ksiginfo_t ksi;
 #ifdef COREDUMP
 	char *path;
@@ -393,6 +394,7 @@
 	write = 0;
 	*retval = 0;
 	tmp = 0;
+	resume_all = 1;

 	switch (req) {
 	case  PT_TRACE_ME:
@@ -528,6 +530,44 @@
 		p->p_trace_enabled = trace_is_enabled(p);

 		/*
+		 * Pick up the LWPID, if supplied.  There are two cases:
+		 * data < 0 : step or continue single thread, lwp = -data
+		 * data > 0 in PT_STEP : step this thread, continue others
+		 * For operations other than PT_STEP, data > 0 means
+		 * data is the signo to deliver to the process.
+		 */
+		tmp = SCARG(uap, data);
+		if (tmp >= 0) {
+#ifdef PT_STEP
+			if (req == PT_STEP)
+				signo = 0;
+			else
+#endif
+			{
+				signo = tmp;
+				tmp = 0;	/* don't search for LWP */
+			}
+		}
+		else
+			tmp = -tmp;
+		
+		if (tmp > 0) {
+			if (req == PT_DETACH) {
+				error = EINVAL;
+				break;
+			}
+			lwp_delref2 (lt);
+			lt = lwp_find(t, tmp);
+			if (lt == NULL) {
+				error = ESRCH;
+				break;
+			}
+			lwp_addref(lt);
+			resume_all = 0;
+			signo = 0;
+		}
+			
+		/*
 		 * From the 4.4BSD PRM:
 		 * "The data argument is taken as a signal number and the
 		 * child's execution continues at location addr as if it
@@ -540,7 +580,7 @@
 		 */

 		/* Check that the data is a valid signal number or zero. */
-		if (SCARG(uap, data) < 0 || SCARG(uap, data) >= NSIG) {
+		if (signo < 0 || signo >= NSIG) {
 			error = EINVAL;
 			break;
 		}
@@ -557,7 +597,17 @@
 #ifdef PT_STEP
 		/*
 		 * Arrange for a single-step, if that's requested and possible.
+		 * More precisely, set the single step status as requested for
+		 * the requested thread, and clear it for other threads.
 		 */
+		LIST_FOREACH(lt2, &t->p_lwps, l_sibling) {
+			if (lt != lt2) 
+			{
+				lwp_lock(lt2);
+				process_sstep(lt2, 0);
+				lwp_unlock(lt2);
+			}
+		}
 		error = process_sstep(lt, req == PT_STEP);
 		if (error) {
 			uvm_lwp_rele(lt);
@@ -579,8 +629,6 @@
 			/* not being traced any more */
 			t->p_opptr = NULL;
 		}
-
-		signo = SCARG(uap, data);
 	sendsig:
 		/* Finally, deliver the requested signal (or none). */
 		if (t->p_stat == SSTOP) {
@@ -590,7 +638,10 @@
 			 * an LWP runs to see it.
 			 */
 			t->p_xstat = signo;
-			proc_unstop(t);
+			if (resume_all)
+				proc_unstop(t);
+			else
+				lwp_unstop(lt);
 		} else if (signo != 0) {
 			KSI_INIT_EMPTY(&ksi);
 			ksi.ksi_signo = signo;
--- lib/libc/sys/ptrace.2.orig	2010-04-05 10:23:27.000000000 -0400
+++ lib/libc/sys/ptrace.2	2010-04-05 10:33:17.000000000 -0400
@@ -149,7 +149,9 @@
 to indicate that execution is to pick up where it left off.
 .Fa data
 provides a signal number to be delivered to the traced process as it
-resumes execution, or 0 if no signal is to be sent.
+resumes execution, or 0 if no signal is to be sent.  If a negative
+value is supplied, that is the negative of the LWP ID of the thread to
+be resumed, and only that thread executes.
 .It Dv PT_KILL
 The traced process terminates, as if
 .Dv PT_CONTINUE
@@ -256,8 +258,8 @@
 call currently does not stop the child process so it can generate
 inconsistent data.
 .It Dv PT_LWPINFO
-Returns information about the specific thread from the process specified
-in the
+Returns information about a thread from the list of threads for the
+process specified in the
 .Fa pid
 argument.
 The
@@ -274,8 +276,15 @@
 .Pp
 where
 .Fa pl_lwpid
-contains the thread for which to get info.
+contains a thread LWP ID.  Information is returned for the thread
+following the one with the specified ID in the process thread list,
+or for the first thread if
+.Fa pl_lwpid
+is 0.
 Upon return
+.Fa pl_lwpid
+contains the LWP ID of the thread that was found, or 0 if there is
+no thread after the one whose LWP ID was supplied in the call.
 .Fa pl_event
 contains the event that stopped the thread.
 Possible
@@ -303,6 +312,14 @@
 Execution continues as in request PT_CONTINUE; however
 as soon as possible after execution of at least one
 instruction, execution stops again.
+If the
+.Fa data
+argument is greater than 0, it contains the LWP ID of the thread to be
+stepped, and any other threads are continued.  If the 
+.Fa data
+argument is less than zero, it contains the negative of the LWP ID of
+the
+thread to be stepped, and only that thread executes.
 .It Dv PT_GETREGS
 This request reads the traced process' machine registers into the
 .Dq Li "struct reg"
@@ -310,6 +327,10 @@
 .Aq Pa machine/reg.h )
 pointed to by
 .Fa addr .
+The 
+.Fa data
+argument contains the LWP ID of the thread whose registers are to
+be read.  If zero is supplied, the first thread of the process is read.
 .It Dv PT_SETREGS
 This request is the converse of
 .Dv PT_GETREGS ;
@@ -319,6 +340,10 @@
 .Aq Pa machine/reg.h )
 pointed to by
 .Fa addr .
+The 
+.Fa data
+argument contains the LWP ID of the thread whose registers are to
+be written.  If zero is supplied, the first thread of the process is written.
 .It Dv PT_GETFPREGS
 This request reads the traced process' floating-point registers into
 the
@@ -327,6 +352,11 @@
 .Aq Pa machine/reg.h )
 pointed to by
 .Fa addr .
+The 
+.Fa data
+argument contains the LWP ID of the thread whose registers are to
+be read.  If zero is supplied, the first thread of the process is 
+read.
 .It Dv PT_SETFPREGS
 This request is the converse of
 .Dv PT_GETFPREGS ;
@@ -336,6 +366,11 @@
 .Aq Pa machine/reg.h )
 pointed to by
 .Fa addr .
+The 
+.Fa data
+argument contains the LWP ID of the thread whose registers are to
+be written.  If zero is supplied, the first thread of the process is 
+written.
 .\" .It Dv PT_SYSCALL
 .\" This request is like
 .\" .Dv PT_CONTINUE

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->closed
State-Changed-By: christos@NetBSD.org
State-Changed-When: Tue, 06 Apr 2010 09:51:20 -0400
State-Changed-Why:
patch applied, many thanks!


From: Christos Zoulas <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/43128 CVS commit: src
Date: Tue, 6 Apr 2010 09:50:22 -0400

 Module Name:	src
 Committed By:	christos
 Date:		Tue Apr  6 13:50:22 UTC 2010

 Modified Files:
 	src/lib/libc/sys: ptrace.2
 	src/sys/kern: kern_lwp.c kern_sig.c sys_process.c
 	src/sys/sys: lwp.h

 Log Message:
 PR/43128: Paul Koning: Threads support in ptrace() is insufficient for gdb to
 debug threaded live apps: Add an optional lwpid in PT_STEP and PT_CONTINUE to
 indicate which lwp to operate on, and implement the glue required to make it
 work.


 To generate a diff of this commit:
 cvs rdiff -u -r1.31 -r1.32 src/lib/libc/sys/ptrace.2
 cvs rdiff -u -r1.141 -r1.142 src/sys/kern/kern_lwp.c
 cvs rdiff -u -r1.304 -r1.305 src/sys/kern/kern_sig.c
 cvs rdiff -u -r1.153 -r1.154 src/sys/kern/sys_process.c
 cvs rdiff -u -r1.128 -r1.129 src/sys/sys/lwp.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Takahiro Kambe <taca@back-street.net>
To: gnats-bugs@NetBSD.org, christos@NetBSD.org
Cc: 
Subject: Re: kern/43128 (Threads support in ptrace() is insufficient for
 gdb to debug threaded live apps)
Date: Wed, 07 Apr 2010 14:37:28 +0900 (JST)

 In message <20100406135122.1450C63B873@www.NetBSD.org>
 	on Tue,  6 Apr 2010 13:51:22 +0000 (UTC),
 	christos@NetBSD.org wrote:
 > State-Changed-From-To: open->closed
 > State-Changed-By: christos@NetBSD.org
 > State-Changed-When: Tue, 06 Apr 2010 09:51:20 -0400
 > State-Changed-Why:
 > patch applied, many thanks!
 Since original problem was on NetBSD 5.0, it should be pull-up to
 netbsd-5 branch?

 Best regards.

 -- 
 Takahiro Kambe <taca@back-street.net>

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/43128 CVS commit: src
Date: Fri, 9 Apr 2010 05:56:50 +0000

 On Tue, Apr 06, 2010 at 01:55:02PM +0000, Christos Zoulas wrote:
  > PR/43128: Paul Koning: Threads support in ptrace() is insufficient
  > for gdb to debug threaded live apps: Add an optional lwpid in
  > PT_STEP and PT_CONTINUE to indicate which lwp to operate on, and
  > implement the glue required to make it work.

 Does this close PR 30756?

 -- 
 David A. Holland
 dholland@netbsd.org

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.