NetBSD Problem Report #42992

From www@NetBSD.org  Wed Mar 17 19:15:27 2010
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id E567B63B86C
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 17 Mar 2010 19:15:26 +0000 (UTC)
Message-Id: <20100317191526.B0E9063B11D@www.NetBSD.org>
Date: Wed, 17 Mar 2010 19:15:26 +0000 (UTC)
From: paul_koning@dell.com
Reply-To: paul_koning@dell.com
To: gnats-bugs@NetBSD.org
Subject: KGDB does not work once interrupts are enabled
X-Send-Pr-Version: www-1.0

>Number:         42992
>Category:       kern
>Synopsis:       KGDB does not work once interrupts are enabled
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Mar 17 19:20:00 +0000 2010
>Last-Modified:  Thu Mar 18 22:45:01 +0000 2010
>Originator:     Paul Koning
>Release:        5.0
>Organization:
Dell
>Environment:
NetBSD pkvm50 5.0 NetBSD 5.0 (GENERIC) #9: Wed Mar 17 12:11:58 EDT 2010  root@pkvm50:/usr/obj/sys/arch/i386/compile/GENERIC i386

>Description:
kgdb uses polled I/O for talking to the remote gdb.  Early on that works fine, but once interrupts are enabled, incoming characters trigger an interrupt to the com port driver so it becomes a race between that driver and the polled driver to see who gets to grab the character.  The polled driver loses most of the time, and even a single lost byte is enough to mess up the remote protocol.  The result is that KGDB is simply non-functional once the kernel switches to interrupt driven I/O.

>How-To-Repeat:
Build a kernel with KGDB.  Start it with -d, attach kgdb, set a breakpoint at some spot that will be hit once the OS is up.  Continue.  The breakpoint will be hit, but then gdb will not communicate successfully (in particular, fetching registers after the break signal fails) so it appears hung.  Running gdb with "set debug remote 1" shows the failure to communicate fairly well, and turning on DEBUG_KGDB in kgdb_stub.c shows it even more clearly (you see partial gdb packets instead of complete ones -- the missing bytes were sucked up by comintr().)

>Fix:
The solution is to put  a splserial() early in kgdb_trap with the matching splx() at the two exits.  This cures the problem and makes KGDB work reliably.

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/42992: KGDB does not work once interrupts are enabled
Date: Wed, 17 Mar 2010 19:22:55 +0000

 On Wed, Mar 17, 2010 at 07:20:01PM +0000, paul_koning@dell.com wrote:
  > The solution is to put a splserial() early in kgdb_trap with the
  > matching splx() at the two exits.  This cures the problem and makes
  > KGDB work reliably.

 That's not going to fix it on a multiprocessor machine though, at
 least in general. Does this matter? I don't remember if kgdb is
 supposed to be multiprocessor-safe.

 -- 
 David A. Holland
 dholland@netbsd.org

From: "Paul Koning" <Paul_Koning@Dell.com>
To: <gnats-bugs@NetBSD.org>,
	<kern-bug-people@netbsd.org>,
	<gnats-admin@netbsd.org>,
	<netbsd-bugs@netbsd.org>
Cc: 
Subject: RE: kern/42992: KGDB does not work once interrupts are enabled
Date: Wed, 17 Mar 2010 15:37:05 -0400

 Agreed, it's a partial fix, but a useful one.

 The most natural way for KGDB to behave is that a breakpoint stops all
 CPUs.  So the full fix would mean telling the other CPUs to stop.  I
 don't know how that is done, or if there is a way to do that.

 	paul

 > -----Original Message-----
 > From: David Holland [mailto:dholland-bugs@netbsd.org]
 > Sent: Wednesday, March 17, 2010 3:25 PM
 > To: kern-bug-people@netbsd.org; gnats-admin@netbsd.org; netbsd-
 > bugs@netbsd.org; Paul Koning
 > Subject: Re: kern/42992: KGDB does not work once interrupts are
 enabled
 >=20
 > The following reply was made to PR kern/42992; it has been noted by
 > GNATS.
 >=20
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@NetBSD.org
 > Cc:
 > Subject: Re: kern/42992: KGDB does not work once interrupts are
 enabled
 > Date: Wed, 17 Mar 2010 19:22:55 +0000
 >=20
 >  On Wed, Mar 17, 2010 at 07:20:01PM +0000, paul_koning@dell.com wrote:
 >   > The solution is to put a splserial() early in kgdb_trap with the
 >   > matching splx() at the two exits.  This cures the problem and
 makes
 >   > KGDB work reliably.
 >=20
 >  That's not going to fix it on a multiprocessor machine though, at
 >  least in general. Does this matter? I don't remember if kgdb is
 >  supposed to be multiprocessor-safe.
 >=20
 >  --
 >  David A. Holland
 >  dholland@netbsd.org
 >=20

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: kern/42992: KGDB does not work once interrupts are enabled
Date: Thu, 18 Mar 2010 10:34:56 +1100

 kgdb entry should be changed to do this, it sounds like:

 	- tell all other cpus to pause
 	- go splhigh
 	- call into the normal loop.

 i would look at how DDB handles multiple CPUs.


 .mrg.

From: "Paul Koning" <Paul_Koning@Dell.com>
To: "matthew green" <mrg@eterna.com.au>,
	<gnats-bugs@NetBSD.org>
Cc: <kern-bug-people@netbsd.org>,
	<gnats-admin@netbsd.org>,
	<netbsd-bugs@netbsd.org>
Subject: RE: kern/42992: KGDB does not work once interrupts are enabled
Date: Thu, 18 Mar 2010 14:57:26 -0400

 It looks like DDB doesn't handle multiple CPUs.  More precisely, it
 displays the CPU number in its prompt, but I don't see any code to stop
 other CPUs.

 	paul

 > -----Original Message-----
 > From: netbsd-bugs-owner@NetBSD.org [mailto:netbsd-bugs-
 > owner@NetBSD.org] On Behalf Of matthew green
 > Sent: Wednesday, March 17, 2010 7:35 PM
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@netbsd.org; gnats-admin@netbsd.org; netbsd-
 > bugs@netbsd.org
 > Subject: re: kern/42992: KGDB does not work once interrupts are
 enabled
 >=20
 >=20
 > kgdb entry should be changed to do this, it sounds like:
 >=20
 > 	- tell all other cpus to pause
 > 	- go splhigh
 > 	- call into the normal loop.
 >=20
 > i would look at how DDB handles multiple CPUs.
 >=20
 >=20
 > .mrg.

From: matthew green <mrg@eterna.com.au>
To: "Paul Koning" <Paul_Koning@Dell.com>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, gnats-bugs@NetBSD.org
Subject: re: kern/42992: KGDB does not work once interrupts are enabled
Date: Fri, 19 Mar 2010 09:40:58 +1100

    It looks like DDB doesn't handle multiple CPUs.  More precisely, it
    displays the CPU number in its prompt, but I don't see any code to stop
    other CPUs.

 look in eg i386/db_interface.c for db_suspend_others() and
 db_resume_others()..

 the same method is used on many of our MP platforms.


 .mrg.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.