NetBSD Problem Report #44260

From gson@gson.org  Tue Dec 21 17:51:01 2010
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id BCD0F63B89C
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 21 Dec 2010 17:51:01 +0000 (UTC)
Message-Id: <20101221175057.B7D5A75E41@guava.gson.org>
Date: Tue, 21 Dec 2010 19:50:57 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@gnats.NetBSD.org
Subject: ddb stack trace from interrupt context is broken
X-Send-Pr-Version: 3.95

>Number:         44260
>Category:       kern
>Synopsis:       ddb stack trace from interrupt context is broken
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    port-i386-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Dec 21 17:55:00 +0000 2010
>Closed-Date:    
>Last-Modified:  Tue Apr 21 14:55:01 +0000 2020
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current
>Organization:
>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:

When entering ddb via an interrupt, such as when sending a break
sequence to a serial console, the ddb "trace" command only shows
the call stack as far back as the interrupt; it does not show the
functions that were executing when the interrupt occurred.

This used to work, albeit not very reliably.  Using automated binary
search, I have tracked down the regression to a set of commits by ad
on 2009.03.07.22.02.16 - 2009.03.07.22.02.17, with the commit message:

  Make ddb compile and work in userspace. Mostly this is comprised of three
  types of changes:

  - Add a few new methods to replace stuff like p_find(), CPU_INFO_FOREACH.

  - Use db_read_bytes() instead of accessing kernel structures directly,
    and similar changes.

  - Add ifdef _KERNEL where the above hasn't been done, and an XXX comment.

This is on i386 (emulated by qemu); other architectures may or may not
be affected.

Here's a couple of stack traces illustrating the problem:

  db{0}> trace
  breakpoint(0,c4091f3c,c04a6193,c42b3e3c,71,c0ed3000,c0ed4000,800,c42e67c6,c4091f
  1c) at netbsd:breakpoint+0x4
  comintr(c42b3d10,c4091f30,7,c0af0010,c3e80030,c0ad0010,c04a0010,c0a4e84c,c3e8ed2
  0,c4062d0c) at netbsd:comintr+0x5a6
  Xintr_ioapic_edge7() at netbsd:Xintr_ioapic_edge7+0xb5
  --- interrupt ---
  0
  db{0}> 

  db{0}> trace
  breakpoint(c0a4e800,dce,4d10ce23,c42b3e3c,71,c0ed3004,c0ed4000,800,c0afc2c6,c3e8
  ed20) at netbsd:breakpoint+0x4
  comintr(c42b3d10,c4062cc8,0,0,0,0,0,0,0,0) at netbsd:comintr+0x5a6
  DDB lost frame for netbsd:Xintr_ioapic_edge7+0xb5, trying 0xc4091f74
  Xintr_ioapic_edge7() at netbsd:Xintr_ioapic_edge7+0xb5
  fatal page fault in supervisor mode
  trap type 6 code 0 eip c024b860 cs 8 eflags 246 cr2 0 ilevel 8
  kernel: supervisor trap page fault, code=0
  Faulted in DDB; continuing...
  db{0}> 

For comparison, here is a stack trace from a working version:

  db{0}> trace
  breakpoint(0,3f8,5,c0499e8d,ca18c108,ca5aaa2c,71,c0f74010,c0f75000,800) at netbs
  d:breakpoint+0x4
  comintr(ca5aa910,ca17ecc8,0,0,0,0,0,0,0,0) at netbsd:comintr+0x5b5
  DDB lost frame for netbsd:Xintr_legacy4+0xbb, trying 0xca5a0f74
  Xintr_legacy4() at netbsd:Xintr_legacy4+0xbb
  --- interrupt ---
  --- switch to interrupt stack ---
  x86_stihlt(ca189c80,0,c098d240,ca189c80,c047c1e0,ca189c80,0,c01002e1,ca189c80,0)
   at netbsd:x86_stihlt+0x5
  idle_loop(ca189c80,0,c01002cd,0,c01002cd,0,0,0,0,0) at netbsd:idle_loop+0xe6
  db{0}> 

With builds from sources from before 2009.03.07.22.02.16, the stack
trace successfully reaches either the "main" or the "idle_loop"
function approximately half the time.  Using sources from
2009.03.07.22.02.17 or later, I have never seen the stack trace
reach either of those functions.

>How-To-Repeat:

pkg_add py-anita
anita --sets=kern-GENERIC,modules,base,etc interact http://nyftp.netbsd.org/pub/NetBSD-daily/HEAD/201012201100Z/i386/
[wait for a login prompt]
[press "control-a b" to send a break sequence the virtual serial console]
trace

>Fix:

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->haad
Responsible-Changed-By: haad@NetBSD.org
Responsible-Changed-When: Fri, 07 Jan 2011 08:06:21 +0000
Responsible-Changed-Why:
I will try to fix this problem, because I can talk with ad@ it might be easier
for me than for others.


From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/44260: ddb stack trace from interrupt context is broken
Date: Fri, 25 Nov 2011 19:13:08 +0200

 This bug has been fixed.  I'm not 100% sure which commit fixed it,
 but yamt's commit of src/sys/arch/i386/i386/db_machdep.c 1.3 with the
 commit message "fix backtrace of interrupt" on April 14, 2011 is a
 prime candidate.

 The test of sending a break sequence on a serial console at the login
 prompt now produces a full backtrace all the way to the idle loop
 about 30% of the time.  It still fails to do so about 70% of the time,
 but from March 2009 until April 2011, it failed 100% of the time, so
 this is progress, and since this PR was specifically about it failing
 100% of the time, it can now be closed.
 -- 
 Andreas Gustafsson, gson@gson.org

State-Changed-From-To: open->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Fri, 25 Nov 2011 17:57:54 +0000
State-Changed-Why:
Stack traces from interrupts context work again, at least some of the time


Responsible-Changed-From-To: haad->port-i386-maintainer
Responsible-Changed-By: gson@NetBSD.org
Responsible-Changed-When: Tue, 21 Apr 2020 14:46:50 +0000
Responsible-Changed-Why:
It's been nine years and the bug has been fixed inbetween.


State-Changed-From-To: closed->open
State-Changed-By: gson@NetBSD.org
State-Changed-When: Tue, 21 Apr 2020 14:53:56 +0000
State-Changed-Why:
It's broken again.


From: Andreas Gustafsson <gson@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/44260 (ddb stack trace from interrupt context is broken)
Date: Tue, 21 Apr 2020 17:50:42 +0300

 The problem originally reported in PR 44260 and fixed in 2011 is back.
 --
 Andreas Gustafsson, gson@NetBSD.org

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.