NetBSD Problem Report #10313

Received: (qmail 22384 invoked from network); 7 Jun 2000 22:01:11 -0000
Message-Id: <200006072159.RAA00728@zorkmid.mit.edu>
Date: Wed, 7 Jun 2000 17:59:45 -0400 (EDT)
From: John Hawkinson <jhawk@mit.edu>
Reply-To: jhawk@mit.edu
To: gnats-bugs@gnats.netbsd.org
Subject: gdb can't trace through trap()
X-Send-Pr-Version: 3.95

>Number:         10313
>Category:       port-i386
>Synopsis:       gdb can't trace through trap()
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-i386-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jun 07 22:02:00 +0000 2000
>Closed-Date:    Sun Jun 11 07:51:22 +0000 2000
>Last-Modified:  Sun Jun 11 07:51:22 +0000 2000
>Originator:     John Hawkinson
>Release:        NetBSD 1.4.2
>Organization:
MIT
>Environment:

System: NetBSD zorkmid.mit.edu 1.4ZA NetBSD 1.4ZA (ZORKMID-$Revision: 1.13 $) #180: Wed Jun 7 16:31:23 EDT 2000 jhawk@zorkmid.mit.edu:/usr/local/current-src/sys/arch/i386/compile/ZORKMID i386


>Description:
	gdb (a) can't seem to print out kernel stack traces through
trap() calls. (b) It's frame manipulation/selection seems to
broken as well, as "backtrace" only seems to give the original
traceback.

>How-To-Repeat:
	Crash your kernel and get a dump. Then:

zorkmid# gdb netbsd.29
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386--netbsd"...(no debugging symbols found)...
(gdb) target kcore netbsd.29.core
panic: trap
#0  0x100 in ?? ()
(gdb) where
#0  0x100 in ?? ()
#1  0xc03049cb in cpu_reboot ()
#2  0xc01bc97d in panic ()
#3  0xc030dd2d in trap ()
(gdb) 

	This is almost totally useless. Of course, we can steal
the trace from the message buffer now that it's there:

(gdb) printf "%s\n",msgbufp+*(msgbufp+4)-1000
y0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
uvm_fault(0xc04fd728, 0xc57d9000, 0, 1) -> 1
fatal page fault in supervisor mode
trap type 6 code 0 eip c0457d9a cs 8 eflags 10202 cr2 c57d9000 cpl 0
panic: trap
Begin traceback...
_trap() at _trap+0x1e5
--- trap (number 6) ---
_pcmcia_scan_cis(c0746200,c04594fc,c5633ed0,ffffffff,0) at _pcmcia_scan_cis+0x1a6
_pcmcia_read_cis(c0746200,c0746aac,c072f380,c072f380,ffffffff) at _pcmcia_read_cis+0x9c
_pcmcia_card_attach(c0746200) at _pcmcia_card_attach+0x27
_cardslot_event_thread(c072f380) at _cardslot_event_thread+0x1e9
End traceback...
syncing disks... 1 1 done

dumping to dev 0,1 offset 396196
dump 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 
(gdb) 

	But really, that's just too sick.  That was (a). As for (b),
well, we presume that the problem is that trap() stores its base
pointer differently from a normal frame. So we should be able to pull
it out and hand it to gdb and have gdb just do the right thing. So if
we go to the trap() frame:

(gdb) where
#0  0x100 in ?? ()
#1  0xc03049cb in cpu_reboot ()
#2  0xc01bc97d in panic ()
#3  0xc030dd2d in trap ()
(gdb) frame 3
#3  0xc030dd2d in trap ()
(gdb) info frame
Stack level 3, frame at 0xc5633c2c:
 eip = 0xc030dd2d in trap; saved eip 0xc0100f29
 caller of frame at 0xc5633bec
 Arglist at 0xc5633c2c, args: 
 Locals at 0xc5633c2c, Previous frame's sp is 0x0
 Saved registers:
  ebx at 0xc5633bf8, ebp at 0xc5633c2c, esi at 0xc5633bfc, edi at 0xc5633c00,
  eip at 0xc5633c30
(gdb) 

	Incidently, why does "ebp" match up with the location of the
frame? Shouldn't ebp be the pointer to the calling frame, not the
current one? It's like this for frame 2 as well, so it's not just
a fluke with the trap frame. According to 'struct trapframe':

    86  struct trapframe {
    87          int     tf_es;
    88          int     tf_ds;
    89          int     tf_edi;
    90          int     tf_esi;
    91          int     tf_ebp;
    92          int     tf_ebx;
    93          int     tf_edx;
    94          int     tf_ecx;
    95          int     tf_eax;
    96          int     tf_trapno;
    97          /* below portion defined in 386 hardware */
    98          int     tf_err;
    99          int     tf_eip;
 ...

and a regular frame would be:

    68  struct i386_frame {
    69          struct i386_frame       *f_frame;
    70          int                     f_retaddr;
    71          int                     f_arg0;

We should get the called frame as fp+(4*4):

(gdb) x/x 0xc5633c2c+(4*4)
0xc5633c3c:     0xc5633f40
(gdb) frame 0xc5633f40
#0  0x0 in ?? ()
(gdb) where
#0  0x100 in ?? ()
#1  0xc03049cb in cpu_reboot ()
#2  0xc01bc97d in panic ()
#3  0xc030dd2d in trap ()
(gdb) info frame
Stack level 0, frame at 0xc5633f40:
 eip = 0x0; saved eip 0xc0456797
 (FRAMELESS), called by frame at 0xc5633f40
 Arglist at 0xc5633f40, args: 
 Locals at 0xc5633f40, Previous frame's sp is 0x0
 Saved registers:
  ebp at 0xc5633f40, eip at 0xc5633f44

	Notice that "where" completely ignored the
specified frame. "info frame" declares the current frame as level 0,
as it should when we've specified arbitrarily. I suppose perhaps
the problem is that where is trying to follow the frame up by es
and losing.

Investigating further, if we print out the frame for trap():

(gdb) x/12x 0xc5633c2c
0xc5633c2c:     0xc5633eb8      0xc0100f29      0xc0740010      0x00000010
0xc5633c3c:     0xc5633f40      0x000000cd      0xc5633eb8      0x0000000d
0xc5633c4c:     0x00000030      0xc57d8000      0x00001000      0x00000006
(gdb) info symbol 0xc0100f29
calltrap + 11 in section .text

It's clear that 0x6 is not a valid eip, so this must be
a regular frame. So why didn't gdb agree? Moving up the stack:

(gdb) x/12x 0xc5633eb8
0xc5633eb8:     0xc5633f40      0xc0457924      0xc0746200      0xc04594fc
0xc5633ec8:     0xc5633ed0      0xffffffff      0x00000000      0x00000000
0xc5633ed8:     0x00000000      0x00000000      0x00000000      0x00000000
(gdb) info symbol 0xc0457924
pcmcia_read_cis + 156 in section .text

Zoinks! Also not a trapframe. What gives? If everything is a regular
frame, then this should all Just Work(tm), right?

(gdb) x/12x 0xc5633f40
0xc5633f40:     0xc5633f6c      0xc0456797      0xc0746200      0xc0746aac
0xc5633f50:     0xc072f380      0xc072f380      0xffffffff      0x00000000
0xc5633f60:     0xc5633f94      0xc0449d6b      0xc072f3c4      0xc5633f94
(gdb) info symbol 0xc0456797
pcmcia_card_attach + 39 in section .text
(gdb) info symbol 0xc5633f94
No symbol matches 0xc5633f94.

(gdb) x/12x 0xc5633f6c
0xc5633f6c:     0xc5633f94      0xc0449ed9      0xc0746200      0xc072f380
0xc5633f7c:     0xc0449cf0      0x00000000      0x00000000      0xc072f3c4
0xc5633f8c:     0x00000000      0xc07b69b0      0x00000000      0xc010034b
(gdb) info symbol 0xc0449ed9
cardslot_event_thread + 489 in section .text
(gdb) info symbol 0xc010034b
proc_trampoline + 3 in section .text

And here we come to an end:

(gdb) x/12x 0xc5633f94
0xc5633f94:     0x00000000      0xc010034b      0xc072f380      0x0000c063
0xc5633fa4:     0x00000000      0x00000000      0xc0100345      0x00000000
0xc5633fb4:     0x00000000      0x00000000      0x00000000      0x00000000
(gdb) info symbol 0xc010034b
proc_trampoline + 3 in section .text

If we assume the previous frame was a trapframe instead, we get garbage:

(gdb) x/12x 0xc0449cf0
0xc0449cf0 <cardslot_event_thread>:     0x83e58955      0x565710ec      0x08758b53      0x01f845c7
0xc0449d00 <cardslot_event_thread+16>:  0x83000000      0x0f00407e      0x00029b84      0x444e8d00
0xc0449d10 <cardslot_event_thread+32>:  0x90f44d89      0xfa80158b      0xd089c050      0xaee0050b
(gdb) info symbol 0x565710ec
No symbol matches 0x565710ec.
(gdb) info symbol 0xaee0050b
No symbol matches 0xaee0050b.
(gdb) 

But I suppose this makes sense as the end of the stack,
and it is what ddb prints.


>Fix:
	I suppose I can write a gdb script for printing out stack
frames traces. It would probably be useful even if gdb were fixed. I
guess I'll do up something and check it in under
/sys/arch/i386/gdbscripts/; Sigh...
>Release-Note:
>Audit-Trail:

From: John Hawkinson <jhawk@netbsd.org>
To: source-changes@netbsd.org
Cc:  
Subject: Re: port-i386/10313: gdb can't trace through trap()
Date: Wed, 7 Jun 2000 20:15:41 -0700 (PDT)

 Module Name:	syssrc
 Committed By:	jhawk
 Date:		Thu Jun  8 03:15:40 UTC 2000

 Added Files:
 	syssrc/sys/arch/i386/gdbscripts: stack

 Log Message:
 gdb script for backtracing an i386 kernel stack.
 Useful when "where" in gdb fails to cross trap()s,
 e.g. port-i386/10313


 To generate a diff of this commit:
 cvs rdiff -r0 -r1.1 syssrc/sys/arch/i386/gdbscripts/stack

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

 --[64191]--
State-Changed-From-To: open->closed 
State-Changed-By: jhawk 
State-Changed-When: Sun Jun 11 00:50:59 PDT 2000 
State-Changed-Why:  
Dupe of 9367 
>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.