NetBSD Problem Report #52560
From gson@gson.org Tue Sep 19 17:49:45 2017
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id BA3B87A21F
for <gnats-bugs@gnats.NetBSD.org>; Tue, 19 Sep 2017 17:49:45 +0000 (UTC)
Message-Id: <20170919174939.A2678989281@guava.gson.org>
Date: Tue, 19 Sep 2017 20:49:39 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: gdb kernel backtrace fails to show function where trap occurred
X-Send-Pr-Version: 3.95
>Number: 52560
>Category: port-i386
>Synopsis: gdb kernel backtrace fails to show function where trap occurred
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: port-i386-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Sep 19 17:50:00 +0000 2017
>Closed-Date:
>Last-Modified: Sat Feb 24 13:15:00 +0000 2018
>Originator: Andreas Gustafsson
>Release: NetBSD-current, source date 2017.09.19.02.44.14
>Organization:
>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:
When an i386 kernel has crashed due to an errant pointer and dumped
core, and you examine the core file using gdb, the backtrace correctly
displays every function in the call stack *except* the one where the
error actually occurred.
For example, if I modify sys_reboot() so that it deliberatly
dereferences an invalid pointer, and then invoke the reboot syscall,
at the time of the crash the console will correctly display a
backtrace that includes sys_reboot():
panic: trap
cpu0: Begin traceback...
vpanic(c109de38,c8f7cd78,c8f7cd78,c8f7ce04,c011f7d3,c109de38,c8f7ce10,c8f7ce10,2,e) at netbsd:vpanic+0x1bb
vpanic(c109de38,c8f7ce10,c8f7ce10,2,e,c8f7ce10,c8f7cdb4,c1f5a1e4,c8f7a000,c0bc739b) at netbsd:vpanic
trap() at netbsd:trap+0x27a
--- trap (number 6) ---
sys_reboot(c2086540,c8f7cf74,c8f7cf6c,ffff0ff0,c8f7cf3c,c01675de,c16b0e20,c2086540,c8f7cf74,c8f7cf6c) at netbsd:sys_reboot+0xa9
sy_call(c16b0e20,c2086540,c8f7cf74,c8f7cf6c,c016773d,86540,c2086540,c8f7cf9c,c0167885,c16b0e20) at c016750e
sy_invoke(c16b0e20,c2086540,c8f7cf74,c8f7cf6c,d0,0,c2086540,c1f5a1e4,d0,c16b0e20) at netbsd:sy_invoke+0xbb
syscall() at netbsd:syscall+0xd7
--- syscall (number 208) ---
bab5a397:
cpu0: End traceback...
but when later examining the crash dump with gdb, it displays an
incorrect address and a "??" in place of sys_reboot():
#4 0xc011f7d3 in trap (frame=0xc8f7ce10)
at /usr/src/sys/arch/i386/i386/trap.c:324
#5 0xc011400f in alltraps ()
#6 0xc8f7ce10 in ?? ()
#7 0xc016750e in sy_call (sy=0xc16b0e20 <sysent+4160>, l=0xc2086540,
I am marking this bug as critical because it is making a large class
of kernel bugs much harder to fix.
I first noticed this problem while filing PR 52553. Since the
prodcedure for reproducing that issue requires specific hardware
(athn), below is an alternative procedure that does not.
>How-To-Repeat:
Apply the following patch:
Index: src/sys/kern/kern_xxx.c
===================================================================
RCS file: /bracket/repo/src/sys/kern/kern_xxx.c,v
retrieving revision 1.73
diff -u -r1.73 kern_xxx.c
--- src/sys/kern/kern_xxx.c 29 Oct 2015 00:27:08 -0000 1.73
+++ src/sys/kern/kern_xxx.c 19 Sep 2017 13:57:19 -0000
@@ -67,6 +67,10 @@
0, NULL, NULL, NULL)) != 0)
return (error);
+ /* Abuse AB_DEBUG for testing trap handling */
+ if ((SCARG(uap, opt) & AB_DEBUG))
+ *((char *)1) = 0;
+
/*
* Only use the boot string if RB_STRING is set.
*/
Build an i386 release with build.sh -V MKDEBUG=yes -V COPTS=-g.
(Or just a kernel; I built a full release because that's what
I have fully automated).
Install it, boot it, log in as root, and issue the command "reboot -x".
This will cause a trap in sys_reboot(), a core dump, and a reboot.
Log in as root again and issue the commands
cd /var/crash
gunzip *.gz
gdb /netbsd
(gdb) target kvm netbsd.0.core
(gdb) bt
Notice how sys_reboot() does not appear in the backtrace.
>Fix:
>Release-Note:
>Audit-Trail:
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-i386/52560: gdb kernel backtrace fails to show function where trap occurred
Date: Wed, 20 Sep 2017 16:24:34 +0300
Here's a couple more pieces of information from running some
additional tests:
1. The problem is not new; a system built from sources dated
2012.06.09.22.49.18 fails the same way.
2. The amd64 port is not affected.
--
Andreas Gustafsson, gson@gson.org
From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52560 CVS commit: src/sys/arch
Date: Fri, 23 Feb 2018 09:00:56 +0000
Module Name: src
Committed By: maxv
Date: Fri Feb 23 09:00:56 UTC 2018
Modified Files:
src/sys/arch/amd64/conf: Makefile.amd64
src/sys/arch/i386/conf: Makefile.i386
Log Message:
Add -fno-shrink-wrap, to force GCC to push the frames at the very beginning
of the functions. Otherwise DDB is unable to display a correct stack trace
if a fault occurred in a function before the frame was pushed.
Discussed on tech-kern@, flag suggested by Krister Walfridsson. Should fix
PR/52560.
To generate a diff of this commit:
cvs rdiff -u -r1.64 -r1.65 src/sys/arch/amd64/conf/Makefile.amd64
cvs rdiff -u -r1.187 -r1.188 src/sys/arch/i386/conf/Makefile.i386
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Fri, 23 Feb 2018 09:16:51 +0000
State-Changed-Why:
Can you re-test with the fix I committed?
State-Changed-From-To: feedback->open
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Fri, 23 Feb 2018 18:12:18 +0000
State-Changed-Why:
back to open
From: Andreas Gustafsson <gson@gson.org>
To: maxv@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-i386/52560 (gdb kernel backtrace fails to show function where trap occurred)
Date: Sat, 24 Feb 2018 11:18:39 +0200
maxv@NetBSD.org wrote:
> Can you re-test with the fix I committed?
I see that the change was reverted, but since I had already started a
test run of source date 2018.02.23.09.00.56, I might as well report
the results. The change you committed did not fix the problem; the
function where the trap occurred was still not identified:
(gdb) bt
#0 maybe_dump (howto=260) at /usr/src/sys/arch/i386/i386/machdep.c:746
#1 0xc011c48d in cpu_reboot (howto=260, bootstr=0x0)
at /usr/src/sys/arch/i386/i386/machdep.c:765
#2 0xc0c02a5a in vpanic (fmt=0xc10b0678 "trap",
ap=0xceae8d78 "\020\216\256\316\020\216\256\316\002")
at /usr/src/sys/kern/subr_prf.c:342
#3 0xc0c0288c in panic (fmt=0xc10b0678 "trap")
at /usr/src/sys/kern/subr_prf.c:258
#4 0xc011fdd8 in trap (frame=0xceae8e10)
at /usr/src/sys/arch/i386/i386/trap.c:323
#5 0xc0114492 in alltraps ()
#6 0xceae8e10 in ?? ()
#7 0xc0168313 in sy_call (sy=0xc16cda40 <sysent+4160>, l=0xc22cd540,
uap=0xceae8f74, rval=0xceae8f6c) at /usr/src/sys/sys/syscallvar.h:65
#8 0xc01683e3 in sy_invoke (sy=0xc16cda40 <sysent+4160>, l=0xc22cd540,
uap=0xceae8f74, rval=0xceae8f6c, code=208)
at /usr/src/sys/sys/syscallvar.h:94
#9 0xc016868a in syscall (frame=0xceae8fa8)
at /usr/src/sys/arch/x86/x86/syscall.c:140
#10 0xc01006a9 in Xsyscall ()
(gdb)
In the commit messsage, you wrote:
> Otherwise DDB is unable to display a correct stack trace
> if a fault occurred in a function before the frame was pushed.
DDB _is_ able to display a correct stack trace, or at least it was at
the time when the PR was filed, as shown at the beginning of the PR.
The problem only affects gdb.
--
Andreas Gustafsson, gson@gson.org
From: Maxime Villard <max@m00nbsd.net>
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-i386/52560 (gdb kernel backtrace fails to show function
where trap occurred)
Date: Sat, 24 Feb 2018 11:43:30 +0100
Le 24/02/2018 à 10:18, Andreas Gustafsson a écrit :
> maxv@NetBSD.org wrote:
>> Can you re-test with the fix I committed?
>
> I see that the change was reverted, but since I had already started a
> test run of source date 2018.02.23.09.00.56, I might as well report
> the results. The change you committed did not fix the problem; the
> function where the trap occurred was still not identified:
>
> (gdb) bt
> #0 maybe_dump (howto=260) at /usr/src/sys/arch/i386/i386/machdep.c:746
> #1 0xc011c48d in cpu_reboot (howto=260, bootstr=0x0)
> at /usr/src/sys/arch/i386/i386/machdep.c:765
> #2 0xc0c02a5a in vpanic (fmt=0xc10b0678 "trap",
> ap=0xceae8d78 "\020\216\256\316\020\216\256\316\002")
> at /usr/src/sys/kern/subr_prf.c:342
> #3 0xc0c0288c in panic (fmt=0xc10b0678 "trap")
> at /usr/src/sys/kern/subr_prf.c:258
> #4 0xc011fdd8 in trap (frame=0xceae8e10)
> at /usr/src/sys/arch/i386/i386/trap.c:323
> #5 0xc0114492 in alltraps ()
> #6 0xceae8e10 in ?? ()
> #7 0xc0168313 in sy_call (sy=0xc16cda40 <sysent+4160>, l=0xc22cd540,
> uap=0xceae8f74, rval=0xceae8f6c) at /usr/src/sys/sys/syscallvar.h:65
> #8 0xc01683e3 in sy_invoke (sy=0xc16cda40 <sysent+4160>, l=0xc22cd540,
> uap=0xceae8f74, rval=0xceae8f6c, code=208)
> at /usr/src/sys/sys/syscallvar.h:94
> #9 0xc016868a in syscall (frame=0xceae8fa8)
> at /usr/src/sys/arch/x86/x86/syscall.c:140
> #10 0xc01006a9 in Xsyscall ()
> (gdb)
>
> In the commit messsage, you wrote:
>> Otherwise DDB is unable to display a correct stack trace
>> if a fault occurred in a function before the frame was pushed.
>
> DDB _is_ able to display a correct stack trace, or at least it was at
> the time when the PR was filed, as shown at the beginning of the PR.
> The problem only affects gdb.
In fact your problem is that the _current_ function does not get displayed in
GDB, and my patch fixed the fact that there were cases where you couldn't get
the _previous_ function in DDB. I still put the PR in feedback, because I
wanted to know whether somehow it would fix the issue you were getting.
Maxime
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.