NetBSD Problem Report #41591

From root@raeburn.org  Sat Jun 13 20:07:37 2009
Return-Path: <root@raeburn.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id EAED363B8B4
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 13 Jun 2009 20:07:36 +0000 (UTC)
Message-Id: <200906132007.n5DK7VdG001315@raeburn.org>
Date: Sat, 13 Jun 2009 16:07:31 -0400 (EDT)
From: raeburn@raeburn.org
Reply-To: raeburn@raeburn.org
To: gnats-bugs@gnats.NetBSD.org
Cc: raeburn@raeburn.org
Subject: nested-panic loop, no reboot
X-Send-Pr-Version: 3.95

>Number:         41591
>Category:       kern
>Synopsis:       nested-panic loop, no reboot
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jun 13 20:10:00 +0000 2009
>Originator:     Ken Raeburn
>Release:        NetBSD 5.0
>Organization:
>Environment:
System: NetBSD raeburn.org 5.0 NetBSD 5.0 (GENERIC) #0: Sun Apr 26 18:50:08 UTC 2009 builds@b6.netbsd.org:/home/builds/ab/netbsd-5-0-RELEASE/i386/200904260229Z-obj/home/builds/ab/netbsd-5-0-RELEASE/src/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:

In an attempt to handle a panic (probably a copy_mbuf one like I just
reported), the kernel gets a fault in attempting to write out a crash
dump, panics, and attempts to write out a crash dump again, until I
hit a reset button (which this system doesn't appear to have) or
power-cycle it (so, no kernel messages remaining in memory).

When I got to the console, it was filled with repetitions of:

trap type 6 code 2 eip c05400b4 cs 8 eflags 10246 cr2 cd12e600 ilevel 8
panic: trap
Faulted in mid-traceback: aborting
dumping to dev 0,1 offset 8
dump fatal page fault in supervisor mode
trap type 6 code 2 eip c05400b4 cs 8 eflags 10246 cr2 cd12e600 ilevel 8
[...]

where c05400b4 is:

0xc054008f <dodumpsys+719>:     call   0xc052d7b0 <pmap_extract>
0xc0540094 <dodumpsys+724>:     test   %al,%al
0xc0540096 <dodumpsys+726>:     je     0xc05400b6 <dodumpsys+758>
0xc0540098 <dodumpsys+728>:     mov    0xfffffff0(%ebp),%ecx
0xc054009b <dodumpsys+731>:     mov    0xc0b15060,%eax
0xc05400a0 <dodumpsys+736>:     mov    %ecx,%edx
0xc05400a2 <dodumpsys+738>:     shr    $0xf,%edx
0xc05400a5 <dodumpsys+741>:     shr    $0xc,%ecx
0xc05400a8 <dodumpsys+744>:     add    %eax,%edx
0xc05400aa <dodumpsys+746>:     and    $0x7,%ecx
0xc05400ad <dodumpsys+749>:     mov    $0x1,%eax
0xc05400b2 <dodumpsys+754>:     shl    %cl,%eax
0xc05400b4 <dodumpsys+756>:     or     %al,(%edx)         **********
0xc05400b6 <dodumpsys+758>:     add    $0x1000,%ebx
0xc05400bc <dodumpsys+764>:     jne    0xc0540080 <dodumpsys+704>
0xc05400be <dodumpsys+766>:     jmp    0xc053feb3 <dodumpsys+243>

so I'm guessing it's in the setbit call in the loop in
sparse_dump_mark, the only place in dumpsys.c where I see a call to
pmap_extract; either sparse_dump_physmap is a bad pointer or
p/PAGE_SIZE is out of range.

This obviously makes it worse for my router than 4.0.1, which just
rebooted on panic. :-(

>How-To-Repeat:
	?
>Fix:
	Once dumping has been started once, it should be disabled for
	any further panic calls.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.