NetBSD Problem Report #8939
Received: (qmail 13560 invoked from network); 3 Dec 1999 09:22:52 -0000
Message-Id: <199912030922.KAA00226@natteravn.runit.sintef.no>
Date: Fri, 3 Dec 1999 10:22:21 +0100 (MET)
From: jarle@runit.no
Reply-To: jarle@runit.no
To: gnats-bugs@gnats.netbsd.org
Subject: unexpected machine check
X-Send-Pr-Version: 3.95
>Number: 8939
>Category: port-alpha
>Synopsis: unexpected machine check
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-alpha-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Dec 03 01:24:01 +0000 1999
>Closed-Date:
>Last-Modified: Tue Nov 09 11:28:24 +0000 2004
>Originator: Jarle Greipsland
>Release: 1999-11-27
>Organization:
>Environment:
System: NetBSD honey.runit.sintef.no 1.4P NetBSD 1.4P (HONEY) #14: Thu Dec 2 22:03:19 CET 1999 jarle@honey.runit.sintef.no:/usr/src/sys/arch/alpha/compile/HONEY alpha
>Description:
While running a couple of crashme processes the following suddenly appeared
on the console:
unexpected machine check:
mces = 0x1
vector = 0x670
param = 0xfffffc0000006060
pc = 0xfffffc00003004c8
ra = 0x120001ac8
curproc = 0xfffffc00054f1558
pid = 332, comm = crashme
panic: machine check
Stopped in crashme at cpu_Debugger+0x4: ret zero,(ra)
db> trace
cpu_Debugger() at cpu_Debugger+0x4
panic() at panic+0xe4
machine_check() at machine_check+0x1fc
interrupt() at interrupt+0x134
XentInt() at XentInt+0x1c
--- interrupt (from ipl 0) ---
XentArith() at XentArith
--- arithmetic trap ---
*ABS*() at 0
db> ps
PID PPID PGRP UID S FLAGS COMMAND WAIT
>How-To-Repeat:
Run a couple of crashme processes ...
>Fix:
No idea. However, the problem seems reproducible, so I may be able to
provide additional information in case someone tells me what they want to
know and how to get at it.
-jarle
--
"Arithmetic is being able to count up to twenty without taking off your
shoes." -- Mickey Mouse
>Release-Note:
>Audit-Trail:
From: nathanw@MIT.EDU (Nathan J. Williams)
To: jarle@runit.no
Cc: gnats-bugs@gnats.netbsd.org
Subject: port-alpha/8939
Date: 03 Dec 1999 11:55:18 -0500
jarle@runit.no writes:
> mces = 0x1
> vector = 0x670
> param = 0xfffffc0000006060
> pc = 0xfffffc00003004c8
> ra = 0x120001ac8
> curproc = 0xfffffc00054f1558
> pid = 332, comm = crashme
>
> panic: machine check
> Stopped in crashme at cpu_Debugger+0x4: ret zero,(ra)
> db> trace
> No idea. However, the problem seems reproducible, so I may be able to
> provide additional information in case someone tells me what they want to
> know and how to get at it.
If you can do "examine" of the printed value of param plus 0x10 (in
the case above, "examine 0xfffffc0000006070"), it will print out the
machine check code, which indicates why it triggered the machine
check. This will help debug it.
- Nathan
From: Jarle Greipsland <jarle@runit.sintef.no>
To: nathanw@MIT.EDU
Cc: gnats-bugs@gnats.netbsd.org
Subject: Re: port-alpha/8939
Date: Fri, 03 Dec 1999 18:22:30 +0100
Nathan J Williams writes:
> If you can do "examine" of the printed value of param plus 0x10 (in
> the case above, "examine 0xfffffc0000006070"), it will print out the
> machine check code, which indicates why it triggered the machine
> check. This will help debug it.
Here's the output:
unexpected machine check:
mces = 0x1
vector = 0x670
param = 0xfffffc0000006060
pc = 0xfffffc00003004c8
ra = 0x120001ac8
curproc = 0xfffffc0004d4a030
pid = 2533, comm = crashme
panic: machine check
Stopped in crashme at cpu_Debugger+0x4: ret zero,(ra)
db> trace
cpu_Debugger() at cpu_Debugger+0x4
panic() at panic+0xe4
machine_check() at machine_check+0x1fc
interrupt() at interrupt+0x134
XentInt() at XentInt+0x1c
--- interrupt (from ipl 0) ---
XentArith() at XentArith
--- arithmetic trap
fatal kernel trap:
trap entry = 0x2 (memory management fault)
a0 = 0xfffffe00044de0a8
a1 = 0x1
a2 = 0x0
pc = 0xfffffc00004f6e48
ra = 0xfffffc00004f6e2c
curproc = 0xfffffc0004d4a030
pid = 2533, comm = crashme
Caught exception in ddb.
db> examine 0xfffffc0000006070
0xfffffc0000006070: 8e
db>
Do you need more information?
-jarle
From: "R.o.s.s H.a.r.v.e.y" <ross@ghs.com>
To: jarle@runit.no, nathanw@mit.edu, port-alpha-maintainer@netbsd.org
Cc: gnats-bugs@gnats.netbsd.org, jarle@runit.sintef.no
Subject: re: port-alpha/8939
Date: Wed, 23 Jan 2002 14:36:56 -0800 (PST)
First, I have to apologize to Jarle for doubting the SW explanation
at first.
I believe I traced this problem to a palcode bug. (I wasn't _totally_
wrong before; the bug does appear to be due to the palcode incorrectly
dealing with the fact that the arithmetic trap has a higher hardware
priority than any other synchronous exception.)
Anyway, I put a workaround into the kernel in mid-2001. Can you try
and confirm that this is finally fixed?
(I really mean it this time! :-)
r.o.s.s
From: "R.o.s.s H.a.r.v.e.y" <ross@ghs.com>
To: jarle@runit.no, nathanw@mit.edu, port-alpha-maintainer@netbsd.org,
ross@sigmet.ghs.com
Cc: gnats-bugs@gnats.netbsd.org, jarle@runit.sintef.no
Subject: re: port-alpha/8939
Date: Wed, 23 Jan 2002 15:36:35 -0800 (PST)
Argh. Don't bother doing any tests just yet, while I know I fixed one
crashme problem, it looks like there is another one. I just crashed my
alpha at home remotely ... I won't know exactly what the story is until
I get back there.
r.o.s.s
Responsible-Changed-From-To: port-alpha-maintainer (NetBSD/alpha Portmaster)->port-alpha-maintainer
Responsible-Changed-By: soren@narn.netbsd.org
Responsible-Changed-When: Tue, 09 Nov 2004 11:28:24 +0000
Responsible-Changed-Why:
Removed extraneous text in field which confused the gnats-html script.
>Unformatted:
>332 285 285 666 2 0x4006 crashme
328 243 243 666 2 0x4006 crashme
291 242 291 0 3 0x4086 csh ttyin
290 233 290 666 3 0x4186 top select
285 233 285 666 3 0x4086 crashme wait
243 233 243 666 3 0x4086 crashme wait
242 241 242 16073 3 0x4086 csh pause
241 238 238 16073 3 0x4184 xterm select
238 236 238 16073 3 0x4084 csh pause
236 215 215 0 3 0x84 sshd1 select
233 229 233 666 3 0x4086 csh pause
229 224 229 0 3 0x4086 csh pause
224 223 224 16073 3 0x4086 bash wait
223 220 220 16073 3 0x4184 xterm select
220 218 220 16073 3 0x4084 csh pause
218 215 215 0 3 0x84 sshd1 select
217 1 217 0 3 0x4086 csh ttyin
215 1 215 0 3 0x84 sshd1 select
203 1 203 0 3 0x84 inetd select
200 1 200 0 3 0x84 xntpd pause
198 1 198 0 3 0x184 cron nanosle
168 167 167 0 3 0x84 lfs_cleanerd segment
167 1 167 0 3 0x84 lfs_cleanerd wait
164 1 164 0 3 0x84 mount_mfs mfsidl
158 1 158 0 3 0x84 portmap select
151 1 151 0 2 0x84 syslogd
4 0 0 0 3 0x20204 ioflush syncer
3 0 0 0 3 0x20204 reaper reaper
2 0 0 0 3 0x20204 pagedaemon daemon_
1 0 1 0 3 0x4084 init wait
0 -1 0 0 3 0x20204 swapper schedul
The system in question reports at the top of the dmesg output:
Digital AlphaPC 164 500 MHz
8192 byte page size, 1 processor.
total memory = 128 MB
(2472 KB reserved for PROM, 125 MB used by NetBSD)
avail memory = 113 MB
using 816 buffers containing 6528 KB of memory
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21164A-2 (pass 2)
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.