NetBSD Problem Report #8939

Received: (qmail 13560 invoked from network); 3 Dec 1999 09:22:52 -0000
Message-Id: <199912030922.KAA00226@natteravn.runit.sintef.no>
Date: Fri, 3 Dec 1999 10:22:21 +0100 (MET)
From: jarle@runit.no
Reply-To: jarle@runit.no
To: gnats-bugs@gnats.netbsd.org
Subject: unexpected machine check
X-Send-Pr-Version: 3.95

>Number:         8939
>Category:       port-alpha
>Synopsis:       unexpected machine check
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-alpha-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Dec 03 01:24:01 +0000 1999
>Closed-Date:    
>Last-Modified:  Tue Nov 09 11:28:24 +0000 2004
>Originator:     Jarle Greipsland
>Release:        1999-11-27
>Organization:

>Environment:

System: NetBSD honey.runit.sintef.no 1.4P NetBSD 1.4P (HONEY) #14: Thu Dec  2 22:03:19 CET 1999     jarle@honey.runit.sintef.no:/usr/src/sys/arch/alpha/compile/HONEY alpha


>Description:
While running a couple of crashme processes the following suddenly appeared
on the console:

unexpected machine check:

    mces    = 0x1
    vector  = 0x670
    param   = 0xfffffc0000006060
    pc      = 0xfffffc00003004c8
    ra      = 0x120001ac8
    curproc = 0xfffffc00054f1558
        pid = 332, comm = crashme

panic: machine check
Stopped in crashme at   cpu_Debugger+0x4:       ret     zero,(ra)
db> trace
cpu_Debugger() at cpu_Debugger+0x4
panic() at panic+0xe4
machine_check() at machine_check+0x1fc
interrupt() at interrupt+0x134
XentInt() at XentInt+0x1c
--- interrupt (from ipl 0) ---
XentArith() at XentArith
--- arithmetic trap ---
*ABS*() at 0
db> ps
 PID             PPID       PGRP        UID S   FLAGS          COMMAND    WAIT
>How-To-Repeat:
Run a couple of crashme processes ...

>Fix:


No idea.  However, the problem seems reproducible, so I may be able to
provide additional information in case someone tells me what they want to
know and how to get at it.
						-jarle
-- 
"Arithmetic is being able to count up to twenty without taking off your
 shoes."			-- Mickey Mouse
>Release-Note:
>Audit-Trail:

From: nathanw@MIT.EDU (Nathan J. Williams)
To: jarle@runit.no
Cc: gnats-bugs@gnats.netbsd.org
Subject: port-alpha/8939
Date: 03 Dec 1999 11:55:18 -0500

 jarle@runit.no writes:

 >     mces    = 0x1
 >     vector  = 0x670
 >     param   = 0xfffffc0000006060
 >     pc      = 0xfffffc00003004c8
 >     ra      = 0x120001ac8
 >     curproc = 0xfffffc00054f1558
 >         pid = 332, comm = crashme
 > 
 > panic: machine check
 > Stopped in crashme at   cpu_Debugger+0x4:       ret     zero,(ra)
 > db> trace

 > No idea.  However, the problem seems reproducible, so I may be able to
 > provide additional information in case someone tells me what they want to
 > know and how to get at it.

 If you can do "examine" of the printed value of param plus 0x10 (in
 the case above, "examine 0xfffffc0000006070"), it will print out the
 machine check code, which indicates why it triggered the machine
 check. This will help debug it.

         - Nathan

From: Jarle Greipsland <jarle@runit.sintef.no>
To: nathanw@MIT.EDU
Cc: gnats-bugs@gnats.netbsd.org
Subject: Re: port-alpha/8939
Date: Fri, 03 Dec 1999 18:22:30 +0100

 Nathan J Williams writes:
 > If you can do "examine" of the printed value of param plus 0x10 (in
 > the case above, "examine 0xfffffc0000006070"), it will print out the
 > machine check code, which indicates why it triggered the machine
 > check. This will help debug it.

 Here's the output:

 unexpected machine check:

     mces    = 0x1
     vector  = 0x670
     param   = 0xfffffc0000006060
     pc      = 0xfffffc00003004c8
     ra      = 0x120001ac8
     curproc = 0xfffffc0004d4a030
         pid = 2533, comm = crashme

 panic: machine check
 Stopped in crashme at   cpu_Debugger+0x4:       ret     zero,(ra)
 db> trace
 cpu_Debugger() at cpu_Debugger+0x4
 panic() at panic+0xe4
 machine_check() at machine_check+0x1fc
 interrupt() at interrupt+0x134
 XentInt() at XentInt+0x1c
 --- interrupt (from ipl 0) ---
 XentArith() at XentArith
 --- arithmetic trap
 fatal kernel trap:

     trap entry = 0x2 (memory management fault)
     a0         = 0xfffffe00044de0a8
     a1         = 0x1
     a2         = 0x0
     pc         = 0xfffffc00004f6e48
     ra         = 0xfffffc00004f6e2c
     curproc    = 0xfffffc0004d4a030
         pid = 2533, comm = crashme

 Caught exception in ddb.
 db> examine 0xfffffc0000006070
 0xfffffc0000006070:     8e
 db> 

 Do you need more information?
 					-jarle


From: "R.o.s.s  H.a.r.v.e.y" <ross@ghs.com>
To: jarle@runit.no, nathanw@mit.edu, port-alpha-maintainer@netbsd.org
Cc: gnats-bugs@gnats.netbsd.org, jarle@runit.sintef.no
Subject: re: port-alpha/8939
Date: Wed, 23 Jan 2002 14:36:56 -0800 (PST)

 First, I have to apologize to Jarle for doubting the SW explanation
 at first.

 I believe I traced this problem to a palcode bug. (I wasn't _totally_
 wrong before; the bug does appear to be due to the palcode incorrectly
 dealing with the fact that the arithmetic trap has a higher hardware
 priority than any other synchronous exception.)

 Anyway, I put a workaround into the kernel in mid-2001. Can you try
 and confirm that this is finally fixed? 

 (I really mean it this time! :-)

         r.o.s.s

From: "R.o.s.s  H.a.r.v.e.y" <ross@ghs.com>
To: jarle@runit.no, nathanw@mit.edu, port-alpha-maintainer@netbsd.org,
   ross@sigmet.ghs.com
Cc: gnats-bugs@gnats.netbsd.org, jarle@runit.sintef.no
Subject: re: port-alpha/8939
Date: Wed, 23 Jan 2002 15:36:35 -0800 (PST)

 Argh. Don't bother doing any tests just yet, while I know I fixed one
 crashme problem, it looks like there is another one. I just crashed my
 alpha at home remotely ... I won't know exactly what the story is until
 I get back there.

 	r.o.s.s
Responsible-Changed-From-To: port-alpha-maintainer (NetBSD/alpha Portmaster)->port-alpha-maintainer
Responsible-Changed-By: soren@narn.netbsd.org
Responsible-Changed-When: Tue, 09 Nov 2004 11:28:24 +0000
Responsible-Changed-Why:
Removed extraneous text in field which confused the gnats-html script.


>Unformatted:
 >332              285        285        666 2  0x4006          crashme
  328              243        243        666 2  0x4006          crashme
  291              242        291          0 3  0x4086              csh   ttyin
  290              233        290        666 3  0x4186              top  select
  285              233        285        666 3  0x4086          crashme    wait
  243              233        243        666 3  0x4086          crashme    wait
  242              241        242      16073 3  0x4086              csh   pause
  241              238        238      16073 3  0x4184            xterm  select
  238              236        238      16073 3  0x4084              csh   pause
  236              215        215          0 3    0x84            sshd1  select
  233              229        233        666 3  0x4086              csh   pause
  229              224        229          0 3  0x4086              csh   pause
  224              223        224      16073 3  0x4086             bash    wait
  223              220        220      16073 3  0x4184            xterm  select
  220              218        220      16073 3  0x4084              csh   pause
  218              215        215          0 3    0x84            sshd1  select
  217                1        217          0 3  0x4086              csh   ttyin
  215                1        215          0 3    0x84            sshd1  select
  203                1        203          0 3    0x84            inetd  select
  200                1        200          0 3    0x84            xntpd   pause
  198                1        198          0 3   0x184             cron nanosle
  168              167        167          0 3    0x84     lfs_cleanerd segment
  167                1        167          0 3    0x84     lfs_cleanerd    wait
  164                1        164          0 3    0x84        mount_mfs  mfsidl
  158                1        158          0 3    0x84          portmap  select
  151                1        151          0 2    0x84          syslogd
  4                  0          0          0 3 0x20204          ioflush  syncer
  3                  0          0          0 3 0x20204           reaper  reaper
  2                  0          0          0 3 0x20204       pagedaemon daemon_
  1                  0          1          0 3  0x4084             init    wait
  0                 -1          0          0 3 0x20204          swapper schedul

 The system in question reports at the top of the dmesg output:

 Digital AlphaPC 164 500 MHz
 8192 byte page size, 1 processor.
 total memory = 128 MB
 (2472 KB reserved for PROM, 125 MB used by NetBSD)
 avail memory = 113 MB
 using 816 buffers containing 6528 KB of memory
 mainbus0 (root)
 cpu0 at mainbus0: ID 0 (primary), 21164A-2 (pass 2)

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.