NetBSD Problem Report #6799
Received: (qmail 22702 invoked from network); 13 Jan 1999 00:27:17 -0000
Message-Id: <199901122358.PAA00476@jules.nas.nasa.gov>
Date: Tue, 12 Jan 1999 15:58:09 -0800 (PST)
From: Matthew Jacob <mjacob@nas.nasa.gov>
Reply-To: mjacob@netbsd.org
To: gnats-bugs@gnats.netbsd.org
Subject: unexpected reboot with 'kernel stack not valid'
X-Send-Pr-Version: 3.95
>Number: 6799
>Category: port-alpha
>Synopsis: under light load Alpha 8200 panics with 'kernel stack not valid'
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: thorpej
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jan 12 16:35:01 +0000 1999
>Closed-Date:
>Last-Modified: Tue Oct 13 15:51:18 +0000 2020
>Originator:
>Release: source cvs updated as of Jan-12-1999, 1000PST.
>Organization:
NASA Ames Research Center
>Environment:
System: NetBSD jules.nas.nasa.gov 1.3I NetBSD 1.3I (JULES) #0: Tue Jan 12 10:36:40 PST 1999 mjacob@mathom.nas.nasa.gov:/space/NetBSD-current/src/sys/arch/alpha/compile/JULES alpha
>Description:
This has been around for a while but it's time to get serious about it.
Under moderately light load (building src), the kernel rebootes with:
halted CPU 8^M
^M
halt code = 2^M
kernel stack not valid halt^M
PC = fffffc00004eadac^M
For this kernel, this is right at the front of uvm_fault
(fffffc00004eada0 T uvm_fault).
The config file (JULES) is actually just a copy of the ALPHA config file.
This particular machine as 2GB primary memory and ~2GB swap. I tried
to repeat the build with mfs on /tmp not mounted, but got the same
panic.
>How-To-Repeat:
Run a build on the NetBSD /usr/src with the following script:
#!/bin/sh
#
# A Script to do a nightly build of the NetBSD source tree (without
# trashing the running system)
#
set -a
# The location of your source tree.
BSDSRCDIR=${BSDSRCDIR-/usr/src}
# The location of the object files produced by the build.
BSDOBJDIR=${BSDOBJDIR-/usr/obj}
# For the initial build, which doesn't include those crypto
# files which may not be exported from the US and Canada.
#EXPORTABLE_SYSTEM=1
# These two aren't really necessary; they just make life
# easier if/when you rebuild later.
BUILD=1
UPDATE=1
# The following variables must be set in the environment;
# /etc/mk.conf will not do!
# Where the installed files go.
DESTDIR=${DESTDIR-/proto}
# Where the .tgz files built for the release go.
RELEASEDIR=${RELEASEDIR-/release}
#
# Set PATH and LD_LIBRARY_PATH to use the built items when possible.
# Strictly speaking this should all be done twice.
#
DD=${DESTDIR}
PATH=${DD}/sbin:${DD}/usr/sbin:${DD}/usr/local/bin:${DD}/bin
PATH=$PATH:${DD}/usr/bin:${DD}/usr/local/sbin:/sbin:/usr/sbin
PATH=$PATH:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin
LD_LIBRARY_PATH=${DD}/usr/lib:${LD_LIBRARY_PATH}
cd ${BSDSRCDIR} && make obj && make build
>Fix:
>Release-Note:
>Audit-Trail:
From: Matthew Jacob <mjacob@feral.com>
To: port-alpha@netbsd.org
Cc: gnats-bugs@netbsd.org
Subject: port-alpha/6799
Date: Wed, 3 Feb 1999 11:02:38 -0800 (PST)
Possibly related to this, but possibly not, here's a double fault
into the prom.... The PC in question is prom_enter. It makes me wonder
whether the switch to UVM is problematic with the re-entering the prom
code that we have to do because we haven't finished console uart stuff
(zsc at gbus...)....
CPU 8 halted
halt code = 6
double error halt
PC = fffffc00005aa7f0 <<<<< prom_enter
Haltcode 6 Double Machine Check
03022636 WATCH$: 02-03-99 02:38:54
00006302
00000200 Frame Size
00000000 Flag bits
00000118 CPU Area Offset
000001a0 System Area Offset
0000fff0 MCheck Reason Mask
00000001 MCheck Frame Rev
EV5 IPRs:
exc_addr: fffffc00 005aa7f0 exc_sum: 00000000 00000000
exc_mask: 00000000 00000000 isr: 00000000 00100000
icsr: 00000061 60000000 icpe_stat: 00000000 00002000
dcpe_stat: 00000000 00000000 va: fffffe00 07533fb8
mm_stat: 00000000 00016e91 sc_addr: ffffff00 0001d28f
sc_stat: 00000000 00000000 bc_tag_addr: ffffff80 00cfcfff
ei_addr: ffffff00 0011575f ei_stat: fffffff0 04ffffff
fill_syn: 00000000 00009000 ld_lock: ffffff00 01783a7f
pal_base: 00000000 00018000 sys_ipr1: 00000000 00510008
TLEP CSRs:
tldev: 51008011 tlber: 00800490
tlcnr: 00000140 tlvid: 00000098
tlesr0: 00400303 tlesr1: 00400c0c
tlesr2: 00406060 tlesr3: 00409090
tlepaerr: 00040200 tlepderr: 00000000
tlepmerr: 00000000 tlvmg: 00000000
tlintrmask0: 000001ff tlintrsum0: 00000802
tlintrmask1: 00000000 tlintrsum1: 00000000
TLSB Node 4
TLDEV 51008011 TLBER 00800490
TLESR0 00400303 TLESR1 00400c0c
TLESR2 00406060 TLESR3 00409090
TLEPAERR 00040200 TLEPDERR 00000000
TLEPMERR 00000000 TLEPWERR0 deadbeef
TLEPWERR1 deadbeef TLEPWERR2 deadbeef
TLEPWERR3 deadbeef
TLSB Node 5
TLDEV 00005000 TLBER 00100000
TLESR0 00000303 TLESR1 00000c0c
TLESR2 00006060 TLESR3 00009090
TLFADR0 00115740 TLFADR1 07850000
TLVID 00000080 TLMIR 80000001
MCR 00000234 MER 00000001
TLSB Node 8
TLDEV 00002020 TLBER 00000000
TLESR0 00000000 TLESR1 00000000
TLESR2 00000000 TLESR3 00000000
ICCNSE 00000000 ICCWTR 00000000
IDPNSE0 00000006 IDPNSE1 00000006
IDPNSE2 00000000 IDPNSE3 00000000
IOP Node 8 Hose 0
PCIERR0 00000000 PCIERR1 00000000
IOP Node 8 Hose 1
PCIERR0 00004001 PCIERR1 00020000 PCIERR2 00000000
CPU 8 has 2 Halt Data Log entries
From: Matthew Jacob <mjacob@nas.nasa.gov>
To: gnats-bugs@netbsd.org
Cc: Subject: port-alpha/6799
Date: Wed, 25 Aug 1999 16:40:56 -0700 (PDT)
Update on this- this is still happening quite regularly with Release 1.4,
but I've gathered hwrpb info in case that helps... It's been trivial
to reproduce for 9 months now- not lack of information.
Panic message was:
CPU 8 halted
halt code = 2
kernel stack not valid halt
PC = fffffc000039528c (lockmgr + 0x12)
>>>show hwrpb
HWRPB is at 2000
00002000 hwrpb
0 00000000 00002000 Physical address of base of HWRPB
8 00000042 50525748 Identifying string 'HWRPB'
16 00000000 00000009 HWRPB version number
24 00000000 00002FA8 HWPRB size
32 00000000 00000008 ID of primary processor
40 00000000 00002000 System page size in bytes
48 00000000 00000022 Physical address size in bits
56 00000000 0000007F Maximum ASN value
64 49343030 3237494E System serial number
72 00000000 00004944
80 00000000 0000000C Alpha system type
88 00000000 00001065 system subtype
96 00000000 00000000 System revision
104 00000000 00400000 Interval clock interrupt frequency
112 00000000 1A153F00 Cycle Counter frequency
120 FFFFFFFC 00000000 Virtual page table base
128 00000000 00000000 Reserved for architecture use, SBZ
136 00000000 00000140 Offset to Translation Buffer Hint Block
144 00000000 00000010 Number of processor supported
152 00000000 00000280 Size of Per-CPU Slots in bytes
160 00000000 00000180 Offset to Per-CPU Slots
168 00000000 00000002 Number of CTBs in CTB table
176 00000000 00000160 Size of largest CTB in CTB table
184 00000000 00002980 Offset to Console Terminal Block
192 00000000 00002C40 Offset to Console Routine Block
200 00000000 00002CA0 Offset to Memory Data Descriptors
208 00000000 00000000 Offset to Configuration Data Table
216 00000000 00139AC0 Offset to FRU Table
224 00000000 00000000 Starting VA of SAVE_TERM routine
232 00000000 00000000 Procedure Value of SAVE_TERM routine
240 FFFFFC00 00300FB4 Starting VA of RESTORE_TERM routine
248 00000000 00000001 Procedure Value of RESTORE_TERM routine
256 00000000 00000000 VA of restart routine
264 00000000 00000000 Restart procedure value
272 00000000 00000000 Reserved to System Software
280 00000000 00000000 Reserved to Hardware
288 49342C6E 9D23DD2C Checksum of HWRPB
296 00000000 00000000 RX Ready bitmask
304 00000000 00000000 TX Ready bitmask
312 00000000 00002EE8 Offset to DSRDB
00003580 slot at index 8
FFFFFC00 005FFEF8 KSP
00000000 00000000 ESP
00000000 00000589 SSP
00000000 00000000 USP
00000000 00000000 PTBR
00000000 00000000 ASN
00000000 00000000 ASTEN_SR
00000000 00000000 FEN
00000000 00000000 CC
00000000 00000000 SCRATCH [0]
00000000 00000000 SCRATCH [1]
00000000 00000000 SCRATCH [2]
00000000 00000000 SCRATCH [3]
00000000 00000000 SCRATCH [4]
00000000 00000000 SCRATCH [5]
000001EE 00000000 SCRATCH [6]
0 Boot in progress
1 Restart capable
1 Processor available
1 Processor present
0 Operator halted
1 Context valid
1 Palcode valid
1 Palcode memory valid
1 Palcode loaded
0 Reserved MBZ
0 Halt requested
0 Reserved MBZ
0 Reserved MBZ
00000000 00000000 PAL_MEM_LEN
00000000 00000000 PAL_SCR_LEN
00000000 00000000 PAL_MEM_ADR
00000000 00000000 PAL_SCR_ADR
00100005 00020116 PAL_REV
00000002 00000007 CPU_TYPE
00000000 00000007 CPU_VAR
00000000 00000000 CPU_REV
30353133 32375941 SERIAL_NUM
00000000 00003432 SERIAL_NUM
00000000 00008AB8 PAL_LOGOUT
00000000 00000690 PAL_LOGOUT_LEN
00000000 70474000 HALT_PCBB
FFFFFC00 0039528C HALT_PC
00000000 000004F0 HALT_PS
FFFFFC00 005CC0B0 HALT_ARGLIST
FFFFFC00 00493368 HALT_RETURN
FFFFFC00 00395280 HALT_VALUE
00000000 00000002 HALTCODE
00000000 00000000 RSVD_SW
00000000 RXLEN
00000000 TXLEN
00003800 slot at index 9
00000000 00000000 KSP
00000000 00000000 ESP
00000000 00000000 SSP
00000000 00000000 USP
00000000 00000000 PTBR
00000000 00000000 ASN
00000000 00000000 ASTEN_SR
00000000 00000000 FEN
00000000 00000000 CC
00000000 00000000 SCRATCH [0]
00000000 00000000 SCRATCH [1]
00000000 00000000 SCRATCH [2]
00000000 00000000 SCRATCH [3]
00000000 00000000 SCRATCH [4]
00000000 00000000 SCRATCH [5]
000001CC 00000000 SCRATCH [6]
0 Boot in progress
0 Restart capable
1 Processor available
1 Processor present
0 Operator halted
0 Context valid
1 Palcode valid
1 Palcode memory valid
1 Palcode loaded
0 Reserved MBZ
0 Halt requested
0 Reserved MBZ
0 Reserved MBZ
00000000 00000000 PAL_MEM_LEN
00000000 00000000 PAL_SCR_LEN
00000000 00000000 PAL_MEM_ADR
00000000 00000000 PAL_SCR_ADR
0010000A 00010114 PAL_REV
00000002 00000007 CPU_TYPE
00000000 00000003 CPU_VAR
00000000 00000000 CPU_REV
30353133 32375941 SERIAL_NUM
00000000 00003432 SERIAL_NUM
00000000 00009148 PAL_LOGOUT
00000000 00000690 PAL_LOGOUT_LEN
00000000 00003800 HALT_PCBB
00000000 00000000 HALT_PC
00000000 00001F00 HALT_PS
00000000 00000000 HALT_ARGLIST
00000000 00000000 HALT_RETURN
00000000 00000000 HALT_VALUE
00000000 00000000 HALTCODE
00000000 00000000 RSVD_SW
00000007 RXLEN
00000013 TXLEN
00004980 console terminal block
00000000 00000002 TYPE
00000000 00000000 ID
00000000 00000000 RSVD
00000000 00000060 DEV_DEP_LEN
00000000 F4000000 CSR
00000000 000006C0 TX_SCB_OFFSET
00000000 00000680 RX_SCB_OFFSET
00000000 00002580 BAUD
00000000 00000000 PUTS_STATUS
00000000 00000000 GETC_STATUS
00004C40 console routine block
00000000 10064210 VDISPATCH
00000000 00066210 PDISPATCH
00000000 10064220 VFIXUP
00000000 00066220 PFIXUP
00000000 00000002 ENTRIES
00000000 00000153 PAGES
00000000 10000000 V_ADDRESS
00000000 00002000 P_ADDRESS
00000000 000000FF PAGE_COUNT
00000000 101FE000 V_ADDRESS
00000000 7FF58000 P_ADDRESS
00000000 00000054 PAGE_COUNT
00004CA0 memory descriptor
00000000 90214F62 CHECKSUM
00000000 00000000 IMP_DATA_PA
00000000 00000003 CLUSTER_COUNT
00000000 00000000 START_PFN
00000000 00000100 PFN_COUNT
00000000 00000000 TEST_COUNT
00000000 00000000 BITMAP_VA
00000000 00000000 BITMAP_PA
00000000 00000000 BITMAP_CHKSUM
00000000 00000001 USAGE
0 bad page(s)
00000000 00000100 START_PFN
00000000 0003FEAC PFN_COUNT
00000000 0003FEAC TEST_COUNT
00000000 101FE000 BITMAP_VA
00000000 7FF58000 BITMAP_PA
FFFFFFFF FFFFF005 BITMAP_CHKSUM
00000000 00000000 USAGE
0 bad page(s)
00000000 0003FFAC START_PFN
00000000 00000054 PFN_COUNT
00000000 00000000 TEST_COUNT
00000000 00000000 BITMAP_VA
00000000 00000000 BITMAP_PA
00000000 00000000 BITMAP_CHKSUM
00000000 00000001 USAGE
0 bad page(s)
00004EE8 Dynamic System Recognition Data block
00000000 00000619 SMM
00000000 00000018 Offset to LURT
00000000 00000068 Offset to Name Count
00000000 00000009 LURT Count
00000000 00000834 LURT Column 1
FFFFFFFF FFFFFFFF LURT Column 2
FFFFFFFF FFFFFFFF LURT Column 3
FFFFFFFF FFFFFFFF LURT Column 4
FFFFFFFF FFFFFFFF LURT Column 5
FFFFFFFF FFFFFFFF LURT Column 6
FFFFFFFF FFFFFFFF LURT Column 7
00000000 0000047E LURT Column 8
00000000 0000047E LURT Column 9
00000000 00000016 Name Count
Platform Name = AlphaServer 8200 5/440
State-Changed-From-To: open->feedback
State-Changed-By: fair
State-Changed-When: Thu Jan 17 23:44:31 PST 2002
State-Changed-Why:
Here we are, a tad over two years later - does this problem still occurr
in NetBSD 1.5.2 or -current?
State-Changed-From-To: feedback->open
State-Changed-By: mjacob
State-Changed-When: Fri Jan 18 09:41:07 PST 2002
State-Changed-Why:
The problem will continue to exist, likely, until we get a working
zs driver for the console. The presumption as to why this occurs
is that the constant callbacks into the PROM for serial console
hit some problem because we've done a lot to change mappings, etc.
Responsible-Changed-From-To: port-alpha-maintainer->thorpej
Responsible-Changed-By: thorpej@NetBSD.org
Responsible-Changed-When: Tue, 13 Oct 2020 15:51:18 +0000
Responsible-Changed-Why:
Take.
>Unformatted:
(Contact us)
$NetBSD: gnats-precook-prs,v 1.4 2018/12/21 14:20:20 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.