NetBSD Problem Report #41795

From he@smistad.uninett.no  Tue Jul 28 12:24:56 2009
Return-Path: <he@smistad.uninett.no>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 092AD63B879
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 28 Jul 2009 12:24:56 +0000 (UTC)
Message-Id: <20090728122452.375D03D09F@smistad.uninett.no>
Date: Tue, 28 Jul 2009 14:24:52 +0200 (CEST)
From: he@NetBSD.org
Reply-To: he@NetBSD.org
To: gnats-bugs@gnats.NetBSD.org
Subject: Infrequent machine checks on SATALink 3512 on AlphaPC 164LX
X-Send-Pr-Version: 3.95

>Number:         41795
>Category:       port-alpha
>Synopsis:       Infrequent machine checks on SATA-using AlphaPC 164LX
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-alpha-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jul 28 12:25:00 +0000 2009
>Originator:     Havard Eidnes
>Release:        NetBSD 5.0
>Organization:
	I try...
>Environment:
System: NetBSD albatross.urc.uninett.no 5.0 (GENERIC-$Revision: 1.325$) #2: Fri May 15 16:13:36 CEST 2009 he@albatross.urc.uninett.no:/usr/obj/sys/arch/alpha/compile/GENERIC alpha
Architecture: alpha
Machine: alpha
>Description:
	As the most convenient method to equip an AlphaPC with lots of
	disk space, I've installed a SATA PCI controller.

	However, it appears that with some regularity (one to a few
	times per week?), it will panic with a machine check.  Below
	follows some corresponding DDB output as well as the dmesg for
	the machine in question.

	This machine is otherwise occupied doing pkgsrc bulk builds,
	and cleaning up and restarting the builds is becoming an
	annoyance.

	Does anyone have any hints for what the root cause of these
	problems might be?  It appears wdcintr() is always on the
	stack when this happens.  I suspect it's the last dsassembly,
	of the "previous" instruction which is the culprit, if the
	saved register contents is anything to go by.  Now, why this
	might happen or how it could get into this state I do not know
	and would like to get some hints on.

	The SATA PCI controller is a garden-variety PC-style
	controller, and I have good experiences when a similar
	controller (model Silicon Image SATALink 3112 rev 0x02) in a
	Sun Ultra 420R.  Could this be a problem with this particular
	model of controller that I'm using (SATALink 3512)?


	Anyway, as promised, here's some ddb and the dmesg output:

unexpected machine check:

    mces    = 0x1
    vector  = 0x670
    param   = 0xfffffc0000006068
    pc      = 0xfffffc0000887174
    ra      = 0xfffffc00007ecc2c
    code    = 0x98
    curlwp = 0xfffffc00044d1400
        pid = 23093.1, comm = pkg_add

panic: machine check
Stopped in pid 23093.1 (pkg_add) at     netbsd:cpu_Debugger+0x4: ret zero,(ra)
db: tra
cpu_Debugger() at netbsd:cpu_Debugger+0x4
panic() at netbsd:panic+0x240
machine_check() at netbsd:machine_check+0x2c4
interrupt() at netbsd:interrupt+0x248
XentInt() at netbsd:XentInt+0x1c
--- interrupt (from ipl 4) ---
cia_swiz_io_read_1() at netbsd:cia_swiz_io_read_1+0x14
pciide_dma_finish() at netbsd:pciide_dma_finish+0x6c
wdcintr() at netbsd:wdcintr+0xa0
pciide_pci_intr() at netbsd:pciide_pci_intr+0x80
alpha_shared_intr_dispatch() at netbsd:alpha_shared_intr_dispatch+0x5c
eb164_iointr() at netbsd:eb164_iointr+0x38
interrupt() at netbsd:interrupt+0xb8
XentInt() at netbsd:XentInt+0x1c
--- interrupt (from ipl 0) ---
pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x3c
vmem_alloc() at netbsd:vmem_alloc+0x64
kmem_alloc() at netbsd:kmem_alloc+0x48
kmem_zalloc() at netbsd:kmem_zalloc+0x20
ufs_balloc_range() at netbsd:ufs_balloc_range+0xdc
ffs_write() at netbsd:ffs_write+0x690
VOP_WRITE() at netbsd:VOP_WRITE+0x40
vn_write() at netbsd:vn_write+0x10c
dofilewrite() at netbsd:dofilewrite+0x94
sys_write() at netbsd:sys_write+0xa8
syscall_plain() at netbsd:syscall_plain+0x160
XentSys() at netbsd:XentSys+0x5c
--- syscall (4) ---
--- user mode ---
db: x/i 0xfffffc0000887174
netbsd:cia_swiz_io_read_1+0x14: zapnot  v0,#0xf,v0
db: show reg
v0          0x7
t0          0x1
t1          0x1
t2          0xfffffc001ffea000
t3          0xfffffc001ffeae6b
t4          0
t5          0xfffffc0000b02166  ustir_speeds+0xbea
t6          0
t7          0
s0          0xfffffc0000bf1b00  msgbufenabled
s1          0x104
s2          0xfffffc0000befac8  db_onpanic
s3          0xfffffc0000adbc9a  irqfmt.10449+0x4e4
s4          0x670
s5          0xfffffc0000006068
s6          0xfffffc0000b02165  ustir_speeds+0xbe9
a0          0x7
a1          0x7ffffe42c0003f8
a2          0
a3          0x8
a4          0x3
a5          0x8
t8          0x1f
t9          0x8
t10         0xcccccccccccccccd
t11         0x1f
ra          0xfffffc00006c8150  panic+0x240
t12         0xfffffc00008c4560  cpu_Debugger
at          0x63697373
gp          0xfffffc0000beb2c0  __link_set_bufq_strats_sym_bufq_strat_dummy+0x80
08
sp          0xfffffe000ecb36f8
pc          0xfffffc00008c4564  cpu_Debugger+0x4
ps          0x7
ai          0x1f
pv          0xfffffc00008c4560  cpu_Debugger
netbsd:cpu_Debugger+0x4:        ret     zero,(ra)
db: x/i 0xfffffc0000887170
netbsd:cia_swiz_io_read_1+0x10: ldl     v0,0(t0)
db: 

	Boot messages:

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 5.0 (GENERIC-$Revision: 1.325 $) #2: Fri May 15 16:13:36 CEST 2009
        he@albatross.urc.uninett.no:/usr/obj/sys/arch/alpha/compile/GENERIC
Digital AlphaPC 164LX 533 MHz, s/n 
8192 byte page size, 1 processor.
total memory = 512 MB
(2128 KB reserved for PROM, 509 MB used by NetBSD)
avail memory = 492 MB
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21164A-2
cpu0: Architecture extensions: 1<BWX>
cia0 at mainbus0: DECchip 2117x Core Logic Chipset (Pyxis), pass 1
cia0: extended capabilities: 1<BWEN>
cia0: using BWX for PCI config access
pci0 at cia0 bus 0
satalink0 at pci0 dev 5 function 0
satalink0: Silicon Image SATALink 3512 (rev. 0x01)
satalink0: using eb164 irq 2 for native-PCI interrupt
atabus0 at satalink0 channel 0
atabus1 at satalink0 channel 1
mpt0 at pci0 dev 7 function 0: Symbios Logic 53c1020/53c1030
mpt0: applying 1030 quirk
mpt0: interrupting at eb164 irq 1
scsibus0 at mpt0: 16 targets, 8 luns per target
mpt1 at pci0 dev 7 function 1: Symbios Logic 53c1020/53c1030
mpt1: applying 1030 quirk
mpt1: interrupting at eb164 irq 8
scsibus1 at mpt1: 16 targets, 8 luns per target
sio0 at pci0 dev 8 function 0: Intel 82378ZB System I/O (rev. 0x43)
tlp0 at pci0 dev 9 function 0: DECchip 21143 Ethernet, pass 3.0
tlp0: interrupting at eb164 irq 3
tlp0: DEC DE500-BA, Ethernet address 08:00:2b:c5:9f:84
tlp0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
cmdide0 at pci0 dev 11 function 0
cmdide0: CMD Technology PCI0646 (rev. 0x01)
cmdide0: primary channel wired to compatibility mode
cmdide0: primary channel interrupting at isa irq 14
atabus2 at cmdide0 channel 0
cmdide0: secondary channel wired to compatibility mode
cmdide0: secondary channel interrupting at isa irq 15
atabus3 at cmdide0 channel 1
isa0 at sio0
lpt0 at isa0 port 0x3bc-0x3bf irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
attimer0 at isa0 port 0x40-0x43: AT Timer
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker (CPU-intensive output)
spkr0 at pcppi0
isabeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
mcclock0 at isa0 port 0x70-0x71: mc146818 compatible time-of-day clock
attimer0: attached to pcppi0
satalink0: port 0: device present, speed: 1.5Gb/s
scsibus0: waiting 2 seconds for devices to settle...
scsibus1: waiting 2 seconds for devices to settle...
wd0 at atabus0 drive 0: <SAMSUNG HD103UJ>
wd0: 931 GB, 1938021 cyl, 16 head, 63 sec, 512 bytes/sect x 1953525168 sectors
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
wd1 at atabus2 drive 0: <WDC WD800JB-00CRA1>
wd1: 76319 MB, 155061 cyl, 16 head, 63 sec, 512 bytes/sect x 156301488 sectors
sd0 at scsibus1 target 0 lun 0: <SEAGATE, ST336605LC, 2200> disk fixed
sd0: 34732 MB, 29550 cyl, 4 head, 601 sec, 512 bytes/sect x 71132959 sectors
sd0: sync (12.50ns offset 63), 16-bit (160.000MB/s) transfers, tagged queueing
sd1 at scsibus1 target 1 lun 0: <HITACHI, DX32CJ-36MC, A2T2> disk fixed
sd1: 35256 MB, 15314 cyl, 12 head, 392 sec, 512 bytes/sect x 72205440 sectors
sd1: sync (12.50ns offset 126), 16-bit (160.000MB/s) transfers, tagged queueing
sd2 at scsibus1 target 2 lun 0: <SEAGATE, ST336705LC, 5028> disk fixed
sd2: 34732 MB, 19036 cyl, 8 head, 467 sec, 512 bytes/sect x 71132960 sectors
sd2: sync (12.50ns offset 63), 16-bit (160.000MB/s) transfers, tagged queueing
sd3 at scsibus1 target 12 lun 0: <SEAGATE, SX336704LC, BC0A> disk fixed
sd3: 34732 MB, 14100 cyl, 12 head, 420 sec, 512 bytes/sect x 71132960 sectors
sd3: sync (25.00ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
Kernelized RAIDframe activated
raid0: RAID Level 0
raid0: Components: /dev/sd0a /dev/sd1a /dev/sd2a /dev/sd3a
raid0: Total Sectors: 284529152 (138930 MB)
root on wd1a dumps on wd1b
root file system type: ffs
WARNING: clock gained 20 days
WARNING: CHECK AND RESET THE DATE!


>How-To-Repeat:
	Continually run pkgsrc bulk builds, watch the machine panic
	with a similar machine check as above one to two times per
	week when equipped with this hardware.
>Fix:
	Sorry, none supplied; would appreciate hints.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.