NetBSD Problem Report #47640

From www@NetBSD.org  Sun Mar 10 17:53:59 2013
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 2F85263EC24
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 10 Mar 2013 17:53:59 +0000 (UTC)
Message-Id: <20130310175358.0C90563EC24@www.NetBSD.org>
Date: Sun, 10 Mar 2013 17:53:57 +0000 (UTC)
From: roop@tamasi.org
Reply-To: roop@tamasi.org
To: gnats-bugs@NetBSD.org
Subject: System will lock up to keyboard/console and network sessions. Machine is on, console is on. Does not ping. 
X-Send-Pr-Version: www-1.0

>Number:         47640
>Category:       port-sgimips
>Synopsis:       System will lock up to keyboard/console and network sessions. Machine is on, console is on. Does not ping.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-sgimips-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Mar 10 17:55:01 +0000 2013
>Originator:     Rob
>Release:        6.0.1
>Organization:
>Environment:
NetBSD machine 6.0.1 NetBSD 6.0.1 (GENERIC32_IP3x) sgimips
mainbus0 (root): SGI-IP32 [SGI, a], 1 processor
cpu0 at mainbus0: MIPS R12000 CPU (0xe35) Rev. 3.5 with unknown FPC type (0x900) Rev. 0.0
>Description:
Observed:
- case1 soon after starting X ; started twm, started compiling vim. End of Xorg.log closes with a block of ~1000 null chars.
- case2 soon after executing atf-run ; left ~2000 ^@ chars at the end of the file. 
- case3 far in to executing atf-run 
- case4 compiling vim; dependency digest-20111104, to a library file(1) sha1.so of type "data", filled with many ^@ chars 
- case5 compiling new 6.0.1 kernel; completed all the compiling, eventually froze at echo'ing .gdbinit, see [1]

# cd /usr/tests
# atf-run | atf-report 2>&1 | tee -a /home/oo/report-atf2.txt 

No panic(9) observed.
No core found by savecore(8).

/etc/sysctl.conf
ddb.onpanic?=2

Power button does not work to power off or reset.
Must power cycle to restart.

Pull keyboard connector and replace causes all lights to flash once. 
No further responses - CAPSLOCK doesn't light etc.

Other info,
Each lockup is punctuated by an audible disk write of about 0.25second.
Filesystem: log replay on reboot always seems successful.
Set off a kernel build from 6.0.1 while writing this up and this made it to the end, but locked up towards the end. Last few object files were corrupted.
Observing /netbsd: crime: cpu error 4 at address 68326508; Ten instances so far in 2 days; number always different between 68326508 - 83730988 so far. System will continue running, and seems not to be directly related to this crash.

[1]End of kernel compile:
#   compile  MITBSGIG32_IP32/zlib.o
gcc -G 0 -mno-abicalls -msoft-float -ffixed-24 -ffreestanding -fno-zero-initialized-in-bss -g -O2 -fno-strict-aliasing -std=gnu99 -Werror -Wall -Wno-main -Wno-format-zero-length -Wpointer-arith -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wno-unreachable-code -Wno-pointer-sign -Wno-attributes -Wno-sign-compare -march=mips3 -mtune=vr5000 -Dsgimips -I. -I../../../../../common/include -I../../../../arch -I../../../.. -nostdinc -DMIPS3_ENABLE_CLOCK_INTR -DMIPS3 -DMAXUSERS=32 -D_KERNEL -D_KERNEL_OPT -std=gnu99 -I../../../../lib/libkern/../../../common/lib/libc/quad -I../../../../lib/libkern/../../../common/lib/libc/string -I../../../../lib/libkern/../../../common/lib/libc/arch/mips/string -I../../../../dist/ipf -c ../../../../net/zlib.c

#    create  MITBSGIG32_IP32/.gdbinit
rm -f .gdbinit
echo "source ../../../../gdbscripts/bdump" >> .gdbinit
echo "source ../../../../gdbscripts/cpus" >> .gdbinit
echo "source ../../../../gdbscripts/kdump" >> .gdbinit
echo "source ../../../../gdbscripts/lwps" >> .gdbinit
echo "source ../../../../gdbscripts/module" >> .gdbinit
Read from remote host xxxx: Operation timed out

Subsequently, the make returned,
wsmouse.o: file not recognized: File format not recognized
then after rm and make,
wsmux.o: file not recognized: File format not recognized
then after rm and make,
xform.o: file not recognized: File format not recognized

Does this show that the signature to the crash is a null data being written out on a flush?

>How-To-Repeat:
atf-run /usr/test 
Compiling kernel, vim, digest

>Fix:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.