NetBSD Problem Report #35120

From www@NetBSD.org  Sat Nov 25 19:10:07 2006
Return-Path: <www@NetBSD.org>
Received: by narn.NetBSD.org (Postfix, from userid 31301)
	id 40DDD63B90D; Sat, 25 Nov 2006 19:10:07 +0000 (UTC)
Message-Id: <20061125191007.40DDD63B90D@narn.NetBSD.org>
Date: Sat, 25 Nov 2006 19:10:07 +0000 (UTC)
From: thomas.feddersen@t-online.de
Reply-To: thomas.feddersen@t-online.de
To: gnats-bugs@NetBSD.org
Subject: NetBSD crashes on filetransfer / Intel MoBo with ICH8
X-Send-Pr-Version: www-1.0

>Number:         35120
>Category:       kern
>Synopsis:       NetBSD crashes on filetransfer / Intel MoBo with ICH8
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Nov 25 19:15:00 +0000 2006
>Closed-Date:    Tue Nov 28 19:26:18 +0000 2006
>Last-Modified:  Tue Nov 28 19:26:18 +0000 2006
>Originator:     Thomas Feddersen
>Release:        4.99.4
>Organization:
Dipl.-Ing. Thomas Feddersen, Beratender Ingenieur
>Environment:
NetBSD Ximus.feddersen.xx 4.99.4 NetBSD 4.99.4 (GENERIC.MPACPI) #0: Tue Nov 21 18:36:20 CET 2006  bouyer@blues.lip6.fr:/Volumes/data/bouyer/tmp/i386/obj/Volumes/data/bouyer/current/src/sys/arch/i386/compile/GENERIC.MPACPI i386
>Description:
from the replies to my previous post "does NetBSD support Intel ICH8 ?"
http://mail-index.netbsd.org/netbsd-users/2006/11/21/0000.html
I gained the notion that NetBSD somehow works on ICH8. 

Is this assumption correct or should I change to older hardware?

So I went ahead and got me a system with an Intel DG965WHMKR MoBo and plugged a Core2Duo E6300 and 6 SATA drives and one ATAPI DVD into it. I installed the release from Manuel Bouyer, it's available in
ftp://ftp-asim.lip6.fr/outgoing/bouyer/i386/

The system runs allright, memtest86+ did 20 passes 0 errors. I could set up a RAID5 array of 6 drives, use pkgsrc to make samba and mc.

When I copy / write a large amount of data - regardless whether between disks (pax) or within one disk (tar -xzvpf pkgsrc.tar.gz -C /usr) or via network (Samba, put one single file) the operation starts and after about 30 seconds the system drops into debugger. This behaviour is the same regardless whether the involved drives are RAID or not. 

The errormesages vary, here is what I copied off the console screen:
---errormessage-1----------------------------
uvm_fault(0cx09ead40, 0x4090c000, 1) -> 0xe
kernel: supervisor trap page fault, code =0
Stopped in pid 24.1 (pagedaemon) at  netbsd:genfs_putpages+0x68a: movl
0
x24(%esi),%edx
---errormessage-2----------------------------
uvm_fault(0cx09ead40, 0x4090c000, 2) -> 0xe
kernel: supervisor trap page fault, code =0
Stopped in pid 23.1 (ioflush) at  netbsd:genfs_putpages+0xae2: movl
%
eax,0x14(%edx)
---errormessage-3----------------------------
panic: lockmgr: non-zero exclusive count
Stopped in pid 25.1 (ioflush) at netbsd:cpu_Debugger+0x4:   popl
%
ebp
---errormessage-4----------------------------
uvm_fault(0cx09ead40, 0x45030000, 1) -> 0xe
kernel: supervisor trap page fault, code =0
Stopped in pid 25.1 (ioflush) at  netbsd:pvtree_SPLAY_MINMAX+0x61: 
m
ovl    0(%edx),%ecx
---errormessage-5----------------------------
dmode 8124 mode 8124 dgen 599057a2 gen 599057a2
size b92 blocks 8
ino 307684 ipref 302805
panic: ffs_valloc: dup alloc
Stopped in pid 681.1 (pax) at netbsd:cpu_Debugger+0x4:  popl  %eb
---/selection of errormessages-----------------------------

I am unable to recover from those faults -> reboot. There are no coredumps and no relevant entires in /var/log/messages. The file system checks during the following reboot usually find errors.

One more thing that may or may not be related:
/var/run/dmesg.boot is garbled (continous "M^?M^?")at the beginning and at the end, although the messages run legibly over the console screen. This behavior also occurs under NetBSD-LiveCD-2007. Portions of /var/log/meassages are also garbled.

Ximus# less /var/log/messages
... snip...
Nov 25 19:01:41 Ximus /netbsd: com0 at acpi0 (PNP0501-1)
Nov 25 19:01:41 Ximus /netbsd: com0: io 0x3ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
Nov 25 19:01:41 Ximus savecore: no core dump
Nov 25 19:01:41 Ximus ntpd[476]: ntpd 4.2.2p2-o  (1)
Nov 25 19:01:42 Ximus ntpd[442]: precision = 1.676 usec
Nov 25 19:01:42 Ximus ntpd[442]: Listening on interface wildcard, 0.0.0.0#123 Disabled
Nov 25 19:01:42 Ximus ntpd[442]: Listening on interface wildcard, ::#123 Disabled
Nov 25 19:01:42 Ximus ntpd[442]: Listening on interface rtk0, fe80::250:fcff:fe6f:ff7f#123 Enabled
Nov 25 19:01:42 Ximus ntpd[442]: Listening on interface rtk0, 192.168.1.26#123 Enabled
Nov 25 19:01:42 Ximus ntpd[442]: Listening on interface lo0, 127.0.0.1#123 Enabled
Nov 25 19:01:42 Ximus ntpd[442]: Listening on interface lo0, ::1#123 Enabled
Nov 25 19:01:42 Ximus ntpd[442]: Listening on interface lo0, fe80::1#123 Enabled
Nov 25 19:01:42 Ximus ntpd[442]: kernel time sync status 2040
Nov 25 19:01:42 Ximus ntpd[442]: frequency initialized 31.196 PPM from /var/db/ntp.drift
Nov 25 19:01:42 Ximus /netbsd: ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
(END)

>How-To-Repeat:
Set up a system like above, install NetBSD 4.99.4, mount wd1 to /mnt and write a lot of data, for example:
Ximus# cd /; pax -v -X -rw -pe / /mnt
or
Ximus# tar -xzvpf pkgsrc.tar.gz -C /usr
or
put a large file via Samba

on a second screen run 
Ximus# systat -w1 iostat
to watch what's happening - writes ~ 30.000 kBytes/sec to disk 
(the error also occurs when you're not watching)
>Fix:
I have not the slightest idea

>Release-Note:

>Audit-Trail:
From: "Thomas Feddersen" <Thomas.Feddersen@t-online.de>
To: <gnats-bugs@NetBSD.org>
Cc: <bouyer@antioche.eu.org>
Subject: RE: kern/35120
Date: Tue, 28 Nov 2006 20:22:28 +0100

 Dear kern-bug-people,

 the PR can be closed. It was a hardware problem:

 I removed the AENEON (1Rx16 PC2-4200U-444-11) memory and 
 replaced it with KINGSTON (PC2-4200 CL4 240).
 -> The system no longer drops into debugger.

 One question remains: is there a reliable memory tester? Memtest86+ wasn't
 in this case.

 Please excuse me for the inconvenience caused.

 Kind regards
 Thomas Feddersen

State-Changed-From-To: open->closed
State-Changed-By: bouyer@netbsd.org
State-Changed-When: Tue, 28 Nov 2006 19:26:18 +0000
State-Changed-Why:
Closed on submitter request; is was incompatible memory.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.