NetBSD Problem Report #45609

From darcy@netbsd.org  Sun Nov 13 13:10:33 2011
Return-Path: <darcy@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 073FD63B8A8
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 13 Nov 2011 13:10:33 +0000 (UTC)
Message-Id: <20111113131032.EBFA614A20D@mail.netbsd.org>
Date: Sun, 13 Nov 2011 13:10:32 +0000 (UTC)
From: darcy@NetBSD.org
Reply-To: darcy@NetBSD.org
To: gnats-bugs@gnats.NetBSD.org
Subject: Multiple system errors on multiple machines
X-Send-Pr-Version: 3.95

>Number:         45609
>Category:       kern
>Synopsis:       Newly installed systems are failing
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Nov 13 13:15:00 +0000 2011
>Last-Modified:  Sun Feb 12 13:50:08 +0000 2012
>Originator:     D'Arcy J.M. Cain
>Release:        NetBSD 5.1_STABLE (Nov  6 13:19:33 UTC 2010)
>Organization:
The NetBSD Project
D'Arcy J.M. Cain <darcy@NetBSD.org>
http://www.NetBSD.org/
>Environment:
System: NetBSD shell.vex.net 5.1 NetBSD 5.1 (GENERIC) #0: Sat Nov  6 13:19:33 UTC 2010  builds@b6.netbsd.org:/home/builds/ab/netbsd-5-1-RELEASE/amd64/201011061943 Z-obj/home/builds/ab/netbsd-5-1-RELEASE/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
I sent this PR yesterday but it has not appeared in the PR database
so I am trying again from homeworld.  I also have more details.

I recently switched a bunch of servers over to NetBSD from FreeBSD.  I
installed the latest ISO from NetBSD.org.  So far three of the systems
have crashed.  Two of them had a custom kernel with this this config:

# $Id$
# General config for 64 bit kernels on Vex.Net

include "arch/amd64/conf/GENERIC"

#ident          "INSTALL-$Revision: 1.80 $"

options SEMMNI=5000
options SEMMNS=1200
options SEMUME=100
options SEMMNU=300
options SHMMAXPGS=4096

pseudo-device   pf                      # PF packet filter

Some of the crashes were before I put the pf device into the kernel.

Then I was working on the latest machine.  I made a few config changes,
set up rc.conf and rebooted.  Here is what I got.  Note, taken from my
iPhone camera so I can't guarantee accuracy.

Oct 16 05:24:51 shutdown: reboot by darcy:
uvm_fault(0xffff800056daee80, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff8035413f cs 8 rflags 10246 cr2 0 cpl 0 rsp fff
800056ee69b0
panic: trap
Begin traceback...
uvm_fault(0xffff800056daee80, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff80529cbc cs 8 rflags 10246 cr2 0 cpl 0 rsp fff
00056ee6550
panic: trap
Faulted in mid-traceback: aborting...
dumping to dev 0,1 offset 50336072
dump 8190...

It took about three hours to do a dump.  On each machine the netbsd.#.gz file
is 10 bytes long.  I uncompressed the core file and ran gdb.  Here is the output
of "gdb /netbsd netbsd.0.core":

"/var/crash/netbsd.0.core" is not a core dump: File format not recognized

Here is the dmesg immediately after the reboot:

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 5.1 (GENERIC) #0: Sat Nov  6 13:19:33 UTC 2010
        builds@b6.netbsd.org:/home/builds/ab/netbsd-5-1-RELEASE/amd64/201011061943Z-obj/home/builds/ab/netbsd-5-1-RELEASE/src/sys/arch/amd64/compile/GENERIC
total memory = 8190 MB
avail memory = 7925 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
SMBIOS rev. 2.31 @ 0xdc010 (57 entries)
HP ProLiant DL140 G3 (417756-001)
mainbus0 (root)
cpu0 at mainbus0 apid 0: Intel 686-class, 2992MHz, id 0xf64
cpu1 at mainbus0 apid 2: Intel 686-class, 2992MHz, id 0xf64
cpu2 at mainbus0 apid 1: Intel 686-class, 2992MHz, id 0xf64
cpu3 at mainbus0 apid 3: Intel 686-class, 2992MHz, id 0xf64
ioapic0 at mainbus0 apid 4: pa 0xfec00000, version 20, 24 pins
ioapic1 at mainbus0 apid 5: pa 0xfec80000, version 20, 24 pins
acpi0 at mainbus0: Intel ACPICA 20080321
acpi0: X/RSDT: OemId <PTLTD ,  RSDT  ,06040000>, AslId < LTP,00000000>
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
ACPI-Fast 24-bit timer
MI0 (IPI0001) at acpi0 not configured
pcppi1 at acpi0 (SPKR, PNP0800): io 0x61
midi0 at pcppi1: PC speaker (CPU-intensive output)
sysbeep0 at pcppi1
attimer1 at acpi0 (TIME, PNP0100): io 0x40-0x43,0x50-0x53 irq 0
COMA (PNP0501) at acpi0 not configured
acpibut0 at acpi0 (PWRB, PNP0C0C): ACPI Power Button
attimer1: attached to pcppi1
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: vendor 0x8086 product 0x25c0 (rev. 0x13)
ppb0 at pci0 dev 2 function 0: vendor 0x8086 product 0x25f7 (rev. 0x13)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
ppb1 at pci1 dev 0 function 0: vendor 0x8086 product 0x3500 (rev. 0x01)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
ppb2 at pci2 dev 0 function 0: vendor 0x8086 product 0x3510 (rev. 0x01)
pci3 at ppb2 bus 3
pci3: i/o space, memory space enabled, rd/line, wr/inv ok
ppb3 at pci1 dev 0 function 3: vendor 0x8086 product 0x350c (rev. 0x01)
ppb3: disabling notification events
pci4 at ppb3 bus 11
pci4: i/o space, memory space enabled, rd/line, wr/inv ok
ppb4 at pci0 dev 3 function 0: vendor 0x8086 product 0x25e3 (rev. 0x13)
pci5 at ppb4 bus 12
pci5: i/o space, memory space enabled, rd/line, wr/inv ok
ppb5 at pci0 dev 4 function 0: vendor 0x8086 product 0x25fa (rev. 0x13)
pci6 at ppb5 bus 16
pci6: i/o space, memory space enabled, rd/line, wr/inv ok
ppb6 at pci0 dev 5 function 0: vendor 0x8086 product 0x25e5 (rev. 0x13)
pci7 at ppb6 bus 17
pci7: i/o space, memory space enabled, rd/line, wr/inv ok
ppb7 at pci0 dev 6 function 0: vendor 0x8086 product 0x25e6 (rev. 0x13)
pci8 at ppb7 bus 18
pci8: i/o space, memory space enabled, rd/line, wr/inv ok
ppb8 at pci0 dev 7 function 0: vendor 0x8086 product 0x25e7 (rev. 0x13)
pci9 at ppb8 bus 19
pci9: i/o space, memory space enabled, rd/line, wr/inv ok
pchb1 at pci0 dev 16 function 0
pchb1: vendor 0x8086 product 0x25f0 (rev. 0x13)
pchb2 at pci0 dev 16 function 1
pchb2: vendor 0x8086 product 0x25f0 (rev. 0x13)
pchb3 at pci0 dev 16 function 2
pchb3: vendor 0x8086 product 0x25f0 (rev. 0x13)
pchb4 at pci0 dev 17 function 0
pchb4: vendor 0x8086 product 0x25f1 (rev. 0x13)
pchb5 at pci0 dev 19 function 0
pchb5: vendor 0x8086 product 0x25f3 (rev. 0x13)
pchb6 at pci0 dev 21 function 0
pchb6: vendor 0x8086 product 0x25f5 (rev. 0x13)
pchb7 at pci0 dev 22 function 0
pchb7: vendor 0x8086 product 0x25f6 (rev. 0x13)
ppb9 at pci0 dev 28 function 0: vendor 0x8086 product 0x2690 (rev. 0x09)
ppb9: disabling notification events
pci10 at ppb9 bus 30
pci10: i/o space, memory space enabled, rd/line, wr/inv ok
bge0 at pci10 dev 0 function 0: Broadcom BCM5721 Gigabit Ethernet
bge0: interrupting at ioapic0 pin 16
bge0: ASIC BCM5750 B1 (0x4101), Ethernet address 00:18:fe:29:00:40
bge0: setting short Tx thresholds
brgphy0 at bge0 phy 1: BCM5750 1000BASE-T media interface, rev. 0
brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
ppb10 at pci0 dev 28 function 1: vendor 0x8086 product 0x2692 (rev. 0x09)
ppb10: disabling notification events
pci11 at ppb10 bus 31
pci11: i/o space, memory space enabled, rd/line, wr/inv ok
bge1 at pci11 dev 0 function 0: Broadcom BCM5721 Gigabit Ethernet
bge1: interrupting at ioapic0 pin 17
bge1: ASIC BCM5750 B1 (0x4101), Ethernet address 00:18:fe:29:00:41
bge1: setting short Tx thresholds
brgphy1 at bge1 phy 1: BCM5750 1000BASE-T media interface, rev. 0
brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX
, auto
uhci0 at pci0 dev 29 function 0: vendor 0x8086 product 0x2688 (rev. 0x09)
uhci0: interrupting at ioapic0 pin 23
usb0 at uhci0: USB revision 1.0
uhci1 at pci0 dev 29 function 1: vendor 0x8086 product 0x2689 (rev. 0x09)
uhci1: interrupting at ioapic0 pin 23
usb1 at uhci1: USB revision 1.0
uhci2 at pci0 dev 29 function 2: vendor 0x8086 product 0x268a (rev. 0x09)
uhci2: interrupting at ioapic0 pin 23
usb2 at uhci2: USB revision 1.0
ehci0 at pci0 dev 29 function 7: vendor 0x8086 product 0x268c (rev. 0x09)
ehci0: interrupting at ioapic0 pin 23
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2
usb3 at ehci0: USB revision 2.0
ppb11 at pci0 dev 30 function 0: vendor 0x8086 product 0x244e (rev. 0xd9)
pci12 at ppb11 bus 32
pci12: i/o space, memory space enabled
vga0 at pci12 dev 2 function 0: vendor 0x102b product 0x0522 (rev. 0x02)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
drm at vga0 not configured
ichlpcib0 at pci0 dev 31 function 0
ichlpcib0: vendor 0x8086 product 0x2670 (rev. 0x09)
timecounter: Timecounter "ichlpcib0" frequency 3579545 Hz quality 1000
ichlpcib0: 24-bit timer
ichlpcib0: TCO (watchdog) timer configured.
piixide0 at pci0 dev 31 function 2
piixide0: Intel 631xESB/632xESB Serial ATA Controller (rev. 0x09)
piixide0: bus-master DMA support present
piixide0: primary channel wired to compatibility mode
piixide0: primary channel interrupting at ioapic0 pin 14
atabus0 at piixide0 channel 0
piixide0: secondary channel wired to compatibility mode
piixide0: secondary channel interrupting at ioapic0 pin 15
atabus1 at piixide0 channel 1
ichsmb0 at pci0 dev 31 function 3: vendor 0x8086 product 0x269b (rev. 0x09)
ichsmb0: interrupting at ioapic0 pin 19
iic0 at ichsmb0: I2C bus
isa0 at ichlpcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
timecounter: Timecounter "TSC" frequency 2992707720 Hz quality 3000
uhub0 at usb0: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhub1 at usb1: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhub2 at usb2: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhub3 at usb3: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub3: 6 ports with 6 removable, self powered
uhidev0 at uhub0 port 1 configuration 1 interface 0
uhidev0: LITEON Technology USB Multimedia Keyboard, rev 1.10/1.01, addr 2, iclass 3/1
ukbd0 at uhidev0
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhidev1 at uhub2 port 1 configuration 1 interface 0
uhidev1: ServerEngines SE USB Device, rev 1.10/0.01, addr 2, iclass 3/1
ukbd1 at uhidev1
wd0 at atabus0 drive 0: <HUA721010KLA330>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 931 GB, 1938021 cyl, 16 head, 63 sec, 512 bytes/sect x 1953525168 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)
atapibus0 at atabus1: 2 targets
cd0 at atapibus0 drive 0: <CD-224E-N, , C.AA> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
cd0(piixide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
Kernelized RAIDframe activated
pad0: outputs: 44100Hz, 16-bit, stereo
audio0 at pad0: half duplex, playback, capture
wskbd2 at ukbd1 mux 1
wskbd2: connecting to wsdisplay0
uhidev2 at uhub2 port 1 configuration 1 interface 1
uhidev2: ServerEngines SE USB Device, rev 1.10/0.01, addr 2, iclass 3/1
ums0 at uhidev2: 8 buttons and Z dir
wsmouse0 at ums0 mux 0
boot device: wd0
root on wd0a dumps on wd0b
root file system type: ffs
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
Accounting started

>How-To-Repeat:
No idea.  It seems random.
>Fix:
Unknown

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/45609: Multiple system errors on multiple machines
Date: Sun, 13 Nov 2011 17:38:34 +0000

 On Sun, Nov 13, 2011 at 01:15:00PM +0000, darcy@NetBSD.org wrote:
  > Oct 16 05:24:51 shutdown: reboot by darcy:
  > uvm_fault(0xffff800056daee80, 0x0, 1) -> e
  > fatal page fault in supervisor mode
  > trap type 6 code 0 rip ffffffff8035413f cs 8 rflags 10246 cr2 0 cpl 0 rsp fff
  > 800056ee69b0
  > panic: trap
  > Begin traceback...
  > uvm_fault(0xffff800056daee80, 0x0, 1) -> e
  > fatal page fault in supervisor mode
  > trap type 6 code 0 rip ffffffff80529cbc cs 8 rflags 10246 cr2 0 cpl 0 rsp fff
                           ^^^^^^^^^^^^^^^^
  > 00056ee6550
  > panic: trap
  > Faulted in mid-traceback: aborting...
  > dumping to dev 0,1 offset 50336072
  > dump 8190...

 Given that the dump broke, can you use nm -n or objdump -d on the
 kernel to figure out where the program counter (rip) value is?

 (I guess since it's a release kernel someone else could check, but I
 don't have it on hand myself)

 -- 
 David A. Holland
 dholland@netbsd.org

From: D'Arcy Cain <darcy@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/45609
Date: Sat, 19 Nov 2011 17:33:34 -0500

 I pulled it off the distribution CD and ran objdump on it.  You can find 
 the result at http://gollum.vex.net/GENERIC.dump.

 I was starting to suspect that this was NFS related.  This morning I ran 
 a "pkg_rolling_replace -u" on five systems running 5.1 AMD64.  Four of 
 the system NFS mounted pkgsrc from the fifth.  The four NFS clients 
 crashed while doing the updates.

 -- 
 D'Arcy J.M. Cain <darcy@NetBSD.org>
 http://www.NetBSD.org/

From: Matthias Scheler <tron@NetBSD.org>
To: D'Arcy Cain <darcy@NetBSD.org>
Cc: Manuel Bouyer <bouyer@antioche.eu.org>, gnats-bugs@NetBSD.org,
	yamt@NetBSD.org, netbsd-bugs@NetBSD.org, gnats-admin@NetBSD.org
Subject: Re: kern/45609
Date: Sun, 12 Feb 2012 13:48:06 +0000

 On Sun, Feb 12, 2012 at 08:40:38AM -0500, D'Arcy Cain wrote:
 > On 12-02-11 07:50 AM, Manuel Bouyer wrote:
 > >it's not a duplicate of kern/45093: kern/45093 is a real kernel
 > 
 > Is it possible that kern/45609 is related to kern/45093?

 No, I don't think so. PR kern/45093 caused kernel hangs, not crashes.

 It could be a bge(4) bug in which case trying a NetBSD 5.1_STABLE
 snapshot (and *not* 5.1.2) might be worth a try.

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.