NetBSD Problem Report #46885

From dtyson@darkstar.anduin.org.uk  Fri Aug 31 21:03:03 2012
Return-Path: <dtyson@darkstar.anduin.org.uk>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 65AE463B86D
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 31 Aug 2012 21:03:03 +0000 (UTC)
Message-Id: <20120831210300.393AB5948@darkstar.anduin.org.uk>
Date: Fri, 31 Aug 2012 22:03:00 +0100 (BST)
From: dtyson@wirralcavinggroup.org.uk
Reply-To: dtyson@wirralcavinggroup.org.uk
To: gnats-bugs@gnats.NetBSD.org
Subject: NetBSD 6.0_RC1 spontaneously reboots as kernel starts to load
X-Send-Pr-Version: 3.95

>Number:         46885
>Category:       kern
>Synopsis:       NetBSD 6.0_RC1 spontaneously reboots as kernel starts to load
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 31 21:05:00 +0000 2012
>Closed-Date:    Sat Sep 08 15:19:45 +0000 2012
>Last-Modified:  Sat Sep 08 15:19:45 +0000 2012
>Originator:     Dave Tyson
>Release:        NetBSD 6.0_RC1
>Organization:
	Wirral Caving Group
>Environment:
System: NetBSD darkstar.anduin.org.uk 6.0_RC1 NetBSD 6.0_RC1 (GENERICD) #0: Fri Aug 31 12:48:38 BST 2012 root@darkstar.anduin.org.uk:/usr/obj/sys/arch/i386/compile/GENERICD i386
Architecture: i386
Machine: i386
>Description:
	Problem occurs on a particular desktop system using a standard
Intel D865GLC/D865PE50 motherboard, Pentium 4 2.6Ghz processor. System has 
been stable for many years and ran NetBSD 5 and was upgraded to NetBSD 6
when it was tagged. Regularly updated from source and worked fine with
GENERIC kernel.
After cvs'ing up to RC1 and building a new GENERIC kernel this would cause
the system to reboot just after the kernel loading message, but before the
version announcement.

Investigation showed that a GENERIC kernel with options DIAGNOSTIC would boot
successfully and the problem never showed up before as all NetBSD 6 beta/beta2
GENERIC kernels had options DIAGNOSTIC on by default and it was only removed
for RC1.

Testing with a serial console while loading a GENERIC kernel shows the standard
segment size messages followed by:

Loading /stand/i386/6.0/modules/ffs/ffs.kmod

the system reboots immediately after this with no other messages. i.e there is
no 'Loaded initial systab...'

Compiling a kernel with options DIAGNOSTIC commented out and options DEBUG
uncommented and testing that works perfectly as well!

As part of a sanity check I noticed that the root fs (128M) was FFSV1 and so
trashed and recreated it as FFSV2, reloaded it and updated the boot block. There
were no changes noticed, GENERIC still failed whereas GENERIC/DEBUG etc worked

Dmesg from DEBUG kernel below:

NetBSD 6.0_RC1 (GENERICD) #0: Fri Aug 31 12:48:38 BST 2012
        root@darkstar.anduin.org.uk:/usr/obj/sys/arch/i386/compile/GENERICD
total memory = 1022 MB
avail memory = 992 MB
timecounter: Timecounters tick every 10.000 msec
cprng kernel: WARNING insufficient entropy at creation.
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
RM plc                                                          (                       )
mainbus0 (root)
cpu0 at mainbus0 apid 0: Intel(R) Pentium(R) 4 CPU 2.60GHz, id 0xf29
cpu1 at mainbus0 apid 1: Intel(R) Pentium(R) 4 CPU 2.60GHz, id 0xf29
ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 20, 24 pins
acpi0 at mainbus0: Intel ACPICA 20110623
acpi0: X/RSDT: OemId <INTEL ,D865GLC ,20050804>, AslId <MSFT,00000097>
acpi0: SCI interrupting at int 9
timecounter: Timecounter "ACPI-Safe" frequency 3579545 Hz quality 900
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x43 irq 0
pckbc1 at acpi0 (PS2K, PNP0303) (kbd port): io 0x60,0x64 irq 1
pcppi1 at acpi0 (SPKR, PNP0800): io 0x61
midi0 at pcppi1: PC speaker
sysbeep0 at pcppi1
npx1 at acpi0 (COPR, PNP0C04): io 0xf0-0xff irq 13
npx1: reported by CPUID; using exception 16
FDC0 (PNP0700) at acpi0 not configured
UAR1 (PNP0501) at acpi0 not configured
LPT (PNP0400) at acpi0 not configured
SYSR (PNP0C02) at acpi0 not configured
FWH (INT0800) at acpi0 not configured
OSYS (PNP0C02) at acpi0 not configured
SYSM (PNP0C01) at acpi0 not configured
acpibut0 at acpi0 (SLPB, PNP0C0E-29): ACPI Sleep Button
apm0 at acpi0: Power Management spec V1.2
pckbd0 at pckbc1 (kbd slot)
pckbc1: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard
attimer1: attached to pcppi1
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: vendor 0x8086 product 0x2570 (rev. 0x02)
agp0 at pchb0: aperture at 0xf8000000, size 0x4000000
ppb0 at pci0 dev 1 function 0: vendor 0x8086 product 0x2571 (rev. 0x02)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
vga1 at pci1 dev 0 function 0: vendor 0x10de product 0x0170 (rev. 0xa3)
wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation), using wskbd0
wsmux1: connecting to wsdisplay0
drm at vga1 not configured
uhci0 at pci0 dev 29 function 0: vendor 0x8086 product 0x24d2 (rev. 0x02)
uhci0: interrupting at ioapic0 pin 16
usb0 at uhci0: USB revision 1.0
uhci1 at pci0 dev 29 function 1: vendor 0x8086 product 0x24d4 (rev. 0x02)
uhci1: interrupting at ioapic0 pin 19
usb1 at uhci1: USB revision 1.0
uhci2 at pci0 dev 29 function 2: vendor 0x8086 product 0x24d7 (rev. 0x02)
uhci2: interrupting at ioapic0 pin 18
usb2 at uhci2: USB revision 1.0
uhci3 at pci0 dev 29 function 3: vendor 0x8086 product 0x24de (rev. 0x02)
uhci3: interrupting at ioapic0 pin 16
usb3 at uhci3: USB revision 1.0
ehci0 at pci0 dev 29 function 7: vendor 0x8086 product 0x24dd (rev. 0x02)
ehci0: interrupting at ioapic0 pin 23
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2 uhci3
usb4 at ehci0: USB revision 2.0
ppb1 at pci0 dev 30 function 0: vendor 0x8086 product 0x244e (rev. 0xc2)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled
bktr0 at pci2 dev 0 function 0
bktr0: interrupting at ioapic0 pin 21
bktr0: Warning - card vendor 0x0000 (model 0x0000) unknown.
bktr0: Detected a DPL34-1@-@0 at 0x84
bktr0: Intel Smart Video III/VideoLogic Captivator PCI, <no> tuner, dpl3518a dolby.
vendor 0x109e product 0x0878 (miscellaneous multimedia, revision 0x11) at pci2 dev 0 function 1 not configured
adv1 at pci2 dev 1 function 0: AdvanSys ABP-9xxUA SCSI adapter
adv1: interrupting at ioapic0 pin 22
scsibus0 at adv1: 8 targets, 8 luns per target
fwohci0 at pci2 dev 2 function 0: vendor 0x11c1 product 0x5811 (rev. 0x61)
fwohci0: interrupting at ioapic0 pin 17
fwohci0: OHCI version 1.0 (ROM=1)
fwohci0: No. of Isochronous channels is 8.
fwohci0: EUI64 30:bd:05:02:00:00:1a:7e
fwohci0: Phy 1394a available S400, 3 ports.
fwohci0: Link S400, max_rec 2048 bytes.
ieee1394if0 at fwohci0: IEEE1394 bus
fwip0 at ieee1394if0: IP over IEEE1394
fwohci0: Initiate bus reset
fxp0 at pci2 dev 8 function 0: Intel PRO/100 VM Network Controller with 82562ET/EZ PHY (rev. 0x01)
fxp0: interrupting at ioapic0 pin 20
fxp0: Ethernet address 00:0c:f1:6c:45:cb
inphy0 at fxp0 phy 1: i82562ET 10/100 media interface, rev. 0
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ichlpcib0 at pci0 dev 31 function 0: vendor 0x8086 product 0x24d0 (rev. 0x02)
timecounter: Timecounter "ichlpcib0" frequency 3579545 Hz quality 1000
ichlpcib0: 24-bit timer
ichlpcib0: TCO (watchdog) timer configured.
gpio0 at ichlpcib0: 64 pins
fwhrng0 at ichlpcib0: Intel Firmware Hub Random Number Generator
piixide0 at pci0 dev 31 function 1: Intel 82801EB IDE Controller (ICH5) (rev. 0x02)
piixide0: bus-master DMA support present
piixide0: primary channel configured to compatibility mode
piixide0: primary channel interrupting at ioapic0 pin 14
atabus0 at piixide0 channel 0
piixide0: secondary channel configured to compatibility mode
piixide0: secondary channel interrupting at ioapic0 pin 15
atabus1 at piixide0 channel 1
piixide1 at pci0 dev 31 function 2: Intel 82801EB Serial ATA Controller (rev. 0x02)
piixide1: bus-master DMA support present
piixide1: primary channel configured to native-PCI mode
piixide1: using ioapic0 pin 18 for native-PCI interrupt
atabus2 at piixide1 channel 0
piixide1: secondary channel configured to native-PCI mode
atabus3 at piixide1 channel 1
ichsmb0 at pci0 dev 31 function 3: vendor 0x8086 product 0x24d3 (rev. 0x02)
ichsmb0: interrupting at ioapic0 pin 17
iic0 at ichsmb0: I2C bus
auich0 at pci0 dev 31 function 5: i82801EB (ICH5) AC-97 Audio
auich0: interrupting at ioapic0 pin 17
auich0: ac97: Analog Devices AD1985 codec; headphone, 20 bit DAC, no 3D stereo
auich0: ac97: ext id 0x3c7<AMAP,LDAC,SDAC,CDAC,SPDIF,DRA,VRA>
isa0 at ichlpcib0
lpt0 at isa0 port 0x378-0x37b irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
acpicpu0 at cpu0: ACPI CPU
acpicpu0: C1: HLT, lat   0 us, pow     0 mW
acpicpu0: T0: I/O, lat   1 us, pow     0 mW, 100 %
acpicpu0: T1: I/O, lat   1 us, pow     0 mW,  88 %
acpicpu0: T2: I/O, lat   1 us, pow     0 mW,  76 %
acpicpu0: T3: I/O, lat   1 us, pow     0 mW,  64 %
acpicpu0: T4: I/O, lat   1 us, pow     0 mW,  52 %
acpicpu0: T5: I/O, lat   1 us, pow     0 mW,  40 %
acpicpu0: T6: I/O, lat   1 us, pow     0 mW,  28 %
acpicpu0: T7: I/O, lat   1 us, pow     0 mW,  16 %
acpicpu1 at cpu1: ACPI CPU
fwohci0: BUS reset
fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
ieee1394if0: 1 nodes, maxhop <= 0 cable IRM irm(0) (me)
ieee1394if0: bus manager 0
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
scsibus0: waiting 2 seconds for devices to settle...
auich0: measured ac97 link rate at 48000 Hz
audio0 at auich0: full duplex, playback, capture, mmap, independent
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
uhub0 at usb1: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhub1 at usb0: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhub2 at usb3: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhub3 at usb2: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
wd0 at atabus0 drive 0
uhub4 at usb4: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub4: 8 ports with 8 removable, self powered
wd0: <Maxtor 6L080L0>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 78167 MB, 158816 cyl, 16 head, 63 sec, 512 bytes/sect x 160086528 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd1 at atabus0 drive 1
wd1: <FUJITSU MPE3064AT>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 6187 MB, 13410 cyl, 15 head, 63 sec, 512 bytes/sect x 12672450 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)
wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
wd1(piixide0:0:1): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA)
atapibus0 at atabus1: 2 targets
cd0 at atapibus0 drive 0: <CD-RW BCE1610IM, , VER A.2> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
cd1 at atapibus0 drive 1: <HL-DT-ST DVDRAM GSA-4167B, 00DA7A5F5015, DL13> cdrom removable
cd1: 32-bit data port
cd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
cd0(piixide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
cd1(piixide0:1:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
umass0 at uhub4 port 5 configuration 1 interface 0
umass0: Generic Mass Storage Device, rev 2.00/1.00, addr 2
umass0: using SCSI over Bulk-Only
scsibus1 at umass0: 2 targets, 1 lun per target
sd0 at scsibus1 target 0 lun 0: <Generic, Storage Device, 0.00> disk removable
sd0: drive offline
sd0: unable to open device, error = 19
uscanner0 at uhub1 port 2
uscanner0: vendor 0x055f product 0x0006, rev 1.00/1.00, addr 2
Kernelized RAIDframe activated
cprng sysctl: WARNING insufficient entropy at creation.
findroot: unable to read block 80035831 of dev wd1 (22)
opendisk: can't open dev sd0 (19)
opendisk: can't open dev sd0 (19)
boot device: wd0
root on wd0a dumps on wd0b
mountroot: trying smbfs...
mountroot: trying ntfs...
mountroot: trying nfs...
mountroot: trying msdos...
mountroot: trying lfs...
mountroot: trying ext2fs...
mountroot: trying ffs...
root file system type: ffs
init: copying out path `/sbin/init' 11
uhidev0 at uhub2 port 2 configuration 1 interface 0
uhidev0: Logitech USB Receiver, rev 2.00/22.00, addr 2, iclass 3/1
ums0 at uhidev0: 16 buttons, W and Z dirs
wsmouse0 at ums0 mux 0
uhidev1 at uhub2 port 2 configuration 1 interface 1
uhidev1: Logitech USB Receiver, rev 2.00/22.00, addr 2, iclass 3/0
uhidev1: 17 report ids
uhid0 at uhidev1 reportid 3: input=4, output=0, feature=0
uhid1 at uhidev1 reportid 16: input=6, output=6, feature=0
uhid2 at uhidev1 reportid 17: input=19, output=19, feature=0
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)

Two older and slightly slower, but otherwise similar machines were 
upgraded to RC1 in the same way work fine and are in production as
web servers.

>How-To-Repeat:
Compile a standard RC1 GENERIC kernel on this particular system. Try and boot it,
watch the machine reboot.	
>Fix:
Use a kernel with options DIAGNOSTIC or options DEBUG defined.

I will try and go back to an early snapshot of 6 and try compiling a GENERIC
kernel with DIAGNOSTIC commented out and see if that exhibits the same symptoms.

>Release-Note:

>Audit-Trail:
From: "Martin S. Weber" <Ephaeton@gmx.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46885: NetBSD 6.0_RC1 spontaneously reboots as kernel starts to load
Date: Fri, 31 Aug 2012 19:19:57 -0400

 On Fri, Aug 31, 2012 at 09:05:01PM +0000, dtyson@wirralcavinggroup.org.uk wrote:
 > (...)
 > After cvs'ing up to RC1 and building a new GENERIC kernel this would cause
 > the system to reboot just after the kernel loading message, but before the
 > version announcement.
 > (...)

 This looks like port-i386/46061. Try my "fix" - go to amd64 (keep in mind to
 clean up your /dev before you do that!)

 Regards,
 -Martin

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/46885: NetBSD 6.0_RC1 spontaneously reboots as kernel
 starts to load
Date: Sat, 1 Sep 2012 23:13:44 +0200

 On Fri, Aug 31, 2012 at 09:05:01PM +0000, dtyson@wirralcavinggroup.org.uk wrote:
 > 	Problem occurs on a particular desktop system using a standard
 > Intel D865GLC/D865PE50 motherboard, Pentium 4 2.6Ghz processor. System has 
 > been stable for many years and ran NetBSD 5 and was upgraded to NetBSD 6
 > when it was tagged. Regularly updated from source and worked fine with
 > GENERIC kernel.
 > After cvs'ing up to RC1 and building a new GENERIC kernel this would cause
 > the system to reboot just after the kernel loading message, but before the
 > version announcement.
 > 
 > Investigation showed that a GENERIC kernel with options DIAGNOSTIC would boot
 > successfully and the problem never showed up before as all NetBSD 6 beta/beta2
 > GENERIC kernels had options DIAGNOSTIC on by default and it was only removed
 > for RC1.

 I can confirm this on a DELL optiplex with a "Intel(R) Celeron(R) CPU 2.80GHz".


 > [...]
 > Compiling a kernel with options DIAGNOSTIC commented out and options DEBUG
 > uncommented and testing that works perfectly as well!

 I looked at low-level code that would have
 #if defined(DIAGNOSTIC) || defined(DEBUG)
 but didn't spot anything (the only file I found in i386 and x86 which
 have something like that is apmcall.S, which is not compiled in).

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/46885: NetBSD 6.0_RC1 spontaneously reboots as kernel
 starts to load
Date: Sun, 2 Sep 2012 00:39:04 +0200

 On Sat, Sep 01, 2012 at 11:13:44PM +0200, Manuel Bouyer wrote:
 > I can confirm this on a DELL optiplex with a "Intel(R) Celeron(R) CPU 2.80GHz".
 > 
 > 
 > > [...]
 > > Compiling a kernel with options DIAGNOSTIC commented out and options DEBUG
 > > uncommented and testing that works perfectly as well!
 > 
 > I looked at low-level code that would have
 > #if defined(DIAGNOSTIC) || defined(DEBUG)
 > but didn't spot anything (the only file I found in i386 and x86 which
 > have something like that is apmcall.S, which is not compiled in).

 I did some experiments:
 - a kernel with the files from i386/i386 and *acpi*.o compiled -DDEBUG
   hangs instead of resetting
 - a kernel with kern_*.o,  *acpi*.o and uvm*.o compiled with -DDEBUG
   (and everything else without -DDEBUG) resets just after the first line
   of green message (which I think is about symbols, but it resets too fast
   to really be sure)
 - a kernel with everything but kern_*.o,  *acpi*.o and uvm*.o compiled with
   -DDEBUG boots

 So, at first glance, it looks like DEBUG or DIAGNOSTIC doesn't make it
 work because it changes code, but because it changes sizes.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46885: NetBSD 6.0_RC1 spontaneously reboots as kernel starts to load
Date: Sun, 02 Sep 2012 08:18:32 +0700

     Date:        Sat,  1 Sep 2012 22:40:06 +0000 (UTC)
     From:        Manuel Bouyer <bouyer@antioche.eu.org>
     Message-ID:  <20120901224006.04E7863B86D@www.NetBSD.org>

 What I am seeing might be completely unrelated, but ...

   |  - a kernel with kern_*.o,  *acpi*.o and uvm*.o compiled with -DDEBUG
   |    (and everything else without -DDEBUG) resets just after the first line
   |    of green message (which I think is about symbols, but it resets too fast
   |    to really be sure)

 I haven't tried different compilation options, just the standard CD
 install kernels from normal i386 and amd64 builds (but booting both
 with and without ACPI enabled), but I have been seeing this same symptom
 with both -current and NetBSD 6 (BETA and BETA2, yet to try RC1) for
 the past couple of months (I started just before BETA was updated to
 BETA2).

 I hadn't reported yet, as unlike others, I'm seeing this on a new laptop,
 that has never booted or run NetBSD (native, it works fine booted in a
 virtualbox), so I was assuming that this system is simply too new for
 NetBSD (it is a UEFI bios, boots from GPT disc, not that that would matter
 for a CD insall boot at this stage, the disc would not yet be detected)

 This is an ASUS Core-i7, 12GB - it boots and runs linux fine, and
 boots a live FreeBSD from CD (planning to try installing that, have been
 slack as I read somewhere that FreeBSD, and I was guessing NetBSD too,
 doesn't properly handle UEFI bios booting - that's what I had been
 assuming my problem was related to).

 In any case, the boot loader loads the kernel fine, transfers to it,
 one green line seems to br printed (hard to be sure what is going on,
 it is visible just milliseconds it seems) then reset, and reboot.

 If it hapopens that what I am seeing is a symptom of the same problem
 that is beig reported here, then at least I have evidence that it is
 not just i386 (since I mostly want to boot an amd64 version) and not
 just the one reported processor model.

 kre

From: "Valeriy E. Ushakov" <uwe@stderr.spb.ru>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46885: NetBSD 6.0_RC1 spontaneously reboots as kernel starts to load
Date: Sun, 2 Sep 2012 19:22:32 +0400

 On Sat, Sep 01, 2012 at 22:40:06 +0000, Manuel Bouyer wrote:

 >  So, at first glance, it looks like DEBUG or DIAGNOSTIC doesn't make it
 >  work because it changes code, but because it changes sizes.

 This and port-i386/46061 look (at least at the first glance) similar
 to what I see with VBox.  I have a bit out of date current tree, from
 mid-July or so.  A GENERIC kernel with some more stuff added to it
 (PCIVERBOSE and some more) would not boot, getting stuck in an endless
 trap loop pretty early in uvm init.  Trimming the kernel size a bit
 would make it boot.  Now, this well may be a VBox bug, but I wonder if
 you can try to boot a trimmed kernel config on your machine to see if
 it boots (unfortunately I will only be able to test this myself on
 real hardware in a few days from now).

 -uwe

From: "Valeriy E. Ushakov" <uwe@stderr.spb.ru>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46885: NetBSD 6.0_RC1 spontaneously reboots as kernel starts to load
Date: Mon, 3 Sep 2012 01:12:55 +0400

 On Sun, Sep 02, 2012 at 15:25:02 +0000, Valeriy E. Ushakov wrote:

 >  This and port-i386/46061 look (at least at the first glance) similar
 >  to what I see with VBox.  I have a bit out of date current tree, from
 >  mid-July or so.  A GENERIC kernel with some more stuff added to it
 >  (PCIVERBOSE and some more) would not boot, getting stuck in an endless
 >  trap loop pretty early in uvm init.  Trimming the kernel size a bit
 >  would make it boot.  Now, this well may be a VBox bug, but I wonder if
 >  you can try to boot a trimmed kernel config on your machine to see if
 >  it boots (unfortunately I will only be able to test this myself on
 >  real hardware in a few days from now).

 I did a few more experiments under vbox --recompile-supervisor
 (i.e. ~= qemu interpreter) and it looks sensitive to size.

 Playing with the following config:


   include     "arch/i386/conf/GENERIC"
   options     PCIVERBOSE

   # this line has 64 bytes ######################################
   ###############################################################
   ###############################################################
   ...

 I do get bootable kernel without those comment lines added to beef up
 the kernel size.  With 12KB worth of comments I get unbootable kernel.
 With 16KB it's bootable again.  Current current will probably have
 different numbers, but you get the idea.

 -uwe

From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, dtyson@wirralcavinggroup.org.uk
Subject: Re: kern/46885: NetBSD 6.0_RC1 spontaneously reboots as kernel starts to load
Date: Sun, 2 Sep 2012 22:35:04 +0100

 On Sun, Sep 02, 2012 at 09:15:07PM +0000, Valeriy E. Ushakov wrote:
 >  
 >  I did a few more experiments under vbox --recompile-supervisor
 >  (i.e. ~= qemu interpreter) and it looks sensitive to size.
 >  
 >  Playing with the following config:
 >  
 >  
 >    include     "arch/i386/conf/GENERIC"
 >    options     PCIVERBOSE
 >  
 >    # this line has 64 bytes ######################################
 >    ###############################################################
 >    ###############################################################
 >    ...
 >  
 >  I do get bootable kernel without those comment lines added to beef up
 >  the kernel size.  With 12KB worth of comments I get unbootable kernel.
 >  With 16KB it's bootable again.  Current current will probably have
 >  different numbers, but you get the idea.

 Maybe nothing is flushing the dcache (or invalidating the icache)
 before the kernel starts?
 That would lead to obscure size based fubars.

 With vbox/qemu it might be possible to work out which!

 Another issue I have seen (a long time ago) was memory overwrites
 caused by rx ethernet frames when an ethernet chip didn't get reset.

 	David

 -- 
 David Laight: david@l8s.co.uk

From: "Valeriy E. Ushakov" <uwe@stderr.spb.ru>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46885: NetBSD 6.0_RC1 spontaneously reboots as kernel starts to load
Date: Mon, 3 Sep 2012 06:56:59 +0400

 On Mon, Sep 03, 2012 at 01:12:55 +0400, Valeriy E. Ushakov wrote:

 > On Sun, Sep 02, 2012 at 15:25:02 +0000, Valeriy E. Ushakov wrote:
 > 
 > >  This and port-i386/46061 look (at least at the first glance) similar
 > >  to what I see with VBox.  I have a bit out of date current tree, from
 > >  mid-July or so.  A GENERIC kernel with some more stuff added to it
 > >  (PCIVERBOSE and some more) would not boot, getting stuck in an endless
 > >  trap loop pretty early in uvm init.  Trimming the kernel size a bit
 > >  would make it boot.  Now, this well may be a VBox bug, but I wonder if
 > >  you can try to boot a trimmed kernel config on your machine to see if
 > >  it boots (unfortunately I will only be able to test this myself on
 > >  real hardware in a few days from now).
 > 
 > I did a few more experiments under vbox --recompile-supervisor
 > (i.e. ~= qemu interpreter) and it looks sensitive to size.
 > 
 > Playing with the following config:
 > 
 > 
 >   include     "arch/i386/conf/GENERIC"
 >   options     PCIVERBOSE
 > 
 >   # this line has 64 bytes ######################################
 >   ###############################################################
 >   ###############################################################
 >   ...
 > 
 > I do get bootable kernel without those comment lines added to beef up
 > the kernel size.  With 12KB worth of comments I get unbootable kernel.
 > With 16KB it's bootable again.  Current current will probably have
 > different numbers, but you get the idea.

 After poking around with vbox debugger I think I see what the problem
 is, though the exact details are to be worked out by someone with a
 clue.

 uvm_page_init calls uvm_pageboot_alloc that calls pmap_growkernel.
 pmap_growkernel rounds up the requested number with x86_round_pdr,
 arranges for pmap to have necessary PTP pages and returns the rounded
 number that uvm_pageboot_alloc assigns to uvm_maxkaddr.

 What i think happens is that uvm_page_init miscalculates its argument
 to uvm_pageboot_alloc, asking for slighly less memory.  The problem
 happens when requested (wrong) number is just below 4MB boundary.

 If we ask pmap_growkernel e.g. for 0xc1401000, it will round it up to
 0xc1800000, so when later we come asking for a page at, say,
 0xc1402000, everything is ok.  But if the kernel size is just right,
 we will ask pmap_growkernel for exactly 0xc1400000.  Later we will
 come asking for a page at 0xc1401000 and we don't have a PTP that maps
 [0xc14000000, 0xc1800000) range in the pmap.

 To trigger this bug your kernel has to be just the right size.  To try
 to reproduce this problem, check the value passed to pmap_growkernel
 from uvm_pageboot_alloc.  If necessary, trim your kernel to have a
 number just below 0xc1400000 (or 0xc1800000, etc).  Pad the kernel
 with an array (for small padding you can add to your config as I
 mentioned in previous comment).

 -uwe

From: "Valeriy E. Ushakov" <uwe@stderr.spb.ru>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46885: NetBSD 6.0_RC1 spontaneously reboots as kernel starts to load
Date: Mon, 3 Sep 2012 20:19:37 +0400

 This seems to be fixed by src/sys/uvm/uvm_km.c at revision 1.131
 netbsd-6 pull up candidate?

 Thanks, matt@

 -uwe

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46885: NetBSD 6.0_RC1 spontaneously reboots as kernel
 starts to load
Date: Mon, 3 Sep 2012 16:40:39 +0000

 On Mon, Sep 03, 2012 at 04:20:05PM +0000, Valeriy E. Ushakov wrote:
  >  This seems to be fixed by src/sys/uvm/uvm_km.c at revision 1.131
  >  netbsd-6 pull up candidate?

 yes please

 -- 
 David A. Holland
 dholland@netbsd.org

From: Lars Heidieker <lars@heidieker.de>
To: gnats-bugs@NetBSD.org
Cc: David Holland <dholland-bugs@netbsd.org>, kern-bug-people@netbsd.org, 
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
 dtyson@wirralcavinggroup.org.uk
Subject: Re: kern/46885: NetBSD 6.0_RC1 spontaneously reboots as kernel starts
 to load
Date: Wed, 05 Sep 2012 22:25:24 +0200

 On 09/03/2012 06:45 PM, David Holland wrote:
 > The following reply was made to PR kern/46885; it has been noted by GNATS.
 > 
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/46885: NetBSD 6.0_RC1 spontaneously reboots as kernel
 >  starts to load
 > Date: Mon, 3 Sep 2012 16:40:39 +0000
 > 
 >  On Mon, Sep 03, 2012 at 04:20:05PM +0000, Valeriy E. Ushakov wrote:
 >   >  This seems to be fixed by src/sys/uvm/uvm_km.c at revision 1.131
 >   >  netbsd-6 pull up candidate?
 >  
 >  yes please
 >  
 >  

 Hi,

 a revised version of the patch it grows the kernel once after the
 kmem_arena is created but before any other arena imports from it.
 Such an import would call through uvm_km_kmem_alloc for backing va with
 memory, hence growing the kernel in there works as well.
 Once another map_entries is inserted we grow beyond kmem_arena anyway,
 so do it right away.

 Index: uvm/uvm_km.c
 ===================================================================
 RCS file: /cvsroot/src/sys/uvm/uvm_km.c,v
 retrieving revision 1.134
 diff -u -p -r1.134 uvm_km.c
 --- uvm/uvm_km.c	4 Sep 2012 13:37:41 -0000	1.134
 +++ uvm/uvm_km.c	5 Sep 2012 15:21:02 -0000
 @@ -329,6 +329,18 @@ uvm_km_bootstrap(vaddr_t start, vaddr_t
  	kmem_arena = vmem_create("kmem", kmembase, kmemsize, PAGE_SIZE,
  	    NULL, NULL, NULL,
  	    0, VM_NOSLEEP | VM_BOOTSTRAP, IPL_VM);
 +#ifdef PMAP_GROWKERNEL
 +	/*
 +	 * kmem_arena VA allocations happen independently of uvm_map.
 +	 * grow kernel to accommodate the kmem_arena.
 +	 */
 +	if (uvm_maxkaddr < kmembase + kmemsize) {
 +		uvm_maxkaddr = pmap_growkernel(kmembase + kmemsize);
 +		KASSERTMSG(uvm_maxkaddr >= kmembase + kmemsize,
 +		    "%#"PRIxVADDR" %#"PRIxVADDR" %#"PRIxVSIZE,
 +		    uvm_maxkaddr, kmembase, kmemsize);
 +	}
 +#endif

  	vmem_init(kmem_arena);

 @@ -782,18 +794,12 @@ again:

  #ifdef PMAP_GROWKERNEL
  	/*
 -	 * These VA allocations happen independently of uvm_map so if this
 allocation
 -	 * extends beyond the current limit, then allocate more resources for it.
 -	 * This can only happen while the kmem_map is the only map entry in the
 -	 * kernel_map because as soon as another map entry is created,
 uvm_map_prepare
 -	 * will set uvm_maxkaddr to an address beyond the kmem_map.
 -	 */
 -	if (uvm_maxkaddr < va + size) {
 -		uvm_maxkaddr = pmap_growkernel(va + size);
 -		KASSERTMSG(uvm_maxkaddr >= va + size,
 -		    "%#"PRIxVADDR" %#"PRIxPTR" %#zx",
 -		    uvm_maxkaddr, va, size);
 -	}
 +	 * These VA allocations happen independently of uvm_map
 +	 * so this allocation must not extend beyond the current limit.
 +	 */
 +	KASSERTMSG(uvm_maxkaddr >= va + size,
 +	    "%#"PRIxVADDR" %#"PRIxPTR" %#zx",
 +	    uvm_maxkaddr, va, size);
  #endif

  	loopva = va;

From: Dave Tyson <dtyson@wirralcavinggroup.org.uk>
To: gnats-bugs@netbsd.org
Cc: netbsd-bugs@netbsd.org,
 kern-bug-people@netbsd.org
Subject: Re: kern/46885: NetBSD 6.0_RC1 spontaneously reboots as kernel starts to load
Date: Sat, 8 Sep 2012 10:59:48 +0100

 The fixes to uvm_kc.c and uvm_map.c which have just been pulled up to NetBSD-6 
 appears to clear the problem. This pr can be closed.

 Thanks,
 Dave

 -- 
 =====================================================================
 Phone: 07805784357
 Open Source O/S: www.netbsd.org
 Caving: http://www.wirralcavinggroup.org.uk
 =====================================================================

State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 08 Sep 2012 15:19:45 +0000
State-Changed-Why:
Confirmed fixed, thanks.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.