NetBSD Problem Report #52266
From gson@gson.org Wed May 31 12:50:35 2017
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 122057A1BE
for <gnats-bugs@gnats.NetBSD.org>; Wed, 31 May 2017 12:50:35 +0000 (UTC)
Message-Id: <20170531125029.A997A743D38@guava.gson.org>
Date: Wed, 31 May 2017 15:50:29 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Double fault early in boot with Transmeta Crusoe CPU
X-Send-Pr-Version: 3.95
>Number: 52266
>Category: port-i386
>Synopsis: Double fault early in boot with Transmeta Crusoe CPU
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: nonaka
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed May 31 12:55:00 +0000 2017
>Closed-Date: Sat Jul 22 12:03:49 +0000 2017
>Last-Modified: Sat Jul 22 12:03:49 +0000 2017
>Originator: Andreas Gustafsson
>Release: NetBSD-current, source date >= 2017.05.23.08.54.39
>Organization:
>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:
I have an old NEC Versa Daylite laptop with a Transmeta Crusoe CPU
running NetBSD/i386. I recently tried to upgrade the kernel to
-current, but it crashed very early in the boot, immediately after
printing the kernel segment sizes, with the following message
(transcribed manually):
Fatal double fault in supervisor mode
Trap type 13 code 0xc0116f29 eip 0x8 cs 0x296 eflags 0xc0150010 cr2 0 ilevel 0 esp 0xc08b0030
curlwp 0xc1231920 pid 0 lid 1 lowest kstack 0xc149e2c0
kernel: user trap double fault, code=0
Stopped in pid 0.1 (system) at 8: invalid adderss
db{0}>
The keyboard does not respond to ddb commands at this point.
By bisection, I have determined that the problem appeared with the
following recent commits:
2017.05.23.08.54.38 nonaka src/sys/arch/amd64/amd64/db_interface.c 1.25
2017.05.23.08.54.38 nonaka src/sys/arch/amd64/amd64/mainbus.c 1.38
2017.05.23.08.54.38 nonaka src/sys/arch/amd64/amd64/vector.S 1.49
2017.05.23.08.54.38 nonaka src/sys/arch/amd64/include/i82093reg.h 1.8
2017.05.23.08.54.38 nonaka src/sys/arch/i386/i386/db_interface.c 1.72
2017.05.23.08.54.38 nonaka src/sys/arch/i386/i386/mainbus.c 1.103
2017.05.23.08.54.38 nonaka src/sys/arch/i386/i386/vector.S 1.69
2017.05.23.08.54.39 nonaka src/sys/arch/i386/include/i82093reg.h 1.10
2017.05.23.08.54.39 nonaka src/sys/arch/x86/include/cpuvar.h 1.50
2017.05.23.08.54.39 nonaka src/sys/arch/x86/include/i82489var.h 1.19
2017.05.23.08.54.39 nonaka src/sys/arch/x86/include/intr.h 1.50
2017.05.23.08.54.39 nonaka src/sys/arch/x86/include/mpacpi.h 1.11
2017.05.23.08.54.39 nonaka src/sys/arch/x86/pci/msipic.c 1.9
2017.05.23.08.54.39 nonaka src/sys/arch/x86/x86/cpu.c 1.125
2017.05.23.08.54.39 nonaka src/sys/arch/x86/x86/lapic.c 1.58
2017.05.23.08.54.39 nonaka src/sys/arch/x86/x86/pmc.c 1.7
2017.05.23.08.54.39 nonaka src/sys/arch/x86/x86/tprof_amdpmi.c 1.7
2017.05.23.08.54.39 nonaka src/sys/arch/x86/x86/tprof_pmi.c 1.14
2017.05.23.08.54.39 nonaka src/sys/arch/xen/include/intr.h 1.40
2017.05.23.08.54.39 nonaka src/sys/arch/xen/include/mpacpi.h 1.2
2017.05.23.08.54.39 nonaka src/sys/arch/xen/x86/intr.c 1.31
2017.05.23.08.54.39 nonaka src/sys/arch/xen/x86/mainbus.c 1.19
>How-To-Repeat:
Attempt to boot NetBSD-current/i386 on a Transmeta Crusoe CPU.
>Fix:
>Release-Note:
>Audit-Trail:
From: coypu@sdf.org
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta
Crusoe CPU
Date: Wed, 31 May 2017 13:17:37 +0000
BTW, you can hard code DDB_COMMANDONENTER
options DDB_COMMANDONENTER="bt"
From: Andreas Gustafsson <gson@gson.org>
To: coypu@sdf.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta
Crusoe CPU
Date: Wed, 31 May 2017 17:54:33 +0300
coypu@sdf.org wrote:
> BTW, you can hard code DDB_COMMANDONENTER
> options DDB_COMMANDONENTER="bt"
I have now tried that, but it just made the machine spontaneously
reboot.
--
Andreas Gustafsson, gson@gson.org
Responsible-Changed-From-To: port-i386-maintainer->nonaka
Responsible-Changed-By: gson@NetBSD.org
Responsible-Changed-When: Wed, 31 May 2017 14:56:04 +0000
Responsible-Changed-Why:
Problem started with nonaka's commits
From: Kimihiro Nonaka <nonakap@gmail.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@netbsd.org>
Cc: port-i386-maintainer@netbsd.org,
"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>, "netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta Crusoe CPU
Date: Thu, 1 Jun 2017 12:37:06 +0900
Could you send a dmesg and a result of cpuctl identify 0 with old kernel?
On Wed, May 31, 2017 at 9:55 PM, Andreas Gustafsson <gson@gson.org> wrote:
>>Number: 52266
>>Category: port-i386
>>Synopsis: Double fault early in boot with Transmeta Crusoe CPU
>>Confidential: no
>>Severity: critical
>>Priority: medium
>>Responsible: port-i386-maintainer
>>State: open
>>Class: sw-bug
>>Submitter-Id: net
>>Arrival-Date: Wed May 31 12:55:00 +0000 2017
>>Originator: Andreas Gustafsson
>>Release: NetBSD-current, source date >= 2017.05.23.08.54.39
>>Organization:
>
>>Environment:
> System: NetBSD
> Architecture: i386
> Machine: i386
>>Description:
>
> I have an old NEC Versa Daylite laptop with a Transmeta Crusoe CPU
> running NetBSD/i386. I recently tried to upgrade the kernel to
> -current, but it crashed very early in the boot, immediately after
> printing the kernel segment sizes, with the following message
> (transcribed manually):
>
> Fatal double fault in supervisor mode
> Trap type 13 code 0xc0116f29 eip 0x8 cs 0x296 eflags 0xc0150010 cr2 0 ilevel 0 esp 0xc08b0030
> curlwp 0xc1231920 pid 0 lid 1 lowest kstack 0xc149e2c0
> kernel: user trap double fault, code=0
> Stopped in pid 0.1 (system) at 8: invalid adderss
> db{0}>
>
> The keyboard does not respond to ddb commands at this point.
>
> By bisection, I have determined that the problem appeared with the
> following recent commits:
>
> 2017.05.23.08.54.38 nonaka src/sys/arch/amd64/amd64/db_interface.c 1.25
> 2017.05.23.08.54.38 nonaka src/sys/arch/amd64/amd64/mainbus.c 1.38
> 2017.05.23.08.54.38 nonaka src/sys/arch/amd64/amd64/vector.S 1.49
> 2017.05.23.08.54.38 nonaka src/sys/arch/amd64/include/i82093reg.h 1.8
> 2017.05.23.08.54.38 nonaka src/sys/arch/i386/i386/db_interface.c 1.72
> 2017.05.23.08.54.38 nonaka src/sys/arch/i386/i386/mainbus.c 1.103
> 2017.05.23.08.54.38 nonaka src/sys/arch/i386/i386/vector.S 1.69
> 2017.05.23.08.54.39 nonaka src/sys/arch/i386/include/i82093reg.h 1.10
> 2017.05.23.08.54.39 nonaka src/sys/arch/x86/include/cpuvar.h 1.50
> 2017.05.23.08.54.39 nonaka src/sys/arch/x86/include/i82489var.h 1.19
> 2017.05.23.08.54.39 nonaka src/sys/arch/x86/include/intr.h 1.50
> 2017.05.23.08.54.39 nonaka src/sys/arch/x86/include/mpacpi.h 1.11
> 2017.05.23.08.54.39 nonaka src/sys/arch/x86/pci/msipic.c 1.9
> 2017.05.23.08.54.39 nonaka src/sys/arch/x86/x86/cpu.c 1.125
> 2017.05.23.08.54.39 nonaka src/sys/arch/x86/x86/lapic.c 1.58
> 2017.05.23.08.54.39 nonaka src/sys/arch/x86/x86/pmc.c 1.7
> 2017.05.23.08.54.39 nonaka src/sys/arch/x86/x86/tprof_amdpmi.c 1.7
> 2017.05.23.08.54.39 nonaka src/sys/arch/x86/x86/tprof_pmi.c 1.14
> 2017.05.23.08.54.39 nonaka src/sys/arch/xen/include/intr.h 1.40
> 2017.05.23.08.54.39 nonaka src/sys/arch/xen/include/mpacpi.h 1.2
> 2017.05.23.08.54.39 nonaka src/sys/arch/xen/x86/intr.c 1.31
> 2017.05.23.08.54.39 nonaka src/sys/arch/xen/x86/mainbus.c 1.19
>
>>How-To-Repeat:
>
> Attempt to boot NetBSD-current/i386 on a Transmeta Crusoe CPU.
>
>>Fix:
>
From: Andreas Gustafsson <gson@gson.org>
To: nonaka@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta Crusoe CPU
Date: Thu, 1 Jun 2017 09:27:57 +0300
Kimihiro Nonaka wrote:
> Could you send a dmesg and a result of cpuctl identify 0 with old kernel?
dmesg:
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 6.1.5 (GUNNEL) #0: Fri Jun 12 15:45:43 EEST 2015
gson@guido.araneus.fi:/bracket/prod/6.1.5-edis/i386/obj/sys/arch/i386/compile/GUNNEL
total memory = 175 MB
avail memory = 159 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
NEC VUS6BCE-00B-000 ( )
mainbus0 (root)
cpu0 at mainbus0: Transmeta(tm) Crusoe(tm) Processor TMTM5600, id 0x543
acpi0 at mainbus0: Intel ACPICA 20110623
acpi0: X/RSDT: OemId <NEC ,ND000034,06040005>, AslId < LTP,00000000>
LNK1: ACPI: Found matching pin for 0.3.INTA at func 0: 255
LNK2: ACPI: Found matching pin for 0.4.INTA at func 0: 11
LNK3: ACPI: Found matching pin for 0.5.INTA at func 0: 10
LNK1: ACPI: Found matching pin for 0.6.INTA at func 0: 9
LNKU: ACPI: Found matching pin for 0.20.INTA at func 0: 5
acpi0: SCI interrupting at int 9
timecounter: Timecounter "ACPI-Safe" frequency 3579545 Hz quality 900
attimer1 at acpi0 (TIME, PNP0100): io 0x40-0x43 irq 0
npx1 at acpi0 (MATH, PNP0C04): io 0xf0-0xfe irq 13
npx1: reported by CPUID; using exception 16
pcppi1 at acpi0 (SPKR, PNP0800): io 0x61
midi0 at pcppi1: PC speaker
sysbeep0 at pcppi1
USKB (PNP0303) at acpi0 not configured
SYSR (PNP0C02) at acpi0 not configured
acpilid0 at acpi0 (LID, PNP0C0D): ACPI Lid Switch
acpiacad0 at acpi0 (ADP, ACPI0003): ACPI AC Adapter
acpibat0 at acpi0 (BAT2, PNP0C0A-2): ACPI Battery
acpifan0 at acpi0 (LRA0, PNP0C0B): ACPI Fan
acpitz0 at acpi0 (THRM)
acpitz0: levels: critical 100.0 C, passive cooling
apm0 at acpi0: Power Management spec V1.2
attimer1: attached to pcppi1
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: vendor 0x1279 product 0x0395 (rev. 0x00)
vendor 0x1279 product 0x0396 (RAM memory) at pci0 dev 0 function 1 not configured
vendor 0x1279 product 0x0397 (RAM memory) at pci0 dev 0 function 2 not configured
cbb0 at pci0 dev 3 function 0: vendor 0x104c product 0xac50 (rev. 0x01)
eso0 at pci0 dev 4 function 0: ESS Solo-1 PCI AudioDrive ES1946 Revision E
eso0: interrupting at irq 11
eso0: mapping Audio 1 DMA using VC I/O space at 0x1480
audio0 at eso0: full duplex, playback, capture, mmap, independent
opl0 at eso0: model OPL3
midi1 at opl0: ESO Yamaha OPL3
mpu0 at eso0
midi2 at mpu0: ESO MPU-401 MIDI UART
joy0 at eso0
joy0: joystick not connected
vga1 at pci0 dev 5 function 0: vendor 0x1002 product 0x4c52 (rev. 0x64)
wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
mach64drm0 at vga1: Rage Mobility P/M
mach64drm0: Initialized mach64 2.0.0 20060718
fxp0 at pci0 dev 6 function 0: i82559S Ethernet (rev. 0x09)
fxp0: interrupting at irq 9
fxp0: May need receiver lock-up workaround
fxp0: Ethernet address 00:10:a4:16:62:84
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vendor 0x115d product 0x002b (serial communications, interface 0x02) at pci0 dev 6 function 1 not configured
pcib0 at pci0 dev 7 function 0: vendor 0x10b9 product 0x1533 (rev. 0x00)
aceride0 at pci0 dev 16 function 0: Acer Labs M5229 UDMA IDE Controller (rev. 0xc3)
aceride0: bus-master DMA support present
aceride0: using PIO transfers above 137GB as workaround for 48bit DMA access bug, expect reduced performance
aceride0: primary channel wired to compatibility mode
aceride0: primary channel interrupting at irq 14
atabus0 at aceride0 channel 0
aceride0: secondary channel wired to compatibility mode
aceride0: secondary channel interrupting at irq 15
atabus1 at aceride0 channel 1
alipm0 at pci0 dev 17 function 0: 74KHz clock
iic0 at alipm0: I2C bus
ohci0 at pci0 dev 20 function 0: vendor 0x10b9 product 0x5237 (rev. 0x03)
ohci0: interrupting at irq 5
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
cbb0: cacheline 0x8 lattimer 0x51
cbb0: bhlc 0x25108
cbb0: interrupting at irq 9
cardslot0 at cbb0
cardbus0 at cardslot0: bus 1
pcmcia0 at cardslot0
isa0 at pcib0
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
acpicpu0 at cpu0: ACPI CPU
acpicpu0: C1: HLT, lat 0 us, pow 0 mW
acpicpu0: C2: I/O, lat 10 us, pow 0 mW
acpicpu0: C3: I/O, lat 32 us, pow 0 mW
acpicpu0: T0: I/O, lat 1 us, pow 0 mW, 100 %
acpicpu0: T1: I/O, lat 1 us, pow 0 mW, 88 %
acpicpu0: T2: I/O, lat 1 us, pow 0 mW, 76 %
acpicpu0: T3: I/O, lat 1 us, pow 0 mW, 64 %
acpicpu0: T4: I/O, lat 1 us, pow 0 mW, 52 %
acpicpu0: T5: I/O, lat 1 us, pow 0 mW, 40 %
acpicpu0: T6: I/O, lat 1 us, pow 0 mW, 28 %
acpicpu0: T7: I/O, lat 1 us, pow 0 mW, 16 %
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
acpiacad0: AC adapter online.
uhub0 at usb0: vendor 0x10b9 OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
wi0 at pcmcia0 function 0: <Lucent Technologies, WaveLAN/IEEE, Version 01.01, >
wi0: 802.11 address 00:60:1d:f2:27:87
wi0: using Lucent Technologies, WaveLAN/IEEE
wi0: Lucent Firmware: Station (4.52.1)
wi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
wd0 at atabus0 drive 0
wd0: <IC25N020ATCS04-0>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 19077 MB, 41344 cyl, 15 head, 63 sec, 512 bytes/sect x 39070080 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(aceride0:0:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA)
Kernelized RAIDframe activated
boot device: wd0
root on wd0a dumps on wd0b
root file system type: ffs
acpibat0: normal capacity on 'charge state'
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
Output from "cpuctl identify 0":
cpu0: Transmeta Crusoe (586-class), 592.78 MHz, id 0x543
cpu0: Processor revision 1.3.1.3
cpu0: Code Morphing Software Rev: 4.1.4-7-51
cpu0: 20000805 23:30 official release 4.1.4#2
cpu0: LongRun <600MHz 1600mV 100%>
cpu0: features 0x84803f<FPU,VME,DE,PSE,TSC,MSR,CMOV,PN,MMX>
cpu0: "Transmeta(tm) Crusoe(tm) Processor TMTM5600"
cpu0: serial number 0000-0543-0000-0F29-0A21-475A
cpu0: Initial APIC ID 0
cpu0: family 05 model 04 extfamily 00 extmodel 00 stepping 03
cpu0: UCode version: ?
--
Andreas Gustafsson, gson@gson.org
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta Crusoe
CPU
Date: Mon, 26 Jun 2017 09:12:05 -0500 (CDT)
A reported here:
http://mail-index.netbsd.org/current-users/2017/06/24/msg031962.html
this problem also affects the i486-class CPU found in the Soekris net4501.
A kernel built with sources from 2017.05.23.08.54.37 boots and runs
fine. A kernel built with sources from 2017.05.23.08.54.40 fails
immediately after printing the segment sizes (and any module load
messages) with:
fatal privileged instruction fault in sufatal double fault in supervisor mode
trap type 13 code 0xc0117238 eip 0x8 cs 0x246 eflags 0xc017a4d1 cr2 0 ilevel 0x8 esp 0xc0405f20
curlwp 0xc0413ba0 pid 0 lid 1 lowest kstack 0xc058c2c0
kernel: user trap double fault, code=0
Stopped in pid 0.1 (system) at 8: invalid address
db{0}>
Attempting 'bt' immediately reboots the machine.
Output of 'cpuctl identify 0' under last working kernel:
cpu0: highest basic info 00000001
cpu0: AMD Am5x86 W/B 133/160 (486-class)
cpu0: family 0x4 model 0xf stepping 0x4 (id 0x4f4)
cpu0: features 0x1<FPU>
cpu0: Initial APIC ID 0
"/var/run/dmesg.boot" with last working kernel:
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 7.99.72 (NET4501) #3: Sun Jun 25 23:25:56 CDT 2017
sysop@plex760.technoskunk.fur:/r0/build/nbsd-tst/obj/i386/sys/arch/i386/compile/NET4501
total memory = 65148 KB
avail memory = 59352 KB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1189200 Hz quality 100
Generic PC
mainbus0 (root)
cpu0 at mainbus0
cpu0: AMD 486-class, id 0x4f4
cpu0: package 0, core 0, smt 0
elansc0 at mainbus0 bus 0: AMD Elan SC520 System Controller
elansc0: product 0 stepping 1.1, CPU clock 133MHz
gpio0 at elansc0: 32 pins
pci0 at elansc0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
vendor 1022 product 3000 (host bridge) at pci0 dev 0 function 0 not configured
sip0 at pci0 dev 18 function 0: NatSemi DP83815 10/100 Ethernet, rev 00
sip0: interrupting at irq 10
sip0: Ethernet address xx:xx:xx:xx:xx:xx
nsphyter0 at sip0 phy 0: DP83815 10/100 media interface, rev. 1
nsphyter0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
sip1 at pci0 dev 19 function 0: NatSemi DP83815 10/100 Ethernet, rev 00
sip1: interrupting at irq 11
sip1: Ethernet address xx:xx:xx:xx:xx:xx
nsphyter1 at sip1 phy 0: DP83815 10/100 media interface, rev. 1
nsphyter1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
sip2 at pci0 dev 20 function 0: NatSemi DP83815 10/100 Ethernet, rev 00
sip2: interrupting at irq 5
sip2: Ethernet address xx:xx:xx:xx:xx:xx
nsphyter2 at sip2 phy 0: DP83815 10/100 media interface, rev. 1
nsphyter2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
isa0 at mainbus0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
wdc0 at isa0 port 0x1f0-0x1f7 irq 14
atabus0 at wdc0 channel 0
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
wd0 at atabus0 drive 0
wd0: <ELITE PRO CF CARD 8GB>
wd0: drive supports 1-sector PIO transfers, LBA addressing
wd0: 7647 MB, 15538 cyl, 16 head, 63 sec, 512 bytes/sect x 15662304 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
boot device: sip0
root on sip0
nfs_boot: trying DHCP/BOOTP
nfs_boot: DHCP next-server: a.b.c.d
nfs_boot: my_name=net4501d
nfs_boot: my_domain=technoskunk.fur
nfs_boot: my_addr=e.f.g.h
nfs_boot: my_mask=m.n.o.p
nfs_boot: gateway=r.p.q.t
root on a.b.c.d:/r0/diskless/net4501d
root file system type: nfs
kern.module.path=/stand/i386/7.99.72/modules
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: Simon Burge <simonb@NetBSD.org>
To: Kimihiro Nonaka <nonakap@gmail.com>
Cc: "gnats-bugs@netbsd.org" <gnats-bugs@netbsd.org>,
port-i386-maintainer@netbsd.org,
"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>,
"netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta Crusoe CPU
Date: Sat, 08 Jul 2017 17:10:33 +1000
Kimihiro Nonaka wrote:
> Could you send a dmesg and a result of cpuctl identify 0 with old kernel?
I get the same problem on my Soekris net4801 with the following cpu:
cpu0: highest basic info 00000002
cpu0: highest extended info 80000005
cpu0: "Geode(TM) Integrated Processor by National Semi"
cpu0: National Semiconductor Geode GX1 (586-class)
cpu0: family 0x5 model 0x4 stepping 0 (id 0x540)
cpu0: features 0x808131<FPU,TSC,MSR,CX8,CMOV,MMX>
cpu0: ITLB 1 4KB entries 112-way
cpu0: Initial APIC ID 0
If I change lapic_is_x2apic() to unconditionally return false, my
Soekris boots (at least to single user mode).
I tried to get lapic_is_x2apic() to store the value of the MSR it
reads by changing that function to:
uint64_t x2apic_msr;
bool
lapic_is_x2apic(void)
{
x2apic_msr = rdmsr(MSR_APICBASE);
return false;
}
but that just faulted/paniced too, but slightly differently:
> boot net8 -s
17730720+696076+839124 [776736+802655]=0x13e1cbc
fatal protection faufatal double fault in supervisor mode
trap type 13 code 0xc0118298 eip 0x8 cs 0x246 eflags 0xc054bbf6 cr2 0 ilevel 0x8 esp 0xc11ea760
curlwp 0xc125f360 pid 0 lid 1 lowest kstack 0xc14e32c0
kernel: user trap double fault, code=0
Stopped in pid 0.1 (system) at 8: invalid address
db{0}>
The chopped off "fatal protection fau" is new. Could the rdmsr() itself
be faulting then??
Is there any further info I get to help?
Cheers,
Simon.
From: Kimihiro Nonaka <nonakap@gmail.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@netbsd.org>
Cc: NONAKA Kimihiro <nonaka@netbsd.org>, "gnats-admin@netbsd.org" <gnats-admin@netbsd.org>,
"netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>, Andreas Gustafsson <gson@gson.org>
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta Crusoe CPU
Date: Sat, 8 Jul 2017 20:06:04 +0900
Hi,
2017-07-08 18:15 GMT+09:00 Simon Burge <simonb@netbsd.org>:
> If I change lapic_is_x2apic() to unconditionally return false, my
> Soekris boots (at least to single user mode).
>
> I tried to get lapic_is_x2apic() to store the value of the MSR it
> reads by changing that function to:
>
> uint64_t x2apic_msr;
>
> bool
> lapic_is_x2apic(void)
> {
> x2apic_msr = rdmsr(MSR_APICBASE);
> return false;
> }
>
> but that just faulted/paniced too, but slightly differently:
>
> > boot net8 -s
> 17730720+696076+839124 [776736+802655]=0x13e1cbc
> fatal protection faufatal double fault in supervisor mode
> trap type 13 code 0xc0118298 eip 0x8 cs 0x246 eflags 0xc054bbf6 cr2 0 ilevel 0x8 esp 0xc11ea760
> curlwp 0xc125f360 pid 0 lid 1 lowest kstack 0xc14e32c0
> kernel: user trap double fault, code=0
> Stopped in pid 0.1 (system) at 8: invalid address
> db{0}>
>
> The chopped off "fatal protection fau" is new. Could the rdmsr() itself
> be faulting then??
>
> Is there any further info I get to help?
Could you try the following patch.
diff --git a/sys/arch/x86/x86/lapic.c b/sys/arch/x86/x86/lapic.c
index 20822a67184..372e9f8c0c2 100644
--- a/sys/arch/x86/x86/lapic.c
+++ b/sys/arch/x86/x86/lapic.c
@@ -235,10 +235,12 @@ lapic_enable_x2apic(void)
bool
lapic_is_x2apic(void)
{
- uint64_t r;
+ uint64_t msr;
- r = rdmsr(MSR_APICBASE);
- return (r & (APICBASE_EN | APICBASE_EXTD)) == (APICBASE_EN |
APICBASE_EXTD);
+ if (rdmsr_safe(MSR_APICBASE, &msr) == EFAULT)
+ return false;
+ return (msr & (APICBASE_EN | APICBASE_EXTD)) ==
+ (APICBASE_EN | APICBASE_EXTD);
}
/*
Regards,
--
Kimihiro Nonaka
From: "NONAKA Kimihiro" <nonaka@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52266 CVS commit: src/sys/arch/x86/x86
Date: Sat, 8 Jul 2017 14:35:33 +0000
Module Name: src
Committed By: nonaka
Date: Sat Jul 8 14:35:33 UTC 2017
Modified Files:
src/sys/arch/x86/x86: lapic.c
Log Message:
PR/52266: use rdmsr_safe(9) instead of rdmsr(9) for old machine.
tested by simonb@
To generate a diff of this commit:
cvs rdiff -u -r1.58 -r1.59 src/sys/arch/x86/x86/lapic.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Simon Burge <simonb@NetBSD.org>
To: Kimihiro Nonaka <nonakap@gmail.com>
Cc: "gnats-bugs@netbsd.org" <gnats-bugs@netbsd.org>,
"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>,
"netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>,
Andreas Gustafsson <gson@gson.org>
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta Crusoe CPU
Date: Sun, 09 Jul 2017 00:24:35 +1000
Kimihiro Nonaka wrote:
> Could you try the following patch.
>
> diff --git a/sys/arch/x86/x86/lapic.c b/sys/arch/x86/x86/lapic.c
> index 20822a67184..372e9f8c0c2 100644
> --- a/sys/arch/x86/x86/lapic.c
> +++ b/sys/arch/x86/x86/lapic.c
> @@ -235,10 +235,12 @@ lapic_enable_x2apic(void)
> bool
> lapic_is_x2apic(void)
> {
> - uint64_t r;
> + uint64_t msr;
>
> - r = rdmsr(MSR_APICBASE);
> - return (r & (APICBASE_EN | APICBASE_EXTD)) == (APICBASE_EN |
> APICBASE_EXTD);
> + if (rdmsr_safe(MSR_APICBASE, &msr) == EFAULT)
> + return false;
> + return (msr & (APICBASE_EN | APICBASE_EXTD)) ==
> + (APICBASE_EN | APICBASE_EXTD);
> }
>
> /*
This works (tested on netbsd-8 branch). Thank you!
Can you please commit and pull up to the netbsd-8 branch?
Cheers,
Simon.
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta Crusoe
CPU
Date: Sat, 8 Jul 2017 13:59:41 -0500 (CDT)
On Mon, 26 Jun 2017, John D. Baker wrote:
> this problem also affects the i486-class CPU found in the Soekris net4501.
I rebuilt with the patch, but my net4501 still panics.
Perhaps I need to remove the kernel objdirs?
I see the patch has been committed, so I'll update and try again.
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta Crusoe
CPU
Date: Sat, 8 Jul 2017 17:52:43 -0500 (CDT)
On Sat, 8 Jul 2017, John D. Baker wrote:
> On Mon, 26 Jun 2017, John D. Baker wrote:
>
> > this problem also affects the i486-class CPU found in the Soekris net4501.
>
> I rebuilt with the patch, but my net4501 still panics.
>
> Perhaps I need to remove the kernel objdirs?
>
> I see the patch has been committed, so I'll update and try again.
I couldn't update just yet, but removed the kernel objdirs and
rebuilt.
No change. My net4501 still panics the same as before. Perhaps it's
hitting another instruction or reading another register present in 586+
CPUs, (Geode GX1, Crusoe) but not in 486 CPUs (AMD Am5x86 W/B)?
The panic message remains:
> boot /netbsd.tst -s
17775532+697548+846740 [821024+839243+13086]=0x14060c0
fatal privileged instruction fault fatal double fault in supervisor mode
trap type 13 code 0xc0118298 eip 0x8 cs 0x246 eflags 0xc054cd46 cr2 0 ilevel 0x8 esp 0xc11f5760
curlwp 0xc126a820 pid 0 lid 1 lowest kstack 0xc15772c0
kernel: user trap double fault, code=0
Stopped in pid 0.1 (system) at 8: invalid address
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta Crusoe
CPU
Date: Sat, 8 Jul 2017 19:37:35 -0500 (CDT)
On Sat, 8 Jul 2017, John D. Baker wrote:
> I couldn't update just yet, but removed the kernel objdirs and
> rebuilt.
There was some confusion over whether I'd patched the file or not.
I updated lapic.c only to be sure I got the most recent one and
rebuilt GENERIC and my custom kernel based on the NET4501 config
from scratch.
The net4501 still panics as follows:
> boot /netbsd.tst -s
17775628+697548+846740 [1040847+780528+806439]=0x14eefd8
fatal privileged instruction fault in supervisofatal double fault in supervisor mode
trap type 13 code 0xc0118298 eip 0x8 cs 0x246 eflags 0xc054cd46 cr2 0 ilevel 0x8 esp 0xc11f5760
curlwp 0xc126a820 pid 0 lid 1 lowest kstack 0xc165f2c0
kernel: user trap double fault, code=0
I should be able to do a complete update and rebuild soon, just to
make sure.
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta Crusoe
CPU
Date: Sat, 8 Jul 2017 23:43:27 -0500 (CDT)
On Sat, 8 Jul 2017, John D. Baker wrote:
> I should be able to do a complete update and rebuild soon, just to
> make sure.
Fully updated with:
$NetBSD: lapic.c,v 1.59 2017/07/08 14:35:33 nonaka Exp $
the net4501 still panics early in the boot process (GENERIC kernel):
> boot /netbsd.tst -s
17775624+697548+846740 [821040+839258+13087]=0x14060e0
fatal privileged instruction fault in supervisor mode
trap type 0 code 0 eip 0xcfatal double fault in supervisor mode
trap type 13 code 0xc0118298 eip 0x8 cs 0x246 eflags 0xc054cd46 cr2 0 ilevel 0x8 esp 0xc11f5760
curlwp 0xc126a820 pid 0 lid 1 lowest kstack 0xc15772c0
kernel: user trap double fault, code=0
Stopped in pid 0.1 (system) at 8: invalid address
This time, instead of rebooting immediately, trying to get a backtrace
produces:
db{0}> bt
panic: kernel diagnostic assertion "l->l_nopreempt > 0" failed: file "/x/current/src/sys/sys/lwp.h", line 513
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip 0xc0118734 cs 0x8 eflags 0x246 cr2 0x1c ilevel 0x8 esp 0xc126aa08
curlwp 0xc126a820 pid 733240715 lid 102 lowest kstack 0xc126ab10
fatal page fault in supervisor modet c0118734: popl %ebp
after which it reboots.
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-i386/52266: Double fault early in boot with AMD Am5x86 CPU
Date: Sun, 9 Jul 2017 00:42:13 -0500 (CDT)
Just to double check, I grabbed:
http://nycdn.netbsd.org/pub/NetBSD-daily/HEAD/201707082110Z/i386/binary/kernel/netbsd-GENERIC.gz
and it panics on the net4501 the same as my locally-built kernels:
> boot /netbsd.tst -s
17769700+697548+846740 [821024+839243+13086]=0x14050c0
fatal prfatal double fault in supervisor mode
trap type 13 code 0xc0118298 eip 0x8 cs 0x246 eflags 0xc054cd46 cr2 0 ilevel 0x8 esp 0xc11f4760
curlwp 0xc1269820 pid 0 lid 1 lowest kstack 0xc15762c0
kernel: user trap double fault, code=0
Stopped in pid 0.1 (system) at 8: invalid address
db{0}> bt
panic: kernel diagnostic assertion "l->l_nopreempt > 0" failed: file "/usr/src/sys/sys/lwp.h", line 513
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip 0xc0118734 cs 0x8 eflags 0x246 cr2 0x1c ilevel 0x8 esp 0xc1269a08
curlwp 0xc1269820 pid 733240715 lid 96 lowest kstack 0xc1269b10
fatal page fault in supervisor mode c0118734: popl %ebp
The last line is repeatedly overprinted several times and then the
machine reboots.
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: Andreas Gustafsson <gson@gson.org>
To: nonaka@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta Crusoe CPU
Date: Sun, 9 Jul 2017 17:43:01 +0300
I tested a kernel built from source date 2017.07.08.14.35.33 (which
has src/sys/arch/x86/x86/lapic.c 1.59) on my Crusoe laptop, and it now
reboots within a fraction of a second after the kernel starts, too
fast to read the console messages.
--
Andreas Gustafsson, gson@gson.org
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52266 CVS commit: [netbsd-8] src/sys/arch/x86/x86
Date: Mon, 10 Jul 2017 12:26:21 +0000
Module Name: src
Committed By: martin
Date: Mon Jul 10 12:26:21 UTC 2017
Modified Files:
src/sys/arch/x86/x86 [netbsd-8]: lapic.c
Log Message:
Pull up following revision(s) (requested by nonaka in ticket #110):
sys/arch/x86/x86/lapic.c: revision 1.59
PR/52266: use rdmsr_safe(9) instead of rdmsr(9) for old machine.
tested by simonb@
To generate a diff of this commit:
cvs rdiff -u -r1.58 -r1.58.2.1 src/sys/arch/x86/x86/lapic.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Kimihiro Nonaka <nonakap@gmail.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@netbsd.org>
Cc: NONAKA Kimihiro <nonaka@netbsd.org>, "gnats-admin@netbsd.org" <gnats-admin@netbsd.org>,
"netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>, Andreas Gustafsson <gson@gson.org>
Subject: Re: port-i386/52266: Double fault early in boot with AMD Am5x86 CPU
Date: Tue, 11 Jul 2017 18:12:37 +0900
Hi,
2017-07-09 14:45 GMT+09:00 John D. Baker <jdbaker@mylinuxisp.com>:
> Just to double check, I grabbed:
>
> http://nycdn.netbsd.org/pub/NetBSD-daily/HEAD/201707082110Z/i386/binary/kernel/netbsd-GENERIC.gz
>
> and it panics on the net4501 the same as my locally-built kernels:
Could you try the following patch?
diff --git a/sys/arch/x86/x86/lapic.c b/sys/arch/x86/x86/lapic.c
index 415bb65b4e5..f87de042054 100644
--- a/sys/arch/x86/x86/lapic.c
+++ b/sys/arch/x86/x86/lapic.c
@@ -237,7 +237,8 @@ lapic_is_x2apic(void)
{
uint64_t msr;
- if (rdmsr_safe(MSR_APICBASE, &msr) == EFAULT)
+ if (!ISSET(cpu_feature[0], CPUID_MSR) ||
+ rdmsr_safe(MSR_APICBASE, &msr) == EFAULT)
return false;
return (msr & (APICBASE_EN | APICBASE_EXTD)) ==
(APICBASE_EN | APICBASE_EXTD);
Regards,
--
Kimihiro Nonaka
From: Andreas Gustafsson <gson@gson.org>
To: Kimihiro Nonaka <nonakap@gmail.com>
Cc: "gnats-bugs\@netbsd.org" <gnats-bugs@netbsd.org>
Subject: Re: port-i386/52266: Double fault early in boot with AMD Am5x86 CPU
Date: Tue, 11 Jul 2017 16:31:20 +0300
Kimihiro Nonaka wrote:
> Could you try the following patch?
>
> diff --git a/sys/arch/x86/x86/lapic.c b/sys/arch/x86/x86/lapic.c
> index 415bb65b4e5..f87de042054 100644
> --- a/sys/arch/x86/x86/lapic.c
> +++ b/sys/arch/x86/x86/lapic.c
> @@ -237,7 +237,8 @@ lapic_is_x2apic(void)
> {
> uint64_t msr;
>
> - if (rdmsr_safe(MSR_APICBASE, &msr) == EFAULT)
> + if (!ISSET(cpu_feature[0], CPUID_MSR) ||
> + rdmsr_safe(MSR_APICBASE, &msr) == EFAULT)
> return false;
> return (msr & (APICBASE_EN | APICBASE_EXTD)) ==
> (APICBASE_EN | APICBASE_EXTD);
>
My Crusoe still reboots immediately after loading the kernel, even
with the patch.
--
Andreas Gustafsson, gson@gson.org
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-i386/52266: Double fault early in boot with AMD Am5x86 CPU
Date: Tue, 11 Jul 2017 10:48:54 -0500 (CDT)
On Tue, 11 Jul 2017 18:12:37 +0900, Kimihiro Nonaka <nonakap@gmail.com>
wrote:
> On 2017-07-09 14:45 GMT+09:00 John D. Baker <jdbaker%mylinuxisp.com@localhost>:
>
> > Just to double check, I grabbed:
> >
> > http://nycdn.netbsd.org/pub/NetBSD-daily/HEAD/201707082110Z/i386/binary/kernel/netbsd-GENERIC.gz
> >
> > and it panics on the net4501 the same as my locally-built kernels:
>
> Could you try the following patch?
>
> diff --git a/sys/arch/x86/x86/lapic.c b/sys/arch/x86/x86/lapic.c
> index 415bb65b4e5..f87de042054 100644
> --- a/sys/arch/x86/x86/lapic.c
> +++ b/sys/arch/x86/x86/lapic.c
> @@ -237,7 +237,8 @@ lapic_is_x2apic(void)
> {
> uint64_t msr;
>
> - if (rdmsr_safe(MSR_APICBASE, &msr) == EFAULT)
> + if (!ISSET(cpu_feature[0], CPUID_MSR) ||
> + rdmsr_safe(MSR_APICBASE, &msr) == EFAULT)
> return false;
> return (msr & (APICBASE_EN | APICBASE_EXTD)) ==
> (APICBASE_EN | APICBASE_EXTD);
With the above patch, my net4501 boots -current (8.99.1) again!
Thanks for the patch. Please commit and pull up to netbsd-8!
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: Andreas Gustafsson <gson@gson.org>
To: nonaka@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-i386/52266: Double fault early in boot with AMD Am5x86 CPU
Date: Tue, 11 Jul 2017 20:34:11 +0300
Nonaka,
I added some debug printfs to lapic_is_x2apic(), and found that
on my Crusoe machine, cpu_feature[0] has the value 0x0084803f.
If I change the "ISSET(cpu_feature[0], CPUID_MSR)"
in your patch to "ISSET(cpu_feature[0], CPUID_APIC)", the
kernel boots successfully.
--
Andreas Gustafsson, gson@gson.org
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-i386/52266: Double fault early in boot with AMD Am5x86 CPU
Date: Tue, 11 Jul 2017 17:18:13 -0500 (CDT)
Looking at the reports so far and the meaning of each of the various
CPUID_* bits, it looks like:
AMD Am5x86 (Elan SC520): No MSR, No APIC
NS Geode: MSR, No APIC, rdmsr_safe() works
TM Crusoe: MSR, No APIC, rdmsr_safe() fails
Are there any CPUs which implement APIC w/o MSR? If not, then perhaps
the condition could be made:
if (!(ISSET(cpu_feature[0], APIC) &&
ISSET(cpu_feature[0], CPUID_MSR)) ||
rdmsr_safe(MSR_APICBASE, &msr) == EFAULT)
return false;
to satisfy the above cases?
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-i386/52266: Double fault early in boot with AMD Am5x86 CPU
Date: Tue, 11 Jul 2017 17:40:17 -0500 (CDT)
On Tue, 11 Jul 2017, John D. Baker wrote:
> Looking at the reports so far and the meaning of each of the various
> CPUID_* bits, it looks like:
>
> AMD Am5x86 (Elan SC520): No MSR, No APIC
> NS Geode: MSR, No APIC, rdmsr_safe() works
> TM Crusoe: MSR, No APIC, rdmsr_safe() fails
>
> Are there any CPUs which implement APIC w/o MSR?
Thinking about it more, if the CPU doesn't implement APIC, why bother
reading MSR at all? As such, Andreas' change to the patch would also
satisfy all the cases above and is simpler.
Since the x2apic code hinges on successfully reading MSR to obtain
MSR_APICBASE, if there's no APIC of which to obtain the base, return
false.
Or is there some subtlty I'm missing?
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: Kimihiro Nonaka <nonakap@gmail.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@netbsd.org>
Cc: NONAKA Kimihiro <nonaka@netbsd.org>, "gnats-admin@netbsd.org" <gnats-admin@netbsd.org>,
"netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>, Andreas Gustafsson <gson@gson.org>
Subject: Re: port-i386/52266: Double fault early in boot with AMD Am5x86 CPU
Date: Wed, 12 Jul 2017 11:09:23 +0900
2017-07-12 7:45 GMT+09:00 John D. Baker <jdbaker@mylinuxisp.com>:
> > Looking at the reports so far and the meaning of each of the various
> > CPUID_* bits, it looks like:
> >
> > AMD Am5x86 (Elan SC520): No MSR, No APIC
> > NS Geode: MSR, No APIC, rdmsr_safe() works
> > TM Crusoe: MSR, No APIC, rdmsr_safe() fails
> >
> > Are there any CPUs which implement APIC w/o MSR?
>
> Thinking about it more, if the CPU doesn't implement APIC, why bother
> reading MSR at all? As such, Andreas' change to the patch would also
> satisfy all the cases above and is simpler.
I agree.
Updated the patch.
diff --git a/sys/arch/x86/x86/lapic.c b/sys/arch/x86/x86/lapic.c
index 415bb65b4e5..e3423d8ce07 100644
--- a/sys/arch/x86/x86/lapic.c
+++ b/sys/arch/x86/x86/lapic.c
@@ -237,7 +237,8 @@ lapic_is_x2apic(void)
{
uint64_t msr;
- if (rdmsr_safe(MSR_APICBASE, &msr) == EFAULT)
+ if (!ISSET(cpu_feature[0], CPUID_APIC) ||
+ rdmsr_safe(MSR_APICBASE, &msr) == EFAULT)
return false;
return (msr & (APICBASE_EN | APICBASE_EXTD)) ==
(APICBASE_EN | APICBASE_EXTD);
Regards,
--
Kimihiro Nonaka
From: "NONAKA Kimihiro" <nonaka@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52266 CVS commit: src/sys/arch/x86/x86
Date: Thu, 13 Jul 2017 00:44:14 +0000
Module Name: src
Committed By: nonaka
Date: Thu Jul 13 00:44:14 UTC 2017
Modified Files:
src/sys/arch/x86/x86: lapic.c
Log Message:
PR/52266: Before access MSR[APICBASE], need to check if APIC is present.
To generate a diff of this commit:
cvs rdiff -u -r1.59 -r1.60 src/sys/arch/x86/x86/lapic.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52266 CVS commit: [netbsd-8] src/sys/arch/x86/x86
Date: Fri, 14 Jul 2017 08:41:18 +0000
Module Name: src
Committed By: martin
Date: Fri Jul 14 08:41:18 UTC 2017
Modified Files:
src/sys/arch/x86/x86 [netbsd-8]: lapic.c
Log Message:
Pull up following revision(s) (requested by nonaka in ticket #135):
sys/arch/x86/x86/lapic.c: revision 1.60
PR/52266: Before access MSR[APICBASE], need to check if APIC is present.
To generate a diff of this commit:
cvs rdiff -u -r1.58.2.1 -r1.58.2.2 src/sys/arch/x86/x86/lapic.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Kimihiro Nonaka <nonakap@gmail.com>
To: Andreas Gustafsson <gson@gson.org>
Cc: NONAKA Kimihiro <nonaka@netbsd.org>, "gnats-bugs@netbsd.org" <gnats-bugs@netbsd.org>
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta Crusoe CPU
Date: Fri, 21 Jul 2017 17:06:52 +0900
Could you try src/sys/arch/x86/x86/lapic.c r1.60?
2017-07-09 23:43 GMT+09:00 Andreas Gustafsson <gson@gson.org>:
> I tested a kernel built from source date 2017.07.08.14.35.33 (which
> has src/sys/arch/x86/x86/lapic.c 1.59) on my Crusoe laptop, and it now
> reboots within a fraction of a second after the kernel starts, too
> fast to read the console messages.
> --
> Andreas Gustafsson, gson@gson.org
From: Andreas Gustafsson <gson@gson.org>
To: Kimihiro Nonaka <nonakap@gmail.com>
Cc: NONAKA Kimihiro <nonaka@netbsd.org>,
"gnats-bugs\@netbsd.org" <gnats-bugs@netbsd.org>
Subject: Re: port-i386/52266: Double fault early in boot with Transmeta Crusoe CPU
Date: Sat, 22 Jul 2017 14:57:40 +0300
Kimihiro Nonaka wrote:
> Could you try src/sys/arch/x86/x86/lapic.c r1.60?
I have now tested a kernel built from 2017.07.21.02.51.12 soures,
which include lapic.c 1.60, and it booted fine. Thank you!
--
Andreas Gustafsson, gson@gson.org
State-Changed-From-To: open->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Sat, 22 Jul 2017 12:03:49 +0000
State-Changed-Why:
The bug has been fixed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.