NetBSD Problem Report #54331

From www@netbsd.org  Sat Jun 29 23:10:06 2019
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 7682B7A17C
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 29 Jun 2019 23:10:06 +0000 (UTC)
Message-Id: <20190629231004.4279E7A1B3@mollari.NetBSD.org>
Date: Sat, 29 Jun 2019 23:10:03 +0000 (UTC)
From: david@gutteridge.ca
Reply-To: david@gutteridge.ca
To: gnats-bugs@NetBSD.org
Subject: macppc MP kernels fail to boot successfully in -current (8.99.49)
X-Send-Pr-Version: www-1.0

>Number:         54331
>Category:       port-macppc
>Synopsis:       macppc MP kernels fail to boot successfully in -current (8.99.49)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-macppc-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jun 29 23:15:00 +0000 2019
>Closed-Date:    Sun May 31 19:31:17 +0000 2020
>Last-Modified:  Sun May 31 19:31:17 +0000 2020
>Originator:     David H. Gutteridge
>Release:        8.99.49/macppc
>Organization:
>Environment:
(See following dmesg details.)
>Description:
I'm unable to boot a -current MP macppc kernel. It initially hits
"autoconfiguration error: 2:CPU 1 didn't start 2", which generates a
brief backtrace at the db prompt. Choosing "continue" after this gets
the boot much farther, but then it starts outputting garbled text at
random points on the console, and reboots.

Note this is not an issue with 8.1. Its GENERIC.MP kernel works quite
well on the same hardware. This is also not an issue with the UP
GENERIC kernel on the same hardware, 8.99.49 works fine (so far as I've
tested it). So there's a regression somewhere in the MP code.

GENERIC.MP kernel with DIAGNOSTIC and DEBUG enabled:

[   1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
[   1.0000000]     2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
[   1.0000000]     2018, 2019 The NetBSD Foundation, Inc.  All rights reserved.
[   1.0000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[   1.0000000]     The Regents of the University of California.  All rights reserved.

[   1.0000000] NetBSD 8.99.49 (DEBUG) #2: Fri Jun 28 23:54:10 EDT 2019
[   1.0000000] 	disciple@arcusxvi.nonus-porta.net:/home/disciple/netbsd-current/src/sys/arch/macppc/compile/obj/DEBUG
[   1.0000000] total memory = 512 MB
[   1.0000000] avail memory = 480 MB
[   1.0000000] timecounter: Timecounters tick every 10.000 msec
[   1.0000000] Kernelized RAIDframe activated
[   1.0000000] found openpic PIC at 80040000
[   1.0000000] OpenPIC Version 1.2: Supports 4 CPUs and 64 interrupt sources.
[   1.0000000] bootpath: /pci@f4000000/ata-6@d/disk@0:0/netbsd-current-mp-debug
[   1.0000000] mainbus0 (root)
[   1.0000000] cpu0 at mainbus0: 7455 (Revision 2.1), ID 0 (primary)
[   1.0000000] cpu0: HID0 0x84d0c1bc<EMCP,TBEN,HIGH_BAT_EN,NAP,DPM,ICE,DCE,XBSEN,SGE,BTIC,LRSTK,FOLD,BHT>, powersave: 1
[   1.0000000] cpu0: 866.67 MHz, 256KB L2 cache no parity, 1MB no-parity L3 cache (DDR SRAM) at 4:1 ratio
[   1.0000000] cpu1 at mainbus0: 7455 (Revision 2.1), ID 1
[   1.0000000] cpu1: HID0 0x84d0c1bc<EMCP,TBEN,HIGH_BAT_EN,NAP,DPM,ICE,DCE,XBSEN,SGE,BTIC,LRSTK,FOLD,BHT>, powersave: 1
[   1.0000000] cpu1: 866.67 MHz, 256KB L2 cache no parity, 1MB no-parity L3 cache (DDR SRAM) at 4:1 ratio
[   1.0000000] autoconfiguration error: 2:CPU 1 didn't start 2
Stopped in pid 0.1 (system) at netbsd:cpu_spinup+0x2b8          lwz     r29,  0x1
14(r1)
db{0}> bt
0x00dd8d30: at config_attach_loc+0x1d4
0x00dd8d60: at config_found_sm_loc+0x5c
0x00dd8d90: at mainbus_attach+0x74
0x00dd8e60: at config_attach_loc+0x1d4
0x00dd8e90: at cpu_configure+0x20
0x00dd8ea0: at main+0x314
0x00dd8f10: at 0x1000b4
db{0}> continue
[   1.0000000] memory0 at mainbus0: len=512
[   1.0000000] spdmem0 at memory0
[   1.0000000] spdmem0: DDR SDRAM, no parity or ECC, 256MB, 267MHz (PC-2100)
[   1.0000000] spdmem0: 13 rows, 10 cols, 1 ranks, 4 banks/chip, 7.5ns cycle time
[   1.0000000] spdmem0: tAA-tRCD-tRP-tRAS: 1-3-3-6
[   1.0000000] spdmem0: voltage SSTL 2.5V, refresh time 7.8us (self-refreshing)
[   1.0000000] spdmem1 at memory0
[   1.0000000] spdmem1: DDR SDRAM, no parity or ECC, 256MB, 267MHz (PC-2100)
[   1.0000000] spdmem1: 13 rows, 10 cols, 1 ranks, 4 banks/chip, 7.5ns cycle time
[   1.0000000] spdmem1: tAA-tRCD-tRP-tRAS: 1-3-3-6
[   1.0000000] spdmem1: voltage SSTL 2.5V, refresh time 7.8us (self-refreshing)
[   1.0000000] uni_n0 at mainbus0 address 0xf8000000
[   1.0000000] ki2c0 at uni_n0 address 0xf8001000
[   1.0000000] iic0 at ki2c0: I2C bus
[   1.0000000] dbcool0 at iic0 addr 0x2c
[   1.0000000] dbcool0: ADM1030 dBCool(tm) Controller (rev 0x0083)
[   1.0000000] i2c-hwclock at iic0 addr 0x65 not configured
[   1.0000000] admtemp0 at iic0 addr 0x49: LM84 temperature sensor: id. 0x4b, rev. 0x00
[   1.0000000] uninorth0 at mainbus0
[   1.0000000] pci0 at uninorth0 bus 0
[   1.0000000] pci0: i/o space, memory space enabled
[   1.0000000] pchb0 at pci0 dev 11 function 0
[   1.0000000] pchb0: Apple Computer UniNorth AGP Bridge (rev. 0x00)
[   1.0000000] genfb0 at pci0 dev 16 function 0: NVIDIA GeForce4 MX 420 (rev. 0xa3)
[   1.0000000] genfb0: framebuffer at 0x98004000, size 1360x768, depth 8, stride 1536
[   1.0000000] wsdisplay0 at genfb0 kbdmux 1: console (default, vt100 emulation)
[   1.0000000] wsmux1: connecting to wsdisplay0
[   1.0000000] drm at genfb0 not configured
[   1.0000000] uninorth1 at mainbus0
[   1.0000000] pci1 at uninorth1 bus 0
[   1.0000000] pci1: i/o space, memory space enabled
[   1.0000000] pchb1 at pci1 dev 11 function 0
[   1.0000000] pchb1: Apple Computer UniNorth Host-PCI Bridge (rev. 0x00)
[   1.0000000] obio0 at pci1 dev 23 function 0: addr 0x80000000
[   1.0000000] obio0: enabling KeyLargo internal modem
[   1.0000000] zsc0 at obio0 irq 22,23
[   1.0000000] zstty0 at zsc0 channel 0
[   1.0000000] zstty1 at zsc0 channel 1
[   1.0000000] snapper0 at obio0 offset 0x10000: irq 30,1,2
[   1.0000000] pmu0 at obio0 offset 0x16000 irq 47:  rev. 12
[   1.0000000] pmu0: initializing IIC bus
[   1.0000000] iic1 at pmu0: I2C bus
[   1.0000000] i2c-hwclock at iic1 addr 0x6a not configured
[   1.0000000] i2c-hwclock at iic1 addr 0x64 not configured
[   1.0000000] pmu0: initializing RTC
[   1.0000000] pmu0: power-mgt not configured
[   1.0000000] ki2c1 at obio0
[   1.0000000] iic2 at ki2c1: I2C bus
[   1.0000000] deq0 at iic2 addr 0x35 Audio Codec (tas3004)
[   1.0000000] wdc0 at obio0 offset 0x1f000 irq 19, level triggered: DMA transfer
[   1.0000000] atabus0 at wdc0 channel 0
[   1.0000000] wdc1 at obio0 offset 0x20000 irq 20, level triggered: DMA transfer
[   1.0000000] atabus1 at wdc1 channel 0
[   1.0000000] snapper0: codec: TAS3004
[   1.0000000] audio0 at snapper0: playback, capture, full duplex
[   1.0000000] audio0: slinear_be:16 -> slinear_be:16 2ch 48000Hz, blk 21ms for playback
[   1.0000000] audio0: slinear_be:16 <- slinear_be:16 2ch 48000Hz, blk 21ms for recording
[   1.0000000] spkr0 at audio0: PC Speaker (synthesized)
[   1.0000000] wsbell at spkr0 not configured
[   1.0000000] ohci0 at pci1 dev 24 function 0: Apple Computer KeyLargo USB Controller (rev. 0x00)
[   1.0000000] csr: 02000006
[   1.0000000] genppc_pci_intr_map: pin: 1, line: 27
[   1.0000000] ohci0: interrupting at irq 27
[   1.0000000] ohci0: OHCI version 1.0
[   1.0000000] usb0 at ohci0: USB revision 1.0
[   1.0000000] ohci1 at pci1 dev 25 function 0: Apple Computer KeyLargo USB Controller (rev. 0x00)
[   1.0000000] csr: 02000006
[   1.0000000] genppc_pci_intr_map: pin: 1, line: 28
[   1.0000000] ohci1: interrupting at irq 28
[   1.0000000] ohci1: OHCI version 1.0
[   1.0000000] usb1 at ohci1: USB revision 1.0
[   1.0000000] uninorth2 at mainbus0
[   1.0000000] pci2 at uninorth2 bus 0
[   1.0000000] pci2: i/o space, memory space enabled
[   1.0000000] pchb2 at pci2 dev 11 function 0
[   1.0000000] pchb2: Apple Computer UniNorth Host-PCI Bridge (rev. 0x00)
[   1.0000000] wdc2 at pci2 dev 13 function 0genppc_pci_intr_map: pin: 1, line: 39
[   1.0000000] : interrupting at irq 39
[   1.0000000] atabus2 at wdc2 channel 0
[   1.0000000] fwohci0 at pci2 dev 14 function 0: Apple Computer UniNorth Firewire (rev. 0x01)
[   1.0000000] genppc_pci_intr_map: pin: 1, line: 40
[   1.0000000] fwohci0: interrupting at irq 40
[   1.0000000] fwohci0: OHCI version 1.10 (ROM=0)
[   1.0000000] fwohci0: No. of Isochronous channels is 8.
[   1.0000000] fwohci0: EUI64 00:03:93:ff:fe:dc:b7:e2
[   1.0000000] fwohci0: Phy 1394a available S400, 3 ports.
[   1.0000000] fwohci0: Link S400, max_rec 2048 bytes.
[   1.0000000] ieee1394if0 at fwohci0: IEEE1394 bus
[   1.0000000] fwip0 at ieee1394if0: IP over IEEE1394
[   1.0000000] fwohci0: Initiate bus reset
[   1.0000000] fwohci0: Clear PME# now
[   1.0000000] gem0 at pci2 dev 15 function 0: Apple Computer GMAC Ethernet (rev. 0x00)
[   1.0000000] genppc_pci_intr_map: pin: 1, line: 41
[   1.0000000] gem0: interrupting at irq 41
[   1.0000000] brgphy0 at gem0 phy 0: BCM5421 1000BASE-T media interface, rev. 1
[   1.0000000] brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
[   1.0000000] gem0: Ethernet address 00:03:93:dc:b7:e2, 10KB RX fifo, 4KB TX fifo
[   1.0000000] vmmask 677c0000 schedmask 7ffc0000 highmask 7ffc0000
[   1.0000000] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
[   1.0000030] timecounter: Timecounter "mftb" frequency 33304156 Hz quality 100
[   1.0000203] fwohci0: BUS reset
[   1.0000203] fwohci0: node_id=0xc800ffc0, gen=2, CYCLEMASTER mode
[   1.0000203] ieee1394if0: 1 nodes, maxhop <= 0 cable IRM irm(0) (me)
[   1.0000203] ieee1394if0: bus manager 0
[   1.0000203] cpu1 started
[   1.0800202] panic: kernel diagnostic assertion "virq != 0" failed: file "/home/disciple/netbsd-current/src/sys/arch/powerpc/pic/intr.c", line 561 
[   1.1099928] cpu1: Begin traceback...
[   1.1199932] 0x00dd9d40: at vpanic+0x144
[   1.1299929] uhub0 at usb0: NetBSD (00000x)0 0OdHdC9Id 7r0o:o t hub (0000), class 9/0, rev 1.00/1.00, adadtr  k1e
[   1.1299929] rn_assert+0x60
[   1.1299929] uhub0: 2 ports with 2 removable, self powered
[   1.1299929] uhub1 at usb1: NetBSD (0000) OHCI root hub0 x(0000d0d09)d,b 0c:l ass 9/0, rev 1.00/1.00, addr 1
[   1.1299929] at pic_handle_intr+0x200
[   1.1299929] uhub1: 2 ports with 2 removable, self powered
[   1.1299929] uhub0: autoconfiguration error: port 1 reset 0fxa0i0ldedd9
[   1.1299929] e00: at trapstart+0x6b0
[   1.1299929] uhub1: autoconfiguration error: port 1 reset 0fxa0i0ldedd
[   1.1299929] 9ed0: at kprintf+0x7f8
[   1.3700513] at0axp0i0bduds90f 0a0t :a tabus1: 2 targets
[   1.3700513] at trapstart+0x800
[   1.3700513] [c d 0  1a.t3 7a0t0a51p3i]b u0sx00 0ddrdi9vfed0 0: : <PHILIPS CDD5101, H9246047JNVGA,a tA 4c.pu2>_ chadrtcomh +r0xe2mo1vca
[   1.3700513] ble
[   1.3700513] 0x00dd9ff0: at cpucd0: drive supports PIO mode _4spinup_tr, DMA mode 2
[[      11..55110000334466]]  mcpoonlifn[0e+] 0=x 03cx
[   1.5100346] 11823, cyc = 4 (120 ns), act = 3 (75 ns), inact = 1
[   1.5100346] cpu1: End traceback...
[   1.5100346] cd0(wdc1:0:0): using PIO mode 4, DMA mode 2 (using DMA)
[   1.5100346] wd0 at atabus2 drive 0
[   1.5100346] Failed to pause: cpu0
[   1.5100346] wd0: <WDC WD800JB-00JJC0>
[   1.5100346] wd0: drive supports 16-sector PIO transfers, LBA addressing
[   1.5100346] wd0: 76319 MB, 155061 cyl, 16 head, 63 sec, 512 bytes/sect x 156301488 sectors
[   1.8000446] wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
[   1.8000446] wd1 at atabus2 drive 1
[   1.8000446] wd1: <IBM-IC35L120AVVA07-0>
[   1.8000446] wd1: drive supports 16-sector PIO transfers, LBA addressing
[   1.8000446] wd1: 115 GB, 239340 cyl, 16 head, 63 sec, 512 bytes/sect x 241254720 sectors
[   1.9499934] wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
[   1.9499934] wd0(wdc2:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100) (using DMA)
[   1.9499934] wd1(wdc2:0:1): using PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100) (using DMA)
[   1.9699929] dumpsys: TBD
[   1.9999936] rebooting

GENERIC UP kernel output:

[   1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
[   1.0000000]     2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
[   1.0000000]     2018, 2019 The NetBSD Foundation, Inc.  All rights reserved.
[   1.0000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[   1.0000000]     The Regents of the University of California.  All rights reserved.

[   1.0000000] NetBSD 8.99.49 (GENERIC) #1: Sat Jun 29 15:55:20 EDT 2019
[   1.0000000] 	disciple@arcusxvi.nonus-porta.net:/home/disciple/netbsd-current/src/sys/arch/macppc/compile/obj/GENERIC
[   1.0000000] total memory = 512 MB
[   1.0000000] avail memory = 482 MB
[   1.0000000] timecounter: Timecounters tick every 10.000 msec
[   1.0000000] Kernelized RAIDframe activated
[   1.0000000] found openpic PIC at 80040000
[   1.0000000] OpenPIC Version 1.2: Supports 4 CPUs and 64 interrupt sources.
[   1.0000000] bootpath: /pci@f4000000/ata-6@d/disk@0:0/netbsd-current-up
[   1.0000000] mainbus0 (root)
[   1.0000000] cpu0 at mainbus0: 7455 (Revision 2.1), ID 0 (primary)
[   1.0000000] cpu0: HID0 0x84d0c1bc<EMCP,TBEN,HIGH_BAT_EN,NAP,DPM,ICE,DCE,XBSEN,SGE,BTIC,LRSTK,FOLD,BHT>, powersave: 1
[   1.0000000] cpu0: 866.67 MHz, 256KB L2 cache no parity, 1MB no-parity L3 cache (DDR SRAM) at 4:1 ratio
[   1.0000000] cpu1 at mainbus0: ID 1
[   1.0000000] cpu1: processor off-line; multiprocessor support not present in kernel
[   1.0000000] memory0 at mainbus0: len=512
[   1.0000000] spdmem0 at memory0
[   1.0000000] spdmem0: DDR SDRAM, no parity or ECC, 256MB, 267MHz (PC-2100)
[   1.0000000] spdmem0: 13 rows, 10 cols, 1 ranks, 4 banks/chip, 7.5ns cycle time
[   1.0000000] spdmem0: tAA-tRCD-tRP-tRAS: 1-3-3-6
[   1.0000000] spdmem0: voltage SSTL 2.5V, refresh time 7.8us (self-refreshing)
[   1.0000000] spdmem1 at memory0
[   1.0000000] spdmem1: DDR SDRAM, no parity or ECC, 256MB, 267MHz (PC-2100)
[   1.0000000] spdmem1: 13 rows, 10 cols, 1 ranks, 4 banks/chip, 7.5ns cycle time
[   1.0000000] spdmem1: tAA-tRCD-tRP-tRAS: 1-3-3-6
[   1.0000000] spdmem1: voltage SSTL 2.5V, refresh time 7.8us (self-refreshing)
[   1.0000000] uni_n0 at mainbus0 address 0xf8000000
[   1.0000000] ki2c0 at uni_n0 address 0xf8001000
[   1.0000000] iic0 at ki2c0: I2C bus
[   1.0000000] dbcool0 at iic0 addr 0x2c
[   1.0000000] dbcool0: ADM1030 dBCool(tm) Controller (rev 0x0083)
[   1.0000000] i2c-hwclock at iic0 addr 0x65 not configured
[   1.0000000] admtemp0 at iic0 addr 0x49: LM84 temperature sensor: id. 0x4b, rev. 0x00
[   1.0000000] uninorth0 at mainbus0
[   1.0000000] pci0 at uninorth0 bus 0
[   1.0000000] pci0: i/o space, memory space enabled
[   1.0000000] pchb0 at pci0 dev 11 function 0
[   1.0000000] pchb0: Apple Computer UniNorth AGP Bridge (rev. 0x00)
[   1.0000000] genfb0 at pci0 dev 16 function 0: NVIDIA GeForce4 MX 420 (rev. 0xa3)
[   1.0000000] genfb0: framebuffer at 0x98004000, size 1360x768, depth 8, stride 1536
[   1.0000000] wsdisplay0 at genfb0 kbdmux 1: console (default, vt100 emulation)
[   1.0000000] wsmux1: connecting to wsdisplay0
[   1.0000000] drm at genfb0 not configured
[   1.0000000] uninorth1 at mainbus0
[   1.0000000] pci1 at uninorth1 bus 0
[   1.0000000] pci1: i/o space, memory space enabled
[   1.0000000] pchb1 at pci1 dev 11 function 0
[   1.0000000] pchb1: Apple Computer UniNorth Host-PCI Bridge (rev. 0x00)
[   1.0000000] obio0 at pci1 dev 23 function 0: addr 0x80000000
[   1.0000000] obio0: enabling KeyLargo internal modem
[   1.0000000] zsc0 at obio0 irq 22,23
[   1.0000000] zstty0 at zsc0 channel 0
[   1.0000000] zstty1 at zsc0 channel 1
[   1.0000000] snapper0 at obio0 offset 0x10000: irq 30,1,2
[   1.0000000] pmu0 at obio0 offset 0x16000 irq 47:  rev. 12
[   1.0000000] pmu0: initializing IIC bus
[   1.0000000] iic1 at pmu0: I2C bus
[   1.0000000] i2c-hwclock at iic1 addr 0x6a not configured
[   1.0000000] i2c-hwclock at iic1 addr 0x64 not configured
[   1.0000000] pmu0: initializing RTC
[   1.0000000] pmu0: power-mgt not configured
[   1.0000000] ki2c1 at obio0
[   1.0000000] iic2 at ki2c1: I2C bus
[   1.0000000] deq0 at iic2 addr 0x35 Audio Codec (tas3004)
[   1.0000000] wdc0 at obio0 offset 0x1f000 irq 19, level triggered: DMA transfer
[   1.0000000] atabus0 at wdc0 channel 0
[   1.0000000] wdc1 at obio0 offset 0x20000 irq 20, level triggered: DMA transfer
[   1.0000000] atabus1 at wdc1 channel 0
[   1.0000000] snapper0: codec: TAS3004
[   1.0000000] audio0 at snapper0: playback, capture, full duplex
[   1.0000000] audio0: slinear_be:16 -> slinear_be:16 2ch 48000Hz, blk 21ms for playback
[   1.0000000] audio0: slinear_be:16 <- slinear_be:16 2ch 48000Hz, blk 21ms for recording
[   1.0000000] spkr0 at audio0: PC Speaker (synthesized)
[   1.0000000] wsbell at spkr0 not configured
[   1.0000000] ohci0 at pci1 dev 24 function 0: Apple Computer KeyLargo USB Controller (rev. 0x00)
[   1.0000000] ohci0: interrupting at irq 27
[   1.0000000] ohci0: OHCI version 1.0
[   1.0000000] usb0 at ohci0: USB revision 1.0
[   1.0000000] ohci1 at pci1 dev 25 function 0: Apple Computer KeyLargo USB Controller (rev. 0x00)
[   1.0000000] ohci1: interrupting at irq 28
[   1.0000000] ohci1: OHCI version 1.0
[   1.0000000] usb1 at ohci1: USB revision 1.0
[   1.0000000] uninorth2 at mainbus0
[   1.0000000] pci2 at uninorth2 bus 0
[   1.0000000] pci2: i/o space, memory space enabled
[   1.0000000] pchb2 at pci2 dev 11 function 0
[   1.0000000] pchb2: Apple Computer UniNorth Host-PCI Bridge (rev. 0x00)
[   1.0000000] wdc2 at pci2 dev 13 function 0: interrupting at irq 39
[   1.0000000] atabus2 at wdc2 channel 0
[   1.0000000] fwohci0 at pci2 dev 14 function 0: Apple Computer UniNorth Firewire (rev. 0x01)
[   1.0000000] fwohci0: interrupting at irq 40
[   1.0000000] fwohci0: OHCI version 1.10 (ROM=0)
[   1.0000000] fwohci0: No. of Isochronous channels is 8.
[   1.0000000] fwohci0: EUI64 00:03:93:ff:fe:dc:b7:e2
[   1.0000000] fwohci0: Phy 1394a available S400, 3 ports.
[   1.0000000] fwohci0: Link S400, max_rec 2048 bytes.
[   1.0000000] ieee1394if0 at fwohci0: IEEE1394 bus
[   1.0000000] fwip0 at ieee1394if0: IP over IEEE1394
[   1.0000000] fwohci0: Initiate bus reset
[   1.0000000] fwohci0: Clear PME# now
[   1.0000000] gem0 at pci2 dev 15 function 0: Apple Computer GMAC Ethernet (rev. 0x00)
[   1.0000000] gem0: interrupting at irq 41
[   1.0000000] brgphy0 at gem0 phy 0: BCM5421 1000BASE-T media interface, rev. 1
[   1.0000000] brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
[   1.0000000] gem0: Ethernet address 00:03:93:dc:b7:e2, 10KB RX fifo, 4KB TX fifo
[   1.0000000] vmmask 677c0000 schedmask 7ffc0000 highmask 7ffc0000
[   1.0000000] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
[   1.0000030] timecounter: Timecounter "mftb" frequency 33304213 Hz quality 100
[   1.0000188] fwohci0: BUS reset
[   1.0000188] fwohci0: node_id=0xc800ffc0, gen=2, CYCLEMASTER mode
[   1.0000188] ieee1394if0: 1 nodes, maxhop <= 0 cable IRM irm(0) (me)
[   1.0000188] ieee1394if0: bus manager 0
[   1.1099931] uhub0 at usb0: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
[   1.1099931] uhub0: 2 ports with 2 removable, self powered
[   1.1099931] uhub1 at usb1: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
[   1.1099931] uhub1: 2 ports with 2 removable, self powered
[   1.1099931] IPsec: Initialized Security Association Processing.
[   1.1799927] atapibus0 at atabus1: 2 targets
[   1.2100003] cd0 at atapibus0 drive 0: <PHILIPS CDD5101, H9246047JNVGA, A4.2> cdrom removable
[   1.2100003] cd0: drive supports PIO mode 4, DMA mode 2
[   1.2100003] cd0(wdc1:0:0): using PIO mode 4, DMA mode 2 (using DMA)
[   1.2100003] wd0 at atabus2 drive 0
[   1.2100003] wd0: <WDC WD800JB-00JJC0>
[   1.2100003] wd0: drive supports 16-sector PIO transfers, LBA addressing
[   1.2100003] wd0: 76319 MB, 155061 cyl, 16 head, 63 sec, 512 bytes/sect x 156301488 sectors
[   1.3099933] wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
[   1.3099933] wd1 at atabus2 drive 1
[   1.3099933] wd1: <IBM-IC35L120AVVA07-0>
[   1.3099933] wd1: drive supports 16-sector PIO transfers, LBA addressing
[   1.3099933] wd1: 115 GB, 239340 cyl, 16 head, 63 sec, 512 bytes/sect x 241254720 sectors
[   1.3800037] wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
[   1.3800037] wd0(wdc2:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100) (using DMA)
[   1.3800037] wd1(wdc2:0:1): using PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100) (using DMA)
[   3.4399993] uhidev0 at uhub1 port 1 configuration 1 interface 0
[   3.4399993] uhidev0: Logitech (0x46d) USB-PS/2 Optical Mouse (0xc00e), rev 2.00/11.10, addr 2, iclass 3/1
[   3.4399993] uhub2 at uhub0 port 1: Apple, Inc. (0x5ac) Keyboard Hub (0x1006), class 9/0, rev 2.00/96.15, addr 2
[   3.5100094] ums0 at uhidev0: 3 buttons and Z dir
[   3.5100094] wsmouse0 at ums0 mux 0
[   3.5100094] uhub2: 3 ports with 2 removable, bus powered
[   4.8800033] uhidev1 at uhub2 port 2 configuration 1 interface 0
[   4.8800033] uhidev1: Apple, Inc (0x5ac) Apple Keyboard (0x220), rev 2.00/0.69, addr 3, iclass 3/1
[   4.8800033] ukbd0 at uhidev1
[   5.2900034] wskbd0 at ukbd0: console keyboard, using wsdisplay0
[   5.2900034] uhidev2 at uhub2 port 2 configuration 1 interface 1
[   5.2900034] uhidev2: Apple, Inc (0x5ac) Apple Keyboard (0x220), rev 2.00/0.69, addr 3, iclass 3/0
[   5.3500123] uhid0 at uhidev2: input=1, output=0, feature=0
[   5.3500123] swwdog0: software watchdog initialized
[   5.4400041] boot device: wd0
[   5.4400041] root on wd0a dumps on wd0b
[   5.4800100] Supported file systems: mfs lfs ffs ext2fs nfs umap procfs overlay null kernfs fdesc union tmpfs smbfs puffs ptyfs ntfs msdos efs cd9660 coda
[   5.4800100] no file system for wd0 (dev 0xa00)
[   5.4800100] cannot mount root, error = 79
[   5.4800100] root device (default wd0a): wd0b
[   5.5700058] dump device (default wd0b): wd0f
[   6.5000709] file system (default generic): 
[   7.4400821]  dumps on wd0f
[   7.7601033] root file system type: ffs
[   7.7601033] kern.module.path=/stand/macppc/8.99.49/modules
[   7.7601033] WARNING: clock gained 19 days
[   7.8201039] init path (default /sbin/init): 
[   7.8601049] init: trying /sbin/init
[  20.7801459] wsdisplay0: screen 1 added (default, vt100 emulation)
[  20.7801459] wsdisplay0: screen 2 added (default, vt100 emulation)
[  20.7801459] wsdisplay0: screen 3 added (default, vt100 emulation)
[  20.7901492] wsdisplay0: screen 4 added (default, vt100 emulation)

>How-To-Repeat:
Boot an MP kernel in -current on a G4 Mac with two CPUs.
>Fix:

>Release-Note:

>Audit-Trail:
From: Michael <macallan@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 in -current (8.99.49)
Date: Mon, 1 Jul 2019 02:18:41 -0400

 Hmm, this might be fallout from me adding G5 SMP support a while ago,
 and unfortunately I don't have a working SMP G4.
 IIRC I had to slightly change the way we find the GPIOs that control
 each CPU's reset pin ( since of course there are a few differences in
 the OF variants used on G5 ) - please send me the ofctl -p output of
 your machine, maybe there is something obvious.

 have fun
 Michael

From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 in -current (8.99.49)
Date: Sun, 07 Jul 2019 17:43:54 -0400

 This is different than I thought, there are actually two separate
 issues involved. The failure to successfully boot an MP kernel can be
 duplicated with 8.1_STABLE: when DEBUG and DIAGNOSTIC are enabled, I
 get a very similar initial backtrace, and the exact same behaviour
 (and, from what I can see, output) before the machine reboots. I hadn't
 previously tried a debugging kernel with 8.x, since it otherwise works
 fine for me. So this isn't a regression with -current, it's pre-
 existing.

 With -current, the distinction is a non-debug MP kernel fails to
 complete booting. It hangs before running /sbin/init. It continues to
 process hardware-related events (e.g., I can unplug and plug in USB
 devices, and it continues to handle these correctly), but otherwise
 just stalls, without any output to indicate why.

 So, perhaps in order to answer why the -current non-debug MP kernel
 isn't working, we need to figure out why MP debug kernels aren't
 working first.

 Dave


From: Michael <macallan@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 in -current (8.99.49)
Date: Mon, 8 Jul 2019 02:46:18 -0400

 >  So, perhaps in order to answer why the -current non-debug MP kernel
 >  isn't working, we need to figure out why MP debug kernels aren't
 >  working first.

 Hmm, that's a good data point, which narrows things down quite a bit.
 Hopefully I'll have time for a closer look on my next weekend ( which
 starts Wednesday ).
 Yes, this is basically a SIGKEEPALIVE from me.

From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 in -current (8.99.49)
Date: Wed, 10 Jul 2019 00:55:58 -0400

 I tried a couple more things, which didn't really yield that much more.
 First, I built an MP debug kernel from the netbsd-7 branch. It also
 panics; however, the panic happens later. (It involves a locking issue
 with USB devices. If this is unrelated to everything else, I don't
 think anyone necessarily wants to chase new issues with 7.x.)

 To get around one of the predictable panics with -current, I commented
 out
 KASSERT(virq != 0);
 at line 561 of sys/arch/powerpc/pic/intr.c. This gives me a kernel that
 behaves the same as an MP non-debug -current kernel. That is, it
 freezes while devices are attaching, after the hard drives have been
 recognized. It doesn't seem to output anything significant that's
 different from the non-debug kernel.

 Dave


From: Michael <macallan@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 in -current (8.99.49)
Date: Thu, 25 Jul 2019 13:12:31 -0400

 I finally had a look at the code and the logs - looks like a race
 between cpu_setup() on the newly hatched CPU and delay(200000) in
 cpu_spinup(), where we just wait a fixed time and then bail out if the
 other CPU isn't ready. This would explain why the 2nd CPU still wakes
 up later on but then things go south because we didn't install the IPI
 handler.

 The following patch adds a period of polling h->hatch_running for a
 while instead of just blindly increasing the delay().

 Index: cpu_subr.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/powerpc/oea/cpu_subr.c,v
 retrieving revision 1.99
 diff -u -w -r1.99 cpu_subr.c
 --- cpu_subr.c  6 Feb 2019 07:32:50 -0000       1.99
 +++ cpu_subr.c  25 Jul 2019 17:03:14 -0000
 @@ -1381,6 +1385,17 @@
         __asm volatile ("dcbst 0,%0"::"r"(&h->hatch_running):"memory");
         __asm volatile ("sync; isync");
  #endif
 +       int hatch_bail = 0;
 +       while ((h->hatch_running < 1) && (hatch_bail < 100000)) {
 +               delay(1);
 +               hatch_bail++;
 +#ifdef CACHE_PROTO_MEI
 +               __asm volatile ("dcbi 0,%0"::"r"(&h->hatch_running):"memory");
 +               __asm volatile ("sync; isync");
 +               __asm volatile ("dcbst 0,%0"::"r"(&h->hatch_running):"memory");
 +               __asm volatile ("sync; isync");
 +#endif
 +       }
         if (h->hatch_running < 1) {
  #ifdef CACHE_PROTO_MEI
                 __asm volatile ("dcbi
                 0,%0"::"r"(&cpu_spinstart_ack):"memory");

From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 in -current (8.99.49)
Date: Mon, 29 Jul 2019 19:38:01 -0400

 On Thu, 2019-07-25 at 17:15 +0000, Michael wrote:
 > The following reply was made to PR port-macppc/54331; it has been noted by GNATS.
 > 
 > From: Michael <macallan@netbsd.org>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 >  in -current (8.99.49)
 > Date: Thu, 25 Jul 2019 13:12:31 -0400
 > 
 >  I finally had a look at the code and the logs - looks like a race
 >  between cpu_setup() on the newly hatched CPU and delay(200000) in
 >  cpu_spinup(), where we just wait a fixed time and then bail out if the
 >  other CPU isn't ready. This would explain why the 2nd CPU still wakes
 >  up later on but then things go south because we didn't install the IPI
 >  handler.
 >  
 >  The following patch adds a period of polling h->hatch_running for a
 >  while instead of just blindly increasing the delay().
 >  
 >  Index: cpu_subr.c
 >  ===================================================================
 >  RCS file: /cvsroot/src/sys/arch/powerpc/oea/cpu_subr.c,v
 >  retrieving revision 1.99
 >  diff -u -w -r1.99 cpu_subr.c
 >  --- cpu_subr.c  6 Feb 2019 07:32:50 -0000       1.99
 >  +++ cpu_subr.c  25 Jul 2019 17:03:14 -0000
 >  @@ -1381,6 +1385,17 @@
 >          __asm volatile ("dcbst 0,%0"::"r"(&h->hatch_running):"memory");
 >          __asm volatile ("sync; isync");
 >   #endif
 >  +       int hatch_bail = 0;
 >  +       while ((h->hatch_running < 1) && (hatch_bail < 100000)) {
 >  +               delay(1);
 >  +               hatch_bail++;
 >  +#ifdef CACHE_PROTO_MEI
 >  +               __asm volatile ("dcbi 0,%0"::"r"(&h->hatch_running):"memory");
 >  +               __asm volatile ("sync; isync");
 >  +               __asm volatile ("dcbst 0,%0"::"r"(&h->hatch_running):"memory");
 >  +               __asm volatile ("sync; isync");
 >  +#endif
 >  +       }
 >          if (h->hatch_running < 1) {
 >   #ifdef CACHE_PROTO_MEI
 >                  __asm volatile ("dcbi
 >                  0,%0"::"r"(&cpu_spinstart_ack):"memory");

 Hi Michael,

 Your patch fixes the initial panic; it now continues booting without
 dropping into the debugger, and it doesn't "spontaneously" reboot at
 any point. Now the debug and non-debug MP kernels behave identically,
 which is that they hang right after USB devices are recognized,
 without any output to indicate why. (My debug kernel includes the
 DEBUG, DIAGNOSTIC, and LOCKDEBUG options.)

 Separately, I've now hit an occasion where 8.1_STABLE without debugging
 options also hangs, which I hadn't seen before. It gets farther than
 -current: it mounts file systems and tries /sbin/init, and hangs there.
 I'm going to try some more tests with it and see what happens.

 Thanks,

 Dave


From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 in -current (8.99.49)
Date: Wed, 31 Jul 2019 02:27:28 -0400

 On Mon, 2019-07-29 at 19:38 -0400, David H. Gutteridge wrote:
 > 
 > Separately, I've now hit an occasion where 8.1_STABLE without debugging
 > options also hangs, which I hadn't seen before. It gets farther than
 > -current: it mounts file systems and tries /sbin/init, and hangs there.
 > I'm going to try some more tests with it and see what happens.

 Hi Michael,

 I can't reproduce that last hanging issue with a non-debug 8.1_STABLE
 MP kernel. Every reboot worked without trouble. (Perhaps there's some
 rare timing condition involved, or perhaps the hardware is getting
 flaky.)

 I also tried applying your patch to an 8.1 MP kernel (with DEBUG,
 DIAGNOSTIC, and LOCKDEBUG options enabled), and that gets me past the
 related original panic there, too. As the 8.1 kernel continues booting,
 it gets to the point where it needs to request user input (I have non-
 standard partitioning in this test setup), at which point it panics
 with the following backtrace:

 panic: kernel diagnostic assertion "mutex_owned(&sc->sc_lock)" failed: file "/home/disciple/netbsd-8/src/sys/dev/usb/ohci.c", line 590
 Stopped in pid 0.1 (system) at  netbsd:vpanic+0x140:     addi     r4,  r0,  0x0
 0x00c56a90: at kern_assert+0x68
 0x00c56ad0: at ohci_reset_std_chain+0x518
 0x00c56b30: at ohci_device_intr_start+0xd8
 0x00c56b70: at usbd_start_next+0x6c
 0x00c56b80: at ohci_softintr+0x6a0
 0x00c56be0: at ohci_intr1+0x1f0
 0x00c56c00: at ohci_poll+0x70
 0x00c56c10: at ukbd_cngetc+0x80
 0x00c56c30: at wskbd_cngetc+0x114
 0x00c56c70: at cngetc+0x58
 0x00c56c90: at cngetsn+0x6c
 0x00c56cc0: at setroot+0x174
 0x00c56ea0: at main+0x62c
 0x00c56f10: at 0x1000b4

 (Since -current hangs around USB controller attachment, I wondered if
 that and this new 8.1 panic are somehow related.)

 Regards,

 Dave


From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org
Cc: port-macppc-maintainer@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, david@gutteridge.ca
Subject: re: port-macppc/54331: macppc MP kernels fail to boot successfully in -current (8.99.49)
Date: Thu, 01 Aug 2019 17:26:53 +1000

 >  panic: kernel diagnostic assertion "mutex_owned(&sc->sc_lock)" failed: =
 file "/home/disciple/netbsd-8/src/sys/dev/usb/ohci.c", line 590

 i think i see the problem here.

 >  Stopped in pid 0.1 (system) at  netbsd:vpanic+0x140:     addi     r4,  =
 r0,  0x0
 >  0x00c56a90: at kern_assert+0x68
 >  0x00c56ad0: at ohci_reset_std_chain+0x518
 >  0x00c56b30: at ohci_device_intr_start+0xd8

 ohci_device_intr_start() only takes the lock when not polling, so
 this assert needs to be adjusted for that.

 can you try this patch?  it probably won't fix the original hang,
 but it should fix this issue.  it's against netbsd-8 though i
 am fairly sure that the code is the same in -current.


 .mrg.


 Index: ohci.c
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 RCS file: /cvsroot/src/sys/dev/usb/ohci.c,v
 retrieving revision 1.273.6.4
 diff -p -u -r1.273.6.4 ohci.c
 --- ohci.c	27 Sep 2018 14:52:26 -0000	1.273.6.4
 +++ ohci.c	1 Aug 2019 07:25:46 -0000
 @@ -587,7 +587,7 @@ ohci_reset_std_chain(ohci_softc_t *sc, s
  	OHCIHIST_FUNC(); OHCIHIST_CALLED();
  	DPRINTF("start len=3D%jd", alen, 0, 0, 0);
  =

 -	KASSERT(mutex_owned(&sc->sc_lock));
 +	KASSERT(sc->sc_bus.ub_usepolling || mutex_owned(&sc->sc_lock));
  =

  	DPRINTFN(8, "addr=3D%jd endpt=3D%jd len=3D%jd speed=3D%jd",
  	    xfer->ux_pipe->up_dev->ud_addr,
 @@ -1675,7 +1675,7 @@ ohci_device_bulk_done(struct usbd_xfer *
  	int isread =3D
  	    (UE_GET_DIR(xfer->ux_pipe->up_endpoint->ue_edesc->bEndpointAddress) =
 =3D=3D UE_DIR_IN);
  =

 -	KASSERT(mutex_owned(&sc->sc_lock));
 +	KASSERT(sc->sc_bus.ub_usepolling || mutex_owned(&sc->sc_lock));
  =

  	OHCIHIST_FUNC(); OHCIHIST_CALLED();
  	DPRINTFN(10, "xfer=3D%#jx, actlen=3D%jd", (uintptr_t)xfer, xfer->ux_actl=
 en,

From: "Michael Lorenz" <macallan@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54331 CVS commit: src/sys/arch/powerpc/oea
Date: Fri, 2 Aug 2019 05:08:08 +0000

 Module Name:	src
 Committed By:	macallan
 Date:		Fri Aug  2 05:08:07 UTC 2019

 Modified Files:
 	src/sys/arch/powerpc/oea: cpu_subr.c

 Log Message:
 first step to address PR54331:
 poll h->hatch_running for a bit instead of blindly relying on a fixed timeout
 for secondary CPUs to wake up and get ready

 needs more testing, possibly pullup


 To generate a diff of this commit:
 cvs rdiff -u -r1.99 -r1.100 src/sys/arch/powerpc/oea/cpu_subr.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org, matthew green <mrg@eterna.com.au>
Cc: 
Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 in -current (8.99.49)
Date: Sun, 11 Aug 2019 18:24:16 -0400

 On Thu, 2019-08-01 at 17:26 +1000, matthew green wrote:
 > >  panic: kernel diagnostic assertion "mutex_owned(&sc->sc_lock)" failed: file "/home/disciple/netbsd-8/src/sys/dev/usb/ohci.c", line 590
 > 
 > i think i see the problem here.
 > 
 > >  Stopped in pid 0.1 (system) at  netbsd:vpanic+0x140:     addi     r4,  r0,  0x0
 > >  0x00c56a90: at kern_assert+0x68
 > >  0x00c56ad0: at ohci_reset_std_chain+0x518
 > >  0x00c56b30: at ohci_device_intr_start+0xd8
 > 
 > ohci_device_intr_start() only takes the lock when not polling, so
 > this assert needs to be adjusted for that.
 > 
 > can you try this patch?  it probably won't fix the original hang,
 > but it should fix this issue.  it's against netbsd-8 though i
 > am fairly sure that the code is the same in -current.
 > 
 > 
 > .mrg.
 > 
 > 
 > Index: ohci.c
 > ===================================================================
 > RCS file: /cvsroot/src/sys/dev/usb/ohci.c,v
 > retrieving revision 1.273.6.4
 > diff -p -u -r1.273.6.4 ohci.c
 > --- ohci.c	27 Sep 2018 14:52:26 -0000	1.273.6.4
 > +++ ohci.c	1 Aug 2019 07:25:46 -0000
 > @@ -587,7 +587,7 @@ ohci_reset_std_chain(ohci_softc_t *sc, s
 >  	OHCIHIST_FUNC(); OHCIHIST_CALLED();
 >  	DPRINTF("start len=%jd", alen, 0, 0, 0);
 >  
 > -	KASSERT(mutex_owned(&sc->sc_lock));
 > +	KASSERT(sc->sc_bus.ub_usepolling || mutex_owned(&sc->sc_lock));
 >  
 >  	DPRINTFN(8, "addr=%jd endpt=%jd len=%jd speed=%jd",
 >  	    xfer->ux_pipe->up_dev->ud_addr,
 > @@ -1675,7 +1675,7 @@ ohci_device_bulk_done(struct usbd_xfer *
 >  	int isread =
 >  	    (UE_GET_DIR(xfer->ux_pipe->up_endpoint->ue_edesc->bEndpointAddress) == UE_DIR_IN);
 >  
 > -	KASSERT(mutex_owned(&sc->sc_lock));
 > +	KASSERT(sc->sc_bus.ub_usepolling || mutex_owned(&sc->sc_lock));
 >  
 >  	OHCIHIST_FUNC(); OHCIHIST_CALLED();
 >  	DPRINTFN(10, "xfer=%#jx, actlen=%jd", (uintptr_t)xfer, xfer->ux_actlen,

 Hi,

 Your patch fixed the remaining issue booting an MP+DEBUG+DIAGNOSTIC+
 LOCKDEBUG kernel built from the netbsd-8 branch. Thanks!

 With this plus macallan@'s earlier patch applied, I don't see any
 remaining issues for netbsd-8, so we've narrowed this PR to what's in
 HEAD/netbsd-9 now.

 Dave


From: "matthew green" <mrg@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54331 CVS commit: src/sys/dev/usb
Date: Sun, 11 Aug 2019 22:55:03 +0000

 Module Name:	src
 Committed By:	mrg
 Date:		Sun Aug 11 22:55:03 UTC 2019

 Modified Files:
 	src/sys/dev/usb: ohci.c

 Log Message:
 relax an assert to be mutex_owned || polling.

 half of the patch i sent for netbsd-8 in PR#54331.
 the other half is already applied.


 To generate a diff of this commit:
 cvs rdiff -u -r1.289 -r1.290 src/sys/dev/usb/ohci.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: matthew green <mrg@eterna.com.au>
To: "David H. Gutteridge" <david@gutteridge.ca>
Cc: gnats-bugs@netbsd.org
Subject: re: port-macppc/54331: macppc MP kernels fail to boot successfully in -current (8.99.49)
Date: Mon, 12 Aug 2019 09:07:32 +1000

 > Your patch fixed the remaining issue booting an MP+DEBUG+DIAGNOSTIC+
 > LOCKDEBUG kernel built from the netbsd-8 branch. Thanks!

 great news.  i sent a pullup request.  i had actually fixed one
 of the two chunks back in march, but both are commited now.

 > With this plus macallan@'s earlier patch applied, I don't see any
 > remaining issues for netbsd-8, so we've narrowed this PR to what's in
 > HEAD/netbsd-9 now.

 i will get the above additional fix into netbsd-9.

 can you update us with the status of -current/-9?  the system
 hangs after probing USB still?

 if so and if you can drop to DDB, can you inspect the
 "config_pending" variable?  eg, "x/x config_pending".  if it
 is zero, that means that some device configuration has called
 config_incr() without a matching config_decr(), which makes the
 code that allows mountroot and then init to run to wait.  this
 is normally for eg, boot device probe, but it tends to indicate
 that there is a lack of interupts, allowing a device probe to
 complete, but really, anything that stops the driver from
 getting to its config_decr() moment will do this.


 .mrg.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54331 CVS commit: [netbsd-8] src/sys/dev/usb
Date: Mon, 12 Aug 2019 17:19:02 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon Aug 12 17:19:02 UTC 2019

 Modified Files:
 	src/sys/dev/usb [netbsd-8]: ohci.c

 Log Message:
 Pull up following revision(s) (requested by mrg in ticket #1334):

 	sys/dev/usb/ohci.c: revision 1.289
 	sys/dev/usb/ohci.c: revision 1.290

 adjust KASSERT() for sc_lock to handle polling case.
 should fix usb kdb in ddb.  reported by martin.

  -

 relax an assert to be mutex_owned || polling.

 half of the patch i sent for netbsd-8 in PR#54331.
 the other half is already applied.


 To generate a diff of this commit:
 cvs rdiff -u -r1.273.6.4 -r1.273.6.5 src/sys/dev/usb/ohci.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org, mrg@eterna.com.au
Cc: 
Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 in -current (8.99.49)
Date: Mon, 12 Aug 2019 22:24:53 -0400

 On Sun, 2019-08-11 at 23:10 +0000, matthew green wrote:
 > The following reply was made to PR port-macppc/54331; it has been noted by GNATS.
 > 
 > From: matthew green <mrg@eterna.com.au>
 > To: "David H. Gutteridge" <david@gutteridge.ca>
 > Cc: gnats-bugs@netbsd.org
 > Subject: re: port-macppc/54331: macppc MP kernels fail to boot successfully in -current (8.99.49)
 > Date: Mon, 12 Aug 2019 09:07:32 +1000
 > 
 >  > Your patch fixed the remaining issue booting an MP+DEBUG+DIAGNOSTIC+
 >  > LOCKDEBUG kernel built from the netbsd-8 branch. Thanks!
 >  
 >  great news.  i sent a pullup request.  i had actually fixed one
 >  of the two chunks back in march, but both are commited now.
 >  
 >  > With this plus macallan@'s earlier patch applied, I don't see any
 >  > remaining issues for netbsd-8, so we've narrowed this PR to what's in
 >  > HEAD/netbsd-9 now.
 >  
 >  i will get the above additional fix into netbsd-9.
 >  
 >  can you update us with the status of -current/-9?  the system
 >  hangs after probing USB still?
 >  
 >  if so and if you can drop to DDB, can you inspect the
 >  "config_pending" variable?  eg, "x/x config_pending".  if it
 >  is zero, that means that some device configuration has called
 >  config_incr() without a matching config_decr(), which makes the
 >  code that allows mountroot and then init to run to wait.  this
 >  is normally for eg, boot device probe, but it tends to indicate
 >  that there is a lack of interupts, allowing a device probe to
 >  complete, but really, anything that stops the driver from
 >  getting to its config_decr() moment will do this.

 I built an MP+DEBUG+DIAGNOSTIC+LOCKDEBUG 9.99.7 kernel with the patches
 included, and it's hanging after USB devices are attached. I can unplug
 and reattach USB devices and it continues to handle those events fine,
 but it never actually mounts (or prompts to mount) the hard drive. I
 don't see anything useful in the output to indicate why. I'm not able
 to get into ddb as far as I know. That is, typing something on the USB
 keyboard that's attached doesn't result in anything happening.

 Dave


From: matthew green <mrg@eterna.com.au>
To: "David H. Gutteridge" <david@gutteridge.ca>
Cc: gnats-bugs@netbsd.org
Subject: re: port-macppc/54331: macppc MP kernels fail to boot successfully in -current (8.99.49)
Date: Tue, 13 Aug 2019 12:46:06 +1000

 > I built an MP+DEBUG+DIAGNOSTIC+LOCKDEBUG 9.99.7 kernel with the patches
 > included, and it's hanging after USB devices are attached. I can unplug
 > and reattach USB devices and it continues to handle those events fine,
 > but it never actually mounts (or prompts to mount) the hard drive. I
 > don't see anything useful in the output to indicate why. I'm not able
 > to get into ddb as far as I know. That is, typing something on the USB
 > keyboard that's attached doesn't result in anything happening.

 does it still have the cpu failure attach messages in early boot?

 here's a simple way to see if i'm right about config_pending not
 being reduced at this point:

 in config_finalize() there is this loop:

 	while (config_pending != 0)
 		cv_wait(&config_misc_cv, &config_misc_lock);

 if you convert it to a cv_timedwait() that sleeps for hz (or
 even hz*2 or hz*5) and then prints config_pending, eg:

 	while (config_pending != 0) {
 		cv_timedwait(&config_misc_cv, &config_misc_lock, hz);
 		printf("%s: config_pending still %d\n", __func__,
 		    config_pending);
 	}

 (untested/compiled).  this would at least give us any idea on
 where to look next.


 .mrg.

From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org, matthew green <mrg@eterna.com.au>
Cc: 
Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 in -current (8.99.49)
Date: Mon, 12 Aug 2019 23:17:10 -0400

 On Tue, 2019-08-13 at 12:46 +1000, matthew green wrote:
 > > I built an MP+DEBUG+DIAGNOSTIC+LOCKDEBUG 9.99.7 kernel with the
 > > patches
 > > included, and it's hanging after USB devices are attached. I can
 > > unplug
 > > and reattach USB devices and it continues to handle those events
 > > fine,
 > > but it never actually mounts (or prompts to mount) the hard drive. I
 > > don't see anything useful in the output to indicate why. I'm not
 > > able
 > > to get into ddb as far as I know. That is, typing something on the
 > > USB
 > > keyboard that's attached doesn't result in anything happening.
 > 
 > does it still have the cpu failure attach messages in early boot?

 The early boot issue with CPU attachment is fixed.

 > here's a simple way to see if i'm right about config_pending not
 > being reduced at this point:
 > 
 > in config_finalize() there is this loop:
 > 
 > 	while (config_pending != 0)
 > 		cv_wait(&config_misc_cv, &config_misc_lock);
 > 
 > if you convert it to a cv_timedwait() that sleeps for hz (or
 > even hz*2 or hz*5) and then prints config_pending, eg:
 > 
 > 	while (config_pending != 0) {
 > 		cv_timedwait(&config_misc_cv, &config_misc_lock, hz);
 > 		printf("%s: config_pending still %d\n", __func__,
 > 		    config_pending);
 > 	}
 > 
 > (untested/compiled).  this would at least give us any idea on
 > where to look next.

 Okay, I'll try that out, thanks!

 Dave


From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 in -current
Date: Fri, 16 Aug 2019 11:09:29 +0200

 I seem to have the same problem (with 9.99.9 on a silver tower dual G4 mac).

 I have serial console though and can get into ddb.

 Your suggested patch shows:

 [   1.0000227] cpu1 started
 [   1.0599986] uhub0 at usb1: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
 [   1.0799956] uhub1 at usb0: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
 [   2.1799924] config_finalize: config_pending still 3
 [   3.1799924] config_finalize: config_pending still 3
 [   4.1699925] wd0 at atabus0 drive 0
 [   4.1828592] config_finalize: config_pending still 3
 [   4.1828592] wd0: <WDC WD2000JB-00FUA0>
 [   4.1828592] wd0: 186 GB, 387621 cyl, 16 head, 63 sec, 512 bytes/sect x 390721968 sectors
 [   4.2499928] atapibus0 at atabus1: 2 targets
 [   4.2799925] cd0 at atapibus0 drive 0: <HL-DT-STDVD-ROM GDR8161B, , 0100> cdrom removable
 [   4.2999962] config_finalize: config_pending still 0

 and then everything seems idle but cpu1 stuck in softclk:

 0       22 3   0       200           3f9760a0               iic0 iicintr
 0       21 3   1       200           3f980c00            xcall/1 xcall
 0       20 1   1       200           3f980920          softser/1
 0    >  19 7   1       200           3f980640          softclk/1
 0       18 1   1       200           3f980360          softbio/1
 0       17 1   1       200           3f980080          softnet/1
 0    >  16 7   1       201           3f994be0             idle/1
 0       15 3   0       200           3f994900             sysmon smtaskq
 0       14 3   1       200           3f994620         pmfsuspend pmfsuspend
 0       13 3   0       200           3f994340           pmfevent pmfevent
 0       12 3   0       200           3f994060         sopendfree sopendfr
 0       11 3   1       200           3fb27bc0           nfssilly nfssilly
 0       10 3   0       200           3fb278e0            cachegc cachegc
 0        9 3   0       200           3fb27600             vdrain vdrain
 0        8 3   0       200           3fb27320          modunload mod_unld
 0        7 3   0       200           3fb27040            xcall/0 xcall
 0        6 1   0       200           3fb31ba0          softser/0
 0        5 1   0       200           3fb318c0          softclk/0
 0        4 1   0       200           3fb315e0          softbio/0
 0        3 1   0       200           3fb31300          softnet/0
 0    >   2 7   0       201           3fb31020             idle/0
 0        1 3   1       200             b85b80            swapper tstile

 db{0}> bt/a 3f980640
 trace: pid 0 lid 19 at 0x10047eb0
 0x10047ee0: at softint_dispatch+0xa8
 0x10047f20: at softint_fast_dispatch+0xdc
 saved LR(0xff3ffb79) is invalid.
 db{0}> x/i softint_dispatch+0xa0     
 netbsd:softint_dispatch+0xa0:   lwz     r3, 0x8(r31)  
 db{0}>  
 netbsd:softint_dispatch+0xa4:   mtspr   ctr, r9,      
 db{0}>  
 netbsd:softint_dispatch+0xa8:   bctrl


 and looking at netbsd.gdb:           

 (gdb) list *(softint_dispatch+0xa8)  
 0x68f388 is in softint_dispatch (../../../../kern/kern_softint.c:592).
 587                             }    
 588                     } else if (!havelock) {           
 589                             KERNEL_LOCK(1, l);       
 590                             havelock = true;
 591                     }
 592                     (*sh->sh_func)(sh->sh_arg);
 593

 Martin

From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org, matthew green <mrg@eterna.com.au>
Cc: 
Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 in -current (8.99.49)
Date: Wed, 20 Nov 2019 23:55:44 -0500

 I tried a build of 9.99.17 from earlier this week, and the situation
 has improved. It no longer hits any issues during device attachment,
 and begins running rc scripts. At that point, however, as it's bringing
 up the gem0 device, it panics there, like so:

 [   20.6900578] panic: pr_phinpage_check: [mclpl] item 0x1fb2f040 not part of pool
 [   20.6900578] cpu0: Begin traceback...
 [   20.6900578] 0x10007da0: at vpanic+0x144
 [   20.7300428] 0x10007dd0: at panic+0x50
 [   20.7300428] 0x10007e20: at pool_cache_put_paddr+0x25c
 [   20.7600388] 0x10007e50: at m_ext_free+0x130
 [   20.7600388] 0x10007e60: at m_free+0x9c
 [   20.7600388] 0x10007e70: at m_freem.part.8+0xc
 [   20.7600388] 0x10007e80: at ether_input+0x67c
 [   20.7600388] 0x10007eb0: at if_percpuq_softint+0xb4
 [   20.7600388] 0x10007ed0: at softint_dispatch+0x1d0
 [   20.7600388] 0x10007f20: at softint_fast_dispatch+0xdc
 [   20.8700561] 0x10007fe8: at 0xfffffffc
 [   20.8700561] cpu0: End traceback...
 Stopped in pid 0.3 (system) at  netbsd:vpanic+0x148:    or       r3,  r29,  r29
 gem0: receive error: RX overflow sc->rxptr 0, complete 4

 Regards,

 Dave


State-Changed-From-To: open->closed
State-Changed-By: gutteridge@NetBSD.org
State-Changed-When: Sun, 31 May 2020 19:31:17 +0000
State-Changed-Why:
Original issue seems to be addressed. gem0 issue now tracked in PR 55326.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.