NetBSD Problem Report #53183

From www@NetBSD.org  Sat Apr 14 20:40:49 2018
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 9C2077A10D
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 14 Apr 2018 20:40:49 +0000 (UTC)
Message-Id: <20180414204048.48D467A1EE@mollari.NetBSD.org>
Date: Sat, 14 Apr 2018 20:40:48 +0000 (UTC)
From: venture37@geeklan.co.uk
Reply-To: venture37@geeklan.co.uk
To: gnats-bugs@NetBSD.org
Subject: System stops servicing I/O requests and eventually deadlocks
X-Send-Pr-Version: www-1.0

>Number:         53183
>Category:       kern
>Synopsis:       System stops servicing I/O requests and eventually deadlocks
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    jdolecek
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Apr 14 20:45:00 +0000 2018
>Closed-Date:    Sun Jan 06 23:04:25 +0000 2019
>Last-Modified:  Sun Jan 06 23:04:25 +0000 2019
>Originator:     Sevan Janiyan
>Release:        NetBSD-HEAD
>Organization:
>Environment:
i386 build
>Description:
Following on from the nvme deadlock PR kern/52769 [1], I have spent some more time on trying to gather information on how my system deadlocks when I cvs update. An easy trigger is updating a tree which is has a lot of catching up to do, especially in src/external. The system is still technically alive but it will not perform any disk I/O operations, that is, in X11, I can have a window displaying top(1), another running iostat(1) and a few others running a cvs update of a pkgsrc and src tree at the same time and eventually the system will stock service i/o but the top & iostat windows will continue to operate, showing the system is completely idle.

I suspected the system which I frequently hit this issue, which is a Thinkpad flashed with coreboot. To rule this machine out, I switched the SSD out to another Thinkpad which does not run coreboot and the issue was present there too. It could well be the SSD at fault but I experienced the same problem on an ageing SATA HDD prior before investing in the SSD. Indeed I have not ruled out being double unlucky by using a third disk or system. I did attempt to using virtual box on macOS and on 2 attempts I ended up hard reseting the host in both cases. It seems that 2 concurrent CVS checkouts is too much.

I was previously using discard and log prior to that but stopped and the issues persisted.

Once the system deadlocks, it's possible to enter ddb once, after resuming, the system eventually locks hard.

[1] http://mail-index.netbsd.org/netbsd-bugs/2018/03/04/msg055906.html
>How-To-Repeat:
Start two concurrent checkouts or updates from CVS (pkgsrc & src tree)
Optional (guarantees failure in my case): add some CPU load, I've been trying to recompile a kernel with -j2 on a dual core system
>Fix:

>Release-Note:

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53183: System stops servicing I/O requests and eventually
 deadlocks
Date: Sun, 15 Apr 2018 11:21:35 +0200

 On Sat, Apr 14, 2018 at 08:45:00PM +0000, venture37@geeklan.co.uk wrote:
 > Once the system deadlocks, it's possible to enter ddb once, after
 > resuming, the system eventually locks hard.

 We need a kernel crash dump from the locked up state, and preferable a
 coresponding netbsd.gdb.

 Since you can get into ddb, could you try to create that and make it
 available?

 Martin

From: Sevan Janiyan <venture37@geeklan.co.uk>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: kern/53183: System stops servicing I/O requests and eventually deadlocks
Date: Sun, 15 Apr 2018 13:32:13 +0100

 > On 15 Apr 2018, at 10:25, Martin Husemann <martin@duskware.de> wrote:
 >=20
 > We need a kernel crash dump from the locked up state, and preferable a
 > coresponding netbsd.gdb.

 Ok

 > Since you can get into ddb, could you try to create that and make it
 > available?

 I tried but can't.
 I've set sysctl ddb.onpanic=3D1
 when the system hangs, I enter ddb & issue sync,instead of dumping to disk, I=
  get
 Dumping to dev0,1 offset 1588
 Dump fatal page fault in supervisor mode
 Trap type 6 code 0x2 eip 0xc0118429 cs 0x8 eflags 0x10246 cr2 0xdc576860 ile=
 vel 0x8 esp 0xc13100c0
 curlwp 0xc370fd20 pid 0 lid 2 lowest kstack 0xdb02a2c0
 kernel: supervisor trap page fault, code=3D0
 stopped in pid 0.2 (system) at netbsd:dodumpsys+0x323: orb %dl,0(%eax)


From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: kern/53183: System stops servicing I/O requests and eventually deadlocks
Date: Mon, 16 Apr 2018 23:20:08 +0200

 Can you please provide dmesg also, particularily the type of the SATA
 controller? I have report of satalink(4) deadlocking after the
 conversion of wd(4) to use dksubr, which I have not yet been able to
 further investigate. Might be the same or similar issue.

From: Sevan Janiyan <venture37@geeklan.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53183: System stops servicing I/O requests and eventually
 deadlocks
Date: Mon, 16 Apr 2018 22:58:13 +0100

 On 16/04/2018 22:25, Jaromír Doleček wrote:
 > The following reply was made to PR kern/53183; it has been noted by GNATS.
 > 
 > From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
 > To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
 > Cc: 
 > Subject: Re: kern/53183: System stops servicing I/O requests and eventually deadlocks
 > Date: Mon, 16 Apr 2018 23:20:08 +0200
 > 
 >  Can you please provide dmesg also, particularily the type of the SATA
 >  controller? I have report of satalink(4) deadlocking after the
 >  conversion of wd(4) to use dksubr, which I have not yet been able to
 >  further investigate. Might be the same or similar issue.
 >  
 > 

 ThinkPad x60s flashed with coreboot.

 [    1.000000] NetBSD 8.99.14 (GENERIC) #0: Sat Apr 14 01:53:38 UTC 2018
 [    1.000000]
 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/i386/compile/GENERIC
 [    1.000000] total memory = 2038 MB
 [    1.000000] avail memory = 1984 MB
 [    1.000000] timecounter: Timecounters tick every 10.000 msec
 [    1.000000] Kernelized RAIDframe activated
 [    1.000000] running cgd selftest aes-xts-256 aes-xts-512 done
 [    1.000000] timecounter: Timecounter "i8254" frequency 1193182 Hz
 quality 100
 [    1.000003] LENOVO 17045LG (ThinkPad X60s)
 [    1.000003] mainbus0 (root)
 [    1.000003] ACPI: RSDP 0x00000000000F61B0 000024 (v02 CORE  )
 [    1.000003] ACPI: XSDT 0x000000007F7260E0 000054 (v01 CORE   COREBOOT
 00000000 CORE 00000000)
 [    1.000003] ACPI: FACP 0x000000007F7292A0 0000F4 (v04 CORE   COREBOOT
 00000000 CORE 00000000)
 [    1.000003] ACPI: DSDT 0x000000007F726280 00301E (v03 COREv4 COREBOOT
 20090419 INTL 20160318)
 [    1.000003] ACPI: FACS 0x000000007F726240 000040
 [    1.000003] ACPI: SSDT 0x000000007F7293A0 000531 (v02 CORE   COREBOOT
 0000002A CORE 0000002A)
 [    1.000003] ACPI: MCFG 0x000000007F7298E0 00003C (v01 CORE   COREBOOT
 00000000 CORE 00000000)
 [    1.000003] ACPI: TCPA 0x000000007F729920 000032 (v02 CORE   COREBOOT
 00000000 CORE 00000000)
 [    1.000003] ACPI: APIC 0x000000007F729960 000068 (v01 CORE   COREBOOT
 00000000 CORE 00000000)
 [    1.000003] ACPI: HPET 0x000000007F72B9D0 000038 (v01 CORE   COREBOOT
 00000000 CORE 00000000)
 [    1.000003] ACPI: 2 ACPI AML tables successfully acquired and loaded
 [    1.000003] ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 0x20,
 24 pins
 [    1.000003] cpu0 at mainbus0 apid 0
 [    1.000003] cpu0: Genuine Intel(R) CPU           L2400  @ 1.66GHz, id
 0x6e8
 [    1.000003] cpu0: package 0, core 0, smt 0
 [    1.000003] cpu1 at mainbus0 apid 1
 [    1.000003] cpu1: Genuine Intel(R) CPU           L2400  @ 1.66GHz, id
 0x6e8
 [    1.000003] cpu1: package 0, core 1, smt 0
 [    1.000003] acpi0 at mainbus0: Intel ACPICA 20180313
 [    1.000003] acpi0: X/RSDT: OemId <CORE  ,COREBOOT,00000000>, AslId
 <CORE,00000000>
 [    1.000003] acpi0: MCFG: segment 0, bus 0-63, address 0x00000000f0000000
 [    1.000003] acpi0: SCI interrupting at int 9
 [    1.000003] timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz
 quality 1000
 [    1.007519] hpet0 at acpi0: high precision event timer (mem
 0xfed00000-0xfed00400)
 [    1.007519] timecounter: Timecounter "hpet0" frequency 14318180 Hz
 quality 2000
 [    1.007683] acpiec0 at acpi0 (EC, PNP0C09-0): io 0x62,0x66
 [    1.007683] PDRC (PNP0C02) at acpi0 not configured
 [    1.007683] acpivga0 at acpi0 (GFX0): ACPI Display Adapter
 [    1.007683] acpiout0 at acpivga0 (VGA0, 0x0100): ACPI Display Output
 Device
 [    1.007683] acpiout1 at acpivga0 (TV0, 0x0240): ACPI Display Output
 Device
 [    1.007683] acpiout2 at acpivga0 (LCD0, 0x0410): ACPI Display Output
 Device
 [    1.007683] acpiout2: brightness levels: [0-15]
 [    1.007683] acpivga0: connected output devices:
 [    1.007683] acpivga0:   0x0100 (acpiout0): VGA Analog Monitor, index
 0, port 0, head 0, bios detect
 [    1.007683] acpivga0:   0x0240 (acpiout1): TV/HDTV Monitor, index 0,
 port 4, head 0, bios detect
 [    1.007683] acpivga0:   0x0410 (acpiout2): Int. Digital Flat Panel,
 index 0, port 1, head 0, bios detect
 [    1.007683] thinkpad0 at acpi0 (HKEY, IBM0068)
 [    1.007683] acpiacad0 at acpi0 (AC, ACPI0003-0): ACPI AC Adapter
 [    1.007683] acpibat0 at acpi0 (BAT0, PNP0C0A-0): ACPI Battery
 [    1.007683] acpibat0: SANYO LION rechargeable battery
 [    1.007683] acpibat0: granularity: low->warn 0.001 Wh, warn->full
 0.001 Wh
 [    1.007683] acpibat1 at acpi0 (BAT1, PNP0C0A-0): ACPI Battery
 [    1.007683] acpibut0 at acpi0 (SLPB, PNP0C0E): ACPI Sleep Button
 [    1.007683] acpilid0 at acpi0 (LID, PNP0C0D): ACPI Lid Switch
 [    1.007683] FWH (INT0800) at acpi0 not configured
 [    1.007683] MATH (PNP0C04) at acpi0 not configured
 [    1.007683] LDRC (PNP0C02) at acpi0 not configured
 [    1.007683] attimer1 at acpi0 (TIMR, PNP0100): io 0x40-0x43,0x50-0x53
 irq 0
 [    1.007683] pckbc1 at acpi0 (PS2K, PNP0303) (kbd port): io 0x60,0x64
 irq 1
 [    1.007683] pckbc2 at acpi0 (PS2M, PNP0F13) (aux port): irq 12
 [    1.007683] COMA (PNP0501) at acpi0 not configured
 [    1.007683] acpiacad1 at acpi0 (DOCK, ACPI0003-0): ACPI AC Adapter
 [    1.007683] acpitz0 at acpi0 (THM0): cpu0 cpu1
 [    1.007683] acpitz0: active cooling level 0: 80.0C
 [    1.007683] acpitz0: levels: critical 100.0 C, passive 90.0 C
 [    1.007683] acpifan0 at acpi0 (FAN, PNP0C0B): ACPI Fan
 [    1.007683] acpitz1 at acpi0 (THM1): cpu0 cpu1
 [    1.007683] acpitz1: levels: critical 99.0 C, passive 94.0 C, passive
 cooling
 [    1.007683] CTBL (BOOT0000) at acpi0 not configured
 [    1.007683] apm0 at acpi0: Power Management spec V1.2
 [    1.007683] pckbd0 at pckbc1 (kbd slot)
 [    1.007683] pckbc1: using irq 1 for kbd slot
 [    1.007683] wskbd0 at pckbd0: console keyboard
 [    1.007683] pms0 at pckbc1 (aux slot)
 [    1.007683] pms0: Failed to initialize an ALPS device.
 [    1.007683] pckbc1: using irq 12 for aux slot
 [    1.007683] wsmouse0 at pms0 mux 0
 [    1.007683] pci0 at mainbus0 bus 0: configuration mode 1
 [    1.007683] pci0: i/o space, memory space enabled, rd/line, rd/mult,
 wr/inv ok
 [    1.007683] pchb0 at pci0 dev 0 function 0: vendor 8086 product 27a0
 (rev. 0x03)
 [    1.007683] agp0 at pchb0: i915-family chipset
 [    1.007683] agp0: detected 7932k stolen memory
 [    1.007683] agp0: aperture at 0xd0000000, size 0x10000000
 [    1.007683] i915drmkms0 at pci0 dev 2 function 0: vendor 8086 product
 27a2 (rev. 0x03)
 [    1.007683] drm: Memory usable by graphics device = 256M
 [    1.007683] drm: Supports vblank timestamp caching Rev 2 (21.10.2013).
 [    1.007683] drm: Driver supports precise vblank timestamp query.
 [    1.007683] i915drmkms0: interrupting at ioapic0 pin 16 (i915)
 [    1.007683] drm: initialized overlay support
 [    1.007683] intelfb0 at i915drmkms0
 [    1.007683] i915drmkms0: info: registered panic notifier
 [    1.007683] intelfb0: framebuffer at 0xdb35b000, size 1024x768, depth
 32, stride 4096
 [    1.007683] wsdisplay0 at intelfb0 kbdmux 1: console (default, vt100
 emulation), using wskbd0
 [    1.007683] wsmux1: connecting to wsdisplay0
 [    1.007683] vendor 8086 product 27a6 (miscellaneous display, revision
 0x03) at pci0 dev 2 function 1 not configured
 [    1.007683] hdaudio0 at pci0 dev 27 function 0: HD Audio Controller
 [    1.007683] hdaudio0: interrupting at msi0 vec 0
 [    1.007683] hdafg0 at hdaudio0: vendor 11d4 product 1981
 [    1.007683] hdafg0: DAC00 2ch: Speaker [Jack & Built-In]
 [    1.007683] hdafg0: ADC01 2ch: CD [Built-In], Mic In [Jack & Built-In]
 [    1.007683] hdafg0: DIG02 2ch: SPDIF Out [Jack]
 [    1.007683] hdafg0: 2ch/2ch 8000Hz 11025Hz 16000Hz 22050Hz 32000Hz
 44100Hz 48000Hz PCM16 PCM20 PCM24 AC3
 [    1.007683] audio0 at hdafg0: full duplex, playback, capture, mmap,
 independent
 [    1.007683] hdafg0: Virtual format configured - Format SLINEAR,
 precision 16, channels 2, frequency 48000
 [    1.007683] hdafg0: Latency: 128 milliseconds
 [    1.007683] hdvsmfg at hdaudio0 not configured
 [    1.007683] ppb0 at pci0 dev 28 function 0: vendor 8086 product 27d0
 (rev. 0x02)
 [    1.007683] ppb0: PCI Express capability version 1 <Root Port of
 PCI-E Root Complex> x1 @ 2.5GT/s
 [    1.007683] pci1 at ppb0 bus 1
 [    1.007683] pci1: i/o space, memory space enabled, rd/line, wr/inv ok
 [    1.007683] wm0 at pci1 dev 0 function 0: Intel i82573L Gigabit
 Ethernet (rev. 0x00)
 [    1.007683] wm0: interrupting at msi1 vec 0
 [    1.007683] wm0: PCI-Express bus
 [    1.007683] wm0: ASPM L0s and L1 are disabled to workaround the errata.
 [    1.007683] wm0: 64 words (8 address bits) SPI EEPROM
 [    1.007683] wm0: Ethernet address 00:16:d3:
 [    1.007683] wm0: 0x2a4440<SPI,IOH_VALID,PCIE,ASF_FIRM,AMT,WOL>
 [    1.007683] makphy0 at wm0 phy 1: Marvell 88E1111 Gigabit PHY, rev. 2
 [    1.007683] makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX,
 1000baseT, 1000baseT-FDX, auto
 [    1.007683] ppb1 at pci0 dev 28 function 1: vendor 8086 product 27d2
 (rev. 0x02)
 [    1.007683] ppb1: PCI Express capability version 1 <Root Port of
 PCI-E Root Complex> x1 @ 2.5GT/s
 [    1.007683] pci2 at ppb1 bus 2
 [    1.007683] pci2: i/o space, memory space enabled, rd/line, wr/inv ok
 [    1.007683] ral0 at pci2 dev 0 function 0: vendor 1814 product 3090
 (rev. 0x00)
 [    1.007683] ral0: interrupting at ioapic0 pin 17
 [    1.007683] ral0: 802.11 address 6c:62:6d:
 [    1.007683] ral0: MAC/BBP RT3090 (rev 0x3213), RF RT3020 (MIMO 1T1R)
 [    1.007683] ppb2 at pci0 dev 28 function 2: vendor 8086 product 27d4
 (rev. 0x02)
 [    1.007683] ppb2: PCI Express capability version 1 <Root Port of
 PCI-E Root Complex> x1 @ 2.5GT/s
 [    1.007683] pci3 at ppb2 bus 3
 [    1.007683] pci3: i/o space, memory space enabled, rd/line, wr/inv ok
 [    1.007683] ppb3 at pci0 dev 28 function 3: vendor 8086 product 27d6
 (rev. 0x02)
 [    1.007683] ppb3: PCI Express capability version 1 <Root Port of
 PCI-E Root Complex> x1 @ 2.5GT/s
 [    1.007683] pci4 at ppb3 bus 4
 [    1.007683] pci4: i/o space, memory space enabled, rd/line, wr/inv ok
 [    1.007683] uhci0 at pci0 dev 29 function 0: vendor 8086 product 27c8
 (rev. 0x02)
 [    1.007683] uhci0: interrupting at ioapic0 pin 16
 [    1.007683] usb0 at uhci0: USB revision 1.0
 [    1.007683] uhci1 at pci0 dev 29 function 1: vendor 8086 product 27c9
 (rev. 0x02)
 [    1.007683] uhci1: interrupting at ioapic0 pin 17
 [    1.007683] usb1 at uhci1: USB revision 1.0
 [    1.007683] uhci2 at pci0 dev 29 function 2: vendor 8086 product 27ca
 (rev. 0x02)
 [    1.007683] uhci2: interrupting at ioapic0 pin 18
 [    1.007683] usb2 at uhci2: USB revision 1.0
 [    1.007683] uhci3 at pci0 dev 29 function 3: vendor 8086 product 27cb
 (rev. 0x02)
 [    1.007683] uhci3: interrupting at ioapic0 pin 19
 [    1.007683] usb3 at uhci3: USB revision 1.0
 [    1.007683] ehci0 at pci0 dev 29 function 7: vendor 8086 product 27cc
 (rev. 0x02)
 [    1.007683] ehci0: interrupting at ioapic0 pin 19
 [    1.007683] ehci0: EHCI version 1.0
 [    1.007683] ehci0: 4 companion controllers, 2 ports each: uhci0 uhci1
 uhci2 uhci3
 [    1.007683] usb4 at ehci0: USB revision 2.0
 [    1.007683] ppb4 at pci0 dev 30 function 0: vendor 8086 product 2448
 (rev. 0xe2)
 [    1.007683] pci5 at ppb4 bus 5
 [    1.007683] pci5: i/o space, memory space enabled
 [    1.007683] cbb0 at pci5 dev 0 function 0: vendor 1180 product 0476
 (rev. 0xb4)
 [    1.007683] fwohci0 at pci5 dev 0 function 1: vendor 1180 product
 0552 (rev. 0x09)
 [    1.007683] fwohci0: interrupting at ioapic0 pin 17
 [    1.007683] fwohci0: OHCI version 1.10 (ROM=0)
 [    1.007683] fwohci0: No. of Isochronous channels is 4.
 [    1.007683] fwohci0: EUI64 00:00:00:00:00:00:00:00
 [    1.007683] fwohci0: Phy 1394a available S400, 2 ports.
 [    1.007683] fwohci0: Link S400, max_rec 2048 bytes.
 [    1.007683] ieee1394if0 at fwohci0: IEEE1394 bus
 [    1.007683] fwip0 at ieee1394if0: IP over IEEE1394
 [    1.007683] fwohci0: Initiate bus reset
 [    1.007683] sdhc0 at pci5 dev 0 function 2: vendor 1180 product 0822
 (rev. 0x18)
 [    1.007683] sdhc0: interrupting at ioapic0 pin 18
 [    1.007683] sdhc0: SDHC 1.0, rev 2, SDMA, 33000 kHz, 3.3V, 512 byte
 blocks
 [    1.007683] sdmmc0 at sdhc0 slot 0
 [    1.007683] vendor 1180 product 0843 (miscellaneous system) at pci5
 dev 0 function 3 not configured
 [    1.007683] cbb0: cacheline 0x0 lattimer 0x40
 [    1.007683] cbb0: bhlc 0x824000
 [    1.007683] cbb0: interrupting at ioapic0 pin 16
 [    1.007683] cardslot0 at cbb0
 [    1.007683] cardbus0 at cardslot0: bus 6
 [    1.007683] pcmcia0 at cardslot0
 [    1.007683] ichlpcib0 at pci0 dev 31 function 0: vendor 8086 product
 27b9 (rev. 0x02)
 [    1.007683] timecounter: Timecounter "ichlpcib0" frequency 3579545 Hz
 quality 1000
 [    1.007683] ichlpcib0: 24-bit timer
 [    1.007683] tco0 at ichlpcib0: TCO (watchdog) timer configured.
 [    1.007683] tco0: Min/Max interval 1/367 seconds
 [    1.007683] piixide0 at pci0 dev 31 function 1: Intel 82801GB/GR IDE
 Controller (ICH7) (rev. 0x02)
 [    1.007683] piixide0: bus-master DMA support present
 [    1.007683] piixide0: primary channel configured to compatibility mode
 [    1.007683] piixide0: primary channel ignored (disabled)
 [    1.007683] piixide0: secondary channel configured to compatibility mode
 [    1.007683] piixide0: secondary channel ignored (disabled)
 [    1.007683] ahcisata0 at pci0 dev 31 function 2: vendor 8086 product
 27c5 (rev. 0x02)
 [    1.007683] ahcisata0: interrupting at ioapic0 pin 16
 [    1.007683] ahcisata0: AHCI revision 1.10, 4 ports, 32 slots, CAP
 0xdf12ff03<PSC,SSC,PMD,SPM,ISS=0x1=Gen1,SCLO,SAL,SALP,SSS,SMPS,SNCQ,S64A>
 [    1.007683] atabus0 at ahcisata0 channel 0
 [    1.007683] ichsmb0 at pci0 dev 31 function 3: vendor 8086 product
 27da (rev. 0x02)
 [    1.007683] ichsmb0: interrupting at ioapic0 pin 23
 [    1.007683] iic0 at ichsmb0: I2C bus
 [    1.007683] isa0 at ichlpcib0
 [    1.007683] tpm0 at isa0 iomem 0xfed40000-0xfed44fff irq 7: ATML
 97SC3203 rev 0x5
 [    1.007683] pcppi0 at isa0 port 0x61
 [    1.007683] midi0 at pcppi0: PC speaker
 [    1.007683] sysbeep0 at pcppi0
 [    1.007683] isapnp0 at isa0 port 0x279
 [    1.007683] attimer1: attached to pcppi0
 [    1.007683] isapnp0: no ISA Plug 'n Play devices found
 [    1.007683] acpicpu0 at cpu0: ACPI CPU
 [    1.007683] acpicpu0: C1: FFH, lat   1 us, pow  1000 mW
 [    1.007683] acpicpu0: C2: FFH, lat   1 us, pow   500 mW
 [    1.007683] acpicpu0: C3: FFH, lat  17 us, pow   250 mW
 [    1.007683] acpicpu0: P0: FFH, lat   1 us, pow 31000 mW, 1666 MHz
 [    1.007683] acpicpu0: P1: FFH, lat   1 us, pow 22050 mW, 1333 MHz
 [    1.007683] acpicpu0: P2: FFH, lat   1 us, pow 13100 mW, 1000 MHz
 [    1.007683] acpicpu0: T0: I/O, lat   1 us, pow     0 mW, 100 %
 [    1.007683] acpicpu0: T1: I/O, lat   1 us, pow     0 mW,  88 %
 [    1.007683] acpicpu0: T2: I/O, lat   1 us, pow     0 mW,  76 %
 [    1.007683] acpicpu0: T3: I/O, lat   1 us, pow     0 mW,  64 %
 [    1.007683] acpicpu0: T4: I/O, lat   1 us, pow     0 mW,  52 %
 [    1.007683] acpicpu0: T5: I/O, lat   1 us, pow     0 mW,  40 %
 [    1.007683] acpicpu0: T6: I/O, lat   1 us, pow     0 mW,  28 %
 [    1.007683] acpicpu0: T7: I/O, lat   1 us, pow     0 mW,  16 %
 [    1.007683] coretemp0 at cpu0: thermal sensor, 1 C resolution, Tjmax=100
 [    1.007683] acpicpu1 at cpu1: ACPI CPU
 [    1.007683] coretemp1 at cpu1: thermal sensor, 1 C resolution, Tjmax=100
 [    1.007683] fwohci0: BUS reset
 [    1.007683] fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
 [    1.007683] ieee1394if0: 1 nodes, maxhop <= 0 cable IRM irm(0) (me)
 [    1.007683] ieee1394if0: bus manager 0
 [    1.007683] DRM error in i915_irq_handler: pipe A underrun
 [    1.007683] timecounter: Timecounter "clockinterrupt" frequency 100
 Hz quality 0
 [    1.616804] IPsec: Initialized Security Association Processing.
 [    1.626933] uhub0 at usb0: NetBSD (0000) UHCI root hub (0000), class
 9/0, rev 1.00/1.00, addr 1
 [    1.626933] uhub0: 2 ports with 2 removable, self powered
 [    1.626933] uhub1 at usb1: NetBSD (0000) UHCI root hub (0000), class
 9/0, rev 1.00/1.00, addr 1
 [    1.626933] uhub1: 2 ports with 2 removable, self powered
 [    1.626933] uhub2 at usb2: NetBSD (0000) UHCI root hub (0000), class
 9/0, rev 1.00/1.00, addr 1
 [    1.626933] uhub2: 2 ports with 2 removable, self powered
 [    1.626933] uhub3 at usb3: NetBSD (0000) UHCI root hub (0000), class
 9/0, rev 1.00/1.00, addr 1
 [    1.626933] uhub3: 2 ports with 2 removable, self powered
 [    1.626933] uhub4 at usb4: NetBSD (0000) EHCI root hub (0000), class
 9/0, rev 2.00/1.00, addr 1
 [    1.626933] uhub4: 8 ports with 8 removable, self powered
 [    1.706986] acpiacad0: AC adapter offline.
 [    1.706986] ahcisata0 port 0: device present, speed: 1.5Gb/s
 [    2.927783] ehci0: handing over full speed device on port 8 to uhci3
 [    3.227979] wd0 at atabus0 drive 0
 [    3.227979] wd0: <SanDisk SDSSDP064G>
 [    3.227979] wd0: drive supports 1-sector PIO transfers, LBA48 addressing
 [    3.227979] wd0: 61057 MB, 124053 cyl, 16 head, 63 sec, 512
 bytes/sect x 125045424 sectors
 [    3.227979] wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA
 mode 6 (Ultra/133), NCQ (32 tags)
 [    3.227979] wd0(ahcisata0:0:0): using PIO mode 4, DMA mode 2,
 Ultra-DMA mode 6 (Ultra/133) (using DMA), NCQ (31 tags)
 [    3.227979] WARNING: 1 error while detecting hardware; check system log.
 [    3.227979] boot device: wd0
 [    3.227979] root on wd0a dumps on wd0b
 [    3.227979] root file system type: ffs
 [    3.227979] kern.module.path=/stand/i386/8.99.14/modules
 [    3.237987] clock: unknown CMOS layout
 [    3.238033] ral0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
 [    3.238033] ral0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps
 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
 [    3.378078] Kernel RNG "kernel" long run test FAILURE: Run of 26 0s found
 [    3.378078] cprng kernel: failed statistical RNG test
 [    3.778340] wsdisplay0: screen 1 added (default, vt100 emulation)
 [    3.778340] wsdisplay0: screen 2 added (default, vt100 emulation)
 [    3.788347] wsdisplay0: screen 3 added (default, vt100 emulation)
 [    3.788347] wsdisplay0: screen 4 added (default, vt100 emulation)
 [    4.879060] ugen0 at uhub3 port 2
 [    4.879060] ugen0: STMicroelectronics (0x483) Biometric Coprocessor
 (0x2016), rev 1.00/0.01, addr 2





 I also tested the SSD in a ThinkPad x61s but I don't have a recent dmesg
 from that system at hand (can get it over the next couple of days if
 needed). Here's a slightly old one from last year
 http://dmesgd.nycbug.org/index.cgi?do=view&id=3360


 Sevan

From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: kern/53183: System stops servicing I/O requests and eventually deadlocks
Date: Tue, 17 Apr 2018 00:42:07 +0200

 Thanks, yours is AHCI, so different case.

 Can you try this patch?

 http://www.netbsd.org/~jdolecek/wd_flush_lock.diff

 This is what I discovered by code inspection for changes of wd(4) for
 dksubr conversion. I don't really expect this to change anything, but
 worth try.

 Next step would be trying with following two files downgraded to
 revision before the dksubr changes, i.e.:
 sys/dev/ata/wd.c rev. 1.433
 sys/dev/ata/wdvar.h rev. 1.44

 Jaromir

From: Sevan Janiyan <venture37@geeklan.co.uk>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: kern/53183: System stops servicing I/O requests and eventually deadlocks
Date: Sat, 21 Apr 2018 01:59:17 +0100

 Thanks for the pointers, Jaromir.

 > On 16 Apr 2018, at 23:45, Jarom=C3=ADr Dole=C4=8Dek <jaromir.dolecek@gmail=
 .com> wrote:
 >=20
 > Next step would be trying with following two files downgraded to
 > revision before the dksubr changes, i.e.:
 > sys/dev/ata/wd.c rev. 1.433
 > sys/dev/ata/wdvar.h rev. 1.44

 Tried the patch suggested then reverted back to the versions above.
 No difference. System still managed to lock up.
 Looking at the dates when these changes went in was about 6 months ago. The p=
 roblem has been around longer than 6 months for me.


 Sevan=

Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Fri, 01 Jun 2018 18:20:36 +0000
Responsible-Changed-Why:
I'd look at this, unless someone else beats me to it.


State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sun, 03 Jun 2018 18:39:55 +0000
State-Changed-Why:
Can you confirm whether dev/ata/wd.c rev. 1.439 makes any difference on your
system?


From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53183 CVS commit: src/sys/dev/ata
Date: Sun, 3 Jun 2018 18:38:36 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Sun Jun  3 18:38:36 UTC 2018

 Modified Files:
 	src/sys/dev/ata: wd.c

 Log Message:
 take mutex around check for pending flush, as the code before dksubr
 conversion had, to avoid possible race

 on my system doesn't really change behaviour, besides the test runs
 being slightly faster (3x parallell pkgsrc archive extraction, up
 to 5% difference), thought that can just be noise

 done as part of investigation for PR kern/53183 by Sevan Janiyan


 To generate a diff of this commit:
 cvs rdiff -u -r1.438 -r1.439 src/sys/dev/ata/wd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Sevan Janiyan <venture37@geeklan.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53183 (System stops servicing I/O requests and eventually
 deadlocks)
Date: Sun, 3 Jun 2018 23:48:06 +0100

 On 03/06/2018 19:39, jdolecek@NetBSD.org wrote:
 > Can you confirm whether dev/ata/wd.c rev. 1.439 makes any difference on your
 > system?

 No, same behavior - system deadlocks.


 Sevan

State-Changed-From-To: feedback->open
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Mon, 04 Jun 2018 06:36:01 +0000
State-Changed-Why:
Feedback provided.


State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Thu, 15 Nov 2018 07:17:08 +0000
State-Changed-Why:
There were some changes in the ATA stack which might have fixed this.
Can you recheck?


State-Changed-From-To: feedback->closed
State-Changed-By: sevan@NetBSD.org
State-Changed-When: Sun, 06 Jan 2019 23:04:25 +0000
State-Changed-Why:
There have been several large commits to src/external since the fixes being tested landed in the tree. I have been unable to reproduce the deadlocks since, updating src/xsrc/pkgsrc and co
Thank you for working on this, my x60s is usable again.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.