NetBSD Problem Report #52769

From martin@aprisoft.de  Tue Nov 28 14:59:32 2017
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E0F357A187
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 28 Nov 2017 14:59:31 +0000 (UTC)
Message-Id: <20171128145921.5BA555CC761@emmas.aprisoft.de>
Date: Tue, 28 Nov 2017 15:59:21 +0100 (CET)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: hang with an ffs stored in an nvme device
X-Send-Pr-Version: 3.95

>Number:         52769
>Category:       kern
>Synopsis:       hang with an ffs stored in an nvme device
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    jdolecek
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Nov 28 15:00:00 +0000 2017
>Closed-Date:    Sat Mar 17 09:13:44 +0000 2018
>Last-Modified:  Sat Mar 17 09:13:44 +0000 2018
>Originator:     Martin Husemann
>Release:        NetBSD 8.99.7
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD seven-days-to-the-wolves.aprisoft.de 8.99.7 NetBSD 8.99.7 (GENERIC) #177: Thu Nov 23 09:03:25 CET 2017 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:

I got two cvs update process hanging hard (unkillable) in this state:

Memory: 7098M Act, 385M Inact, 11M Wired, 12M Exec, 7439M File, 6607M Free
Swap: 

  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
    0 root       0    0     0K   25M CPU/5      1:23  0.00%  0.00% [system]
22820 martin    95    0    47M 8952K bioloc/5   0:15  0.00%  0.00% cvs
18588 martin   127    0    48M 9124K bioloc/2   0:08  0.00%  0.00% cvs

Tried to reboot, which didn't work.

So I dropped into ddb and got:

Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x5:  leave
db{0}> ps
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
29181    1 3   0         0   ffffe403e1d3c200             reboot tstile
3481     1 3   0        80   ffffe40368d5e080               tcsh ttyraw
18588    1 3   2   1000000   ffffe4040092db80                cvs biolock
14652    1 3   3        80   ffffe403be513640                 sh wait
13756    1 3   1        80   ffffe40397f5f460          ssh-agent select
7734     1 3   0        80   ffffe4038e491460               tcsh pause
22820    1 3   5   1000000   ffffe403ccf032c0                cvs biolock
db{0}> bt /a 0xffffe4040092db80
trace: pid 18588 lid 1 at 0xffffe4011e02b920
sleepq_block() at netbsd:sleepq_block+0x97
cv_timedwait() at netbsd:cv_timedwait+0xa9
bbusy() at netbsd:bbusy+0x7b
getblk() at netbsd:getblk+0x5a
bio_doread() at netbsd:bio_doread+0x1d
bread() at netbsd:bread+0x1a
ffs_update.part.3() at netbsd:ffs_update.part.3+0x112
ffs_truncate() at netbsd:ffs_truncate+0xd91
ufs_truncate_retry() at netbsd:ufs_truncate_retry+0x79
ufs_inactive() at netbsd:ufs_inactive+0x1ac
VOP_INACTIVE() at netbsd:VOP_INACTIVE+0x33
vrelel() at netbsd:vrelel+0x168
ufs_remove() at netbsd:ufs_remove+0xab
VOP_REMOVE() at netbsd:VOP_REMOVE+0x37
do_sys_unlinkat.isra.5() at netbsd:do_sys_unlinkat.isra.5+0x1eb
syscall() at netbsd:syscall+0x1bc
--- syscall (number 10) ---
7a93d4312a7a:

and:

db{0}> bt /a ffffe403ccf032c0
trace: pid 22820 lid 1 at 0xffffe4011de34820
sleepq_block() at netbsd:sleepq_block+0x97
cv_timedwait() at netbsd:cv_timedwait+0xa9
bbusy() at netbsd:bbusy+0x7b
getblk() at netbsd:getblk+0x5a
bio_doread() at netbsd:bio_doread+0x1d
bread() at netbsd:bread+0x1a
ffs_alloccg() at netbsd:ffs_alloccg+0xcc
ffs_hashalloc() at netbsd:ffs_hashalloc+0x28
ffs_alloc() at netbsd:ffs_alloc+0x11e
ffs_balloc() at netbsd:ffs_balloc+0x570
ufs_mkdir() at netbsd:ufs_mkdir+0x22b
VOP_MKDIR() at netbsd:VOP_MKDIR+0x3b
do_sys_mkdirat.isra.7.constprop.12() at netbsd:do_sys_mkdirat.isra.7.constprop.1
2+0x167
syscall() at netbsd:syscall+0x1bc
--- syscall (number 136) ---
78c2024a686a:


The cvses where updating different trees on the same file system.

Got a crash dump and netbsd.gdb, if anyone is interestd (internal
note: seven/netbsd.0.*)


>How-To-Repeat:
n/a

>Fix:
n/a

>Release-Note:

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/52769: Deadlock in ffs?
Date: Wed, 29 Nov 2017 08:47:27 +0100

 Chuck pointed out that the threads are just waiting on buffers to be unlocked,
 so this is likely more a case of a missed interrupt (or similar).

 The cvs updates ran on:

 /dev/ld0a on /work type ffs (noatime, local)

 that is:

 nvme0 at pci2 dev 0 function 0: vendor 144d product a802 (rev. 0x01)
 nvme0: NVMe 1.1
 nvme0: interrupting at ioapic0 pin 16
 nvme0: SAMSUNG MZVPV256HDGL-00000, firmware BXW7300Q, serial S1XWNYAH408386
 ld0 at nvme0 nsid 1

 and a full dmesg is below.

 Martin

 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
     2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017
     The NetBSD Foundation, Inc.  All rights reserved.
 Copyright (c) 1982, 1986, 1989, 1991, 1993
     The Regents of the University of California.  All rights reserved.

 NetBSD 8.99.7 (GENERIC) #177: Thu Nov 23 09:03:25 CET 2017
 	martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/amd64/compile/GENERIC
 total memory = 16382 MB
 avail memory = 15883 MB
 timecounter: Timecounters tick every 10.000 msec
 Kernelized RAIDframe activated
 running cgd selftest aes-xts-256 aes-xts-512 done
 timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
 System manufacturer System Product Name (System Version)
 mainbus0 (root)
 ACPI: RSDP 0x00000000000FBAD0 000024 (v02 ACPIAM)
 ACPI: XSDT 0x00000000CFE90100 00005C (v01 030811 XSDT1042 20110308 MSFT 00000097)
 ACPI: FACP 0x00000000CFE90290 0000F4 (v03 030811 FACP1042 20110308 MSFT 00000097)
 Firmware Warning (ACPI): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20171110/tbfadt-642)
 ACPI: DSDT 0x00000000CFE90450 00E040 (v01 A1585  A1585000 00000000 INTL 20060113)
 ACPI: FACS 0x00000000CFEA8000 000040
 ACPI: APIC 0x00000000CFE90390 00007C (v01 030811 APIC1042 20110308 MSFT 00000097)
 ACPI: MCFG 0x00000000CFE90410 00003C (v01 030811 OEMMCFG  20110308 MSFT 00000097)
 ACPI: OEMB 0x00000000CFEA8040 000072 (v01 030811 OEMB1042 20110308 MSFT 00000097)
 ACPI: SRAT 0x00000000CFE9F8A0 000108 (v01 AMD    FAM_F_10 00000002 AMD  00000001)
 ACPI: HPET 0x00000000CFE9F9B0 000038 (v01 030811 OEMHPET  20110308 MSFT 00000097)
 ACPI: SSDT 0x00000000CFE9F9F0 000DA4 (v01 A M I  POWERNOW 00000001 AMD  00000001)
 ACPI: Executed 3 blocks of module-level executable AML code
 ACPI: 2 ACPI AML tables successfully acquired and loaded
 ioapic0 at mainbus0 apid 6: pa 0xfec00000, version 0x21, 24 pins
 cpu0 at mainbus0 apid 0
 cpu0: AMD Phenom(tm) II X6 1075T Processor, id 0x100fa0
 cpu0: package 0, core 0, smt 0
 cpu1 at mainbus0 apid 1
 cpu1: AMD Phenom(tm) II X6 1075T Processor, id 0x100fa0
 cpu1: package 0, core 1, smt 0
 cpu2 at mainbus0 apid 2
 cpu2: AMD Phenom(tm) II X6 1075T Processor, id 0x100fa0
 cpu2: package 0, core 2, smt 0
 cpu3 at mainbus0 apid 3
 cpu3: AMD Phenom(tm) II X6 1075T Processor, id 0x100fa0
 cpu3: package 0, core 3, smt 0
 cpu4 at mainbus0 apid 4
 cpu4: AMD Phenom(tm) II X6 1075T Processor, id 0x100fa0
 cpu4: package 0, core 4, smt 0
 cpu5 at mainbus0 apid 5
 cpu5: AMD Phenom(tm) II X6 1075T Processor, id 0x100fa0
 cpu5: package 0, core 5, smt 0
 acpi0 at mainbus0: Intel ACPICA 20171110
 acpi0: X/RSDT: OemId <030811,XSDT1042,20110308>, AslId <MSFT,00000097>
 acpi0: MCFG: segment 0, bus 0-255, address 0x00000000e0000000
 acpi0: SCI interrupting at int 9
 timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
 hpet0 at acpi0: high precision event timer (mem 0xfed00000-0xfed00400)
 timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
 acpiec0 at acpi0 (EC0, PNP0C09): io 0x62,0x66
 attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x43 irq 0
 pcppi1 at acpi0 (SPKR, PNP0800): io 0x61
 spkr0 at pcppi1: PC Speaker
 wsbell at spkr0 not configured
 midi0 at pcppi1: PC speaker
 sysbeep0 at pcppi1
 UAR1 (PNP0501) at acpi0 not configured
 aibs0 at acpi0 (ASOC, ATK0110-16843024): ASUSTeK AI Booster
 OMSC (PNP0C02) at acpi0 not configured
 RMSC (PNP0C02) at acpi0 not configured
 SIOR (PNP0C02) at acpi0 not configured
 PCIE (PNP0C02) at acpi0 not configured
 RMEM (PNP0C01) at acpi0 not configured
 acpibut0 at acpi0 (PWRB, PNP0C0C-170): ACPI Power Button
 acpiwmi0 at acpi0 (AOD, PNP0C14-0): ACPI WMI Interface
 acpiwmibus at acpiwmi0 not configured
 attimer1: attached to pcppi1
 pci0 at mainbus0 bus 0: configuration mode 1
 pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
 pchb0 at pci0 dev 0 function 0: vendor 1002 product 5957 (rev. 0x00)
 ppb0 at pci0 dev 2 function 0: vendor 1002 product 5978 (rev. 0x00)
 ppb0: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x16 @ 5.0GT/s
 ppb0: link is x2 @ 2.5GT/s
 pci1 at ppb0 bus 6
 pci1: i/o space, memory space enabled, rd/line, wr/inv ok
 nouveau0 at pci1 dev 0 function 0: vendor 10de product 0e22 (rev. 0xa1)
 nouveau0: interrupting at ioapic0 pin 18 (nouveau)
 Zone  kernel: Available graphics memory: 5488618 kiB
 Zone   dma32: Available graphics memory: 2097152 kiB
 00001030
 00010130
 00002261
 drm: Supports vblank timestamp caching Rev 2 (21.10.2013).
 drm: Driver supports precise vblank timestamp query.
 nouveau0: info: No connectors reported connected with modes
 drm: Cannot find any crtc or sizes - going 1024x768
 nouveaufb0 at nouveau0
 nouveau0: info: registered panic notifier
 nouveaufb0: framebuffer at 0xffff8001434bf000, size 1024x768, depth 32, stride 4096
 wsdisplay0 at nouveaufb0 kbdmux 1
 wsmux1: connecting to wsdisplay0
 hdaudio0 at pci1 dev 0 function 1: HD Audio Controller
 hdaudio0: interrupting at ioapic0 pin 19
 hdafg0 at hdaudio0: vendor 10de product 0012
 hdafg0: DP00 8ch: Digital Out [Jack]
 hdafg0: 8ch/0ch 48000Hz PCM16*
 hdafg1 at hdaudio0: vendor 10de product 0012
 hdafg1: DP00 8ch: Digital Out [Jack]
 hdafg1: 8ch/0ch 48000Hz PCM16*
 hdafg2 at hdaudio0: vendor 10de product 0012
 hdafg2: DP00 8ch: Digital Out [Jack]
 hdafg2: 8ch/0ch 48000Hz PCM16*
 hdafg3 at hdaudio0: vendor 10de product 0012
 hdafg3: DP00 8ch: Digital Out [Jack]
 hdafg3: 8ch/0ch 48000Hz PCM16*
 ppb1 at pci0 dev 4 function 0: vendor 1002 product 597a (rev. 0x00)
 ppb1: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x4 @ 5.0GT/s
 pci2 at ppb1 bus 5
 pci2: i/o space, memory space enabled, rd/line, wr/inv ok
 nvme0 at pci2 dev 0 function 0: vendor 144d product a802 (rev. 0x01)
 nvme0: NVMe 1.1
 nvme0: interrupting at ioapic0 pin 16
 nvme0: SAMSUNG MZVPV256HDGL-00000, firmware BXW7300Q, serial S1XWNYAH408386
 ld0 at nvme0 nsid 1
 ld0: 238 GB, 31130 cyl, 255 head, 63 sec, 512 bytes/sect x 500118192 sectors
 ppb2 at pci0 dev 9 function 0: vendor 1002 product 597e (rev. 0x00)
 ppb2: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x2 @ 5.0GT/s
 ppb2: link is x1 @ 2.5GT/s
 pci3 at ppb2 bus 4
 pci3: i/o space, memory space enabled, rd/line, wr/inv ok
 jmide0 at pci3 dev 0 function 0: vendor 197b product 2361 (rev. 0x10)
 jmide0: 1 PATA port, 1 SATA port
 jmide0: interrupting at ioapic0 pin 17
 ahcisata0 at jmide0
 ahcisata0: AHCI revision 1.10, 1 port, 32 slots, CAP 0xc722ff00<PSC,SSC,PMD,SPM,ISS=0x2=Gen2,SCLO,SAL,SALP,SNCQ,S64A>
 atabus0 at ahcisata0 channel 0
 jmide0: PCI IDE interface used
 jmide0: bus-master DMA support present
 jmide0: primary channel wired to native-PCI mode
 jmide0: primary channel is unused
 jmide0: secondary channel wired to native-PCI mode
 jmide0: secondary channel is PATA
 atabus1 at jmide0 channel 1
 ppb3 at pci0 dev 10 function 0: vendor 1002 product 597f (rev. 0x00)
 ppb3: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x2 @ 5.0GT/s
 ppb3: link is x1 @ 5.0GT/s
 pci4 at ppb3 bus 3
 pci4: i/o space, memory space enabled, rd/line, wr/inv ok
 xhci0 at pci4 dev 0 function 0: vendor 1033 product 0194 (rev. 0x03)
 xhci0: interrupting at ioapic0 pin 18
 xhci0: xHCI version 0.96
 usb0 at xhci0: USB revision 3.0
 usb1 at xhci0: USB revision 2.0
 ahcisata1 at pci0 dev 17 function 0: vendor 1002 product 4391 (rev. 0x40)
 ahcisata1: interrupting at ioapic0 pin 19
 ahcisata1: 64-bit DMA
 ahcisata1: AHCI revision 1.20, 6 ports, 32 slots, CAP 0xf732ff05<PSC,SSC,PMD,SPM,ISS=0x3=Gen3,SCLO,SAL,SALP,SMPS,SSNTF,SNCQ,S64A>
 atabus2 at ahcisata1 channel 0
 atabus3 at ahcisata1 channel 1
 atabus4 at ahcisata1 channel 2
 atabus5 at ahcisata1 channel 3
 atabus6 at ahcisata1 channel 4
 atabus7 at ahcisata1 channel 5
 ohci0 at pci0 dev 18 function 0: vendor 1002 product 4397 (rev. 0x00)
 ohci0: interrupting at ioapic0 pin 18
 ohci0: OHCI version 1.0, legacy support
 usb2 at ohci0: USB revision 1.0
 ehci0 at pci0 dev 18 function 2: vendor 1002 product 4396 (rev. 0x00)
 ehci0: interrupting at ioapic0 pin 17
 ehci0: dropped intr workaround enabled
 ehci0: EHCI version 1.0
 ehci0: 1 companion controller, 5 ports: ohci0
 usb3 at ehci0: USB revision 2.0
 ohci1 at pci0 dev 19 function 0: vendor 1002 product 4397 (rev. 0x00)
 ohci1: interrupting at ioapic0 pin 18
 ohci1: OHCI version 1.0, legacy support
 usb4 at ohci1: USB revision 1.0
 ehci1 at pci0 dev 19 function 2: vendor 1002 product 4396 (rev. 0x00)
 ehci1: interrupting at ioapic0 pin 17
 ehci1: dropped intr workaround enabled
 ehci1: EHCI version 1.0
 ehci1: 1 companion controller, 5 ports: ohci1
 usb5 at ehci1: USB revision 2.0
 piixpm0 at pci0 dev 20 function 0: vendor 1002 product 4385 (rev. 0x42)
 piixpm0: polling (SB800)
 iic0 at piixpm0: I2C bus
 iic1 at piixpm0: I2C bus
 iic2 at piixpm0: I2C bus
 iic3 at piixpm0: I2C bus
 hdaudio1 at pci0 dev 20 function 2: HD Audio Controller
 hdaudio1: interrupting at ioapic0 pin 16
 hdafg4 at hdaudio1: vendor 1106 product 0440
 hdafg4: DAC00 8ch: Speaker [Jack], HP Out [Jack]
 hdafg4: ADC01 2ch: Line In [Jack], Mic In [Jack]
 hdafg4: HDMI02 2ch: Digital Out [Jack]
 hdafg4: DIG03 2ch: SPDIF Out [Jack]
 hdafg4: 8ch/2ch 48000Hz PCM16*
 audio0 at hdafg4: full duplex, playback, capture, mmap, independent
 hdafg4: Virtual format configured - Format SLINEAR, precision 16, channels 2, frequency 48000
 hdafg4: Latency: 128 milliseconds
 spkr1 at audio0: PC Speaker (synthesized)
 wsbell at spkr1 not configured
 pcib0 at pci0 dev 20 function 3: vendor 1002 product 439d (rev. 0x40)
 ppb4 at pci0 dev 20 function 4: vendor 1002 product 4384 (rev. 0x40)
 pci5 at ppb4 bus 2
 pci5: i/o space, memory space enabled
 fwohci0 at pci5 dev 8 function 0: vendor 1106 product 3044 (rev. 0xc0)
 fwohci0: interrupting at ioapic0 pin 20
 fwohci0: OHCI version 1.10 (ROM=1)
 fwohci0: No. of Isochronous channels is 4.
 fwohci0: EUI64 00:1f:c6:00:00:0a:ca:1a
 fwohci0: Phy 1394a available S400, 2 ports.
 fwohci0: Link S400, max_rec 2048 bytes.
 ieee1394if0 at fwohci0: IEEE1394 bus
 fwip0 at ieee1394if0: IP over IEEE1394
 fwohci0: Initiate bus reset
 ohci2 at pci0 dev 20 function 5: vendor 1002 product 4399 (rev. 0x00)
 ohci2: interrupting at ioapic0 pin 18
 ohci2: OHCI version 1.0, legacy support
 usb6 at ohci2: USB revision 1.0
 ppb5 at pci0 dev 21 function 0: vendor 1002 product 43a0 (rev. 0x00)
 ppb5: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x1 @ 5.0GT/s
 ppb5: link is x1 @ 2.5GT/s
 pci6 at ppb5 bus 1
 pci6: i/o space, memory space enabled, rd/line, wr/inv ok
 re0 at pci6 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet (rev. 0x06)
 re0: interrupting at ioapic0 pin 16
 re0: Ethernet address bc:ae:c5:46:16:58
 re0: using 256 tx descriptors
 rgephy0 at re0 phy 7: RTL8169S/8110S/8211 1000BASE-T media interface, rev. 4
 rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
 ohci3 at pci0 dev 22 function 0: vendor 1002 product 4397 (rev. 0x00)
 ohci3: interrupting at ioapic0 pin 18
 ohci3: OHCI version 1.0, legacy support
 usb7 at ohci3: USB revision 1.0
 ehci2 at pci0 dev 22 function 2: vendor 1002 product 4396 (rev. 0x00)
 ehci2: interrupting at ioapic0 pin 17
 ehci2: dropped intr workaround enabled
 ehci2: EHCI version 1.0
 ehci2: 1 companion controller, 4 ports: ohci3
 usb8 at ehci2: USB revision 2.0
 pchb1 at pci0 dev 24 function 0: vendor 1022 product 1200 (rev. 0x00)
 pchb2 at pci0 dev 24 function 1: vendor 1022 product 1201 (rev. 0x00)
 pchb3 at pci0 dev 24 function 2: vendor 1022 product 1202 (rev. 0x00)
 amdnb_misc0 at pci0 dev 24 function 3: AMD NB Misc Configuration
 amdtemp0 at amdnb_misc0: AMD CPU Temperature Sensors (Family10h)
 pchb4 at pci0 dev 24 function 4: vendor 1022 product 1204 (rev. 0x00)
 isa0 at pcib0
 com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
 com0: console
 pckbc0 at isa0 port 0x60-0x64
 acpicpu0 at cpu0: ACPI CPU
 acpicpu0: C1: HLT, lat   0 us, pow     0 mW
 acpicpu0: C2: I/O, lat  75 us, pow     1 mW
 acpicpu0: P0: FFH, lat   4 us, pow 19507 mW, 3000 MHz
 acpicpu0: P1: FFH, lat   4 us, pow 14500 mW, 2300 MHz
 acpicpu0: P2: FFH, lat   4 us, pow 10535 mW, 1600 MHz
 acpicpu0: P3: FFH, lat   4 us, pow  6210 mW,  800 MHz
 acpicpu0: T0: I/O, lat   1 us, pow     0 mW, 100 %
 acpicpu0: T1: I/O, lat   1 us, pow     0 mW,  88 %
 acpicpu0: T2: I/O, lat   1 us, pow     0 mW,  76 %
 acpicpu0: T3: I/O, lat   1 us, pow     0 mW,  64 %
 acpicpu0: T4: I/O, lat   1 us, pow     0 mW,  52 %
 acpicpu0: T5: I/O, lat   1 us, pow     0 mW,  40 %
 acpicpu0: T6: I/O, lat   1 us, pow     0 mW,  28 %
 acpicpu0: T7: I/O, lat   1 us, pow     0 mW,  16 %
 acpicpu1 at cpu1: ACPI CPU
 acpicpu2 at cpu2: ACPI CPU
 acpicpu3 at cpu3: ACPI CPU
 acpicpu4 at cpu4: ACPI CPU
 acpicpu5 at cpu5: ACPI CPU
 fwohci0: BUS reset
 fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
 ieee1394if0: 1 nodes, maxhop <= 0 cable IRM irm(0) (me)
 ieee1394if0: bus manager 0
 timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
 timecounter: Timecounter "TSC" frequency 3010199480 Hz quality 3000
 uhub0 at usb0: vendor 1033 (0x1033) xHCI Root Hub (0000), class 9/0, rev 1.00/1.00, addr 0
 uhub0: 2 ports with 2 removable, self powered
 uhub1 at usb1: vendor 1033 (0x1033) xHCI Root Hub (0000), class 9/0, rev 2.00/1.00, addr 0
 uhub1: 2 ports with 2 removable, self powered
 IPsec: Initialized Security Association Processing.
 uhub2 at usb2: vendor 1002 (0x1002) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
 uhub2: 5 ports with 5 removable, self powered
 uhub3 at usb5: vendor 1002 (0x1002) EHCI root hub (0000), class 9/0, rev 2.00/1.00, addr 1
 uhub3: 5 ports with 5 removable, self powered
 uhub4 at usb6: vendor 1002 (0x1002) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
 uhub4: 2 ports with 2 removable, self powered
 ahcisata1 port 2: device present, speed: 1.5Gb/s
 uhub5 at usb7: vendor 1002 (0x1002) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
 uhub5: 4 ports with 4 removable, self powered
 uhub6 at usb8: vendor 1002 (0x1002) EHCI root hub (0000), class 9/0, rev 2.00/1.00, addr 1
 uhub6: 4 ports with 4 removable, self powered
 uhub7 at usb4: vendor 1002 (0x1002) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
 uhub7: 5 ports with 5 removable, self powered
 ahcisata1 port 0: device present, speed: 6.0Gb/s
 uhub8 at usb3: vendor 1002 (0x1002) EHCI root hub (0000), class 9/0, rev 2.00/1.00, addr 1
 uhub8: 5 ports with 5 removable, self powered
 wd0 at atabus2 drive 0
 wd0: <ST2000DL003-9VT166>
 wd0: drive supports 16-sector PIO transfers, LBA48 addressing
 wd0: 1863 GB, 3876021 cyl, 16 head, 63 sec, 512 bytes/sect x 3907029168 sectors
 wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133), WRITE DMA FUA, NCQ (32 tags)
 wd0(ahcisata1:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA), NCQ (31 tags)
 atapibus0 at atabus4: 1 targets
 cd0 at atapibus0 drive 0: <DRW-24D5MT, K28H2EA3236, 1.10> cdrom removable
 cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
 cd0(ahcisata1:2:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)
 boot device: wd0
 root on wd0a dumps on wd0b
 root file system type: ffs
 kern.module.path=/stand/amd64/8.99.7/modules
 uplcom0 at uhub2 port 1
 uplcom0: Prolific Technology Inc. (0x67b) USB-Serial Controller (0x2303), rev 1.10/3.00, addr 2
 ucom0 at uplcom0
 cd0(ahcisata1:2:0):  DEFERRED ERROR, key = 0x2
 uplcom1 at uhub2 port 2
 uplcom1: Prolific Technology Inc. (0x67b) USB 2.0 To COM Device (0x2303), rev 1.10/3.00, addr 3
 ucom1 at uplcom1
 uplcom2 at uhub2 port 3
 uplcom2: Prolific Technology Inc. (0x67b) USB-Serial Controller (0x2303), rev 1.10/3.00, addr 4
 ucom2 at uplcom2
 wsdisplay0: screen 0 added (default, vt100 emulation)
 wsdisplay0: screen 1 added (default, vt100 emulation)
 wsdisplay0: screen 2 added (default, vt100 emulation)
 wsdisplay0: screen 3 added (default, vt100 emulation)
 wsdisplay0: screen 4 added (default, vt100 emulation)

From: Nick Hudson <skrll@netbsd.org>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/52769: Deadlock in ffs?
Date: Wed, 29 Nov 2017 12:46:11 +0000

 This reminds me of the pool page problem on erlite where not all the memory was
 available as direct map (and therefore PMAP_MAP_POOLPAGEable) meaning that
 in low memory situations the pool wouldn't grow. A direct map page never 
 became available.

 I fixed this by making all erlite memory available as direct map via xkseg

 http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/mips/mips/pmap_machdep.c.diff?r1=1.20&r2=1.21
 http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/mips/include/vmparam.h.diff?r1=1.59&r2=1.60

 Nick

From: Chuck Silvers <chuq@chuq.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/52769: hang with an ffs stored in an nvme device
Date: Fri, 1 Dec 2017 10:45:09 -0800

 I got a copy of the dump from martin and here are some things I found.
 the two buffers that he pointed out are for different parts of the same device:

 (gdb) p *(struct buf *)0xffffe4041d8f7240
 $4 = {
   b_u = {
     u_actq = {
       tqe_next = 0xffffe4040e6b6c68, 
       tqe_prev = 0xffffe4029c033ea0
     }, 
     u_rbnode = {
       rb_nodes = {0xffffe4040e6b6c68, 0xffffe4029c033ea0}, 
       rb_info = 0x0
     }, 
     u_work = {
       wk_dummy = 0xffffe4040e6b6c68
     }
   }, 
   b_iodone = 0x0, 
   b_error = 0x0, 
   b_resid = 0x388e, 
   b_flags = 0x404, 

 #define B_ASYNC         0x00000004      /* Start I/O, do not wait. */
 #define B_COWDONE       0x00000400      /* Copy-on-write already done. */

   b_prio = 0x1, 
   b_bufsize = 0x8000, 
   b_bcount = 0x8000, 
   b_dev = 0x1300, 
   b_data = 0xffff800148420000, 
   b_blkno = 0xddd06c0, 
   b_rawblkno = 0xddd0ec0, 
   b_proc = 0x0, 
   b_saveaddr = 0x0, 
   b_private = 0x0, 
   b_dcookie = 0x0, 
   b_busy = {
     cv_opaque = {0xffffe4040092db80, 0xffffe4040092dc90, 0xffffffff8116c394}
   }, 
   b_refcnt = 0x2, 
   b_unused = 0x0, 
   b_hash = {
     le_next = 0x0, 
     le_prev = 0xffffe403f8913f40
   }, 
   b_vnbufs = {
     le_next = 0xffffe4029c033ea0, 
     le_prev = 0xffffe4040e6b6d18
   }, 
   b_freelist = {
     tqe_next = 0xffffe4041739a6c0, 
     tqe_prev = 0xffffe403b883c308
   }, 
   b_wapbllist = {
     tqe_next = 0x0, 
     tqe_prev = 0x0
   }, 
   b_lblkno = 0xddd06c0, 
   b_freelistindex = 0x1, 
   b_cflags = 0x4800010, 

 #define BC_BUSY         0x00000010      /* I/O in progress. */
 #define BC_WANTED       0x00800000      /* Process wants this buffer. */
 #define BC_VFLUSH       0x04000000      /* Buffer is being synced. */

   b_vp = 0xffffe4042b5b2bd0, 
   b_done = {
     cv_opaque = {0x0, 0xffffe4041d8f7338, 0xffffffff8116c39c}
   }, 
   b_oflags = 0x0, 
   b_objlock = 0xffffe4042b619640
 }


 (gdb) p *(struct buf *)0xffffe40294774480
 $5 = {
   b_u = {
     u_actq = {
       tqe_next = 0xffffe403b48ac368, 
       tqe_prev = 0xffffe403b48ac488
     }, 
     u_rbnode = {
       rb_nodes = {0xffffe403b48ac368, 0xffffe403b48ac488}, 
       rb_info = 0x0
     }, 
     u_work = {
       wk_dummy = 0xffffe403b48ac368
     }
   }, 
   b_iodone = 0x0, 
   b_error = 0x0, 
   b_resid = 0x49ec, 
   b_flags = 0x404, 

 #define B_ASYNC         0x00000004      /* Start I/O, do not wait. */
 #define B_COWDONE       0x00000400      /* Copy-on-write already done. */

   b_prio = 0x1, 
   b_bufsize = 0x8000, 
   b_bcount = 0x8000, 
   b_dev = 0x1300, 
   b_data = 0xffff80014cfb0000, 
   b_blkno = 0x121f1100, 
   b_rawblkno = 0x121f1900, 
   b_proc = 0x0, 
   b_saveaddr = 0x0, 
   b_private = 0x0, 
   b_dcookie = 0x0, 
   b_busy = {
     cv_opaque = {0xffffe403ccf032c0, 0xffffe403ccf033d0, 0xffffffff8116c394}
   }, 
   b_refcnt = 0x2, 
   b_unused = 0x0, 
   b_hash = {
     le_next = 0xffffe403e8c7b480, 
     le_prev = 0xffff80001d75c158
   }, 
   b_vnbufs = {
     le_next = 0xffffe403b48ac488, 
     le_prev = 0xffffe403b48ac418
   }, 
   b_freelist = {
     tqe_next = 0xffffe40412c51120, 
     tqe_prev = 0xffffe4041739a300
   }, 
   b_wapbllist = {
     tqe_next = 0x0, 
     tqe_prev = 0x0
   }, 
   b_lblkno = 0x121f1100, 
   b_freelistindex = 0x1, 
   b_cflags = 0x4800010, 

 #define BC_BUSY         0x00000010      /* I/O in progress. */
 #define BC_WANTED       0x00800000      /* Process wants this buffer. */
 #define BC_VFLUSH       0x04000000      /* Buffer is being synced. */

   b_vp = 0xffffe4042b5b2bd0, 
   b_done = {
     cv_opaque = {0x0, 0xffffe40294774578, 0xffffffff8116c39c}
   }, 
   b_oflags = 0x0, 
   b_objlock = 0xffffe4042b619640
 }



 b_dev = 0x1300 is "ld0a", which is associated with nvme0 as martin mentioned.
 the buffers are B_ASYNC, so they will be unlocked automatically by biodone().
 however the nvme driver does not have any I/O pending:

 (gdb) p *((struct nvme_softc *)nvme_cd.cd_devs[0]->dv_private)->sc_admin_q
 $9 = {
   q_sc = 0xffff80001da34000, 
   q_sq_mtx = {
     u = {
       mtxa_owner = 0x601
     }
   }, 
   q_cq_mtx = {
     u = {
       mtxa_owner = 0x601
     }
   }, 
   q_sq_dmamem = 0xffffe4042f7ceac8, 
   q_cq_dmamem = 0xffffe4042f7ceaf0, 
   q_sqtdbl = 0x1000, 
   q_cqhdbl = 0x1004, 
   q_id = 0x0, 
   q_entries = 0x20, 
   q_sq_tail = 0x4, 
   q_cq_head = 0x4, 
   q_cq_phase = 0x1, 
   q_ccb_mtx = {
     u = {
       mtxa_owner = 0x601
     }
   }, 
   q_nccbs = 0x20, 
   q_nccbs_avail = 0x21, 
   q_ccbs = 0xffffe4011d6e6000, 
   q_ccb_list = {
     sqh_first = 0xffffe4011d6e6000, 
     sqh_last = 0xffffe4011d6e6e88
   }, 
   q_ccb_prpls = 0xffffe4042f7ceb18
 }

 (gdb) p *((struct nvme_softc *)nvme_cd.cd_devs[0]->dv_private)->sc_q[0]
 $12 = {
   q_sc = 0xffff80001da34000, 
   q_sq_mtx = {
     u = {
       mtxa_owner = 0x601
     }
   }, 
   q_cq_mtx = {
     u = {
       mtxa_owner = 0x601
     }
   }, 
   q_sq_dmamem = 0xffffe4042f7ceb40, 
   q_cq_dmamem = 0xffffe4042f7ceb68, 
   q_sqtdbl = 0x1008, 
   q_cqhdbl = 0x100c, 
   q_id = 0x1, 
   q_entries = 0x400, 
   q_sq_tail = 0x3b7, 
   q_cq_head = 0x3b7, 
   q_cq_phase = 0x0, 
   q_ccb_mtx = {
     u = {
       mtxa_owner = 0x601
     }
   }, 
   q_nccbs = 0x3ff, 
   q_nccbs_avail = 0x3ff, 
   q_ccbs = 0xffff800020d5b000, 
   q_ccb_list = {
     sqh_first = 0xffff800020d78cb8, 
     sqh_last = 0xffff800020d78f10
   }, 
   q_ccb_prpls = 0xffffe4042f7ceb90
 }


 the one normal queue is empty (q_nccbs == q_nccbs_avail, and q_sq_tail == q_cq_head).
 the admin queue is a bit odd in that q_nccbs_avail is one greater than q_nccbs,
 but it does appear to also be empty since q_sq_tail == q_cq_head.


 the two buffers above, along with 554 others, are still in the bufq of ld0:

 (gdb) p ((struct ld_softc *)ld_cd.cd_devs[0]->dv_private)->sc_dksc.sc_bufq
 $1 = (struct bufq_state *) 0xffffe4042f686ce8
 (gdb) xbufq_fcfs 0xffffe4042f686ce8
 bufq 0xffffe4042f686ce8
 0xffffe4025ff23b48
 0xffffe4025ff23a28
 0xffffe4025ff23908
 0xffffe4025ff237e8
 0xffffe4025ff236c8
 ...
 0xffffe403b23b4480
 0xffffe403cfef36c0
 total buffers 556


 ld0's dk_softc also indicates that no buffers are currently in the hands of
 the underlying driver:

 (gdb) p *(struct ld_softc *)ld_cd.cd_devs[0]->dv_private
 $14 = {
   sc_dksc = {
 ...
     sc_bufq = 0xffffe4042f686ce8, 
     sc_dtype = 0xf, 
     sc_deferred = 0xffffe4025ff23c68, 
     sc_busy = 0x0, 
 ...
   sc_queuecnt = 0x0, 
 ...
   sc_maxqueuecnt = 0x3ff, 
 }


 so the bug appears to be that the dk layer is sitting on a bunch of buffers
 even though the underlying nvme driver is idle.

 one problem I see is in ld_diskstart():

 static int
 ld_diskstart(device_t dev, struct buf *bp)
 {
 	struct ld_softc *sc = device_private(dev);
 	int error;

 	if (sc->sc_queuecnt >= sc->sc_maxqueuecnt)
 		return EAGAIN;

 	if ((sc->sc_flags & LDF_MPSAFE) == 0)
 		KERNEL_LOCK(1, curlwp);

 	mutex_enter(&sc->sc_mutex);

 	if (sc->sc_queuecnt >= sc->sc_maxqueuecnt)
 		error = EAGAIN;
 	else {
 		error = (*sc->sc_start)(sc, bp);
 		if (error == 0)
 			sc->sc_queuecnt++;
 	}

 	mutex_exit(&sc->sc_mutex);

 	if ((sc->sc_flags & LDF_MPSAFE) == 0)
 		KERNEL_UNLOCK_ONE(curlwp);

 	return error;
 }



 the first sc_queuecnt check here is done without holding any locks,
 so it could race with lddone() decrementing sc_queuecnt and return EAGAIN,
 which will cause dk_start() to stop sending new I/O to the underlying
 nvme driver even though the driver queue is no longer full.
 lddone() calls dk_start(), but the buffer that ld_diskstart() is rejecting
 is not on the bufq at that point so the lddone() thread wouldn't know about it.
 just removing the unlocked check of sc_queuecnt here would fix this race.

 however, it seems unlikely that this race would result in more than 1 I/O
 becoming stuck in the bufq, and in this dump we have a bunch of them,
 so it seems very likely that there are more problems that I haven't found yet.

 the logic in dk_start() that allows two threads to be processing the bufq
 in parallel looks like it could result in losing track of a buffer entirely.
 if both threads get EAGAIN from d_diskstart then then will both try to
 save the buffer pointer in dksc->sc_deferred, and the first one will be lost.
 but that isn't the problem with the two buffers above since they are both
 still on the bufq.  the comment says this special hidey hole for one buffer
 is intended to keep the order of I/Os the same, but that isn't needed for
 correctness, and doing that in the dk layer rather than in the bufq layer
 seems poor.  if we want a way to undo a bufq_get() rather than just putting
 the buffer back on the bufq with bufq_put(), then we should add a new API
 for that rather than making callers deal with it.

 (for nvme especially it would be better to not require putting all buffers
 for a device on a single bufq at all, since that rather reduces the benefit
 of the hardware having multiple hardware queues.  but that kind of design
 change is beyond the scope of the issue at hand.)

 I'll keep looking for more bugs that could have caused this.

 -Chuck

Responsible-Changed-From-To: kern-bug-people->chs
Responsible-Changed-By: chs@NetBSD.org
Responsible-Changed-When: Fri, 01 Dec 2017 19:01:29 +0000
Responsible-Changed-Why:
I'm looking at this.


From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/52769: hang with an ffs stored in an nvme device
Date: Sat, 2 Dec 2017 09:51:03 -0000 (UTC)

 chuq@chuq.com (Chuck Silvers) writes:

 >the logic in dk_start() that allows two threads to be processing the bufq
 >in parallel looks like it could result in losing track of a buffer entirely.
 >if both threads get EAGAIN from d_diskstart then then will both try to
 >save the buffer pointer in dksc->sc_deferred, and the first one will be lost.

 Two threads cannot enter d_diskstart as the queue processing is skipped
 for sc_busy > 1. Instead the first thread will retry the queue (including
 a possible sc_deferred entry).

 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52769 CVS commit: src/sys/dev
Date: Mon, 4 Dec 2017 22:15:52 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Mon Dec  4 22:15:52 UTC 2017

 Modified Files:
 	src/sys/dev: dksubr.c

 Log Message:
 add KASSERT() just before setting sc_deferred, to ensure this does
 not trigger problem in PR kern/52769


 To generate a diff of this commit:
 cvs rdiff -u -r1.100 -r1.101 src/sys/dev/dksubr.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: Chuck Silvers <chuq@chuq.com>
Cc: gnats-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/52769: hang with an ffs stored in an nvme device
Date: Mon, 4 Dec 2017 23:20:14 +0100

 Hello Check,

 race WRT sc_deferred shouldn't be possible as noted by Michael. But I've
 added there a KASSERT() now. Can you retry with rev. 1.101 of
 sys/dev/dksubr.c to rule out that as the issue?

 Jaromir


From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/52769 
Date: Fri, 23 Feb 2018 07:40:39 +0800 (+08)

 I can't add any more debug info to this, but I can say "me too".  I just 
 encountered the same biolock hang, during a cvs update of a source tree 
 sitting on an nvme(4) device.


 +------------------+--------------------------+----------------------------+
 | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:          |
 | (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
 | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
 +------------------+--------------------------+----------------------------+

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/52769 
Date: Fri, 23 Feb 2018 08:11:33 +0800 (+08)

 On Fri, 23 Feb 2018, Paul Goyette wrote:

 > I can't add any more debug info to this, but I can say "me too".  I just 
 > encountered the same biolock hang, during a cvs update of a source tree 
 > sitting on an nvme(4) device.

 Just in case it matters, I'm on an amd64 system with MSIX interrupts:

 nvme0 at pci3 dev 0 function 0: Samsung Electronics (3rd vendor ID) product a804 (rev. 0x00)
 nvme0: NVMe 1.2
 nvme0: for admin queue interrupting at msix3 vec 0
 nvme0: Samsung SSD 960 PRO 512GB, firmware 2B6QCXP7, serial S3EWNX0K108171P
 nvme0: for io queue 1 interrupting at msix3 vec 1 affinity to cpu0
 nvme0: for io queue 2 interrupting at msix3 vec 2 affinity to cpu1
 nvme0: for io queue 3 interrupting at msix3 vec 3 affinity to cpu2
 nvme0: for io queue 4 interrupting at msix3 vec 4 affinity to cpu3
 nvme0: for io queue 5 interrupting at msix3 vec 5 affinity to cpu4
 nvme0: for io queue 6 interrupting at msix3 vec 6 affinity to cpu5
 nvme0: for io queue 7 interrupting at msix3 vec 7 affinity to cpu6
 ld0 at nvme0 nsid 1
 ld0: 476 GB, 62260 cyl, 255 head, 63 sec, 512 bytes/sect x 1000215216 sectors

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/52769 
Date: Thu, 1 Mar 2018 16:23:00 +0800 (+08)

 I ran into this one again, just a short time ago.  This time I was
 able to break into ddb and force a crash dump.  So if there is any
 debugging that anyone wants, just let me know.

 FWIW, it seems that the bug is pretty specific to the nvme drive.
 All processes that were accessing the nvme drive got hung up, yet
 processes accessing the root drive (on a "spinny" wd) were able to
 continue without problem.

 It's curious that all reported instances of this PR have involved
 a "cvs update" getting stuck in biolock, while everything else just
 ends up in tstile.  I wonder what cvs does that is so different
 from everything else?



 +------------------+--------------------------+----------------------------+
 | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:          |
 | (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
 | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
 +------------------+--------------------------+----------------------------+

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: jdolecek@netbsd.org
Subject: Re: kern/52769 
Date: Sun, 4 Mar 2018 13:43:31 +0800 (+08)

 FWIW, I hit this PR again (third time, now) on a very up-to-date kernel. 
 I have confirmed that I have rev 1.101 of sys/dev/dksubr.c so that is 
 almost certainly not the problem

 I still have a core dump available...


 +------------------+--------------------------+----------------------------+
 | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:          |
 | (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
 | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
 +------------------+--------------------------+----------------------------+

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/52769
Date: Sun, 4 Mar 2018 08:40:01 +0000

 On Thu, Mar 01, 2018 at 08:25:01AM +0000, Paul Goyette wrote:
  >  It's curious that all reported instances of this PR have involved
  >  a "cvs update" getting stuck in biolock, while everything else just
  >  ends up in tstile.  I wonder what cvs does that is so different
  >  from everything else?

 tstile is, most of the time, just what happens when things pile up
 behind other more interesting deadlocks. Though not always.

 cvs is probably causing it, because cvs does tons of silly metadata
 operations at high speed.

 Is this wapbl, or not? If not, they'll be synchronous and that might
 be causing extra excitement.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: chs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
    martin@NetBSD.org
Subject: Re: kern/52769
Date: Sun, 4 Mar 2018 22:35:37 +0800 (+08)

 On Sun, 4 Mar 2018, David Holland wrote:

 > The following reply was made to PR kern/52769; it has been noted by GNATS.
 >
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@NetBSD.org
 > Cc:
 > Subject: Re: kern/52769
 > Date: Sun, 4 Mar 2018 08:40:01 +0000
 >
 > On Thu, Mar 01, 2018 at 08:25:01AM +0000, Paul Goyette wrote:
 >  >  It's curious that all reported instances of this PR have involved
 >  >  a "cvs update" getting stuck in biolock, while everything else just
 >  >  ends up in tstile.  I wonder what cvs does that is so different
 >  >  from everything else?
 >
 > tstile is, most of the time, just what happens when things pile up
 > behind other more interesting deadlocks. Though not always.
 >
 > cvs is probably causing it, because cvs does tons of silly metadata
 > operations at high speed.
 >
 > Is this wapbl, or not? If not, they'll be synchronous and that might
 > be causing extra excitement.

 In my case YES, this is wapbl.  Not sure about Martin's original case.

From: Martin Husemann <martin@duskware.de>
To: Paul Goyette <paul@whooppee.com>
Cc: gnats-bugs@NetBSD.org, chs@NetBSD.org
Subject: Re: kern/52769
Date: Sun, 4 Mar 2018 17:45:36 +0100

 On Sun, Mar 04, 2018 at 10:35:37PM +0800, Paul Goyette wrote:
 > In my case YES, this is wapbl.  Not sure about Martin's original case.

 No wapbl on nvme for my case. But since this is nvme driver lossage, the
 file system details will only affect the chances for it to happen, not
 the possibility.

 Martin

From: Sevan Janiyan <venture37@geeklan.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/52769
Date: Sun, 4 Mar 2018 17:28:32 +0000

 I do not have an nvme based system but I frequently experience deadlocks
 on i386. The deadlocks are pretty catastrophic in that it clobbers the
 file system.

 At first I thought I was issue with the HDD, the disk I was originally
 running with was the 64Gb SATA disk my X60s came with. I swapped that
 out for a shiny 60GB sandisk SSD, issue continued.
 The file system was mounted with the log option on both disks.
 I switched to discard recently for TRIM on the SSD, system crashes.

 System is running with coreboot, thought it may be because of that, I
 had some oddities with DTrace previously which resolved itself after a
 coreboot update, still could be this bug reports make me think that
 issue lies elsewhere and I only experience these deadlocks when running CVS.

 In my case, the last time it happened, I was unable to enter DDB, it
 just sat there waiting. The console was responsive though.

 The issue in my case happens when there is a lot of CVS activity, that
 is triggered from updating from an old check out. I updated from
 Decembers checkout and that took out a large part of my checkout (the
 files in CVS directories get filled with snippets of other files).
 The recent GCC import triggered it.
 Wipping src/external, updating to regain the directory then updating
 once more sometimes triggers it.

 will try running / without any additional options beynod rw and see have
 I go.

 Sevan

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/52769
Date: Sun, 4 Mar 2018 23:51:00 -0000 (UTC)

 venture37@geeklan.co.uk (Sevan Janiyan) writes:

 > I switched to discard recently for TRIM on the SSD, system crashes.

 I wouldn't use "discard" :-|



 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: chs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	martin@NetBSD.org
Subject: Re: kern/52769: nvme lockup
Date: Mon, 5 Mar 2018 06:24:16 +0100

 On Sun, Mar 04, 2018 at 05:30:01PM +0000, Sevan Janiyan wrote:
 >  I do not have an nvme based system but I frequently experience deadlocks
 >  on i386. The deadlocks are pretty catastrophic in that it clobbers the
 >  file system.

 Can you please file a separate PR for that?

 Thanks!

 Martin

From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: Chuck Silvers <chuq@chuq.com>
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
	netbsd-bugs@netbsd.org
Subject: Re: kern/52769: hang with an ffs stored in an nvme device
Date: Wed, 14 Mar 2018 22:49:39 +0100

 --001a11444b4424f9f10567665ac6
 Content-Type: text/plain; charset="UTF-8"

 2017-12-01 19:45 GMT+01:00 Chuck Silvers <chuq@chuq.com>:
 >   q_nccbs = 0x20,
 >   q_nccbs_avail = 0x21,

 This is highly suspicious, q_nccbs_avail should always be <= q_nccbs. Good
 that the driver deadlocked, it would panic in nvme_ccb_get() once it would
 try to get the nonexisting 33th ccb from queue :)

 This got me thinking though. If the completion queue is being processed, we
 currently don't reset q_nccbs_avail until after all finished ccbs are
 processed. While this is running, any further I/O would be skipped with
 EAGAIN, if all ccbs were taken and q_nccbs_avail was 0. When the ccb
 counter is reset on the end of nvme_q_complete(), there is no outstanding
 I/O any more which would trigger another lddone() and do the queue drain,
 so the driver ceases to process anything. This scenario matches the
 described symtoms quite well.

 Can you please try patch from
 http://www.netbsd.org/~jdolecek/nvme_avail_put.diff ?

 It's compile tested only, so might need some tweaks. The idea is to reset
 the ccb counter immediatelly, so lddone() would be able to queue another
 I/O while the completion queue is being still processed. This should also
 fix ccb leak on errors - e.g. nvme_ns_dobio() calls just nvme_ccb_put()
 when bus_dmamap_load() fails, so q_nccbs_avail stays decremented from
 nvme_ccb_get().

 Jaromir

 --001a11444b4424f9f10567665ac6
 Content-Type: text/html; charset="UTF-8"
 Content-Transfer-Encoding: quoted-printable

 <div dir=3D"ltr"><br>2017-12-01 19:45 GMT+01:00 Chuck Silvers &lt;<a href=
 =3D"mailto:chuq@chuq.com">chuq@chuq.com</a>&gt;:<br>&gt; =C2=A0 q_nccbs =3D=
  0x20,<br>&gt; =C2=A0 q_nccbs_avail =3D 0x21,<br><br><div>This is highly su=
 spicious, q_nccbs_avail should always be &lt;=3D q_nccbs. Good that the dri=
 ver deadlocked, it would panic in nvme_ccb_get() once it would try to get t=
 he nonexisting 33th ccb from queue :)</div><div><br></div><div>This got me =
 thinking though. If the completion queue is being processed, we currently d=
 on&#39;t reset q_nccbs_avail until after all finished ccbs are processed. W=
 hile this is running, any further I/O would be skipped with EAGAIN, if all =
 ccbs were taken and q_nccbs_avail was 0. When the ccb counter is reset on t=
 he end of nvme_q_complete(), there is no outstanding I/O any more which wou=
 ld trigger another lddone() and do the queue drain, so the driver ceases to=
  process anything. This scenario matches the described symtoms quite well.<=
 /div><div><br></div><div>Can you please try patch from=C2=A0<a href=3D"http=
 ://www.netbsd.org/~jdolecek/nvme_avail_put.diff">http://www.netbsd.org/~jdo=
 lecek/nvme_avail_put.diff</a> ?</div><div><br></div><div>It&#39;s compile t=
 ested only, so might need some tweaks. The idea is to reset the ccb counter=
  immediatelly, so lddone() would be able to queue another I/O while the com=
 pletion queue is being still processed. This should also fix ccb leak on er=
 rors - e.g. nvme_ns_dobio() calls just nvme_ccb_put() when bus_dmamap_load(=
 ) fails, so q_nccbs_avail stays decremented from nvme_ccb_get().</div><div>=
 <br></div><div>Jaromir</div></div>

 --001a11444b4424f9f10567665ac6--

State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Thu, 15 Mar 2018 10:56:36 +0000
State-Changed-Why:
Possible patch provided, waiting for feedback.


From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: chs@NetBSD.org, netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: Re: kern/52769: hang with an ffs stored in an nvme device
Date: Fri, 16 Mar 2018 17:07:46 +0800 (+08)

   This message is in MIME format.  The first part should be readable text,
   while the remaining parts are likely unreadable without MIME-aware tools.

 --0-648942362-1521191266=:10795
 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
 Content-Transfer-Encoding: QUOTED-PRINTABLE

 On Wed, 14 Mar 2018, Jarom=C3=ADr Dole=C4~Mek wrote:

 > >   q_nccbs =3D 0x20,
 > >   q_nccbs_avail =3D 0x21,
 >
 > This is highly suspicious, q_nccbs_avail should always be <=3D q_nccbs. G=
 ood
 > that the driver deadlocked, it would panic in nvme_ccb_get() once it woul=
 d
 > try to get the nonexisting 33th ccb from queue :)
 >
 > This got me thinking though. If the completion queue is being processed, =
 we
 > currently don't reset q_nccbs_avail until after all finished ccbs are
 > processed. While this is running, any further I/O would be skipped with
 > EAGAIN, if all ccbs were taken and q_nccbs_avail was 0. When the ccb
 > counter is reset on the end of nvme_q_complete(), there is no outstanding
 > I/O any more which would trigger another lddone() and do the queue drain,
 > so the driver ceases to process anything. This scenario matches the
 > described symtoms quite well.
 >
 > Can you please try patch from
 > http://www.netbsd.org/~jdolecek/nvme_avail_put.diff ?

 Initial testing with this patch is looking good.  I'm currently running=20
 a 'cvs update' against the same tree in which I'm running a "build.sh=20
 -j24 release" and so far no hang.

 > It's compile tested only, so might need some tweaks. The idea is to reset
 > the ccb counter immediatelly, so lddone() would be able to queue another
 > I/O while the completion queue is being still processed. This should also
 > fix ccb leak on errors - e.g. nvme_ns_dobio() calls just nvme_ccb_put()
 > when bus_dmamap_load() fails, so q_nccbs_avail stays decremented from
 > nvme_ccb_get().

 Just based on reading the patch, it would appear to make sense to commit=20
 even if it doesn't completely fix the hang.  The patch might not be=20
 "sufficient" but it would appear to be "necessary".  :)


 +------------------+--------------------------+----------------------------=
 +
 | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:          =
 |
 | (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   =
 |
 | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org =
 |
 +------------------+--------------------------+----------------------------=
 +
 --0-648942362-1521191266=:10795--

From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52769 CVS commit: src/sys/dev/ic
Date: Fri, 16 Mar 2018 18:49:18 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Fri Mar 16 18:49:18 UTC 2018

 Modified Files:
 	src/sys/dev/ic: nvme.c nvmevar.h

 Log Message:
 stop using q_nccbs_avail for deciding whether there are available ccbs;
 no need to maintain a counter _and_ q_ccb_list

 this fixes deadlock when all ccbs happen to be taken before completion
 interrupt - nvme_q_complete() increased q_nccbs_avail only after
 processing all the completed commands, by then there was nothing
 left to actually kick the disk queue again into action

 this also fixes ccb leak on command errors e.g. with bus_dmamem_alloc()
 or bus_dmamel_load() - q_nccbs_avail was never decreased on the error path

 fixes PR kern/52769 by Martin Husemann, thanks to Paul Goyette
 for testing


 To generate a diff of this commit:
 cvs rdiff -u -r1.32 -r1.33 src/sys/dev/ic/nvme.c
 cvs rdiff -u -r1.13 -r1.14 src/sys/dev/ic/nvmevar.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: chs->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Fri, 16 Mar 2018 18:54:34 +0000
Responsible-Changed-Why:
I've committed a fix.


State-Changed-From-To: feedback->pending-pullups
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Fri, 16 Mar 2018 18:54:34 +0000
State-Changed-Why:
The variant of proposed patch was committed. While there might be other
bugs, after the change this particular problem does not happen any more.
Requested pullup to netbsd-8, ticket #636


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52769 CVS commit: [netbsd-8] src/sys/dev/ic
Date: Sat, 17 Mar 2018 08:11:18 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sat Mar 17 08:11:18 UTC 2018

 Modified Files:
 	src/sys/dev/ic [netbsd-8]: nvme.c nvmevar.h

 Log Message:
 Pull up following revision(s) (requested by jdolecek in ticket #636):
 	sys/dev/ic/nvme.c: revision 1.33
 	sys/dev/ic/nvmevar.h: revision 1.14
 stop using q_nccbs_avail for deciding whether there are available ccbs;
 no need to maintain a counter _and_ q_ccb_list
 this fixes deadlock when all ccbs happen to be taken before completion
 interrupt - nvme_q_complete() increased q_nccbs_avail only after
 processing all the completed commands, by then there was nothing
 left to actually kick the disk queue again into action
 this also fixes ccb leak on command errors e.g. with bus_dmamem_alloc()
 or bus_dmamel_load() - q_nccbs_avail was never decreased on the error path
 fixes PR kern/52769 by Martin Husemann, thanks to Paul Goyette
 for testing


 To generate a diff of this commit:
 cvs rdiff -u -r1.30 -r1.30.2.1 src/sys/dev/ic/nvme.c
 cvs rdiff -u -r1.13 -r1.13.6.1 src/sys/dev/ic/nvmevar.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sat, 17 Mar 2018 09:13:44 +0000
State-Changed-Why:
Pullup done. Thanks!


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.