NetBSD Problem Report #52769
From martin@aprisoft.de Tue Nov 28 14:59:32 2017
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id E0F357A187
for <gnats-bugs@gnats.NetBSD.org>; Tue, 28 Nov 2017 14:59:31 +0000 (UTC)
Message-Id: <20171128145921.5BA555CC761@emmas.aprisoft.de>
Date: Tue, 28 Nov 2017 15:59:21 +0100 (CET)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: hang with an ffs stored in an nvme device
X-Send-Pr-Version: 3.95
>Number: 52769
>Category: kern
>Synopsis: hang with an ffs stored in an nvme device
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: jdolecek
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Nov 28 15:00:00 +0000 2017
>Closed-Date: Sat Mar 17 09:13:44 +0000 2018
>Last-Modified: Sat Mar 17 09:13:44 +0000 2018
>Originator: Martin Husemann
>Release: NetBSD 8.99.7
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD seven-days-to-the-wolves.aprisoft.de 8.99.7 NetBSD 8.99.7 (GENERIC) #177: Thu Nov 23 09:03:25 CET 2017 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
I got two cvs update process hanging hard (unkillable) in this state:
Memory: 7098M Act, 385M Inact, 11M Wired, 12M Exec, 7439M File, 6607M Free
Swap:
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
0 root 0 0 0K 25M CPU/5 1:23 0.00% 0.00% [system]
22820 martin 95 0 47M 8952K bioloc/5 0:15 0.00% 0.00% cvs
18588 martin 127 0 48M 9124K bioloc/2 0:08 0.00% 0.00% cvs
Tried to reboot, which didn't work.
So I dropped into ddb and got:
Stopped in pid 0.2 (system) at netbsd:breakpoint+0x5: leave
db{0}> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
29181 1 3 0 0 ffffe403e1d3c200 reboot tstile
3481 1 3 0 80 ffffe40368d5e080 tcsh ttyraw
18588 1 3 2 1000000 ffffe4040092db80 cvs biolock
14652 1 3 3 80 ffffe403be513640 sh wait
13756 1 3 1 80 ffffe40397f5f460 ssh-agent select
7734 1 3 0 80 ffffe4038e491460 tcsh pause
22820 1 3 5 1000000 ffffe403ccf032c0 cvs biolock
db{0}> bt /a 0xffffe4040092db80
trace: pid 18588 lid 1 at 0xffffe4011e02b920
sleepq_block() at netbsd:sleepq_block+0x97
cv_timedwait() at netbsd:cv_timedwait+0xa9
bbusy() at netbsd:bbusy+0x7b
getblk() at netbsd:getblk+0x5a
bio_doread() at netbsd:bio_doread+0x1d
bread() at netbsd:bread+0x1a
ffs_update.part.3() at netbsd:ffs_update.part.3+0x112
ffs_truncate() at netbsd:ffs_truncate+0xd91
ufs_truncate_retry() at netbsd:ufs_truncate_retry+0x79
ufs_inactive() at netbsd:ufs_inactive+0x1ac
VOP_INACTIVE() at netbsd:VOP_INACTIVE+0x33
vrelel() at netbsd:vrelel+0x168
ufs_remove() at netbsd:ufs_remove+0xab
VOP_REMOVE() at netbsd:VOP_REMOVE+0x37
do_sys_unlinkat.isra.5() at netbsd:do_sys_unlinkat.isra.5+0x1eb
syscall() at netbsd:syscall+0x1bc
--- syscall (number 10) ---
7a93d4312a7a:
and:
db{0}> bt /a ffffe403ccf032c0
trace: pid 22820 lid 1 at 0xffffe4011de34820
sleepq_block() at netbsd:sleepq_block+0x97
cv_timedwait() at netbsd:cv_timedwait+0xa9
bbusy() at netbsd:bbusy+0x7b
getblk() at netbsd:getblk+0x5a
bio_doread() at netbsd:bio_doread+0x1d
bread() at netbsd:bread+0x1a
ffs_alloccg() at netbsd:ffs_alloccg+0xcc
ffs_hashalloc() at netbsd:ffs_hashalloc+0x28
ffs_alloc() at netbsd:ffs_alloc+0x11e
ffs_balloc() at netbsd:ffs_balloc+0x570
ufs_mkdir() at netbsd:ufs_mkdir+0x22b
VOP_MKDIR() at netbsd:VOP_MKDIR+0x3b
do_sys_mkdirat.isra.7.constprop.12() at netbsd:do_sys_mkdirat.isra.7.constprop.1
2+0x167
syscall() at netbsd:syscall+0x1bc
--- syscall (number 136) ---
78c2024a686a:
The cvses where updating different trees on the same file system.
Got a crash dump and netbsd.gdb, if anyone is interestd (internal
note: seven/netbsd.0.*)
>How-To-Repeat:
n/a
>Fix:
n/a
>Release-Note:
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/52769: Deadlock in ffs?
Date: Wed, 29 Nov 2017 08:47:27 +0100
Chuck pointed out that the threads are just waiting on buffers to be unlocked,
so this is likely more a case of a missed interrupt (or similar).
The cvs updates ran on:
/dev/ld0a on /work type ffs (noatime, local)
that is:
nvme0 at pci2 dev 0 function 0: vendor 144d product a802 (rev. 0x01)
nvme0: NVMe 1.1
nvme0: interrupting at ioapic0 pin 16
nvme0: SAMSUNG MZVPV256HDGL-00000, firmware BXW7300Q, serial S1XWNYAH408386
ld0 at nvme0 nsid 1
and a full dmesg is below.
Martin
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 8.99.7 (GENERIC) #177: Thu Nov 23 09:03:25 CET 2017
martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/amd64/compile/GENERIC
total memory = 16382 MB
avail memory = 15883 MB
timecounter: Timecounters tick every 10.000 msec
Kernelized RAIDframe activated
running cgd selftest aes-xts-256 aes-xts-512 done
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
System manufacturer System Product Name (System Version)
mainbus0 (root)
ACPI: RSDP 0x00000000000FBAD0 000024 (v02 ACPIAM)
ACPI: XSDT 0x00000000CFE90100 00005C (v01 030811 XSDT1042 20110308 MSFT 00000097)
ACPI: FACP 0x00000000CFE90290 0000F4 (v03 030811 FACP1042 20110308 MSFT 00000097)
Firmware Warning (ACPI): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20171110/tbfadt-642)
ACPI: DSDT 0x00000000CFE90450 00E040 (v01 A1585 A1585000 00000000 INTL 20060113)
ACPI: FACS 0x00000000CFEA8000 000040
ACPI: APIC 0x00000000CFE90390 00007C (v01 030811 APIC1042 20110308 MSFT 00000097)
ACPI: MCFG 0x00000000CFE90410 00003C (v01 030811 OEMMCFG 20110308 MSFT 00000097)
ACPI: OEMB 0x00000000CFEA8040 000072 (v01 030811 OEMB1042 20110308 MSFT 00000097)
ACPI: SRAT 0x00000000CFE9F8A0 000108 (v01 AMD FAM_F_10 00000002 AMD 00000001)
ACPI: HPET 0x00000000CFE9F9B0 000038 (v01 030811 OEMHPET 20110308 MSFT 00000097)
ACPI: SSDT 0x00000000CFE9F9F0 000DA4 (v01 A M I POWERNOW 00000001 AMD 00000001)
ACPI: Executed 3 blocks of module-level executable AML code
ACPI: 2 ACPI AML tables successfully acquired and loaded
ioapic0 at mainbus0 apid 6: pa 0xfec00000, version 0x21, 24 pins
cpu0 at mainbus0 apid 0
cpu0: AMD Phenom(tm) II X6 1075T Processor, id 0x100fa0
cpu0: package 0, core 0, smt 0
cpu1 at mainbus0 apid 1
cpu1: AMD Phenom(tm) II X6 1075T Processor, id 0x100fa0
cpu1: package 0, core 1, smt 0
cpu2 at mainbus0 apid 2
cpu2: AMD Phenom(tm) II X6 1075T Processor, id 0x100fa0
cpu2: package 0, core 2, smt 0
cpu3 at mainbus0 apid 3
cpu3: AMD Phenom(tm) II X6 1075T Processor, id 0x100fa0
cpu3: package 0, core 3, smt 0
cpu4 at mainbus0 apid 4
cpu4: AMD Phenom(tm) II X6 1075T Processor, id 0x100fa0
cpu4: package 0, core 4, smt 0
cpu5 at mainbus0 apid 5
cpu5: AMD Phenom(tm) II X6 1075T Processor, id 0x100fa0
cpu5: package 0, core 5, smt 0
acpi0 at mainbus0: Intel ACPICA 20171110
acpi0: X/RSDT: OemId <030811,XSDT1042,20110308>, AslId <MSFT,00000097>
acpi0: MCFG: segment 0, bus 0-255, address 0x00000000e0000000
acpi0: SCI interrupting at int 9
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
hpet0 at acpi0: high precision event timer (mem 0xfed00000-0xfed00400)
timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
acpiec0 at acpi0 (EC0, PNP0C09): io 0x62,0x66
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x43 irq 0
pcppi1 at acpi0 (SPKR, PNP0800): io 0x61
spkr0 at pcppi1: PC Speaker
wsbell at spkr0 not configured
midi0 at pcppi1: PC speaker
sysbeep0 at pcppi1
UAR1 (PNP0501) at acpi0 not configured
aibs0 at acpi0 (ASOC, ATK0110-16843024): ASUSTeK AI Booster
OMSC (PNP0C02) at acpi0 not configured
RMSC (PNP0C02) at acpi0 not configured
SIOR (PNP0C02) at acpi0 not configured
PCIE (PNP0C02) at acpi0 not configured
RMEM (PNP0C01) at acpi0 not configured
acpibut0 at acpi0 (PWRB, PNP0C0C-170): ACPI Power Button
acpiwmi0 at acpi0 (AOD, PNP0C14-0): ACPI WMI Interface
acpiwmibus at acpiwmi0 not configured
attimer1: attached to pcppi1
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: vendor 1002 product 5957 (rev. 0x00)
ppb0 at pci0 dev 2 function 0: vendor 1002 product 5978 (rev. 0x00)
ppb0: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x16 @ 5.0GT/s
ppb0: link is x2 @ 2.5GT/s
pci1 at ppb0 bus 6
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
nouveau0 at pci1 dev 0 function 0: vendor 10de product 0e22 (rev. 0xa1)
nouveau0: interrupting at ioapic0 pin 18 (nouveau)
Zone kernel: Available graphics memory: 5488618 kiB
Zone dma32: Available graphics memory: 2097152 kiB
00001030
00010130
00002261
drm: Supports vblank timestamp caching Rev 2 (21.10.2013).
drm: Driver supports precise vblank timestamp query.
nouveau0: info: No connectors reported connected with modes
drm: Cannot find any crtc or sizes - going 1024x768
nouveaufb0 at nouveau0
nouveau0: info: registered panic notifier
nouveaufb0: framebuffer at 0xffff8001434bf000, size 1024x768, depth 32, stride 4096
wsdisplay0 at nouveaufb0 kbdmux 1
wsmux1: connecting to wsdisplay0
hdaudio0 at pci1 dev 0 function 1: HD Audio Controller
hdaudio0: interrupting at ioapic0 pin 19
hdafg0 at hdaudio0: vendor 10de product 0012
hdafg0: DP00 8ch: Digital Out [Jack]
hdafg0: 8ch/0ch 48000Hz PCM16*
hdafg1 at hdaudio0: vendor 10de product 0012
hdafg1: DP00 8ch: Digital Out [Jack]
hdafg1: 8ch/0ch 48000Hz PCM16*
hdafg2 at hdaudio0: vendor 10de product 0012
hdafg2: DP00 8ch: Digital Out [Jack]
hdafg2: 8ch/0ch 48000Hz PCM16*
hdafg3 at hdaudio0: vendor 10de product 0012
hdafg3: DP00 8ch: Digital Out [Jack]
hdafg3: 8ch/0ch 48000Hz PCM16*
ppb1 at pci0 dev 4 function 0: vendor 1002 product 597a (rev. 0x00)
ppb1: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x4 @ 5.0GT/s
pci2 at ppb1 bus 5
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
nvme0 at pci2 dev 0 function 0: vendor 144d product a802 (rev. 0x01)
nvme0: NVMe 1.1
nvme0: interrupting at ioapic0 pin 16
nvme0: SAMSUNG MZVPV256HDGL-00000, firmware BXW7300Q, serial S1XWNYAH408386
ld0 at nvme0 nsid 1
ld0: 238 GB, 31130 cyl, 255 head, 63 sec, 512 bytes/sect x 500118192 sectors
ppb2 at pci0 dev 9 function 0: vendor 1002 product 597e (rev. 0x00)
ppb2: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x2 @ 5.0GT/s
ppb2: link is x1 @ 2.5GT/s
pci3 at ppb2 bus 4
pci3: i/o space, memory space enabled, rd/line, wr/inv ok
jmide0 at pci3 dev 0 function 0: vendor 197b product 2361 (rev. 0x10)
jmide0: 1 PATA port, 1 SATA port
jmide0: interrupting at ioapic0 pin 17
ahcisata0 at jmide0
ahcisata0: AHCI revision 1.10, 1 port, 32 slots, CAP 0xc722ff00<PSC,SSC,PMD,SPM,ISS=0x2=Gen2,SCLO,SAL,SALP,SNCQ,S64A>
atabus0 at ahcisata0 channel 0
jmide0: PCI IDE interface used
jmide0: bus-master DMA support present
jmide0: primary channel wired to native-PCI mode
jmide0: primary channel is unused
jmide0: secondary channel wired to native-PCI mode
jmide0: secondary channel is PATA
atabus1 at jmide0 channel 1
ppb3 at pci0 dev 10 function 0: vendor 1002 product 597f (rev. 0x00)
ppb3: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x2 @ 5.0GT/s
ppb3: link is x1 @ 5.0GT/s
pci4 at ppb3 bus 3
pci4: i/o space, memory space enabled, rd/line, wr/inv ok
xhci0 at pci4 dev 0 function 0: vendor 1033 product 0194 (rev. 0x03)
xhci0: interrupting at ioapic0 pin 18
xhci0: xHCI version 0.96
usb0 at xhci0: USB revision 3.0
usb1 at xhci0: USB revision 2.0
ahcisata1 at pci0 dev 17 function 0: vendor 1002 product 4391 (rev. 0x40)
ahcisata1: interrupting at ioapic0 pin 19
ahcisata1: 64-bit DMA
ahcisata1: AHCI revision 1.20, 6 ports, 32 slots, CAP 0xf732ff05<PSC,SSC,PMD,SPM,ISS=0x3=Gen3,SCLO,SAL,SALP,SMPS,SSNTF,SNCQ,S64A>
atabus2 at ahcisata1 channel 0
atabus3 at ahcisata1 channel 1
atabus4 at ahcisata1 channel 2
atabus5 at ahcisata1 channel 3
atabus6 at ahcisata1 channel 4
atabus7 at ahcisata1 channel 5
ohci0 at pci0 dev 18 function 0: vendor 1002 product 4397 (rev. 0x00)
ohci0: interrupting at ioapic0 pin 18
ohci0: OHCI version 1.0, legacy support
usb2 at ohci0: USB revision 1.0
ehci0 at pci0 dev 18 function 2: vendor 1002 product 4396 (rev. 0x00)
ehci0: interrupting at ioapic0 pin 17
ehci0: dropped intr workaround enabled
ehci0: EHCI version 1.0
ehci0: 1 companion controller, 5 ports: ohci0
usb3 at ehci0: USB revision 2.0
ohci1 at pci0 dev 19 function 0: vendor 1002 product 4397 (rev. 0x00)
ohci1: interrupting at ioapic0 pin 18
ohci1: OHCI version 1.0, legacy support
usb4 at ohci1: USB revision 1.0
ehci1 at pci0 dev 19 function 2: vendor 1002 product 4396 (rev. 0x00)
ehci1: interrupting at ioapic0 pin 17
ehci1: dropped intr workaround enabled
ehci1: EHCI version 1.0
ehci1: 1 companion controller, 5 ports: ohci1
usb5 at ehci1: USB revision 2.0
piixpm0 at pci0 dev 20 function 0: vendor 1002 product 4385 (rev. 0x42)
piixpm0: polling (SB800)
iic0 at piixpm0: I2C bus
iic1 at piixpm0: I2C bus
iic2 at piixpm0: I2C bus
iic3 at piixpm0: I2C bus
hdaudio1 at pci0 dev 20 function 2: HD Audio Controller
hdaudio1: interrupting at ioapic0 pin 16
hdafg4 at hdaudio1: vendor 1106 product 0440
hdafg4: DAC00 8ch: Speaker [Jack], HP Out [Jack]
hdafg4: ADC01 2ch: Line In [Jack], Mic In [Jack]
hdafg4: HDMI02 2ch: Digital Out [Jack]
hdafg4: DIG03 2ch: SPDIF Out [Jack]
hdafg4: 8ch/2ch 48000Hz PCM16*
audio0 at hdafg4: full duplex, playback, capture, mmap, independent
hdafg4: Virtual format configured - Format SLINEAR, precision 16, channels 2, frequency 48000
hdafg4: Latency: 128 milliseconds
spkr1 at audio0: PC Speaker (synthesized)
wsbell at spkr1 not configured
pcib0 at pci0 dev 20 function 3: vendor 1002 product 439d (rev. 0x40)
ppb4 at pci0 dev 20 function 4: vendor 1002 product 4384 (rev. 0x40)
pci5 at ppb4 bus 2
pci5: i/o space, memory space enabled
fwohci0 at pci5 dev 8 function 0: vendor 1106 product 3044 (rev. 0xc0)
fwohci0: interrupting at ioapic0 pin 20
fwohci0: OHCI version 1.10 (ROM=1)
fwohci0: No. of Isochronous channels is 4.
fwohci0: EUI64 00:1f:c6:00:00:0a:ca:1a
fwohci0: Phy 1394a available S400, 2 ports.
fwohci0: Link S400, max_rec 2048 bytes.
ieee1394if0 at fwohci0: IEEE1394 bus
fwip0 at ieee1394if0: IP over IEEE1394
fwohci0: Initiate bus reset
ohci2 at pci0 dev 20 function 5: vendor 1002 product 4399 (rev. 0x00)
ohci2: interrupting at ioapic0 pin 18
ohci2: OHCI version 1.0, legacy support
usb6 at ohci2: USB revision 1.0
ppb5 at pci0 dev 21 function 0: vendor 1002 product 43a0 (rev. 0x00)
ppb5: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x1 @ 5.0GT/s
ppb5: link is x1 @ 2.5GT/s
pci6 at ppb5 bus 1
pci6: i/o space, memory space enabled, rd/line, wr/inv ok
re0 at pci6 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet (rev. 0x06)
re0: interrupting at ioapic0 pin 16
re0: Ethernet address bc:ae:c5:46:16:58
re0: using 256 tx descriptors
rgephy0 at re0 phy 7: RTL8169S/8110S/8211 1000BASE-T media interface, rev. 4
rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
ohci3 at pci0 dev 22 function 0: vendor 1002 product 4397 (rev. 0x00)
ohci3: interrupting at ioapic0 pin 18
ohci3: OHCI version 1.0, legacy support
usb7 at ohci3: USB revision 1.0
ehci2 at pci0 dev 22 function 2: vendor 1002 product 4396 (rev. 0x00)
ehci2: interrupting at ioapic0 pin 17
ehci2: dropped intr workaround enabled
ehci2: EHCI version 1.0
ehci2: 1 companion controller, 4 ports: ohci3
usb8 at ehci2: USB revision 2.0
pchb1 at pci0 dev 24 function 0: vendor 1022 product 1200 (rev. 0x00)
pchb2 at pci0 dev 24 function 1: vendor 1022 product 1201 (rev. 0x00)
pchb3 at pci0 dev 24 function 2: vendor 1022 product 1202 (rev. 0x00)
amdnb_misc0 at pci0 dev 24 function 3: AMD NB Misc Configuration
amdtemp0 at amdnb_misc0: AMD CPU Temperature Sensors (Family10h)
pchb4 at pci0 dev 24 function 4: vendor 1022 product 1204 (rev. 0x00)
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
pckbc0 at isa0 port 0x60-0x64
acpicpu0 at cpu0: ACPI CPU
acpicpu0: C1: HLT, lat 0 us, pow 0 mW
acpicpu0: C2: I/O, lat 75 us, pow 1 mW
acpicpu0: P0: FFH, lat 4 us, pow 19507 mW, 3000 MHz
acpicpu0: P1: FFH, lat 4 us, pow 14500 mW, 2300 MHz
acpicpu0: P2: FFH, lat 4 us, pow 10535 mW, 1600 MHz
acpicpu0: P3: FFH, lat 4 us, pow 6210 mW, 800 MHz
acpicpu0: T0: I/O, lat 1 us, pow 0 mW, 100 %
acpicpu0: T1: I/O, lat 1 us, pow 0 mW, 88 %
acpicpu0: T2: I/O, lat 1 us, pow 0 mW, 76 %
acpicpu0: T3: I/O, lat 1 us, pow 0 mW, 64 %
acpicpu0: T4: I/O, lat 1 us, pow 0 mW, 52 %
acpicpu0: T5: I/O, lat 1 us, pow 0 mW, 40 %
acpicpu0: T6: I/O, lat 1 us, pow 0 mW, 28 %
acpicpu0: T7: I/O, lat 1 us, pow 0 mW, 16 %
acpicpu1 at cpu1: ACPI CPU
acpicpu2 at cpu2: ACPI CPU
acpicpu3 at cpu3: ACPI CPU
acpicpu4 at cpu4: ACPI CPU
acpicpu5 at cpu5: ACPI CPU
fwohci0: BUS reset
fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
ieee1394if0: 1 nodes, maxhop <= 0 cable IRM irm(0) (me)
ieee1394if0: bus manager 0
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
timecounter: Timecounter "TSC" frequency 3010199480 Hz quality 3000
uhub0 at usb0: vendor 1033 (0x1033) xHCI Root Hub (0000), class 9/0, rev 1.00/1.00, addr 0
uhub0: 2 ports with 2 removable, self powered
uhub1 at usb1: vendor 1033 (0x1033) xHCI Root Hub (0000), class 9/0, rev 2.00/1.00, addr 0
uhub1: 2 ports with 2 removable, self powered
IPsec: Initialized Security Association Processing.
uhub2 at usb2: vendor 1002 (0x1002) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
uhub2: 5 ports with 5 removable, self powered
uhub3 at usb5: vendor 1002 (0x1002) EHCI root hub (0000), class 9/0, rev 2.00/1.00, addr 1
uhub3: 5 ports with 5 removable, self powered
uhub4 at usb6: vendor 1002 (0x1002) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
uhub4: 2 ports with 2 removable, self powered
ahcisata1 port 2: device present, speed: 1.5Gb/s
uhub5 at usb7: vendor 1002 (0x1002) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
uhub5: 4 ports with 4 removable, self powered
uhub6 at usb8: vendor 1002 (0x1002) EHCI root hub (0000), class 9/0, rev 2.00/1.00, addr 1
uhub6: 4 ports with 4 removable, self powered
uhub7 at usb4: vendor 1002 (0x1002) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
uhub7: 5 ports with 5 removable, self powered
ahcisata1 port 0: device present, speed: 6.0Gb/s
uhub8 at usb3: vendor 1002 (0x1002) EHCI root hub (0000), class 9/0, rev 2.00/1.00, addr 1
uhub8: 5 ports with 5 removable, self powered
wd0 at atabus2 drive 0
wd0: <ST2000DL003-9VT166>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 1863 GB, 3876021 cyl, 16 head, 63 sec, 512 bytes/sect x 3907029168 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133), WRITE DMA FUA, NCQ (32 tags)
wd0(ahcisata1:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA), NCQ (31 tags)
atapibus0 at atabus4: 1 targets
cd0 at atapibus0 drive 0: <DRW-24D5MT, K28H2EA3236, 1.10> cdrom removable
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
cd0(ahcisata1:2:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)
boot device: wd0
root on wd0a dumps on wd0b
root file system type: ffs
kern.module.path=/stand/amd64/8.99.7/modules
uplcom0 at uhub2 port 1
uplcom0: Prolific Technology Inc. (0x67b) USB-Serial Controller (0x2303), rev 1.10/3.00, addr 2
ucom0 at uplcom0
cd0(ahcisata1:2:0): DEFERRED ERROR, key = 0x2
uplcom1 at uhub2 port 2
uplcom1: Prolific Technology Inc. (0x67b) USB 2.0 To COM Device (0x2303), rev 1.10/3.00, addr 3
ucom1 at uplcom1
uplcom2 at uhub2 port 3
uplcom2: Prolific Technology Inc. (0x67b) USB-Serial Controller (0x2303), rev 1.10/3.00, addr 4
ucom2 at uplcom2
wsdisplay0: screen 0 added (default, vt100 emulation)
wsdisplay0: screen 1 added (default, vt100 emulation)
wsdisplay0: screen 2 added (default, vt100 emulation)
wsdisplay0: screen 3 added (default, vt100 emulation)
wsdisplay0: screen 4 added (default, vt100 emulation)
From: Nick Hudson <skrll@netbsd.org>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: kern/52769: Deadlock in ffs?
Date: Wed, 29 Nov 2017 12:46:11 +0000
This reminds me of the pool page problem on erlite where not all the memory was
available as direct map (and therefore PMAP_MAP_POOLPAGEable) meaning that
in low memory situations the pool wouldn't grow. A direct map page never
became available.
I fixed this by making all erlite memory available as direct map via xkseg
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/mips/mips/pmap_machdep.c.diff?r1=1.20&r2=1.21
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/mips/include/vmparam.h.diff?r1=1.59&r2=1.60
Nick
From: Chuck Silvers <chuq@chuq.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/52769: hang with an ffs stored in an nvme device
Date: Fri, 1 Dec 2017 10:45:09 -0800
I got a copy of the dump from martin and here are some things I found.
the two buffers that he pointed out are for different parts of the same device:
(gdb) p *(struct buf *)0xffffe4041d8f7240
$4 = {
b_u = {
u_actq = {
tqe_next = 0xffffe4040e6b6c68,
tqe_prev = 0xffffe4029c033ea0
},
u_rbnode = {
rb_nodes = {0xffffe4040e6b6c68, 0xffffe4029c033ea0},
rb_info = 0x0
},
u_work = {
wk_dummy = 0xffffe4040e6b6c68
}
},
b_iodone = 0x0,
b_error = 0x0,
b_resid = 0x388e,
b_flags = 0x404,
#define B_ASYNC 0x00000004 /* Start I/O, do not wait. */
#define B_COWDONE 0x00000400 /* Copy-on-write already done. */
b_prio = 0x1,
b_bufsize = 0x8000,
b_bcount = 0x8000,
b_dev = 0x1300,
b_data = 0xffff800148420000,
b_blkno = 0xddd06c0,
b_rawblkno = 0xddd0ec0,
b_proc = 0x0,
b_saveaddr = 0x0,
b_private = 0x0,
b_dcookie = 0x0,
b_busy = {
cv_opaque = {0xffffe4040092db80, 0xffffe4040092dc90, 0xffffffff8116c394}
},
b_refcnt = 0x2,
b_unused = 0x0,
b_hash = {
le_next = 0x0,
le_prev = 0xffffe403f8913f40
},
b_vnbufs = {
le_next = 0xffffe4029c033ea0,
le_prev = 0xffffe4040e6b6d18
},
b_freelist = {
tqe_next = 0xffffe4041739a6c0,
tqe_prev = 0xffffe403b883c308
},
b_wapbllist = {
tqe_next = 0x0,
tqe_prev = 0x0
},
b_lblkno = 0xddd06c0,
b_freelistindex = 0x1,
b_cflags = 0x4800010,
#define BC_BUSY 0x00000010 /* I/O in progress. */
#define BC_WANTED 0x00800000 /* Process wants this buffer. */
#define BC_VFLUSH 0x04000000 /* Buffer is being synced. */
b_vp = 0xffffe4042b5b2bd0,
b_done = {
cv_opaque = {0x0, 0xffffe4041d8f7338, 0xffffffff8116c39c}
},
b_oflags = 0x0,
b_objlock = 0xffffe4042b619640
}
(gdb) p *(struct buf *)0xffffe40294774480
$5 = {
b_u = {
u_actq = {
tqe_next = 0xffffe403b48ac368,
tqe_prev = 0xffffe403b48ac488
},
u_rbnode = {
rb_nodes = {0xffffe403b48ac368, 0xffffe403b48ac488},
rb_info = 0x0
},
u_work = {
wk_dummy = 0xffffe403b48ac368
}
},
b_iodone = 0x0,
b_error = 0x0,
b_resid = 0x49ec,
b_flags = 0x404,
#define B_ASYNC 0x00000004 /* Start I/O, do not wait. */
#define B_COWDONE 0x00000400 /* Copy-on-write already done. */
b_prio = 0x1,
b_bufsize = 0x8000,
b_bcount = 0x8000,
b_dev = 0x1300,
b_data = 0xffff80014cfb0000,
b_blkno = 0x121f1100,
b_rawblkno = 0x121f1900,
b_proc = 0x0,
b_saveaddr = 0x0,
b_private = 0x0,
b_dcookie = 0x0,
b_busy = {
cv_opaque = {0xffffe403ccf032c0, 0xffffe403ccf033d0, 0xffffffff8116c394}
},
b_refcnt = 0x2,
b_unused = 0x0,
b_hash = {
le_next = 0xffffe403e8c7b480,
le_prev = 0xffff80001d75c158
},
b_vnbufs = {
le_next = 0xffffe403b48ac488,
le_prev = 0xffffe403b48ac418
},
b_freelist = {
tqe_next = 0xffffe40412c51120,
tqe_prev = 0xffffe4041739a300
},
b_wapbllist = {
tqe_next = 0x0,
tqe_prev = 0x0
},
b_lblkno = 0x121f1100,
b_freelistindex = 0x1,
b_cflags = 0x4800010,
#define BC_BUSY 0x00000010 /* I/O in progress. */
#define BC_WANTED 0x00800000 /* Process wants this buffer. */
#define BC_VFLUSH 0x04000000 /* Buffer is being synced. */
b_vp = 0xffffe4042b5b2bd0,
b_done = {
cv_opaque = {0x0, 0xffffe40294774578, 0xffffffff8116c39c}
},
b_oflags = 0x0,
b_objlock = 0xffffe4042b619640
}
b_dev = 0x1300 is "ld0a", which is associated with nvme0 as martin mentioned.
the buffers are B_ASYNC, so they will be unlocked automatically by biodone().
however the nvme driver does not have any I/O pending:
(gdb) p *((struct nvme_softc *)nvme_cd.cd_devs[0]->dv_private)->sc_admin_q
$9 = {
q_sc = 0xffff80001da34000,
q_sq_mtx = {
u = {
mtxa_owner = 0x601
}
},
q_cq_mtx = {
u = {
mtxa_owner = 0x601
}
},
q_sq_dmamem = 0xffffe4042f7ceac8,
q_cq_dmamem = 0xffffe4042f7ceaf0,
q_sqtdbl = 0x1000,
q_cqhdbl = 0x1004,
q_id = 0x0,
q_entries = 0x20,
q_sq_tail = 0x4,
q_cq_head = 0x4,
q_cq_phase = 0x1,
q_ccb_mtx = {
u = {
mtxa_owner = 0x601
}
},
q_nccbs = 0x20,
q_nccbs_avail = 0x21,
q_ccbs = 0xffffe4011d6e6000,
q_ccb_list = {
sqh_first = 0xffffe4011d6e6000,
sqh_last = 0xffffe4011d6e6e88
},
q_ccb_prpls = 0xffffe4042f7ceb18
}
(gdb) p *((struct nvme_softc *)nvme_cd.cd_devs[0]->dv_private)->sc_q[0]
$12 = {
q_sc = 0xffff80001da34000,
q_sq_mtx = {
u = {
mtxa_owner = 0x601
}
},
q_cq_mtx = {
u = {
mtxa_owner = 0x601
}
},
q_sq_dmamem = 0xffffe4042f7ceb40,
q_cq_dmamem = 0xffffe4042f7ceb68,
q_sqtdbl = 0x1008,
q_cqhdbl = 0x100c,
q_id = 0x1,
q_entries = 0x400,
q_sq_tail = 0x3b7,
q_cq_head = 0x3b7,
q_cq_phase = 0x0,
q_ccb_mtx = {
u = {
mtxa_owner = 0x601
}
},
q_nccbs = 0x3ff,
q_nccbs_avail = 0x3ff,
q_ccbs = 0xffff800020d5b000,
q_ccb_list = {
sqh_first = 0xffff800020d78cb8,
sqh_last = 0xffff800020d78f10
},
q_ccb_prpls = 0xffffe4042f7ceb90
}
the one normal queue is empty (q_nccbs == q_nccbs_avail, and q_sq_tail == q_cq_head).
the admin queue is a bit odd in that q_nccbs_avail is one greater than q_nccbs,
but it does appear to also be empty since q_sq_tail == q_cq_head.
the two buffers above, along with 554 others, are still in the bufq of ld0:
(gdb) p ((struct ld_softc *)ld_cd.cd_devs[0]->dv_private)->sc_dksc.sc_bufq
$1 = (struct bufq_state *) 0xffffe4042f686ce8
(gdb) xbufq_fcfs 0xffffe4042f686ce8
bufq 0xffffe4042f686ce8
0xffffe4025ff23b48
0xffffe4025ff23a28
0xffffe4025ff23908
0xffffe4025ff237e8
0xffffe4025ff236c8
...
0xffffe403b23b4480
0xffffe403cfef36c0
total buffers 556
ld0's dk_softc also indicates that no buffers are currently in the hands of
the underlying driver:
(gdb) p *(struct ld_softc *)ld_cd.cd_devs[0]->dv_private
$14 = {
sc_dksc = {
...
sc_bufq = 0xffffe4042f686ce8,
sc_dtype = 0xf,
sc_deferred = 0xffffe4025ff23c68,
sc_busy = 0x0,
...
sc_queuecnt = 0x0,
...
sc_maxqueuecnt = 0x3ff,
}
so the bug appears to be that the dk layer is sitting on a bunch of buffers
even though the underlying nvme driver is idle.
one problem I see is in ld_diskstart():
static int
ld_diskstart(device_t dev, struct buf *bp)
{
struct ld_softc *sc = device_private(dev);
int error;
if (sc->sc_queuecnt >= sc->sc_maxqueuecnt)
return EAGAIN;
if ((sc->sc_flags & LDF_MPSAFE) == 0)
KERNEL_LOCK(1, curlwp);
mutex_enter(&sc->sc_mutex);
if (sc->sc_queuecnt >= sc->sc_maxqueuecnt)
error = EAGAIN;
else {
error = (*sc->sc_start)(sc, bp);
if (error == 0)
sc->sc_queuecnt++;
}
mutex_exit(&sc->sc_mutex);
if ((sc->sc_flags & LDF_MPSAFE) == 0)
KERNEL_UNLOCK_ONE(curlwp);
return error;
}
the first sc_queuecnt check here is done without holding any locks,
so it could race with lddone() decrementing sc_queuecnt and return EAGAIN,
which will cause dk_start() to stop sending new I/O to the underlying
nvme driver even though the driver queue is no longer full.
lddone() calls dk_start(), but the buffer that ld_diskstart() is rejecting
is not on the bufq at that point so the lddone() thread wouldn't know about it.
just removing the unlocked check of sc_queuecnt here would fix this race.
however, it seems unlikely that this race would result in more than 1 I/O
becoming stuck in the bufq, and in this dump we have a bunch of them,
so it seems very likely that there are more problems that I haven't found yet.
the logic in dk_start() that allows two threads to be processing the bufq
in parallel looks like it could result in losing track of a buffer entirely.
if both threads get EAGAIN from d_diskstart then then will both try to
save the buffer pointer in dksc->sc_deferred, and the first one will be lost.
but that isn't the problem with the two buffers above since they are both
still on the bufq. the comment says this special hidey hole for one buffer
is intended to keep the order of I/Os the same, but that isn't needed for
correctness, and doing that in the dk layer rather than in the bufq layer
seems poor. if we want a way to undo a bufq_get() rather than just putting
the buffer back on the bufq with bufq_put(), then we should add a new API
for that rather than making callers deal with it.
(for nvme especially it would be better to not require putting all buffers
for a device on a single bufq at all, since that rather reduces the benefit
of the hardware having multiple hardware queues. but that kind of design
change is beyond the scope of the issue at hand.)
I'll keep looking for more bugs that could have caused this.
-Chuck
Responsible-Changed-From-To: kern-bug-people->chs
Responsible-Changed-By: chs@NetBSD.org
Responsible-Changed-When: Fri, 01 Dec 2017 19:01:29 +0000
Responsible-Changed-Why:
I'm looking at this.
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/52769: hang with an ffs stored in an nvme device
Date: Sat, 2 Dec 2017 09:51:03 -0000 (UTC)
chuq@chuq.com (Chuck Silvers) writes:
>the logic in dk_start() that allows two threads to be processing the bufq
>in parallel looks like it could result in losing track of a buffer entirely.
>if both threads get EAGAIN from d_diskstart then then will both try to
>save the buffer pointer in dksc->sc_deferred, and the first one will be lost.
Two threads cannot enter d_diskstart as the queue processing is skipped
for sc_busy > 1. Instead the first thread will retry the queue (including
a possible sc_deferred entry).
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52769 CVS commit: src/sys/dev
Date: Mon, 4 Dec 2017 22:15:52 +0000
Module Name: src
Committed By: jdolecek
Date: Mon Dec 4 22:15:52 UTC 2017
Modified Files:
src/sys/dev: dksubr.c
Log Message:
add KASSERT() just before setting sc_deferred, to ensure this does
not trigger problem in PR kern/52769
To generate a diff of this commit:
cvs rdiff -u -r1.100 -r1.101 src/sys/dev/dksubr.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: Chuck Silvers <chuq@chuq.com>
Cc: gnats-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/52769: hang with an ffs stored in an nvme device
Date: Mon, 4 Dec 2017 23:20:14 +0100
Hello Check,
race WRT sc_deferred shouldn't be possible as noted by Michael. But I've
added there a KASSERT() now. Can you retry with rev. 1.101 of
sys/dev/dksubr.c to rule out that as the issue?
Jaromir
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/52769
Date: Fri, 23 Feb 2018 07:40:39 +0800 (+08)
I can't add any more debug info to this, but I can say "me too". I just
encountered the same biolock hang, during a cvs update of a source tree
sitting on an nvme(4) device.
+------------------+--------------------------+----------------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+------------------+--------------------------+----------------------------+
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/52769
Date: Fri, 23 Feb 2018 08:11:33 +0800 (+08)
On Fri, 23 Feb 2018, Paul Goyette wrote:
> I can't add any more debug info to this, but I can say "me too". I just
> encountered the same biolock hang, during a cvs update of a source tree
> sitting on an nvme(4) device.
Just in case it matters, I'm on an amd64 system with MSIX interrupts:
nvme0 at pci3 dev 0 function 0: Samsung Electronics (3rd vendor ID) product a804 (rev. 0x00)
nvme0: NVMe 1.2
nvme0: for admin queue interrupting at msix3 vec 0
nvme0: Samsung SSD 960 PRO 512GB, firmware 2B6QCXP7, serial S3EWNX0K108171P
nvme0: for io queue 1 interrupting at msix3 vec 1 affinity to cpu0
nvme0: for io queue 2 interrupting at msix3 vec 2 affinity to cpu1
nvme0: for io queue 3 interrupting at msix3 vec 3 affinity to cpu2
nvme0: for io queue 4 interrupting at msix3 vec 4 affinity to cpu3
nvme0: for io queue 5 interrupting at msix3 vec 5 affinity to cpu4
nvme0: for io queue 6 interrupting at msix3 vec 6 affinity to cpu5
nvme0: for io queue 7 interrupting at msix3 vec 7 affinity to cpu6
ld0 at nvme0 nsid 1
ld0: 476 GB, 62260 cyl, 255 head, 63 sec, 512 bytes/sect x 1000215216 sectors
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/52769
Date: Thu, 1 Mar 2018 16:23:00 +0800 (+08)
I ran into this one again, just a short time ago. This time I was
able to break into ddb and force a crash dump. So if there is any
debugging that anyone wants, just let me know.
FWIW, it seems that the bug is pretty specific to the nvme drive.
All processes that were accessing the nvme drive got hung up, yet
processes accessing the root drive (on a "spinny" wd) were able to
continue without problem.
It's curious that all reported instances of this PR have involved
a "cvs update" getting stuck in biolock, while everything else just
ends up in tstile. I wonder what cvs does that is so different
from everything else?
+------------------+--------------------------+----------------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+------------------+--------------------------+----------------------------+
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: jdolecek@netbsd.org
Subject: Re: kern/52769
Date: Sun, 4 Mar 2018 13:43:31 +0800 (+08)
FWIW, I hit this PR again (third time, now) on a very up-to-date kernel.
I have confirmed that I have rev 1.101 of sys/dev/dksubr.c so that is
almost certainly not the problem
I still have a core dump available...
+------------------+--------------------------+----------------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+------------------+--------------------------+----------------------------+
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/52769
Date: Sun, 4 Mar 2018 08:40:01 +0000
On Thu, Mar 01, 2018 at 08:25:01AM +0000, Paul Goyette wrote:
> It's curious that all reported instances of this PR have involved
> a "cvs update" getting stuck in biolock, while everything else just
> ends up in tstile. I wonder what cvs does that is so different
> from everything else?
tstile is, most of the time, just what happens when things pile up
behind other more interesting deadlocks. Though not always.
cvs is probably causing it, because cvs does tons of silly metadata
operations at high speed.
Is this wapbl, or not? If not, they'll be synchronous and that might
be causing extra excitement.
--
David A. Holland
dholland@netbsd.org
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: chs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
martin@NetBSD.org
Subject: Re: kern/52769
Date: Sun, 4 Mar 2018 22:35:37 +0800 (+08)
On Sun, 4 Mar 2018, David Holland wrote:
> The following reply was made to PR kern/52769; it has been noted by GNATS.
>
> From: David Holland <dholland-bugs@netbsd.org>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/52769
> Date: Sun, 4 Mar 2018 08:40:01 +0000
>
> On Thu, Mar 01, 2018 at 08:25:01AM +0000, Paul Goyette wrote:
> > It's curious that all reported instances of this PR have involved
> > a "cvs update" getting stuck in biolock, while everything else just
> > ends up in tstile. I wonder what cvs does that is so different
> > from everything else?
>
> tstile is, most of the time, just what happens when things pile up
> behind other more interesting deadlocks. Though not always.
>
> cvs is probably causing it, because cvs does tons of silly metadata
> operations at high speed.
>
> Is this wapbl, or not? If not, they'll be synchronous and that might
> be causing extra excitement.
In my case YES, this is wapbl. Not sure about Martin's original case.
From: Martin Husemann <martin@duskware.de>
To: Paul Goyette <paul@whooppee.com>
Cc: gnats-bugs@NetBSD.org, chs@NetBSD.org
Subject: Re: kern/52769
Date: Sun, 4 Mar 2018 17:45:36 +0100
On Sun, Mar 04, 2018 at 10:35:37PM +0800, Paul Goyette wrote:
> In my case YES, this is wapbl. Not sure about Martin's original case.
No wapbl on nvme for my case. But since this is nvme driver lossage, the
file system details will only affect the chances for it to happen, not
the possibility.
Martin
From: Sevan Janiyan <venture37@geeklan.co.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/52769
Date: Sun, 4 Mar 2018 17:28:32 +0000
I do not have an nvme based system but I frequently experience deadlocks
on i386. The deadlocks are pretty catastrophic in that it clobbers the
file system.
At first I thought I was issue with the HDD, the disk I was originally
running with was the 64Gb SATA disk my X60s came with. I swapped that
out for a shiny 60GB sandisk SSD, issue continued.
The file system was mounted with the log option on both disks.
I switched to discard recently for TRIM on the SSD, system crashes.
System is running with coreboot, thought it may be because of that, I
had some oddities with DTrace previously which resolved itself after a
coreboot update, still could be this bug reports make me think that
issue lies elsewhere and I only experience these deadlocks when running CVS.
In my case, the last time it happened, I was unable to enter DDB, it
just sat there waiting. The console was responsive though.
The issue in my case happens when there is a lot of CVS activity, that
is triggered from updating from an old check out. I updated from
Decembers checkout and that took out a large part of my checkout (the
files in CVS directories get filled with snippets of other files).
The recent GCC import triggered it.
Wipping src/external, updating to regain the directory then updating
once more sometimes triggers it.
will try running / without any additional options beynod rw and see have
I go.
Sevan
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/52769
Date: Sun, 4 Mar 2018 23:51:00 -0000 (UTC)
venture37@geeklan.co.uk (Sevan Janiyan) writes:
> I switched to discard recently for TRIM on the SSD, system crashes.
I wouldn't use "discard" :-|
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: chs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
martin@NetBSD.org
Subject: Re: kern/52769: nvme lockup
Date: Mon, 5 Mar 2018 06:24:16 +0100
On Sun, Mar 04, 2018 at 05:30:01PM +0000, Sevan Janiyan wrote:
> I do not have an nvme based system but I frequently experience deadlocks
> on i386. The deadlocks are pretty catastrophic in that it clobbers the
> file system.
Can you please file a separate PR for that?
Thanks!
Martin
From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: Chuck Silvers <chuq@chuq.com>
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/52769: hang with an ffs stored in an nvme device
Date: Wed, 14 Mar 2018 22:49:39 +0100
--001a11444b4424f9f10567665ac6
Content-Type: text/plain; charset="UTF-8"
2017-12-01 19:45 GMT+01:00 Chuck Silvers <chuq@chuq.com>:
> q_nccbs = 0x20,
> q_nccbs_avail = 0x21,
This is highly suspicious, q_nccbs_avail should always be <= q_nccbs. Good
that the driver deadlocked, it would panic in nvme_ccb_get() once it would
try to get the nonexisting 33th ccb from queue :)
This got me thinking though. If the completion queue is being processed, we
currently don't reset q_nccbs_avail until after all finished ccbs are
processed. While this is running, any further I/O would be skipped with
EAGAIN, if all ccbs were taken and q_nccbs_avail was 0. When the ccb
counter is reset on the end of nvme_q_complete(), there is no outstanding
I/O any more which would trigger another lddone() and do the queue drain,
so the driver ceases to process anything. This scenario matches the
described symtoms quite well.
Can you please try patch from
http://www.netbsd.org/~jdolecek/nvme_avail_put.diff ?
It's compile tested only, so might need some tweaks. The idea is to reset
the ccb counter immediatelly, so lddone() would be able to queue another
I/O while the completion queue is being still processed. This should also
fix ccb leak on errors - e.g. nvme_ns_dobio() calls just nvme_ccb_put()
when bus_dmamap_load() fails, so q_nccbs_avail stays decremented from
nvme_ccb_get().
Jaromir
--001a11444b4424f9f10567665ac6
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><br>2017-12-01 19:45 GMT+01:00 Chuck Silvers <<a href=
=3D"mailto:chuq@chuq.com">chuq@chuq.com</a>>:<br>> =C2=A0 q_nccbs =3D=
0x20,<br>> =C2=A0 q_nccbs_avail =3D 0x21,<br><br><div>This is highly su=
spicious, q_nccbs_avail should always be <=3D q_nccbs. Good that the dri=
ver deadlocked, it would panic in nvme_ccb_get() once it would try to get t=
he nonexisting 33th ccb from queue :)</div><div><br></div><div>This got me =
thinking though. If the completion queue is being processed, we currently d=
on't reset q_nccbs_avail until after all finished ccbs are processed. W=
hile this is running, any further I/O would be skipped with EAGAIN, if all =
ccbs were taken and q_nccbs_avail was 0. When the ccb counter is reset on t=
he end of nvme_q_complete(), there is no outstanding I/O any more which wou=
ld trigger another lddone() and do the queue drain, so the driver ceases to=
process anything. This scenario matches the described symtoms quite well.<=
/div><div><br></div><div>Can you please try patch from=C2=A0<a href=3D"http=
://www.netbsd.org/~jdolecek/nvme_avail_put.diff">http://www.netbsd.org/~jdo=
lecek/nvme_avail_put.diff</a> ?</div><div><br></div><div>It's compile t=
ested only, so might need some tweaks. The idea is to reset the ccb counter=
immediatelly, so lddone() would be able to queue another I/O while the com=
pletion queue is being still processed. This should also fix ccb leak on er=
rors - e.g. nvme_ns_dobio() calls just nvme_ccb_put() when bus_dmamap_load(=
) fails, so q_nccbs_avail stays decremented from nvme_ccb_get().</div><div>=
<br></div><div>Jaromir</div></div>
--001a11444b4424f9f10567665ac6--
State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Thu, 15 Mar 2018 10:56:36 +0000
State-Changed-Why:
Possible patch provided, waiting for feedback.
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: chs@NetBSD.org, netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: Re: kern/52769: hang with an ffs stored in an nvme device
Date: Fri, 16 Mar 2018 17:07:46 +0800 (+08)
This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
--0-648942362-1521191266=:10795
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
On Wed, 14 Mar 2018, Jarom=C3=ADr Dole=C4~Mek wrote:
> > q_nccbs =3D 0x20,
> > q_nccbs_avail =3D 0x21,
>
> This is highly suspicious, q_nccbs_avail should always be <=3D q_nccbs. G=
ood
> that the driver deadlocked, it would panic in nvme_ccb_get() once it woul=
d
> try to get the nonexisting 33th ccb from queue :)
>
> This got me thinking though. If the completion queue is being processed, =
we
> currently don't reset q_nccbs_avail until after all finished ccbs are
> processed. While this is running, any further I/O would be skipped with
> EAGAIN, if all ccbs were taken and q_nccbs_avail was 0. When the ccb
> counter is reset on the end of nvme_q_complete(), there is no outstanding
> I/O any more which would trigger another lddone() and do the queue drain,
> so the driver ceases to process anything. This scenario matches the
> described symtoms quite well.
>
> Can you please try patch from
> http://www.netbsd.org/~jdolecek/nvme_avail_put.diff ?
Initial testing with this patch is looking good. I'm currently running=20
a 'cvs update' against the same tree in which I'm running a "build.sh=20
-j24 release" and so far no hang.
> It's compile tested only, so might need some tweaks. The idea is to reset
> the ccb counter immediatelly, so lddone() would be able to queue another
> I/O while the completion queue is being still processed. This should also
> fix ccb leak on errors - e.g. nvme_ns_dobio() calls just nvme_ccb_put()
> when bus_dmamap_load() fails, so q_nccbs_avail stays decremented from
> nvme_ccb_get().
Just based on reading the patch, it would appear to make sense to commit=20
even if it doesn't completely fix the hang. The patch might not be=20
"sufficient" but it would appear to be "necessary". :)
+------------------+--------------------------+----------------------------=
+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: =
|
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com =
|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org =
|
+------------------+--------------------------+----------------------------=
+
--0-648942362-1521191266=:10795--
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52769 CVS commit: src/sys/dev/ic
Date: Fri, 16 Mar 2018 18:49:18 +0000
Module Name: src
Committed By: jdolecek
Date: Fri Mar 16 18:49:18 UTC 2018
Modified Files:
src/sys/dev/ic: nvme.c nvmevar.h
Log Message:
stop using q_nccbs_avail for deciding whether there are available ccbs;
no need to maintain a counter _and_ q_ccb_list
this fixes deadlock when all ccbs happen to be taken before completion
interrupt - nvme_q_complete() increased q_nccbs_avail only after
processing all the completed commands, by then there was nothing
left to actually kick the disk queue again into action
this also fixes ccb leak on command errors e.g. with bus_dmamem_alloc()
or bus_dmamel_load() - q_nccbs_avail was never decreased on the error path
fixes PR kern/52769 by Martin Husemann, thanks to Paul Goyette
for testing
To generate a diff of this commit:
cvs rdiff -u -r1.32 -r1.33 src/sys/dev/ic/nvme.c
cvs rdiff -u -r1.13 -r1.14 src/sys/dev/ic/nvmevar.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: chs->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Fri, 16 Mar 2018 18:54:34 +0000
Responsible-Changed-Why:
I've committed a fix.
State-Changed-From-To: feedback->pending-pullups
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Fri, 16 Mar 2018 18:54:34 +0000
State-Changed-Why:
The variant of proposed patch was committed. While there might be other
bugs, after the change this particular problem does not happen any more.
Requested pullup to netbsd-8, ticket #636
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52769 CVS commit: [netbsd-8] src/sys/dev/ic
Date: Sat, 17 Mar 2018 08:11:18 +0000
Module Name: src
Committed By: martin
Date: Sat Mar 17 08:11:18 UTC 2018
Modified Files:
src/sys/dev/ic [netbsd-8]: nvme.c nvmevar.h
Log Message:
Pull up following revision(s) (requested by jdolecek in ticket #636):
sys/dev/ic/nvme.c: revision 1.33
sys/dev/ic/nvmevar.h: revision 1.14
stop using q_nccbs_avail for deciding whether there are available ccbs;
no need to maintain a counter _and_ q_ccb_list
this fixes deadlock when all ccbs happen to be taken before completion
interrupt - nvme_q_complete() increased q_nccbs_avail only after
processing all the completed commands, by then there was nothing
left to actually kick the disk queue again into action
this also fixes ccb leak on command errors e.g. with bus_dmamem_alloc()
or bus_dmamel_load() - q_nccbs_avail was never decreased on the error path
fixes PR kern/52769 by Martin Husemann, thanks to Paul Goyette
for testing
To generate a diff of this commit:
cvs rdiff -u -r1.30 -r1.30.2.1 src/sys/dev/ic/nvme.c
cvs rdiff -u -r1.13 -r1.13.6.1 src/sys/dev/ic/nvmevar.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sat, 17 Mar 2018 09:13:44 +0000
State-Changed-Why:
Pullup done. Thanks!
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.