NetBSD Problem Report #48270

From www@NetBSD.org  Fri Oct  4 05:53:18 2013
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 077F970F1C
	for <gnats-bugs@gnats.NetBSD.org>; Fri,  4 Oct 2013 05:53:18 +0000 (UTC)
Message-Id: <20131004055315.5435D710CD@mollari.NetBSD.org>
Date: Fri,  4 Oct 2013 05:53:15 +0000 (UTC)
From: m.ramakers@gmail.com
Reply-To: m.ramakers@gmail.com
To: gnats-bugs@NetBSD.org
Subject: umass / ehci effectively hangs system during stresstest
X-Send-Pr-Version: www-1.0

>Number:         48270
>Category:       kern
>Synopsis:       umass / ehci effectively hangs system during stresstest
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    skrll
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Oct 04 05:55:00 +0000 2013
>Closed-Date:    Thu Jan 05 11:21:33 +0000 2017
>Last-Modified:  Thu Jan 05 11:21:33 +0000 2017
>Originator:     Michai Ramakers
>Release:        6.0.1
>Organization:
>Environment:
NetBSD lime.LAN 6.0.1 NetBSD 6.0.1 (GENERIC) amd64
>Description:
when connecting a SATA disk in USB 2.0 enclosure, mounting it and running...

  while ( true ); do tar xzf a_linux_kernel_tree.tgz; rm -rf linux*; done

...in 3 terminals, each with different workdir (within the umass mount), eventually messages like this are generated:

  umass0: Invalid CSW: tag 1631723562 should be 2236794

The processes effectively stop writing/reading; 'tar' processes are in state 'uninterruptible wait' as per ps(1) ('D'). Unmounting the device works, but results in messages:

  umass0: BBB reset failed, IOERROR
  umass0: BBB bulk-in clear stall failed, IOERROR
  umass0: BBB bulk-out clear stall failed, IOERROR

Initially, I/O to other disks was still ok, but the morning after, I found the box unresponsive when trying to access any disk, and console filled with the above messages. 

The device, when attached:

---

Oct  3 18:13:10 lime /netbsd: umass0 at uhub4 port 1 configuration 1 interface 0
Oct  3 18:13:10 lime /netbsd: umass0: Sunplus Innovation Technology USB to Serial-ATA bridge, rev 2.00/1.32, addr 2
Oct  3 18:13:10 lime /netbsd: umass0: using SCSI over Bulk-Only
Oct  3 18:13:10 lime /netbsd: scsibus0 at umass0: 2 targets, 1 lun per target
Oct  3 18:13:10 lime /netbsd: sd0 at scsibus0 target 0 lun 0: <SAMSUNG, HD204UI, 0200> disk fixed
Oct  3 18:13:10 lime /netbsd: sd0: 1863 GB, 500 cyl, 8 head, 32 sec, 512 bytes/sect x 3907029168 sectors

---

/var/run/dmesg.boot from the next boot:

---

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011, 2012
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 6.0.1 (GENERIC)
total memory = 1919 MB
avail memory = 1848 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
System manufacturer System Product Name (System Version)
mainbus0 (root)
cpu0 at mainbus0 apid 0: Intel(R) Core(TM)2 Duo CPU     E4400  @ 2.00GHz, id 0x6fd
cpu1 at mainbus0 apid 1: Intel(R) Core(TM)2 Duo CPU     E4400  @ 2.00GHz, id 0x6fd
ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 3, 24 pins
ioapic1 at mainbus0 apid 3: pa 0xfecc0000, version 3, 24 pins
acpi0 at mainbus0: Intel ACPICA 20110623
acpi0: X/RSDT: OemId <A_M_I_,OEMRSDT ,08000708>, AslId <MSFT,00000097>
ACPI Warning: Incorrect checksum in table [OEMB] - 0x20, should be 0x13 (20110623/tbutils-282)
acpi0: SCI interrupting at int 9
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
hpet0 at acpi0: high precision event timer (mem 0xfed00000-0xfed00400)
timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
pcppi1 at acpi0 (SPKR, PNP0800): io 0x61
midi0 at pcppi1: PC speaker
sysbeep0 at pcppi1
FDC (PNP0700) at acpi0 not configured
LPTE (PNP0401) at acpi0 not configured
RMSC (PNP0C02) at acpi0 not configured
aibs0 at acpi0 (ASOC, ATK0110-16843024): ASUSTeK AI Booster
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x43
SIOR (PNP0C02) at acpi0 not configured
UAR1 (PNP0501) at acpi0 not configured
OMSC (PNP0C02) at acpi0 not configured
PCIE (PNP0C02) at acpi0 not configured
acpibut0 at acpi0 (SLPB, PNP0C0E): ACPI Sleep Button
RMEM (PNP0C01) at acpi0 not configured
acpibut1 at acpi0 (PWRB, PNP0C0C-170): ACPI Power Button
acpiwdrt0 at acpi0: mem 0xfed01000,0xfed01004
acpiwdrt0: PCI 0:000:00:0 vendor 0x1106 product 0x3372
acpiwdrt0: watchdog interval 1-1023 sec.
attimer1: attached to pcppi1
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: vendor 0x1106 product 0x0364 (rev. 0x00)
agp0 at pchb0: aperture at 0xf0000000, size 0x8000000
pchb1 at pci0 dev 0 function 1: vendor 0x1106 product 0x1364 (rev. 0x00)
pchb2 at pci0 dev 0 function 2: vendor 0x1106 product 0x2364 (rev. 0x00)
pchb3 at pci0 dev 0 function 3: vendor 0x1106 product 0x3364 (rev. 0x00)
pchb4 at pci0 dev 0 function 4: vendor 0x1106 product 0x4364 (rev. 0x00)
vendor 0x1106 product 0x5364 (interrupt system, interface 0x20) at pci0 dev 0 function 5 not configured
pchb5 at pci0 dev 0 function 6: vendor 0x1106 product 0x6364 (rev. 0x00)
pchb6 at pci0 dev 0 function 7: vendor 0x1106 product 0x7364 (rev. 0x00)
ppb0 at pci0 dev 1 function 0: vendor 0x1106 product 0xb198 (rev. 0x00)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
vga0 at pci1 dev 0 function 0: vendor 0x1106 product 0x3371 (rev. 0x01)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
drm at vga0 not configured
ppb1 at pci0 dev 2 function 0: vendor 0x1106 product 0xa364 (rev. 0x80)
ppb1: PCI Express 1.0 <Root Port of PCI-E Root Complex>
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
ppb2 at pci0 dev 3 function 0: vendor 0x1106 product 0xc364 (rev. 0x80)
ppb2: PCI Express 1.0 <Root Port of PCI-E Root Complex>
pci3 at ppb2 bus 3
pci3: i/o space, memory space enabled, rd/line, wr/inv ok
viaide0 at pci0 dev 15 function 0: VIA Technologies VT8237S SATA Controller (rev. 0x00)
viaide0: bus-master DMA support present
viaide0: primary channel configured to native-PCI mode
viaide0: using ioapic0 pin 21 for native-PCI interrupt
atabus0 at viaide0 channel 0
viaide0: secondary channel configured to native-PCI mode
atabus1 at viaide0 channel 1
viaide1 at pci0 dev 15 function 1
viaide1: VIA Technologies unknown VIA ATA controller
viaide1: bus-master DMA support present
viaide1: primary channel configured to compatibility mode
viaide1: primary channel interrupting at ioapic0 pin 14
atabus2 at viaide1 channel 0
viaide1: secondary channel configured to compatibility mode
viaide1: secondary channel interrupting at ioapic0 pin 15
atabus3 at viaide1 channel 1
uhci0 at pci0 dev 16 function 0: vendor 0x1106 product 0x3038 (rev. 0xb0)
uhci0: interrupting at ioapic0 pin 20
usb0 at uhci0: USB revision 1.0
uhci1 at pci0 dev 16 function 1: vendor 0x1106 product 0x3038 (rev. 0xb0)
uhci1: interrupting at ioapic0 pin 22
usb1 at uhci1: USB revision 1.0
uhci2 at pci0 dev 16 function 2: vendor 0x1106 product 0x3038 (rev. 0xb0)
uhci2: interrupting at ioapic0 pin 21
usb2 at uhci2: USB revision 1.0
uhci3 at pci0 dev 16 function 3: vendor 0x1106 product 0x3038 (rev. 0xb0)
uhci3: interrupting at ioapic0 pin 23
usb3 at uhci3: USB revision 1.0
ehci0 at pci0 dev 16 function 4: vendor 0x1106 product 0x3104 (rev. 0x90)
ehci0: interrupting at ioapic0 pin 21
ehci0: dropped intr workaround enabled
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2 uhci3
usb4 at ehci0: USB revision 2.0
pcib0 at pci0 dev 17 function 0: vendor 0x1106 product 0x3372 (rev. 0x00)
pchb7 at pci0 dev 17 function 7: vendor 0x1106 product 0x287e (rev. 0x00)
vr0 at pci0 dev 18 function 0: vendor 0x1106 product 0x3065 (rev. 0x7c)
vr0: interrupting at ioapic0 pin 23
vr0: Ethernet address: 00:1b:fc:c8:cf:a3
atphy0 at vr0 phy 1: L2 10/100 PHY, rev. 1
atphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pchb8 at pci0 dev 19 function 0: vendor 0x1106 product 0x337b (rev. 0x00)
ppb3 at pci0 dev 19 function 1: vendor 0x1106 product 0x337a (rev. 0x00)
pci4 at ppb3 bus 4
pci4: i/o space, memory space enabled
isa0 at pcib0
lpt0 at isa0 port 0x378-0x37b irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
pci5 at mainbus0 bus 128
pci5: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
hdaudio0 at pci5 dev 1 function 0: HD Audio Controller
hdaudio0: interrupting at ioapic0 pin 17
hdafg0 at hdaudio0: Realtek ALC662
hdafg0: DAC00 2ch: Speaker [Jack]
hdafg0: DAC01 2ch: HP Out [Jack]
hdafg0: DIG02 2ch: SPDIF Out [Built-In]
hdafg0: ADC03 2ch: Line In [Jack], Mic In [Jack]
hdafg0: ADC04 2ch: Mic In [Jack]
hdafg0: 2ch/2ch 44100Hz 48000Hz 96000Hz PCM16 PCM20 PCM24 AC3
audio0 at hdafg0: full duplex, playback, capture, independent
acpicpu0 at cpu0: ACPI CPU
acpicpu0: C1: HLT, lat   0 us, pow     0 mW
acpicpu0: P0: FFH, lat  10 us, pow 88000 mW, 2000 MHz
acpicpu0: P1: FFH, lat  10 us, pow 71808 mW, 1600 MHz
acpicpu0: P2: FFH, lat  10 us, pow 56408 mW, 1200 MHz
coretemp0 at cpu0: thermal sensor, 1 C resolution
acpicpu1 at cpu1: ACPI CPU
coretemp1 at cpu1: thermal sensor, 1 C resolution
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
timecounter: Timecounter "TSC" frequency 2000117900 Hz quality 3000
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
uhub0 at usb1: vendor 0x1106 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhub1 at usb3: vendor 0x1106 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhub2 at usb0: vendor 0x1106 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhub3 at usb2: vendor 0x1106 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
uhub4 at usb4: vendor 0x1106 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub4: 8 ports with 8 removable, self powered
viaide0 port 0: device present, speed: 3.0Gb/s
wd0 at atabus0 drive 0
wd0: <SAMSUNG HD204UI>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 1863 GB, 3876021 cyl, 16 head, 63 sec, 512 bytes/sect x 3907029168 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(viaide0:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)
atapibus0 at atabus2: 2 targets
cd0 at atapibus0 drive 0: <TSSTcorpDVD-ROM SH-D162D, , SB00> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
cd0(viaide1:0:0): using PIO mode 4, DMA mode 2 (using DMA)
Kernelized RAIDframe activated
pad0: outputs: 44100Hz, 16-bit, stereo
audio1 at pad0: half duplex, playback, capture
boot device: wd0
root on wd0a dumps on wd0b
/: replaying log to memory
root file system type: ffs
/: replaying log to disk
uhidev0 at uhub3 port 1 configuration 1 interface 0
uhidev0: Microsoft Comfort Curve Keyboard 2000, rev 2.00/1.73, addr 2, iclass 3/1
ukbd0 at uhidev0
wskbd0 at ukbd0: console keyboard, using wsdisplay0
uhidev1 at uhub3 port 1 configuration 1 interface 1
uhidev1: Microsoft Comfort Curve Keyboard 2000, rev 2.00/1.73, addr 2, iclass 3/0
uhidev1: 1 report ids
uhid0 at uhidev1 reportid 1: input=7, output=0, feature=0
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
/share: replaying log to disk
aibs0: warning under limit on 'CHASSIS FAN Speed'

---

>How-To-Repeat:
as listed in 'Full description'
>Fix:

>Release-Note:

>Audit-Trail:
From: Nick Hudson <skrll@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: m.ramakers@gmail.com, kern-bug-people@netbsd.org, 
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48270: umass / ehci effectively hangs system during stresstest
Date: Wed, 16 Oct 2013 08:25:45 +0100

 This is a multi-part message in MIME format.
 --------------000506050201040900010809
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 7bit

 On 10/04/13 06:55, m.ramakers@gmail.com wrote:

 >> Number:         48270
 >> Synopsis:       umass / ehci effectively hangs system during stresstest
 >>

 Hi,

 Can you see if the attached patch helps it recover, please?

 Why it needs to recover is another matter.

 Nick

 --------------000506050201040900010809
 Content-Type: text/plain; charset=us-ascii;
  name="usbdi.c.diff"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="usbdi.c.diff"

 Index: sys/dev/usb/usbdi.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/usb/usbdi.c,v
 retrieving revision 1.159
 diff -u -p -r1.159 usbdi.c
 --- sys/dev/usb/usbdi.c	4 Oct 2013 12:47:04 -0000	1.159
 +++ sys/dev/usb/usbdi.c	16 Oct 2013 07:22:10 -0000
 @@ -296,10 +296,16 @@ usbd_transfer(usbd_xfer_handle xfer)
  	if (!(flags & USBD_NO_COPY) && size != 0 && !usbd_xfer_isread(xfer))
  		memcpy(KERNADDR(dmap, 0), xfer->buffer, size);

 -	/* xfer is not valid after the transfer method unless synchronous */
 +	/*
 +	 * xfer is not valid after the transfer method unless synchronous, or
 +	 * there was and error
 +	 */
  	err = pipe->methods->transfer(xfer);

  	if (err != USBD_IN_PROGRESS && err) {
 +		pipe->running = 0;
 + 		SIMPLEQ_REMOVE_HEAD(&pipe->queue, next);
 +
  		/* The transfer has not been queued, so free buffer. */
  		if (xfer->rqflags & URQ_AUTO_DMABUF) {
  			struct usbd_bus *bus = pipe->device->bus;


 --------------000506050201040900010809--

Responsible-Changed-From-To: kern-bug-people->skrll
Responsible-Changed-By: skrll@NetBSD.org
Responsible-Changed-When: Sun, 08 Mar 2015 11:31:34 +0000
Responsible-Changed-Why:
Take


State-Changed-From-To: open->feedback
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Sun, 21 Feb 2016 13:19:48 +0000
State-Changed-Why:
Please try -current and report back.  A change similar to the suggested diff
has been committed.


From: Michai Ramakers <m.ramakers@gmail.com>
To: gnats-bugs@netbsd.org
Cc: skrll@netbsd.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/48270 (umass / ehci effectively hangs system during stresstest)
Date: Sun, 21 Feb 2016 19:28:24 +0100

 I won't be able to test this the coming week for sure; perhaps the
 week after that.

From: Michai Ramakers <m.ramakers@gmail.com>
To: gnats-bugs@gnats.netbsd.org
Cc: 
Subject: Re: kern/48270
Date: Thu, 18 Aug 2016 07:58:46 +0200

 Hello,

 I realise review/retest by me has been pending now for a long time -
 sorry for this.

 However, here's more inconvenient news perhaps: I don't foresee being
 able to test this. The box in question has been reinstalled, the
 umass-medium is gone, and to be honest I'm too busy to try to
 reproduce the symptom using new hardware.

 Kind regards and thank you for putting in effort,
 Michai Ramakers

 -- 
 Web: http://sheep-thrills.net/
 Twitter: https://twitter.com/MichaiRamakers
 LinkedIn: http://nl.linkedin.com/in/michai

State-Changed-From-To: feedback->closed
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Thu, 05 Jan 2017 11:21:33 +0000
State-Changed-Why:
No chance of feedback


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.