NetBSD Problem Report #32626

From Ephaeton@gmx.net  Wed Jan 25 13:58:16 2006
Return-Path: <Ephaeton@gmx.net>
Received: from mail.gmx.net (mail.gmx.de [213.165.64.21])
	by narn.netbsd.org (Postfix) with SMTP id 934AE63B879
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 25 Jan 2006 13:58:13 +0000 (UTC)
Message-Id: <200601251358.k0PDw9UK023160@circe.entropie.net>
Date: Wed, 25 Jan 2006 14:58:09 +0100 (CET)
From: "Martin S. Weber" <Ephaeton@gmx.net>
Reply-To: Ephaeton@gmx.net
To: gnats-bugs@netbsd.org
Subject: ehci + umass + stress = umass stall cycle
X-Send-Pr-Version: 3.95

>Number:         32626
>Notify-List:    gson@gson.org
>Category:       kern
>Synopsis:       ehci + umass + stress = umass stall cycle
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 25 14:00:00 +0000 2006
>Closed-Date:    Sat May 28 05:27:01 +0000 2022
>Last-Modified:  Sat May 28 05:27:01 +0000 2022
>Originator:     Martin S. Weber
>Release:        NetBSD 3.99.15
>Organization:
	Entropie Regensburg!

>Environment:

The "Via" machine:

uhci0 at pci0 dev 2 function 0: VIA Technologies VT83C572 USB Controller (rev. 0x61)
uhci0: interrupting at ioapic1 pin 2 (irq 11)
usb0 at uhci0: USB revision 1.0
uhci1 at pci0 dev 2 function 1: VIA Technologies VT83C572 USB Controller (rev. 0x61)
uhci1: interrupting at ioapic1 pin 3 (irq 11)
usb1 at uhci1: USB revision 1.0
ehci0 at pci0 dev 2 function 2: VIA Technologies VT8237 EHCI USB Controller (rev. 0x63)
ehci0: interrupting at ioapic1 pin 2 (irq 11)
ehci0: BIOS has given up ownership
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1
usb2 at ehci0: USB revision 2.0
ohci0 at pci0 dev 15 function 2: ServerWorks OSB4/CSB5 USB Host Controller (rev. 0x04)
ohci0: interrupting at ioapic1 pin 1 (irq 9)
ohci0: OHCI version 1.0, legacy support
usb3 at ohci0: USB revision 1.0

The "Intel" machine:

ehci0 at pci0 dev 29 function 7: Intel 82801EB/ER USB EHCI Controller (rev. 0x02)
ehci0: interrupting at ioapic0 pin 23 (irq 5)
ehci0: BIOS has given up ownership
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2 uhci3
usb4 at ehci0: USB revision 2.0

The "umass":

umass0 at uhub5 port 3 configuration 1 interface 0
umass0: Genesys Logic USB TO IDE, rev 2.00/0.33, addr 3
umass0: using SCSI over Bulk-Only
scsibus0 at umass0: 2 targets, 1 lun per target
sd0 at scsibus0 target 0 lun 0: <IC25N080, ATMR04-0, 0811> disk fixed
sd0: fabricating a geometry
sd0: 76319 MB, 76319 cyl, 64 head, 32 sec, 512 bytes/sect x 156301488 sectors
sd0: fabricating a geometry




System: NetBSD circe.entropie.net 3.99.15 NetBSD 3.99.15 (CIRCE.LEAN) #0: Sat Jan 21 14:53:01 CET 2006 root@circe.entropie.net:/src/obj/sys/arch/i386/compile/CIRCE.LEAN i386
Architecture: i386
Machine: i386
>Description:

When stressing an umass hanging on an ehci, problems can occur once the drive
is stressed. Adding the ..broken_intr_workaround makes the problems harder to
reproduce, yet it is still possible.

Note: I'm talking about the intel controller down there. The workaround as
imported for VIA controllers was/is also active for this Intel controller
(via patched ehci_pci as suggested by Juan RP)!

Symptoms: At some point the umass beginns stalling and doesn't ever recover.
The stalls make it impossible for the system to sync the drive. Thus it is
also impossible to umount the drive. It's furthermore impossible to kill anything
trying to write on the drive (They're stuck in biowait / ffsync) - not even with
kill -9.

Shutting down the system cleanly is impossible because the fs cannot be synced /
umounted. Hitting ctrl+alt+esc while it struggles and trying a 'sync' by hand
makes the machine hang completely (i.e. hard powerdown required).

This is reproducible by both intel and via controllers (see dmesgs in Environment)

Output from an EHCIDEBUG + UMASSDEBUG kernel with ehcidebug = umassdebug = 0
on the machine with the Intel Controller:

umass0: BBB reset failed, TIMEOUT
umass0: BBB bulk-in clear stall failed, TIMEOUT
umass0: BBB bulk-out clear stall failed, TIMEOUT
umass0: BBB reset failed, TIMEOUT
umass0: BBB bulk-in clear stall failed, TIMEOUT
umass0: BBB bulk-out clear stall failed, TIMEOUT
umass0: BBB reset failed, TIMEOUT
umass0: BBB bulk-in clear stall failed, TIMEOUT
umass0: BBB bulk-out clear stall failed, TIMEOUT
umass0: BBB reset failed, TIMEOUT
umass0: BBB bulk-in clear stall failed, TIMEOUT
umass0: BBB bulk-out clear stall failed, TIMEOUT
....(ad nauseatum)

Output from an EHCIDEBUG + UMASSDEBUG kernel with ehcidebug = umassdebug = 1
on the machine with the Intel Controller (WITHOUT WORKAROUND):

ehci0 at pci0 dev 29 function 7: Intel 82801EB/ER USB EHCI Controller (rev. 0x02)
ehci0: offs=32
ehci0: interrupting at ioapic0 pin 23 (irq 5)
ehci_pci_attach: companion uhci0
ehci_pci_attach: companion uhci1
ehci_pci_attach: companion uhci2
ehci_pci_attach: companion uhci3
ehci_dump_caps: legsup=0x00000001 legctlsts=0xc0000000
ehci0: BIOS has given up ownership
ehci_init: start
ehci0: EHCI version 1.0
ehci_init: sparams=0x104208
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2 uhci3
ehci_init: cparams=0x6871
ehci0: resetting
ehci0: flsize=1024
QH(0xcaeafe40) at 0x01914e40:
  link=0x01914e42<QH>
  endp=0x0000a000
    addr=0x00 inact=0 endpt=0 eps=2 dtc=0 hrecl=1
    mpl=0x0 ctl=0 nrl=0
  endphub=0x00000000
    smask=0x00 cmask=0x00 huba=0x00 port=0 mult=0
  curqtd=0x00000001<T>
Overlay qTD:
  next=0x00000001<T> altnext=0x00000001<T>
  status=0x00000040: toggle=0 bytes=0x0 ioc=0 c_page=0x0
    cerr=0 pid=0 stat=0x40<HALTED>
  buffer[0]=0x00000000
  buffer[1]=0x00000000
  buffer[2]=0x00000000
  buffer[3]=0x00000000
  buffer[4]=0x00000000
usb4 at ehci0: USB revision 2.0
(...)
(mount)
ehci_alloc_sqtd_chain: start len=65536
....
(start writing [rsync])
ehci_alloc_sqtd_chain: start len=65536
....
(start reading, too [*] * 1)
umass0: BBB reset failed, IOERROR
ehci_device_clear_toggle: epipe=0xc178da80 status=0x8c008148
umass0: BBB bulk-in clear stall failed, IOERROR
ehci_device_clear_toggle: epipe=0xc178d080 status=0x8c00
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
ehci_device_clear_toggle: epipe=0xc178da80 status=0x8c008148
umass0: BBB bulk-in clear stall failed, IOERROR
ehci_device_clear_toggle: epipe=0xc178d080 status=0x1f8049
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
ehci_device_clear_toggle: epipe=0xc178da80 status=0x8c008148
umass0: BBB bulk-in clear stall failed, IOERROR
ehci_device_clear_toggle: epipe=0xc178d080 status=0x1f8049
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
...

Output from an EHCIDEBUG + UMASSDEBUG kernel with ehcidebug = umassdebug = 1
on the machine with the Intel Controller (*WITH* WORKAROUND):

(mount, start writing [rsync])
(also have reader [*] * 2, 3 at times run)
(all fine so far. Ok Seems stressable. Stop those [*]s.)
ehci_intrlist_timeout
ehci_alloc_sqtd_chain: start len=65536

(and here xmms tries to read the "next" mp3 from the umass)

ehci_alloc_sqtd_chain: start len=65536
ehci_intrlist_timeout
...(about 50 follow)...
ehci_intrlist_timeout
ehci_timeout: exfer=0xc1654900
ehci_timeout_task: xfer=0xc1654900
ehci_abort_xfer: xfer=0xc1654900 pipe=0xc1660000
ehci_intr1: door bell
ehci_intrlist_timeout
...(more follow)...
ehci_intrlist_timeout
ehci_timeout: exfer=0xc168a200
ehci_intrlist_timeout
ehci_timeout_task: xfer=0xc168a200
ehci_abort_xfer: xfer=0xc168a200 pipe=0xc1660180
ehci_intr1: door bell
ehci_idone: aborted xfer=0xc168a200
umass0: BBB reset failed, TIMEOUT
ehci_device_clear_toggle: epipe=0xc1660000 status=0x6008d80
...
ehci_intrlist_timeout
ehci_timeout: exfer=0xc168af00
ehci_intrlist_timeout
ehci_timeout_task: xfer=0xc168af00
ehci_abort_xfer: xfer=0xc168af00 pipe=0xc1660180
ehci_intr1: door bell
ehci_idone: aborted xfer=0xc168af00
umass0: BBB bulk-in clear stall failed, TIMEOUT
ehci_device_clear_toggle: epipe=0xc1660380 status=0x8c00
ehci_intrlist_timeout
....
ehci_intrlist_timeout
ehci_timeout: exfer=0xc1654300
ehci_timeout_task: xfer=0xc1654300
ehci_abort_xfer: xfer=0xc1654300 pipe=0xc1660180
ehci_intr1: door bell
umass0: BBB bulk-out clear stall failed, TIMEOUT
ehci_intrlist_timeout




Basically the same happens with the VIA controller (although due to lack of
syncing I lost the debug output when I had to kill the machine).

I *do* have a dump of a kernel which could not shutdown cleanly (don't ask
how I got along in creating that one) which can be made available to
interested parties of course.

See also:
	http://mail-index.netbsd.org/current-users/2006/01/23/0014.html
	http://mail-index.netbsd.org/current-users/2006/01/24/0007.html

And
	kern/26681 (which probably should be re-opened)

>How-To-Repeat:
I have an umass with a lot of files after two dir hierarchies, and thus using:
(ksh speak)
stress() {
	x="$1"
	while :; do
		for i in */*/*; do
			[ $(( $RANDOM % 5 )) -eq 3 ] || continue;
			cat "$i" > /tmp/junk.$x
		done
		echo "Run $x done.."
		sleep $(( $RANDOM % 30 ))
	done
}

Run a couple of these stress()ers on the drive. Additionally I run the
two:

rsync -avz /usr/src .

in some subdir on the drive as well as

while :; do
	dd if=/dev/zero of=junk.img bs=1m count=10240 progress=1
	rm -f junk.img
done

Just let these running. After "some seconds" up to "some hours" the drive
will stumble.

Note: I am not sure whether the drive just is "broken" and is experiencing
something like a "**ide: lost interrupt". No matter, the system shouldn't
be down like that.

>Fix:
Uh, dunno.


>Release-Note:

>Audit-Trail:
From: Carl Brewer <carl@bl.echidna.id.au>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/32626
Date: Thu, 13 Apr 2006 17:00:57 +1000

 does this relate at all to kern/31003 ?

From: "Martin S. Weber" <Ephaeton@gmx.net>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/32626
Date: Sat, 15 Apr 2006 21:20:04 +0200

 Hoi,

 On Thu, Apr 13, 2006 at 07:05:16AM +0000, Carl Brewer wrote:
 > The following reply was made to PR kern/32626; it has been noted by GNATS.
 > 
 > From: Carl Brewer <carl@bl.echidna.id.au>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/32626
 > Date: Thu, 13 Apr 2006 17:00:57 +1000
 > 
 >  does this relate at all to kern/31003 ?

 Possible; at least it's the same messages, one of the panics I've observed
 before I've sent a PR (or any msg to the mailing list), and my HDD also has
 no external power supply.

 Then again the cause for mine is totally different it seems - I'm not playing
 laptop games, I'm talking about desktop machines which keep running all the
 time and seemingly stumble over a "slow read", "bad sector" or whatever evil
 of the hdd I have.

 So ... 50/50 ? As no one ever looked into 32626 before, no idea :)

 Btw, I'm *still* volunteering to assist with whatever testing that might help!

 Regards,

 -Martin

From: "Martin S. Weber" <Ephaeton@gmx.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/32626
Date: Wed, 29 Feb 2012 20:29:46 -0500

 PR can be closed. It is over 6 years old and the window of opportunity
 to dig deeper into this has closed some 4 years ago. I no longer have
 access to the machine(s) in question, nor the hardware that was used
 to trigger the condition(s). This bug will thus remain a mystery.

State-Changed-From-To: open->closed
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Thu, 01 Mar 2012 10:38:47 +0000
State-Changed-Why:
Hardware gone, submitter requests PR to be closed.


From: "Jonathan A. Kollasch" <jakllsch@kollasch.net>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/32626: ehci + umass + stress = umass stall cycle
Date: Thu, 1 Mar 2012 06:34:26 -0600

 I have reproduced this bug within the last one to two years.

From: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/32626: ehci + umass + stress = umass stall cycle
Date: Thu, 1 Mar 2012 15:12:02 +0100

 On Thu, Mar 01, 2012 at 12:35:02PM +0000, Jonathan A. Kollasch wrote:
 > From: "Jonathan A. Kollasch" <jakllsch@kollasch.net>
 > To: gnats-bugs@netbsd.org
 > Subject: Re: kern/32626: ehci + umass + stress = umass stall cycle
 > Date: Thu, 1 Mar 2012 06:34:26 -0600
 > 
 >  I have reproduced this bug within the last one to two years.

 Hm. I think from a GNATS point of view it's probably best if you
 submit a similar bug report referencing this one, otherwise I don't
 know how to get Martin out of the loop (who's probably not interested
 in all the details of this now).
  Thomas

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/32626: ehci + umass + stress = umass stall cycle
Date: Thu, 1 Mar 2012 21:52:03 +0000

 On Thu, Mar 01, 2012 at 02:15:03PM +0000, Thomas Klausner wrote:
  >  On Thu, Mar 01, 2012 at 12:35:02PM +0000, Jonathan A. Kollasch wrote:
  >  > From: "Jonathan A. Kollasch" <jakllsch@kollasch.net>
  >  > To: gnats-bugs@netbsd.org
  >  > Subject: Re: kern/32626: ehci + umass + stress = umass stall cycle
  >  > Date: Thu, 1 Mar 2012 06:34:26 -0600
  >  > 
  >  >  I have reproduced this bug within the last one to two years.
  >  
  >  Hm. I think from a GNATS point of view it's probably best if you
  >  submit a similar bug report referencing this one, otherwise I don't
  >  know how to get Martin out of the loop (who's probably not interested
  >  in all the details of this now).

 There is no useful way, at least for now. I'm not sure that creating a
 new PR is really the best solution though.

 -- 
 David A. Holland
 dholland@netbsd.org

From: "Martin S. Weber" <Ephaeton@gmx.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/32626: ehci + umass + stress = umass stall cycle
Date: Thu, 1 Mar 2012 23:15:54 -0500

 On Thu, Mar 01, 2012 at 09:55:01PM +0000, David Holland wrote:
 > The following reply was made to PR kern/32626; it has been noted by GNATS.
 > 
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/32626: ehci + umass + stress = umass stall cycle
 > Date: Thu, 1 Mar 2012 21:52:03 +0000
 > 
 >  On Thu, Mar 01, 2012 at 02:15:03PM +0000, Thomas Klausner wrote:
 >   >  On Thu, Mar 01, 2012 at 12:35:02PM +0000, Jonathan A. Kollasch wrote:
 >   >  > From: "Jonathan A. Kollasch" <jakllsch@kollasch.net>
 >   >  > To: gnats-bugs@netbsd.org
 >   >  > Subject: Re: kern/32626: ehci + umass + stress = umass stall cycle
 >   >  > Date: Thu, 1 Mar 2012 06:34:26 -0600
 >   >  > 
 >   >  >  I have reproduced this bug within the last one to two years.
 >   >  
 >   >  Hm. I think from a GNATS point of view it's probably best if you
 >   >  submit a similar bug report referencing this one, otherwise I don't
 >   >  know how to get Martin out of the loop (who's probably not interested
 >   >  in all the details of this now).
 >  
 >  There is no useful way, at least for now. I'm not sure that creating a
 >  new PR is really the best solution though.

 If you still see this, keep it open and append, I don't mind the noise :)

 -Martin

State-Changed-From-To: closed->open
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Fri, 02 Mar 2012 09:39:39 +0000
State-Changed-Why:
"Jonathan A. Kollasch" <jakllsch@kollasch.net> still sees this.


State-Changed-From-To: open->closed
State-Changed-By: mbalmer@NetBSD.org
State-Changed-When: Sat, 03 Mar 2012 09:40:13 +0000
State-Changed-Why:
Closed at submitter's request.


State-Changed-From-To: closed->open
State-Changed-By: mbalmer@NetBSD.org
State-Changed-When: Sat, 03 Mar 2012 09:42:25 +0000
State-Changed-Why:
re-open.  I missed the discussion that followed the submitter's request
to close this PR.


From: Matteo Beccati <php@beccati.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/32626
Date: Mon, 15 Apr 2013 09:28:20 +0200

 Hi,

 after upgrading to 6.1 RC2, I've started experiencing the same issue.
 Every night, during the generation of the daily report, the USB hard
 drive generates quite a lot of these errors:

 umass0: BBB reset failed, TIMEOUT
 umass0: BBB bulk-in clear stall failed, TIMEOUT
 umass0: BBB bulk-out clear stall failed, TIMEOUT

 After that I cannot cleanly shutdown, nor kill -9 some processes. I
 don't recall this happening with 6.0, but maybe the drive just had less
 data (i.e. there are backup copies of the old filesystems now).

 Here's a dmesg output for reference.

 NetBSD 6.1_RC2 (GENERIC)
 total memory = 2685 MB
 avail memory = 2628 MB
 timecounter: Timecounters tick every 10.000 msec
 timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
 System manufacturer System Product Name (System Version)
 mainbus0 (root)
 ACPI Warning: Optional field Pm2ControlBlock has zero address or length:
 0x0000000000000000/0x1 (20110623/tbfadt-586)
 cpu0 at mainbus0 apid 0: AMD E-350 Processor, id 0x500f10
 cpu1 at mainbus0 apid 1: AMD E-350 Processor, id 0x500f10
 ioapic0 at mainbus0 apid 0: pa 0xfec00000, version 21, 24 pins
 acpi0 at mainbus0: Intel ACPICA 20110623
 acpi0: X/RSDT: OemId <ALASKA,   A M I,01072009>, AslId <AMI ,00010013>
 ACPI Error: [RAMB] Namespace lookup failure, AE_NOT_FOUND
 (20110623/psargs-392)
 ACPI Exception: AE_NOT_FOUND, Could not execute arguments for [RAMW]
 (Region) (20110623/nsinit-380)
 ioapic0 reenabling
 acpi0: SCI interrupting at int 9
 timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
 hpet0 at acpi0: high precision event timer (mem 0xfed00000-0xfed00400)
 timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
 AMDN (PNP0C01) at acpi0 not configured
 S800 (PNP0C02) at acpi0 not configured
 SIO1 (PNP0C02) at acpi0 not configured
 attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x43 irq 0
 pcppi1 at acpi0 (SPKR, PNP0800): io 0x61
 midi0 at pcppi1: PC speaker
 sysbeep0 at pcppi1
 RMSC (PNP0C02) at acpi0 not configured
 npx1 at acpi0 (COPR, PNP0C04): io 0xf0-0xff irq 13
 npx1: reported by CPUID; using exception 16
 NBRM (PNP0C02) at acpi0 not configured
 UAR1 (PNP0501) at acpi0 not configured
 BROD (PNP0C02) at acpi0 not configured
 acpibut0 at acpi0 (PWRB, PNP0C0C-170): ACPI Power Button
 RMEM (PNP0C01) at acpi0 not configured
 OMSC (PNP0C02) at acpi0 not configured
 acpiwmi0 at acpi0 (AMW0, PNP0C14-ASUSWMI): ACPI WMI Interface
 wmieeepc0 at acpiwmi0: Asus Eee PC WMI mappings
 apm0 at acpi0: Power Management spec V1.2
 attimer1: attached to pcppi1
 pci0 at mainbus0 bus 0: configuration mode 1
 pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
 pchb0 at pci0 dev 0 function 0: vendor 0x1022 product 0x1510 (rev. 0x00)
 vga1 at pci0 dev 1 function 0: vendor 0x1002 product 0x9802 (rev. 0x00)
 wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation)
 wsmux1: connecting to wsdisplay0
 drm at vga1 not configured
 hdaudio0 at pci0 dev 1 function 1: HD Audio Controller
 hdaudio0: interrupting at ioapic0 pin 19
 hdafg0 at hdaudio0: ATI R6xx HDMI
 hdafg0: HDMI00 2ch: Digital Out [Jack]
 hdafg0: 2ch/0ch 32000Hz 44100Hz 48000Hz PCM16 AC3
 ppb0 at pci0 dev 4 function 0: vendor 0x1022 product 0x1512 (rev. 0x00)
 ppb0: PCI Express 2.0 <Root Port of PCI-E Root Complex>
 pci1 at ppb0 bus 1
 pci1: i/o space, memory space enabled, rd/line, wr/inv ok
 ahcisata0 at pci0 dev 17 function 0: vendor 0x1002 product 0x4391 (rev.
 0x40)
 ahcisata0: interrupting at ioapic0 pin 19
 ahcisata0: AHCI revision 1.20, 6 ports, 32 slots, CAP
 0xf732ff05<PSC,SSC,PMD,SPM,ISS=0x3=Gen3,SCLO,SAL,SALP,SMPS,SSNTF,SNCQ,S64A>
 atabus0 at ahcisata0 channel 0
 atabus1 at ahcisata0 channel 1
 atabus2 at ahcisata0 channel 2
 atabus3 at ahcisata0 channel 3
 atabus4 at ahcisata0 channel 4
 atabus5 at ahcisata0 channel 5
 ohci0 at pci0 dev 18 function 0: vendor 0x1002 product 0x4397 (rev. 0x00)
 ohci0: interrupting at ioapic0 pin 18
 ohci0: OHCI version 1.0, legacy support
 usb0 at ohci0: USB revision 1.0
 ehci0 at pci0 dev 18 function 2: vendor 0x1002 product 0x4396 (rev. 0x00)
 ehci0: interrupting at ioapic0 pin 17
 ehci0: dropped intr workaround enabled
 ehci0: BIOS has given up ownership
 ehci0: EHCI version 1.0
 ehci0: companion controller, 5 ports each: ohci0
 usb1 at ehci0: USB revision 2.0
 ohci1 at pci0 dev 19 function 0: vendor 0x1002 product 0x4397 (rev. 0x00)
 ohci1: interrupting at ioapic0 pin 18
 ohci1: OHCI version 1.0, legacy support
 usb2 at ohci1: USB revision 1.0
 ehci1 at pci0 dev 19 function 2: vendor 0x1002 product 0x4396 (rev. 0x00)
 ehci1: interrupting at ioapic0 pin 17
 ehci1: dropped intr workaround enabled
 ehci1: BIOS has given up ownership
 ehci1: EHCI version 1.0
 ehci1: companion controller, 5 ports each: ohci1
 usb3 at ehci1: USB revision 2.0
 piixpm0 at pci0 dev 20 function 0: vendor 0x1002 product 0x4385 (rev. 0x42)
 piixpm0: polling (SB800)
 iic0 at piixpm0: I2C bus
 pcib0 at pci0 dev 20 function 3: vendor 0x1002 product 0x439d (rev. 0x40)
 ppb1 at pci0 dev 20 function 4: vendor 0x1002 product 0x4384 (rev. 0x40)
 pci2 at ppb1 bus 2
 pci2: i/o space, memory space enabled
 ohci2 at pci0 dev 20 function 5: vendor 0x1002 product 0x4399 (rev. 0x00)
 ohci2: interrupting at ioapic0 pin 18
 ohci2: OHCI version 1.0, legacy support
 usb4 at ohci2: USB revision 1.0
 ppb2 at pci0 dev 21 function 0: vendor 0x1002 product 0x43a0 (rev. 0x00)
 ppb2: PCI Express 2.0 <Root Port of PCI-E Root Complex>
 pci3 at ppb2 bus 3
 pci3: i/o space, memory space enabled, rd/line, wr/inv ok
 ath0 at pci3 dev 0 function 0: Atheros 9285
 ath0: interrupting at ioapic0 pin 16
 ath0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
 ath0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps
 24Mbps 36Mbps 48Mbps 54Mbps
 ath0: mac 192.2 phy 14.0 radio 12.0
 ppb3 at pci0 dev 21 function 1: vendor 0x1002 product 0x43a1 (rev. 0x00)
 ppb3: PCI Express 2.0 <Root Port of PCI-E Root Complex>
 pci4 at ppb3 bus 4
 pci4: i/o space, memory space enabled, rd/line, wr/inv ok
 re0 at pci4 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet
 (rev. 0x06)
 re0: interrupting at ioapic0 pin 17
 re0: Ethernet address 14:da:e9:6e:3b:9d
 re0: using 256 tx descriptors
 rgephy0 at re0 phy 7: RTL8169S/8110S/8211 1000BASE-T media interface, rev. 4
 rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
 1000baseT-FDX, auto
 ppb4 at pci0 dev 21 function 2: vendor 0x1002 product 0x43a2 (rev. 0x00)
 ppb4: PCI Express 2.0 <Root Port of PCI-E Root Complex>
 pci5 at ppb4 bus 5
 pci5: i/o space, memory space enabled, rd/line, wr/inv ok
 vendor 0x1033 product 0x0194 (USB serial bus, interface 0x30, revision
 0x03) at pci5 dev 0 function 0 not configured
 ppb5 at pci0 dev 21 function 3: vendor 0x1002 product 0x43a3 (rev. 0x00)
 ppb5: PCI Express 2.0 <Root Port of PCI-E Root Complex>
 pci6 at ppb5 bus 6
 pci6: i/o space, memory space enabled, rd/line, wr/inv ok
 vendor 0x1033 product 0x0194 (USB serial bus, interface 0x30, revision
 0x03) at pci6 dev 0 function 0 not configured
 ohci3 at pci0 dev 22 function 0: vendor 0x1002 product 0x4397 (rev. 0x00)
 ohci3: interrupting at ioapic0 pin 18
 ohci3: OHCI version 1.0, legacy support
 usb5 at ohci3: USB revision 1.0
 ehci2 at pci0 dev 22 function 2: vendor 0x1002 product 0x4396 (rev. 0x00)
 ehci2: interrupting at ioapic0 pin 17
 ehci2: dropped intr workaround enabled
 ehci2: BIOS has given up ownership
 ehci2: EHCI version 1.0
 ehci2: companion controller, 4 ports each: ohci3
 usb6 at ehci2: USB revision 2.0
 pchb1 at pci0 dev 24 function 0: vendor 0x1022 product 0x1700 (rev. 0x43)
 pchb2 at pci0 dev 24 function 1: vendor 0x1022 product 0x1701 (rev. 0x00)
 pchb3 at pci0 dev 24 function 2: vendor 0x1022 product 0x1702 (rev. 0x00)
 pchb4 at pci0 dev 24 function 3: vendor 0x1022 product 0x1703 (rev. 0x00)
 amdtemp0 at pchb4: AMD CPU Temperature Sensors (Family14h)
 pchb5 at pci0 dev 24 function 4: vendor 0x1022 product 0x1704 (rev. 0x00)
 pchb6 at pci0 dev 24 function 5: vendor 0x1022 product 0x1718 (rev. 0x00)
 pchb7 at pci0 dev 24 function 6: vendor 0x1022 product 0x1716 (rev. 0x00)
 pchb8 at pci0 dev 24 function 7: vendor 0x1022 product 0x1719 (rev. 0x00)
 isa0 at pcib0
 com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
 pckbc0 at isa0 port 0x60-0x64
 acpicpu0 at cpu0: ACPI CPU
 acpicpu0: C1: HLT, lat   0 us, pow     0 mW
 acpicpu0: C2: I/O, lat 100 us, pow     0 mW
 acpicpu0: P0: FFH, lat   1 us, pow  4940 mW, 1600 MHz
 acpicpu0: P1: FFH, lat   1 us, pow  3282 mW, 1280 MHz
 acpicpu0: P2: FFH, lat   1 us, pow  1400 mW,  800 MHz
 acpicpu0: T0: I/O, lat   1 us, pow     0 mW, 100 %
 acpicpu0: T1: I/O, lat   1 us, pow     0 mW,  88 %
 acpicpu0: T2: I/O, lat   1 us, pow     0 mW,  76 %
 acpicpu0: T3: I/O, lat   1 us, pow     0 mW,  64 %
 acpicpu0: T4: I/O, lat   1 us, pow     0 mW,  52 %
 acpicpu0: T5: I/O, lat   1 us, pow     0 mW,  40 %
 acpicpu0: T6: I/O, lat   1 us, pow     0 mW,  28 %
 acpicpu0: T7: I/O, lat   1 us, pow     0 mW,  16 %
 acpicpu1 at cpu1: ACPI CPU
 timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
 timecounter: Timecounter "TSC" frequency 1600616250 Hz quality 3000
 uhub0 at usb0: vendor 0x1002 OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub0: 5 ports with 5 removable, self powered
 uhub1 at usb1: vendor 0x1002 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
 uhub1: 5 ports with 5 removable, self powered
 uhub2 at usb2: vendor 0x1002 OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub2: 5 ports with 5 removable, self powered
 uhub3 at usb3: vendor 0x1002 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
 uhub3: 5 ports with 5 removable, self powered
 uhub4 at usb4: vendor 0x1002 OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub4: 2 ports with 2 removable, self powered
 uhub5 at usb5: vendor 0x1002 OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub5: 4 ports with 4 removable, self powered
 uhub6 at usb6: vendor 0x1002 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
 uhub6: 4 ports with 4 removable, self powered
 ahcisata0 port 0: device present, speed: 3.0Gb/s
 wd0 at atabus0 drive 0
 wd0: <KINGSTON SNV425S264GB>
 wd0: drive supports 16-sector PIO transfers, LBA48 addressing
 wd0: 61057 MB, 124053 cyl, 16 head, 63 sec, 512 bytes/sect x 125045424
 sectors
 wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
 wd0(ahcisata0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 5
 (Ultra/100) (using DMA)
 umass0 at uhub1 port 3 configuration 1 interface 0
 umass0: JMicron USB to ATA/ATAPI bridge, rev 2.00/1.00, addr 2
 umass0: using SCSI over Bulk-Only
 scsibus0 at umass0: 2 targets, 1 lun per target
 sd0 at scsibus0 target 0 lun 0: <TOSHIBA, MK6459GSX, > disk fixed
 sd0: 596 GB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 1250263728 sectors
 aubtfwl0 at uhub5 port 3
 uhub7 at uhub1 port 5: vendor 0x0409 product 0x005a, class 9/0, rev
 2.00/1.00, addr 3
 uhub7: single transaction translator
 uhub7: 4 ports with 4 removable, self powered
 uhidev0 at uhub7 port 3 configuration 1 interface 0
 uhidev0: Logitech USB Receiver, rev 2.00/12.01, addr 4, iclass 3/1
 ukbd0 at uhidev0
 wskbd0 at ukbd0: console keyboard, using wsdisplay0
 uhidev1 at uhub7 port 3 configuration 1 interface 1
 uhidev1: Logitech USB Receiver, rev 2.00/12.01, addr 4, iclass 3/1
 uhidev1: 8 report ids
 ums0 at uhidev1 reportid 2: 16 buttons, W and Z dirs
 wsmouse0 at ums0 mux 0
 uhid0 at uhidev1 reportid 3: input=4, output=0, feature=0
 uhid1 at uhidev1 reportid 4: input=1, output=0, feature=0
 uhid2 at uhidev1 reportid 8: input=1, output=0, feature=0
 uhidev2 at uhub7 port 3 configuration 1 interface 2
 uhidev2: Logitech USB Receiver, rev 2.00/12.01, addr 4, iclass 3/0
 uhidev2: 33 report ids
 uhid3 at uhidev2 reportid 16: input=6, output=6, feature=0
 uhid4 at uhidev2 reportid 17: input=19, output=19, feature=0
 uhid5 at uhidev2 reportid 32: input=14, output=14, feature=0
 uhid6 at uhidev2 reportid 33: input=31, output=31, feature=0
 uhidev3 at uhub7 port 4 configuration 1 interface 0
 uhidev3: Logitech USB Keyboard, rev 1.10/79.00, addr 5, iclass 3/1
 ukbd1 at uhidev3
 wskbd1 at ukbd1 mux 1
 wskbd1: connecting to wsdisplay0
 uhidev4 at uhub7 port 4 configuration 1 interface 1
 uhidev4: Logitech USB Keyboard, rev 1.10/79.00, addr 5, iclass 3/0
 uhidev4: 5 report ids
 uhid7 at uhidev4 reportid 3: input=2, output=0, feature=0
 uhid8 at uhidev4 reportid 4: input=1, output=0, feature=0
 uhid9 at uhidev4 reportid 5: input=0, output=0, feature=5
 uhidev5 at uhub7 port 4 configuration 1 interface 2
 uhidev5: Logitech USB Keyboard, rev 1.10/79.00, addr 5, iclass 3/1
 ums1 at uhidev5: 5 buttons and Z dir
 wsmouse1 at ums1 mux 0
 Kernelized RAIDframe activated
 boot device: wd0
 root on wd0a dumps on wd0b
 root file system type: ffs
 aubtfwl0: ath3k-1.fw open fail 2
 wsdisplay0: screen 1 added (80x25, vt100 emulation)
 wsdisplay0: screen 2 added (80x25, vt100 emulation)
 wsdisplay0: screen 3 added (80x25, vt100 emulation)
 wsdisplay0: screen 4 added (80x25, vt100 emulation)

From: "Martin S. Weber" <Ephaeton@gmx.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/32626
Date: Tue, 16 Apr 2013 12:31:27 -0400

 On Tue, Apr 16, 2013 at 04:25:03PM +0000, Matteo Beccati wrote:
 > The following reply was made to PR kern/32626; it has been noted by GNATS.
 > 
 > From: Matteo Beccati <php@beccati.com>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: kern/32626
 > Date: Mon, 15 Apr 2013 09:28:20 +0200
 > 
 >  Hi,
 >  
 >  after upgrading to 6.1 RC2, I've started experiencing the same issue.
 >  Every night, during the generation of the daily report, the USB hard
 >  drive generates quite a lot of these errors:
 >  
 >  umass0: BBB reset failed, TIMEOUT
 >  umass0: BBB bulk-in clear stall failed, TIMEOUT
 >  umass0: BBB bulk-out clear stall failed, TIMEOUT
 >  
 >  After that I cannot cleanly shutdown, nor kill -9 some processes. I
 >  don't recall this happening with 6.0, but maybe the drive just had less
 >  data (i.e. there are backup copies of the old filesystems now).

 I believe this is simply really really badly broken hardware. Backup your data :)

 Regards,
 -Martin

From: Matteo Beccati <php@beccati.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/32626
Date: Wed, 17 Apr 2013 09:58:48 +0200

 > From: "Martin S. Weber" <Ephaeton%gmx.net@localhost>
 >  > From: Matteo Beccati <php%beccati.com@localhost>
 >  >  
 >  >  after upgrading to 6.1 RC2, I've started experiencing the same issue.
 >  >  Every night, during the generation of the daily report, the USB hard
 >  >  drive generates quite a lot of these errors:
 >  >  
 >  >  umass0: BBB reset failed, TIMEOUT
 >  >  umass0: BBB bulk-in clear stall failed, TIMEOUT
 >  >  umass0: BBB bulk-out clear stall failed, TIMEOUT
 >  >  
 >  >  After that I cannot cleanly shutdown, nor kill -9 some processes. I
 >  >  don't recall this happening with 6.0, but maybe the drive just had less
 >  >  data (i.e. there are backup copies of the old filesystems now).
 >  
 >  I believe this is simply really really badly broken hardware. Backup your data 
 > :)

 I would have expected a different message for disk faults...

 Anyway, I've picked up a new USB disk and started transferring data to
 it. Not very surprisingly, both HDs now generate the same kind of error
 when stressed, usually either one of the two starts triggering timeouts
 and the other one follows soon after.


 Cheers
 -- 
 Matteo Beccati

 Development & Consulting - http://www.beccati.com/

From: Matteo Beccati <php@beccati.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/32626
Date: Fri, 19 Apr 2013 11:04:23 +0200

 >  > From: "Martin S. Weber" <Ephaeton%gmx.net@localhost>
 >  >  I believe this is simply really really badly broken hardware. Backup your data 
 >  > :)

 Well, it turns out you were right. It's been two days since I've
 disconnected the old USB drive and I had no errors since then.

 What baffles me, is how the USB peripheral was able to make the system
 unstable and even prevent "kill -9" from working.

 Anyway, sorry for the noise.


 Cheers
 -- 
 Matteo Beccati

 Development & Consulting - http://www.beccati.com/

From: Matteo Beccati <php@beccati.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/32626
Date: Sat, 20 Apr 2013 20:41:06 +0200

 On 19/04/2013 11:05, Matteo Beccati wrote:
 >  Well, it turns out you were right. It's been two days since I've
 >  disconnected the old USB drive and I had no errors since then.

 The above was a bit premature. Last night's daily report triggered the
 error with the new drive too:

 Apr 20 03:19:43 epia /netbsd: umass0: BBB reset failed, TIMEOUT
 Apr 20 03:20:49 epia /netbsd: umass0: BBB bulk-in clear stall failed,
 TIMEOUT
 Apr 20 03:21:55 epia /netbsd: umass0: BBB bulk-out clear stall failed,
 TIMEOUT
 (... and so on ...)

 Could it be that the AMD FCH A50M (Hudson M1) is doing something funny,
 similarly to the VIA chipset from the OP?


 Cheers
 -- 
 Matteo Beccati

 Development & Consulting - http://www.beccati.com/

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/32626
Date: Sun, 6 Oct 2013 11:51:24 +0000

 On Fri, Apr 19, 2013 at 09:05:03AM +0000, Matteo Beccati wrote:
  >  Well, it turns out you were right. It's been two days since I've
  >  disconnected the old USB drive and I had no errors since then.
  >  
  >  What baffles me, is how the USB peripheral was able to make the system
  >  unstable and even prevent "kill -9" from working.

 It shouldn't. But presumably there's one or more problems such that
 some set of responses a device can make that puts the USB stack into a
 broken state, and then everything else is going to go down the drain
 too (slowly, or not so slowly, depending)... it would be good to
 figure out what, because infinite loops of "umass: BBB reset failed"
 and so forth appear in quite a few PRs.

 -- 
 David A. Holland
 dholland@netbsd.org

From: scole_mail <scole_mail@gmx.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/32626
Date: Fri, 6 Feb 2015 19:33:29 -0500

 Just in case anyone is still looking at this, I've got the same issue
 on my powermac 7200:

 NetBSD pm7200 7.99.4 NetBSD 7.99.4 (GENERIC-$Revision: 1.7 $)

 uhci0 at pci0 dev 13 function 0: VIA Technologies VT83C572 USB Controller (rev. 0x61)
 uhci0: interrupting at irq 23
 usb0 at uhci0: USB revision 1.0
 uhci1 at pci0 dev 13 function 1: VIA Technologies VT83C572 USB Controller (rev. 0x61)
 uhci1: interrupting at irq 23
 usb1 at uhci1: USB revision 1.0
 ehci0 at pci0 dev 13 function 2: VIA Technologies VT8237 EHCI USB Controller (rev. 0x63)
 ehci0: interrupting at irq 23
 ehci0: dropped intr workaround enabled
 ehci0: EHCI version 1.0
 ehci0: companion controllers, 2 ports each: uhci0 uhci1
 usb2 at ehci0: USB revision 2.0
 ...
 umass0 at uhub2 port 4 configuration 1 interface 0
 umass0: Kingston DataTraveler 2.0, rev 2.00/1.00, addr 2
 umass0: using SCSI over Bulk-Only
 scsibus1 at umass0: 2 targets, 1 lun per target

 umass0: BBB reset failed, TIMEOUT
 umass0: BBB bulk-in clear stall failed, TIMEOUT
 umass0: BBB bulk-out clear stall failed, TIMEOUT
 umass0: BBB reset failed, TIMEOUT
 umass0: BBB bulk-in clear stall failed, TIMEOUT
 umass0: BBB bulk-out clear stall failed, TIMEOUT

 I'm not positive its related, but I seemed to get more errors after
 detatching a usb keyboard.

 Thanks

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Thu, 26 May 2022 03:46:56 +0000
State-Changed-Why:
Have you (or has anyone) seen problems of this kind recently? After many
rounds of umass fixes I think we're now pretty much on top of this.


From: "Martin S. Weber" <Ephaeton@gmx.net>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/32626 (ehci + umass + stress = umass stall cycle)
Date: Thu, 26 May 2022 07:33:51 +0200

 I haven't been able to observe this further.

 Regards,
 -M

State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 28 May 2022 05:27:01 +0000
State-Changed-Why:
Let's call it fixed.


>Unformatted:
 	..including the ehci_pci + ehci fixes to include the
 	BROKEN_INTR_WORKAROUND.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.