NetBSD Problem Report #37884

From jakllsch@wormulon.kollasch.net  Sun Jan 27 16:06:24 2008
Return-Path: <jakllsch@wormulon.kollasch.net>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 9AB3A63BADF
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 27 Jan 2008 16:06:24 +0000 (UTC)
Message-Id: <20080127160622.3FCD6FE9A8@wormulon.kollasch.net>
Date: Sun, 27 Jan 2008 16:06:22 +0000 (UTC)
From: jakllsch@kollasch.net
Reply-To: jakllsch@kollasch.net
To: gnats-bugs@gnats.NetBSD.org
Subject: nvidia ehci umass
X-Send-Pr-Version: 3.95

>Number:         37884
>Category:       kern
>Synopsis:       some umass devices stall on nvidia ehci controller
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jan 27 16:10:00 +0000 2008
>Closed-Date:    Sat Jun 12 05:18:06 +0000 2010
>Last-Modified:  Sat Jun 12 05:18:06 +0000 2010
>Originator:     Jonathan A. Kollasch
>Release:        NetBSD 4.99.31
>Organization:

>Environment:
System: NetBSD wormulon.kollasch.net 4.99.31 NetBSD 4.99.31 (WORMULON) #27: Sat Jan 5 20:17:13 UTC 2008 root@wormulon.kollasch.net:/usr/src/sys/arch/amd64/compile/WORMULON amd64
Architecture: x86_64
Machine: amd64
>Description:

Either Nvidia EHCI controllers, or our ehci(4), has a quirk in it that
causes some umass devices to stall forever.

>How-To-Repeat:

Attach almost any USB 2.0 Flash Drive to a nvidia EHCI controller.
Try to use a file system on it.  Watch it permanantly stall.

USB 2.0 to IDE/SATA bridges (when used with real hard drives)
do not exhibit this issue, at least not with easily repeatable
chances.  When I used a PL-2507 USB 2.0 <-> PATA umass bridge
with a UDMA/66 CompactFlash card, it stalled too.

Depending on the method of access to the device, processes
accessing the device stall in biowait (cp) or physio (dd) states.

This has occured for me on a nforce3 board,
as well as a nforce4 board.

In the same boxes a NEC or VIA EHCI add-on card works fine.

Also, the nvidia OHCI does not exhibit this issue,
but USB 1.1 is slower than a glacier.

>Fix:

I wish I knew.

Maybe a race condition?

>Release-Note:

>Audit-Trail:
From: "Jared D. McNeill" <jmcneill@invisible.ca>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
 netbsd-bugs@netbsd.org
Subject: Re: kern/37884: nvidia ehci umass
Date: Sun, 27 Jan 2008 11:19:02 -0500

 jakllsch@kollasch.net wrote:
 >> Environment:
 > System: NetBSD wormulon.kollasch.net 4.99.31 NetBSD 4.99.31 (WORMULON) #27: Sat Jan 5 20:17:13 UTC 2008 root@wormulon.kollasch.net:/usr/src/sys/arch/amd64/compile/WORMULON amd64

 This is quite an old kernel. Can you please confirm that the issue is 
 still present with a 4.99.50 kernel?

 Jared

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: jmcneill@invisible.ca
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: kern/37884: nvidia ehci umass
Date: Mon, 28 Jan 2008 01:43:46 +0900

 jmcneill@invisible.ca wrote:

 > > System: NetBSD wormulon.kollasch.net 4.99.31 NetBSD 4.99.31 (WORMULON) #27: Sat Jan 5 20:17:13 UTC 2008 root@wormulon.kollasch.net:/usr/src/sys/arch/amd64/compile/WORMULON amd64
 > 
 > This is quite an old kernel. Can you please confirm that the issue is 
 > still present with a 4.99.50 kernel?

 Yes, the recent kernel still has the problem.

 According to Microsoft KB925528, NVIDIA echi seems to have a bug
 which can't do DMA against memory higher than 2G physicall address.

 Maybe we could handle the quirk by MI bus_dmatag_subregion(9)
 like if_bce.c, but the next problem is that ehci(4) driver
 completely lacks bus_dmamap_sync(9) calls, as noted in
 sys/dev/usb/TODO.

 Any takers?
 ---
 Izumi Tsutsui

From: Martin Husemann <martin@duskware.de>
To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Cc: jmcneill@invisible.ca, gnats-bugs@NetBSD.org,
	kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org
Subject: Re: kern/37884: nvidia ehci umass
Date: Sun, 27 Jan 2008 18:04:01 +0100

 On Mon, Jan 28, 2008 at 01:43:46AM +0900, Izumi Tsutsui wrote:
 > According to Microsoft KB925528, NVIDIA echi seems to have a bug
 > which can't do DMA against memory higher than 2G physicall address.

 Oh, great, that would explain it.

 > Maybe we could handle the quirk by MI bus_dmatag_subregion(9)
 > like if_bce.c, but the next problem is that ehci(4) driver
 > completely lacks bus_dmamap_sync(9) calls, as noted in
 > sys/dev/usb/TODO.

 Yes, that sounds like the way to go. I have an affected machine but won't
 come around to it within the next few weeks, so everyone is invited to
 beat me to this.

 Martin

From: jakllsch@kollasch.net
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/37884: nvidia ehci umass
Date: Sun, 27 Jan 2008 11:44:18 -0600

 On Sun, Jan 27, 2008 at 05:05:03PM +0000, Martin Husemann wrote:
 > The following reply was made to PR kern/37884; it has been noted by GNATS.
 > 
 > From: Martin Husemann <martin@duskware.de>
 > To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
 > Cc: jmcneill@invisible.ca, gnats-bugs@NetBSD.org,
 > 	kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
 > 	netbsd-bugs@NetBSD.org
 > Subject: Re: kern/37884: nvidia ehci umass
 > Date: Sun, 27 Jan 2008 18:04:01 +0100
 > 
 >  On Mon, Jan 28, 2008 at 01:43:46AM +0900, Izumi Tsutsui wrote:
 >  > According to Microsoft KB925528, NVIDIA echi seems to have a bug
 >  > which can't do DMA against memory higher than 2G physicall address.

 Unless some of my 1GiB of physical memory (the Socket 754 northbridge
 can only handle 2GiB anyway) is placed up there, I shouldn't be
 having this issue.

 It also doesn't explain how it is very easy to trigger with
 flash drives, and hard to trigger with hard drives.

 >  Oh, great, that would explain it.
 >  
 >  > Maybe we could handle the quirk by MI bus_dmatag_subregion(9)
 >  > like if_bce.c, but the next problem is that ehci(4) driver
 >  > completely lacks bus_dmamap_sync(9) calls, as noted in
 >  > sys/dev/usb/TODO.

 But that might be causing it, the longer latency of a hard
 drive may allow inconsistent cache to get evicted naturally.

 But then why does it only happen with nvidia controllers?

 >  
 >  Yes, that sounds like the way to go. I have an affected machine but won't
 >  come around to it within the next few weeks, so everyone is invited to
 >  beat me to this.

 Well, now that I know where the issue may lie, I could try fixing it myself,
 however, I'm not familiar with USB host controllers yet.

 	Jonathan Kollasch

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: jakllsch@kollasch.net
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: kern/37884: nvidia ehci umass
Date: Mon, 28 Jan 2008 03:00:21 +0900

 > But then why does it only happen with nvidia controllers?

 Caused by the second bug mentioned in KB925528?
 ---
 Izumi Tsutsui

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
        jakllsch@kollasch.net, tsutsui@ceres.dti.ne.jp
Subject: Re: kern/37884: nvidia ehci umass
Date: Mon, 28 Jan 2008 03:13:29 +0900

 >  But that might be causing it, the longer latency of a hard
 >  drive may allow inconsistent cache to get evicted naturally.

 BTW, no cache flush ops are requied for DMA on x86 and
 bus_dmamap_sync(9) also handles xfer between bounce buffers.
 ---
 Izumi Tsutsui

From: "Jonathan A. Kollasch" <jakllsch@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/37884: nvidia ehci umass
Date: Wed, 26 May 2010 03:27:50 +0000

 This bug can probably be traced to the unusual way we activate Queue
 Heads.

 We currently link the Queue Head into the Asynchronous List upon pipe
 open.  Then we use the method implemented in ehci_set_qh_qtd() to activate
 the transaction.  This is a somewhat risky method.  The controller is
 constantly reading from and writing to the chained-together Queue Heads
 that we are writing to.

 The ehci_set_qh_qtd() method of activation seems hard to derive from
 the EHCI 1.0 specification.  The apparent best-practice for activating
 Queue Heads is to link/unlink them into/from the Asynchronous List.
 Linux and FreeBSD (HPS stack) both appear follow this suggestion.

 I've confirmed that the link/unlink method eliminates the problem
 described in this PR.  In the process of testing, it became apparent that
 completely fixing this would amount to a somewhat noticable rototill of
 the driver.

State-Changed-From-To: open->analyzed
State-Changed-By: jakllsch@NetBSD.org
State-Changed-When: Thu, 27 May 2010 01:48:40 +0000
State-Changed-Why:
analysis provided


From: "Jonathan A. Kollasch" <jakllsch@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/37884 CVS commit: src/sys/dev/usb
Date: Sat, 29 May 2010 16:52:33 +0000

 Module Name:	src
 Committed By:	jakllsch
 Date:		Sat May 29 16:52:33 UTC 2010

 Modified Files:
 	src/sys/dev/usb: ehci.c

 Log Message:
 Nvidia EHCI controllers do not ignore one or more of the "Port Number",
 "Hub Address", "Split Completion Mask" fields in the Queue Head marked
 "This field is ignored by the host controller unless the EPS field
 indicates a full- or low-speed device.".

 Therefore, only populate these fields for full- and low-speed devices.

 Fixes PR#37884.


 To generate a diff of this commit:
 cvs rdiff -u -r1.166 -r1.167 src/sys/dev/usb/ehci.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->pending-pullups
State-Changed-By: jakllsch@NetBSD.org
State-Changed-When: Sat, 29 May 2010 17:04:24 +0000
State-Changed-Why:
fix committed. pullup-5 #1409


From: Jeff Rizzo <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/37884 CVS commit: [netbsd-5] src/sys/dev/usb
Date: Sat, 12 Jun 2010 01:05:45 +0000

 Module Name:	src
 Committed By:	riz
 Date:		Sat Jun 12 01:05:44 UTC 2010

 Modified Files:
 	src/sys/dev/usb [netbsd-5]: ehci.c

 Log Message:
 Pull up following revision(s) (requested by jakllsch in ticket #1409):
 	sys/dev/usb/ehci.c: revision 1.167
 Nvidia EHCI controllers do not ignore one or more of the "Port Number",
 "Hub Address", "Split Completion Mask" fields in the Queue Head marked
 "This field is ignored by the host controller unless the EPS field
 indicates a full- or low-speed device.".
 Therefore, only populate these fields for full- and low-speed devices.
 Fixes PR#37884.


 To generate a diff of this commit:
 cvs rdiff -u -r1.154.4.1 -r1.154.4.2 src/sys/dev/usb/ehci.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 12 Jun 2010 05:18:06 +0000
State-Changed-Why:
Pullups complete.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.