NetBSD Problem Report #54977

From paul@whooppee.com  Mon Feb 17 16:37:35 2020
Return-Path: <paul@whooppee.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 806E51A9213
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 17 Feb 2020 16:37:35 +0000 (UTC)
Message-Id: <20200217163704.0783830F2C3@speedy.whooppee.com>
Date: Mon, 17 Feb 2020 08:37:04 -0800 (PST)
From: paul@whooppee.com
Reply-To: paul@whooppee.com
To: gnats-bugs@NetBSD.org
Subject: xhci(4) bug: failed to create xfers
X-Send-Pr-Version: 3.95

>Number:         54977
>Category:       kern
>Synopsis:       USB umass hard drive "failed to create xfers" when attaching via xhci(4)
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb 17 16:40:00 +0000 2020
>Closed-Date:    Wed Oct 07 19:21:37 +0000 2020
>Last-Modified:  Wed Oct 07 19:21:37 +0000 2020
>Originator:     Paul Goyette
>Release:        NetBSD 9.99.46
>Organization:
+--------------------+--------------------------+-----------------------+
| Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:     |
| (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com     |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org   |
+--------------------+--------------------------+-----------------------+
>Environment:


System: NetBSD speedy.whooppee.com 9.99.46 NetBSD 9.99.46 (SPEEDY 2020-02-07 16:26:35 UTC) #1: Fri Feb 7 19:37:58 UTC 2020 paul@speedy.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/SPEEDY amd64
Architecture: x86_64
Machine: amd64
>Description:
With a 9.99.46 kernel built from sources dated 2020-02-07 16:26:35 UTC
I get the following errors when plugging in a USB hard drive:

umass0 at uhub1 port 2 configuration 1 interface 0
umass0: Western Digital (0x1058) Ext HDD 1021 (0x1021), rev 2.00/20.02, addr 32
umass0: using SCSI over Bulk-Only
umass0: autoconfiguration error: failed to create xfers

This worked correctly with a 9.99.42 kernel built from sources dated
2020-01-25 19:35:05 UTC:

umass0 at uhub1 port 1 configuration 1 interface 0
umass0: Western Digital (0x1058) Ext HDD 1021 (0x1021), rev 2.00/20.02, addr 6
umass0: using SCSI over Bulk-Only
scsibus0 at umass0: 2 targets, 1 lun per target
sd0 at scsibus0 target 0 lun 0: <WD, Ext HDD 1021, 2002> disk fixed
sd0: fabricating a geometry
sd0: 1863 GB, 1907727 cyl, 64 head, 32 sec, 512 bytes/sect x 3907024896 sectors
sd0: fabricating a geometry

On IRC it was suggested (thanks, maya!) that the error message might be
related to memory fragmentation.  I didn't believe it (given that I have
128GB of RAM), but a quick check with top(1) showed that I had more than
100GB of 'file cache' active.  So, I unmounted all my development trees
(to force the cache to get flushed - and unmapped).  Sure enough, I was
then able to successfully mount the USB drive!

Concensus on IRC is that this is a bug in the xhci(4) driver, and is
triggered by fragmentation of kernel virtual address space.  It seems that
this has been reported/discussed before on the mailing lists, but no PR
seems to have been filed.

>How-To-Repeat:
Create fragmentation in kernel VA space, then attach a umass hard drive
to a xhci(4) USB port.
>Fix:
Unknown

>Release-Note:

>Audit-Trail:
From: "John D. Baker" <jdbaker@consolidated.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/54977: xhci(4) bug: failed to create xfers
Date: Mon, 17 Feb 2020 12:27:15 -0600 (CST)

 This problem is not unique to xhci(4), but also occurs with [eou]hci(4)
 devices as well--especially on "low"-memory machines (2, 4, 8GB used to
 be huge...) that have been running for a while before a umass(4) device
 is plugged in.

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]consolidated[flyspeck]net  OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54977
Date: Sat, 19 Sep 2020 17:11:22 -0400

 (Carrying over from PR kern/55671 -- I didn't know this was already
 reported.) I don't know if it's of any use, but I captured a USB debug
 log relating to this, at:

 http://www.netbsd.org/~gutteridge/pen_drive_attach_failure.log

 (As John D. Baker already noted here, this isn't an xhci(4)-specific
 issue, it happens with other USB controllers as well.)

From: Nick Hudson <nick.hudson@gmx.co.uk>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, paul@whooppee.com
Cc: 
Subject: Re: kern/54977
Date: Sun, 20 Sep 2020 09:26:09 +0100

 On 19/09/2020 22:15, David H. Gutteridge wrote:
 > The following reply was made to PR kern/54977; it has been noted by GNAT=
 S.
 >
 > From: "David H. Gutteridge" <david@gutteridge.ca>
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: kern/54977
 > Date: Sat, 19 Sep 2020 17:11:22 -0400
 >
 >   (Carrying over from PR kern/55671 -- I didn't know this was already
 >   reported.) I don't know if it's of any use, but I captured a USB debug
 >   log relating to this, at:
 >
 >   http://www.netbsd.org/~gutteridge/pen_drive_attach_failure.log
 >
 >   (As John D. Baker already noted here, this isn't an xhci(4)-specific
 >   issue, it happens with other USB controllers as well.)
 >
 >

 I fixed [eou]hci to not have this problem. xhci is still problematic.
 dwc2 is also problematic and harder.

 Nick

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54977: xhci(4) bug: failed to create xfers
Date: Wed, 7 Oct 2020 12:15:55 -0700 (PDT)

 Logging commit message to the PR:

 Module Name:    src
 Committed By:   chs
 Date:           Wed Oct  7 17:51:50 UTC 2020

 Modified Files:
          src/sys/uvm: uvm_init.c uvm_page.h uvm_pglist.c uvm_swap.c

 Log Message:
 Add a new, more aggressive allocator for uvm_pglistalloc() to allocate
 contiguous physical pages, and try this new allocator if the existing
 one fails.  The existing contig allocator only tries to allocate pages
 that are already free, which works fine shortly after boot but rarely
 works after the system has been up for a while.  The new allocator uses
 the pagedaemon to evict pages from memory in the hope that this will
 free up a range of pages that satisfies the constraits of the request.
 This should help with things like plugging in a USB device, which often
 fails for some USB controllers because they can't get contigous memory.


 To generate a diff of this commit:
 cvs rdiff -u -r1.53 -r1.54 src/sys/uvm/uvm_init.c
 cvs rdiff -u -r1.106 -r1.107 src/sys/uvm/uvm_page.h
 cvs rdiff -u -r1.85 -r1.86 src/sys/uvm/uvm_pglist.c
 cvs rdiff -u -r1.199 -r1.200 src/sys/uvm/uvm_swap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.


 From chuq@ on IRC:

 yea, that commit should effectively fix PR 54977


 +--------------------+--------------------------+-----------------------+
 | Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:     |
 | (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com     |
 | Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org   |
 +--------------------+--------------------------+-----------------------+

State-Changed-From-To: open->closed
State-Changed-By: pgoyette@NetBSD.org
State-Changed-When: Wed, 07 Oct 2020 19:21:37 +0000
State-Changed-Why:
From chuq@: yea, that commit should effectively fix PR 54977


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.