NetBSD Problem Report #51241

From www@NetBSD.org  Tue Jun 14 23:32:03 2016
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id DF35E7A217
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 14 Jun 2016 23:32:02 +0000 (UTC)
Message-Id: <20160614233201.B06897AAA2@mollari.NetBSD.org>
Date: Tue, 14 Jun 2016 23:32:01 +0000 (UTC)
From: cyber@netbsd.org
Reply-To: cyber@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: USB Drive detect failure on Tegra K1
X-Send-Pr-Version: www-1.0

>Number:         51241
>Category:       kern
>Synopsis:       USB Drive detect failure on Tegra K1
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jun 14 23:35:00 +0000 2016
>Last-Modified:  Sun Jun 19 08:20:01 +0000 2016
>Originator:     Erik Berls
>Release:        NetBSD armv7 7.99.30 NetBSD 7.99.30 (TEGRA.201606011320Z) evbarm
>Organization:
NetBSD
>Environment:
NetBSD armv7 7.99.30 NetBSD 7.99.30 (TEGRA) #3: Thu Jun  2 23:36:05 UTC 2016  root@armv7:/home/netbsd-src/src/sys/arch/evbarm/compile/obj/TEGRA evbarm

>Description:
Plugging in a USB drive into a TEGRA K1 fails to probe.

dmesg: ftp://ftp.netbsd.org:/pub/NetBSD/misc/cyber/TEGRA-USB/dmesg.txt
usbhist: ftp://ftp.netbsd.org:/pub/NetBSD/misc/cyber/TEGRA-USB/usbhist.txt

>How-To-Repeat:
1. Boot Tegra
2. Plug in USB drive (In this case a WD My Book, 5TB, also tested with 2TB My Passport)
3. Note that it spits out some kernel messages but doesn't attach.

Full dmesg and vmstat -u usbhist available via ftp (see above).

>Fix:

>Audit-Trail:
From: Erik Berls <cyber@netbsd.org>
To: gnats-bugs@netbsd.org, netbsd-bugs@netbsd.org, 
 gnats-admin@netbsd.org, kern-bug-people@netbsd.org
Cc: 
Subject: Re: kern/51241: USB Drive detect failure on Tegra K1
Date: Sat, 18 Jun 2016 22:23:30 -0700

 =46or those playing the home game:

 (tldr; updated .tdb gets us further, small drive works, large drive broke=
 n)


 Using the updated .tdb suggested by Nick (should be on the wiki now), =C2=
 =A0things progress a little further:

 armv7=23 umass0 at uhub1 port 1 configuration 1 interface 0
 umass0: Western Digital My Book 1230, rev 2.10/10.65, addr 2
 scsibus0 at umass0: 2 targets, 2 luns per target
 sd0 at scsibus0 target 0 lun 0: <WD, My Book 1230, 1065> disk fixed
 sd0(umass0:0:0:0): not ready, data =3D 00 00 00 00 04 01 00 00 00 00
 sd0: drive offline
 sd0: fabricating a geometry

 armv7=23 fdisk sd0
 =5EC=5EC=5EC=5EC

 Process is wedged. (tstile, blocking on =5Bscsibus0=5D, which is stuck in=
  a biowait under read=5Fmbr)

 However, with another drive (2TB):
 armv7=23 umass0 at uhub1 port 1 configuration 1 interface 0
 umass0: Western Digital My Passport 0748, rev 2.10/10.19, addr 2
 scsibus0 at umass0: 2 targets, 2 luns per target
 sd0 at scsibus0 target 0 lun 0: <WD, My Passport 0748, 1019> disk fixed
 sd0(umass0:0:0:0): not ready, data =3D 00 00 00 00 04 01 00 00 00 00
 sd0: drive offline
 sd0: fabricating a geometry
 dk9 at sd0: =22E=46I System Partition=22, 409600 blocks at 40, type: msdo=
 s
 dk10 at sd0: =22Passport=22, 3906291632 blocks at 409640, type: <unknown>=

 dk11 at sd0: =22Booter=22, 262144 blocks at 3906701272, type: <unknown>
 scsibus0 target 0 lun 1: <WD, SES Device, 1019> enclosure services fixed =
 not configured

 (Note: the type for the second volume (dk10) is an encrypted H=46S+ volum=
 e.)


 More trace files on ftp.


 On June 14, 2016 at 4:35:07 PM, cyber=40netbsd.org (cyber=40netbsd.org) w=
 rote:
 > >Number: 51241
 > >Category: kern
 > >Synopsis: USB Drive detect failure on Tegra K1
 > >Confidential: no
 > >Severity: serious
 > >Priority: medium
 > >Responsible: kern-bug-people
 > >State: open
 > >Class: sw-bug
 > >Submitter-Id: net
 > >Arrival-Date: Tue Jun 14 23:35:00 +0000 2016
 > >Originator: Erik Berls
 > >Release: NetBSD armv7 7.99.30 NetBSD 7.99.30 (TEGRA.201606011320Z) evb=
 arm
 > >Organization:
 > NetBSD
 > >Environment:
 > NetBSD armv7 7.99.30 NetBSD 7.99.30 (TEGRA) =233: Thu Jun 2 23:36:05 UT=
 C 2016 root=40armv7:/home/netbsd-src/src/sys/arch/evbarm/compile/obj/TEGR=
 A =20
 > evbarm
 > =20
 > >Description:
 > Plugging in a USB drive into a TEGRA K1 fails to probe.
 > =20
 > dmesg: ftp://ftp.netbsd.org:/pub/NetBSD/misc/cyber/TEGRA-USB/dmesg.txt
 > usbhist: ftp://ftp.netbsd.org:/pub/NetBSD/misc/cyber/TEGRA-USB/usbhist.=
 txt =20
 > =20
 > >How-To-Repeat:
 > 1. Boot Tegra
 > 2. Plug in USB drive (In this case a WD My Book, 5TB, also tested with =
 2TB My Passport)
 > 3. Note that it spits out some kernel messages but doesn't attach.
 > =20
 > =46ull dmesg and vmstat -u usbhist available via ftp (see above).
 > =20
 > >=46ix:
 > =20
 > =20

 -- =20
 Erik Berls


From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/51241: USB Drive detect failure on Tegra K1
Date: Sun, 19 Jun 2016 08:16:43 +0000 (UTC)

 cyber@netbsd.org (Erik Berls) writes:

 >armv7# umass0 at uhub1 port 1 configuration 1 interface 0
 >umass0: Western Digital My Book 1230, rev 2.10/10.65, addr 2
 >scsibus0 at umass0: 2 targets, 2 luns per target
 >sd0 at scsibus0 target 0 lun 0: <WD, My Book 1230, 1065> disk fixed
 >sd0(umass0:0:0:0): not ready, data = 00 00 00 00 04 01 00 00 00 00
 >sd0: drive offline
 >sd0: fabricating a geometry

 >armv7# fdisk sd0
 >^C^C^C^C


 This proved to be an issue that is not USB related but a deadlock
 between scsipi and wedge discovery.


 scsipi (which is used by umass) creates a thread (e.g. scsibus0) that
 is used to complete scsi requests that return an error. However, before
 doing so, it probes the scsibus synchronously for targets by calling
 scsibus_config().

 The target discovery attaches scsi target drivers like sd(4),
 and the sdattach routine then scans for wedges by calling
 dkwedge_discover().

 This is where things get stuck. dkwedge_discover() accesses
 the target (to read things like a GPT) which issues scsi commands.
 This all works fine if there are no errors, but in this
 case the external USB drive answers an early access attempt
 with an error 4/1 "Logical Unit Is in Process Of Becoming Ready".
 Error handling is queued for the completion thread, but the
 thread is still doing the bus probing.


 To solve this we need either to

   decouple scsibus probing from the error handling thread (it was
   put there to avoid deadlocks during autoconf, but I'm not sure if
   that is still relevant, see scsipi_base.c 1.79).

 or

   decouple wedge discovery from device attachment (which works fine
   for all other devices and may create race conditions between
   device attachment and access to wedges).

 Additionally we need to support devices that need time to become
 ready, either by waiting in autoconf() or polling asynchronously.
 The latter probably causes some unpredictable device numbers of
 wedge devices (dkX), but for wedges that wouldn't be a new issue.


 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.