NetBSD Problem Report #48151
From www@NetBSD.org Fri Aug 23 16:41:45 2013
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 7FA5E717C5
for <gnats-bugs@gnats.NetBSD.org>; Fri, 23 Aug 2013 16:41:45 +0000 (UTC)
Message-Id: <20130823164144.1C5E27198F@mollari.NetBSD.org>
Date: Fri, 23 Aug 2013 16:41:43 +0000 (UTC)
From: vincent@labri.fr
Reply-To: vincent@labri.fr
To: gnats-bugs@NetBSD.org
Subject: USB race condition can lead to panic
X-Send-Pr-Version: www-1.0
>Number: 48151
>Category: kern
>Synopsis: USB race condition can lead to panic
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Aug 23 16:45:00 +0000 2013
>Closed-Date: Wed Sep 04 13:24:12 +0000 2013
>Last-Modified: Wed Sep 04 13:24:12 +0000 2013
>Originator: Aymeric Vincent
>Release: -current 6.99.23
>Organization:
>Environment:
NetBSD z.local 6.99.23 NetBSD 6.99.23 (Z) #12: Thu Aug 22 16:04:56 CEST 2013 vincent@z.local:/home/vincent/netbsd/src/sys/arch/amd64/compile/Z amd64
>Description:
usb_transfer_complete() releases the pipe lock to call the callback associated with a xfer.
Such callbacks may (see usbd_bulk_transfer_cb()) cv_broadcast() on the xfer's cv.
This will lead to usbd_transfer() being awoken and returning, which in turn will lead to the xfer being freed (and reused to confuse debugging).
All this while the xfer is still being used by usb_transfer_complete() and a crash occurs when it tries to use it again because the xfer is in an inconsistent state (being freed/reallocated/reused).
>How-To-Repeat:
Transfer lots of data for several minutes through USB (in my case using ugen), and notice a panic is triggered.
After enabling some debug output, notice that usually, ehci_idone() finishes before usbd_bulk_transfer() resumes:
ehci_idone: xfer=0xfffffe810ff7b8a8, pipe=0xfffffe810cde7d50 ready
ehci_idone: len=1024, actlen=1024, status=0x8d00
usb_freemem: large free
ehci_idone: ex=0xfffffe810ff7b8a8 done
usbd_bulk_transfer: transferred 1024
... except just before the crash, where ehci_idone() does not finish before the same xfer gets reused:
ehci_idone: xfer=0xfffffe810ff7b8a8, pipe=0xfffffe810cde7d50 ready
ehci_idone: len=1024, actlen=1024, status=0x8d00
usb_freemem: large free
usbd_bulk_transfer: transferred 1024
panic: uskbedr_nbeull kd_itargannossfteirc: asstsaerrtt iotnr a"ncsvf_iesr_ va1
l0i2d4( cbv)" tfeasi
led: ufslbe_ a"l.l.o/c.m.e/m.:. /l.a.r/gkee ranl/lkoecr n1_0c2o4n
dvar.c", line 340
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x136
kern_assert() at netbsd:kern_assert+0x48
cv_broadcast() at netbsd:cv_broadcast+0x53
usb_transfer_complete() at netbsd:usb_transfer_complete+0x253
ehci_idone() at netbsd:ehci_idone+0x19d
ehci_softintr() at netbsd:ehci_softintr+0x40
usb_soft_intr() at netbsd:usb_soft_intr+0x24
softint_dispatch() at netbsd:softint_dispatch+0xd9
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe80a21b7d70
Xsoftintr() at netbsd:Xsoftintr+0x4f
>Fix:
I submit this PR because it's not clear to me what is the right solution. The cv_broadcast/wakeup is done in two places :
in the callbacks called by usb_transfer_complete()
right after, directly in usb_transfer_complete()
Removing one of the two is required, but I don't know which is intended conceptually. Removing any of the two should fix the race because the only reason the xfer is reused in usb_transfer_complete() is to do this cv_broadcast()...
>Release-Note:
>Audit-Trail:
From: Nick Hudson <skrll@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: vincent@labri.fr, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/48151: USB race condition can lead to panic
Date: Sat, 24 Aug 2013 09:25:09 +0100
This is a multi-part message in MIME format.
--------------000308070401040900030405
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
On 08/23/13 17:45, vincent@labri.fr wrote:
>> Number: 48151
>> Category: kern
>> Synopsis: USB race condition can lead to panic
>> [...]
>> I submit this PR because it's not clear to me what is the right solution. The cv_broadcast/wakeup is done in two places :
>>
>> in the callbacks called by usb_transfer_complete()
>> right after, directly in usb_transfer_complete()
>>
>> Removing one of the two is required, but I don't know which is intended conceptually. Removing any of the two should fix the race because the only reason the xfer is reused in usb_transfer_complete() is to do this cv_broadcast()...
>>
>>
Looks like the callbacks can safely be removed - usbd_sync_transfer_sig
DTRT.
How about the attached diff? It fixes utoppy at the same time.
Nick
--------------000308070401040900030405
Content-Type: text/plain; charset=us-ascii;
name="diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="diff"
Index: usbdi_util.c
===================================================================
RCS file: /cvsroot/src/sys/dev/usb/usbdi_util.c,v
retrieving revision 1.59
diff -u -p -u -r1.59 usbdi_util.c
--- usbdi_util.c 5 Jan 2013 23:34:21 -0000 1.59
+++ usbdi_util.c 24 Aug 2013 08:23:30 -0000
@@ -415,19 +415,6 @@ usbd_get_config(usbd_device_handle dev,
return (usbd_do_request(dev, &req, conf));
}
-Static void usbd_bulk_transfer_cb(usbd_xfer_handle xfer,
- usbd_private_handle priv, usbd_status status);
-Static void
-usbd_bulk_transfer_cb(usbd_xfer_handle xfer, usbd_private_handle priv,
- usbd_status status)
-{
-
- if (xfer->pipe->device->bus->lock)
- cv_broadcast(&xfer->cv);
- else
- wakeup(xfer); /* XXXSMP ok */
-}
-
usbd_status
usbd_bulk_transfer(usbd_xfer_handle xfer, usbd_pipe_handle pipe,
u_int16_t flags, u_int32_t timeout, void *buf,
@@ -435,31 +422,19 @@ usbd_bulk_transfer(usbd_xfer_handle xfer
{
usbd_status err;
- usbd_setup_xfer(xfer, pipe, 0, buf, *size,
- flags, timeout, usbd_bulk_transfer_cb);
- DPRINTFN(1, ("usbd_bulk_transfer: start transfer %d bytes\n", *size));
+ usbd_setup_xfer(xfer, pipe, 0, buf, *size, flags, timeout, NULL);
+ DPRINTFN(1, ("usbd_bulk_transfer: start transfer %d bytes\n", *size));
err = usbd_sync_transfer_sig(xfer);
+
usbd_get_xfer_status(xfer, NULL, NULL, size, NULL);
DPRINTFN(1,("usbd_bulk_transfer: transferred %d\n", *size));
if (err) {
DPRINTF(("usbd_bulk_transfer: error=%d\n", err));
usbd_clear_endpoint_stall(pipe);
}
- return (err);
-}
-
-Static void usbd_intr_transfer_cb(usbd_xfer_handle xfer,
- usbd_private_handle priv, usbd_status status);
-Static void
-usbd_intr_transfer_cb(usbd_xfer_handle xfer, usbd_private_handle priv,
- usbd_status status)
-{
- if (xfer->pipe->device->bus->lock)
- cv_broadcast(&xfer->cv);
- else
- wakeup(xfer); /* XXXSMP ok */
+ return (err);
}
usbd_status
@@ -469,11 +444,13 @@ usbd_intr_transfer(usbd_xfer_handle xfer
{
usbd_status err;
- usbd_setup_xfer(xfer, pipe, 0, buf, *size,
- flags, timeout, usbd_intr_transfer_cb);
+ usbd_setup_xfer(xfer, pipe, 0, buf, *size, flags, timeout, NULL);
+
DPRINTFN(1, ("usbd_intr_transfer: start transfer %d bytes\n", *size));
err = usbd_sync_transfer_sig(xfer);
+
usbd_get_xfer_status(xfer, NULL, NULL, size, NULL);
+
DPRINTFN(1,("usbd_intr_transfer: transferred %d\n", *size));
if (err) {
DPRINTF(("usbd_intr_transfer: error=%d\n", err));
Index: utoppy.c
===================================================================
RCS file: /cvsroot/src/sys/dev/usb/utoppy.c,v
retrieving revision 1.21
diff -u -p -u -r1.21 utoppy.c
--- utoppy.c 27 Oct 2012 17:18:38 -0000 1.21
+++ utoppy.c 24 Aug 2013 08:23:31 -0000
@@ -537,22 +537,10 @@ utoppy_bulk_transfer(usbd_xfer_handle xf
usbd_setup_xfer(xfer, pipe, 0, buf, *size, flags, timeout,
utoppy_bulk_transfer_cb);
- usbd_lock_pipe(pipe); /* don't want callback until tsleep() */
- err = usbd_transfer(xfer);
- if (err != USBD_IN_PROGRESS) {
- usbd_unlock_pipe(pipe);
- return (err);
- }
- if (pipe->device->bus->lock)
- error = cv_wait_sig(&xfer->cv, pipe->device->bus->lock);
- else
- error = tsleep((void *)xfer, PZERO, lbl, 0);
- usbd_unlock_pipe(pipe);
- if (error) {
- usbd_abort_pipe(pipe);
- return (USBD_INTERRUPTED);
- }
- usbd_get_xfer_status(xfer, NULL, NULL, size, &err);
+
+ err = usbd_sync_transfer_sig(xfer);
+
+ usbd_get_xfer_status(xfer, NULL, NULL, size, NULL);
return (err);
}
--------------000308070401040900030405--
From: vincent@labri.fr
To: Nick Hudson <skrll@netbsd.org>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48151: USB race condition can lead to panic
Date: Sat, 24 Aug 2013 13:27:58 +0200
Hi,
> How about the attached diff? It fixes utoppy at the same time.
I'm running with a similar patch since yesterday, so I'm fine with it
and to me you can commit it.
Can't speak for the utoppy part.
Thanks,
Aymeric
From: "Nick Hudson" <skrll@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/48151 CVS commit: src/sys/dev/usb
Date: Fri, 30 Aug 2013 12:59:19 +0000
Module Name: src
Committed By: skrll
Date: Fri Aug 30 12:59:19 UTC 2013
Modified Files:
src/sys/dev/usb: usbdi_util.c utoppy.c
Log Message:
Remove race introduced by using usbd_sync_transfer_sig with a callback
that use the xfer cv (or does wake up on the xfer). The xfer has likely
been freed or re-used.
While here update utoppy(4) to use usbd_sync_transfer_sig.
Fixes PR/48151
To generate a diff of this commit:
cvs rdiff -u -r1.60 -r1.61 src/sys/dev/usb/usbdi_util.c
cvs rdiff -u -r1.21 -r1.22 src/sys/dev/usb/utoppy.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Fri, 30 Aug 2013 13:01:53 +0000
State-Changed-Why:
Fix committed. OK to close?
From: Aymeric Vincent <vincent@labri.fr>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/48151 (USB race condition can lead to panic)
Date: Wed, 04 Sep 2013 15:03:02 +0200
I could finally test a new kernel just to be sure, and yes your can
close it. Thank you.
Aymeric
State-Changed-From-To: feedback->closed
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Wed, 04 Sep 2013 13:24:12 +0000
State-Changed-Why:
Confirmed fixed. thanks.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.