NetBSD Problem Report #48151

From www@NetBSD.org  Fri Aug 23 16:41:45 2013
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 7FA5E717C5
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 23 Aug 2013 16:41:45 +0000 (UTC)
Message-Id: <20130823164144.1C5E27198F@mollari.NetBSD.org>
Date: Fri, 23 Aug 2013 16:41:43 +0000 (UTC)
From: vincent@labri.fr
Reply-To: vincent@labri.fr
To: gnats-bugs@NetBSD.org
Subject: USB race condition can lead to panic
X-Send-Pr-Version: www-1.0

>Number:         48151
>Category:       kern
>Synopsis:       USB race condition can lead to panic
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 23 16:45:00 +0000 2013
>Closed-Date:    Wed Sep 04 13:24:12 +0000 2013
>Last-Modified:  Wed Sep 04 13:24:12 +0000 2013
>Originator:     Aymeric Vincent
>Release:        -current 6.99.23
>Organization:
>Environment:
NetBSD z.local 6.99.23 NetBSD 6.99.23 (Z) #12: Thu Aug 22 16:04:56 CEST 2013  vincent@z.local:/home/vincent/netbsd/src/sys/arch/amd64/compile/Z amd64

>Description:

usb_transfer_complete() releases the pipe lock to call the callback associated with a xfer.

Such callbacks may (see usbd_bulk_transfer_cb()) cv_broadcast() on the xfer's cv.

This will lead to usbd_transfer() being awoken and returning, which in turn will lead to the xfer being freed (and reused to confuse debugging).

All this while the xfer is still being used by usb_transfer_complete() and a crash occurs when it tries to use it again because the xfer is in an inconsistent state (being freed/reallocated/reused).

>How-To-Repeat:
Transfer lots of data for several minutes through USB (in my case using ugen), and notice a panic is triggered.

After enabling some debug output, notice that usually, ehci_idone() finishes before usbd_bulk_transfer() resumes:

ehci_idone: xfer=0xfffffe810ff7b8a8, pipe=0xfffffe810cde7d50 ready
ehci_idone: len=1024, actlen=1024, status=0x8d00
usb_freemem: large free
ehci_idone: ex=0xfffffe810ff7b8a8 done
usbd_bulk_transfer: transferred 1024

... except just before the crash, where ehci_idone() does not finish before the same xfer gets reused:

ehci_idone: xfer=0xfffffe810ff7b8a8, pipe=0xfffffe810cde7d50 ready
ehci_idone: len=1024, actlen=1024, status=0x8d00
usb_freemem: large free
usbd_bulk_transfer: transferred 1024  
panic: uskbedr_nbeull kd_itargannossfteirc:  asstsaerrtt iotnr a"ncsvf_iesr_ va1
l0i2d4( cbv)" tfeasi
led: ufslbe_ a"l.l.o/c.m.e/m.:. /l.a.r/gkee ranl/lkoecr n1_0c2o4n
dvar.c", line 340
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x136
kern_assert() at netbsd:kern_assert+0x48
cv_broadcast() at netbsd:cv_broadcast+0x53
usb_transfer_complete() at netbsd:usb_transfer_complete+0x253
ehci_idone() at netbsd:ehci_idone+0x19d
ehci_softintr() at netbsd:ehci_softintr+0x40
usb_soft_intr() at netbsd:usb_soft_intr+0x24
softint_dispatch() at netbsd:softint_dispatch+0xd9
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe80a21b7d70
Xsoftintr() at netbsd:Xsoftintr+0x4f

>Fix:
I submit this PR because it's not clear to me what is the right solution. The cv_broadcast/wakeup is done in two places :

in the callbacks called by usb_transfer_complete()
right after, directly in usb_transfer_complete()

Removing one of the two is required, but I don't know which is intended conceptually. Removing any of the two should fix the race because the only reason the xfer is reused in usb_transfer_complete() is to do this cv_broadcast()...

>Release-Note:

>Audit-Trail:
From: Nick Hudson <skrll@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: vincent@labri.fr, kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
 netbsd-bugs@netbsd.org
Subject: Re: kern/48151: USB race condition can lead to panic
Date: Sat, 24 Aug 2013 09:25:09 +0100

 This is a multi-part message in MIME format.
 --------------000308070401040900030405
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 7bit

 On 08/23/13 17:45, vincent@labri.fr wrote:
 >> Number:         48151
 >> Category:       kern
 >> Synopsis:       USB race condition can lead to panic
 >> [...]

 >> I submit this PR because it's not clear to me what is the right solution. The cv_broadcast/wakeup is done in two places :
 >>
 >> in the callbacks called by usb_transfer_complete()
 >> right after, directly in usb_transfer_complete()
 >>
 >> Removing one of the two is required, but I don't know which is intended conceptually. Removing any of the two should fix the race because the only reason the xfer is reused in usb_transfer_complete() is to do this cv_broadcast()...
 >>
 >>

 Looks like the callbacks can safely be removed - usbd_sync_transfer_sig 
 DTRT.

 How about the attached diff? It fixes utoppy at the same time.

 Nick

 --------------000308070401040900030405
 Content-Type: text/plain; charset=us-ascii;
  name="diff"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="diff"

 Index: usbdi_util.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/usb/usbdi_util.c,v
 retrieving revision 1.59
 diff -u -p -u -r1.59 usbdi_util.c
 --- usbdi_util.c	5 Jan 2013 23:34:21 -0000	1.59
 +++ usbdi_util.c	24 Aug 2013 08:23:30 -0000
 @@ -415,19 +415,6 @@ usbd_get_config(usbd_device_handle dev, 
  	return (usbd_do_request(dev, &req, conf));
  }

 -Static void usbd_bulk_transfer_cb(usbd_xfer_handle xfer,
 -				  usbd_private_handle priv, usbd_status status);
 -Static void
 -usbd_bulk_transfer_cb(usbd_xfer_handle xfer, usbd_private_handle priv,
 -		      usbd_status status)
 -{
 -
 -	if (xfer->pipe->device->bus->lock)
 -		cv_broadcast(&xfer->cv);
 -	else
 -		wakeup(xfer);	/* XXXSMP ok */
 -}
 -
  usbd_status
  usbd_bulk_transfer(usbd_xfer_handle xfer, usbd_pipe_handle pipe,
  		   u_int16_t flags, u_int32_t timeout, void *buf,
 @@ -435,31 +422,19 @@ usbd_bulk_transfer(usbd_xfer_handle xfer
  {
  	usbd_status err;

 -	usbd_setup_xfer(xfer, pipe, 0, buf, *size,
 -			flags, timeout, usbd_bulk_transfer_cb);
 -	DPRINTFN(1, ("usbd_bulk_transfer: start transfer %d bytes\n", *size));
 +	usbd_setup_xfer(xfer, pipe, 0, buf, *size, flags, timeout, NULL);

 +	DPRINTFN(1, ("usbd_bulk_transfer: start transfer %d bytes\n", *size));
  	err = usbd_sync_transfer_sig(xfer);
 +
  	usbd_get_xfer_status(xfer, NULL, NULL, size, NULL);
  	DPRINTFN(1,("usbd_bulk_transfer: transferred %d\n", *size));
  	if (err) {
  		DPRINTF(("usbd_bulk_transfer: error=%d\n", err));
  		usbd_clear_endpoint_stall(pipe);
  	}
 -	return (err);
 -}
 -
 -Static void usbd_intr_transfer_cb(usbd_xfer_handle xfer,
 -				  usbd_private_handle priv, usbd_status status);
 -Static void
 -usbd_intr_transfer_cb(usbd_xfer_handle xfer, usbd_private_handle priv,
 -		      usbd_status status)
 -{

 -	if (xfer->pipe->device->bus->lock)
 -		cv_broadcast(&xfer->cv);
 -	else
 -		wakeup(xfer);	/* XXXSMP ok */
 +	return (err);
  }

  usbd_status
 @@ -469,11 +444,13 @@ usbd_intr_transfer(usbd_xfer_handle xfer
  {
  	usbd_status err;

 -	usbd_setup_xfer(xfer, pipe, 0, buf, *size,
 -			flags, timeout, usbd_intr_transfer_cb);
 +	usbd_setup_xfer(xfer, pipe, 0, buf, *size, flags, timeout, NULL);
 +
  	DPRINTFN(1, ("usbd_intr_transfer: start transfer %d bytes\n", *size));
  	err = usbd_sync_transfer_sig(xfer);
 +
  	usbd_get_xfer_status(xfer, NULL, NULL, size, NULL);
 +
  	DPRINTFN(1,("usbd_intr_transfer: transferred %d\n", *size));
  	if (err) {
  		DPRINTF(("usbd_intr_transfer: error=%d\n", err));
 Index: utoppy.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/usb/utoppy.c,v
 retrieving revision 1.21
 diff -u -p -u -r1.21 utoppy.c
 --- utoppy.c	27 Oct 2012 17:18:38 -0000	1.21
 +++ utoppy.c	24 Aug 2013 08:23:31 -0000
 @@ -537,22 +537,10 @@ utoppy_bulk_transfer(usbd_xfer_handle xf

  	usbd_setup_xfer(xfer, pipe, 0, buf, *size, flags, timeout,
  	    utoppy_bulk_transfer_cb);
 -	usbd_lock_pipe(pipe);	/* don't want callback until tsleep() */
 -	err = usbd_transfer(xfer);
 -	if (err != USBD_IN_PROGRESS) {
 -		usbd_unlock_pipe(pipe);
 -		return (err);
 -	}
 -	if (pipe->device->bus->lock)
 -		error = cv_wait_sig(&xfer->cv, pipe->device->bus->lock);
 -	else
 -		error = tsleep((void *)xfer, PZERO, lbl, 0);
 -	usbd_unlock_pipe(pipe);
 -	if (error) {
 -		usbd_abort_pipe(pipe);
 -		return (USBD_INTERRUPTED);
 -	}
 -	usbd_get_xfer_status(xfer, NULL, NULL, size, &err);
 +
 +	err = usbd_sync_transfer_sig(xfer);
 +
 +	usbd_get_xfer_status(xfer, NULL, NULL, size, NULL);
  	return (err);
  }


 --------------000308070401040900030405--

From: vincent@labri.fr
To: Nick Hudson <skrll@netbsd.org>
Cc: gnats-bugs@NetBSD.org,  kern-bug-people@netbsd.org,  gnats-admin@netbsd.org,  netbsd-bugs@netbsd.org
Subject: Re: kern/48151: USB race condition can lead to panic
Date: Sat, 24 Aug 2013 13:27:58 +0200

        Hi,

 > How about the attached diff? It fixes utoppy at the same time.

 I'm running with a similar patch since yesterday, so I'm fine with it
 and to me you can commit it.

 Can't speak for the utoppy part.

 Thanks,
  Aymeric

From: "Nick Hudson" <skrll@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48151 CVS commit: src/sys/dev/usb
Date: Fri, 30 Aug 2013 12:59:19 +0000

 Module Name:	src
 Committed By:	skrll
 Date:		Fri Aug 30 12:59:19 UTC 2013

 Modified Files:
 	src/sys/dev/usb: usbdi_util.c utoppy.c

 Log Message:
 Remove race introduced by using usbd_sync_transfer_sig with a callback
 that use the xfer cv (or does wake up on the xfer). The xfer has likely
 been freed or re-used.

 While here update utoppy(4) to use usbd_sync_transfer_sig.

 Fixes PR/48151


 To generate a diff of this commit:
 cvs rdiff -u -r1.60 -r1.61 src/sys/dev/usb/usbdi_util.c
 cvs rdiff -u -r1.21 -r1.22 src/sys/dev/usb/utoppy.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Fri, 30 Aug 2013 13:01:53 +0000
State-Changed-Why:
Fix committed. OK to close?


From: Aymeric Vincent <vincent@labri.fr>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48151 (USB race condition can lead to panic)
Date: Wed, 04 Sep 2013 15:03:02 +0200

 I could finally test a new kernel just to be sure, and yes your can
 close it. Thank you.

  Aymeric

State-Changed-From-To: feedback->closed
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Wed, 04 Sep 2013 13:24:12 +0000
State-Changed-Why:
Confirmed fixed. thanks.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.