NetBSD Problem Report #38935

From bouyer@antioche.eu.org  Tue Jun 10 18:25:10 2008
Return-Path: <bouyer@antioche.eu.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id E9A1A63B8BC
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 10 Jun 2008 18:25:09 +0000 (UTC)
Message-Id: <200806101825.m5AIP6jp002722@rochebonne.antioche.eu.org>
Date: Tue, 10 Jun 2008 20:25:06 +0200 (CEST)
From: Manuel Bouyer <bouyer@antioche.eu.org>
Reply-To: bouyer@antioche.eu.org
To: gnats-bugs@gnats.NetBSD.org
Subject: x86 bus_dmamap_sync() should use memory barrier
X-Send-Pr-Version: 3.95

>Number:         38935
>Category:       port-i386
>Synopsis:       x86 bus_dmamap_sync() should use memory barrier
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    bouyer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jun 10 18:30:00 +0000 2008
>Closed-Date:    Sat Jun 28 17:24:47 +0000 2008
>Last-Modified:  Sat Jun 28 17:24:47 +0000 2008
>Originator:     Manuel Bouyer
>Release:        NetBSD current
>Organization:
>Environment:
System: NetBSD 
Architecture: i386/amd64/xen
Machine: i386/amd64/xen
>Description:
	the x86 implementation of bus_dmamap_sync() turn into a NOP if
	we're not using bouce-buffers (which is the common case).
	Yet some DMA devices rely on the read or write operations
	happening in order (e.g. when the driver can add DMA descripors to
	a queue while the device may also be looking at the queue).
	Typical examples are the uhci and ehci drivers (but it's not
	the only example in our tree). Without appropriate memory barrier,
	the CPU's write buffers can reorder writes in a may that makes
	the DMA descriptors in an inconsistent state from the device's view.
	speculative reads can also make memory read occur in a different
	order than the one indended, and cause the driver to misread status
	words.
>How-To-Repeat:
	code inspection
	open/close a umodem device in a loop, try to understand why the
	uhci device sometimes sees bogus DMA descriptors and halts.
>Fix:


Index: x86/bus_dma.c
===================================================================
RCS file: /cvsroot/src/sys/arch/x86/x86/bus_dma.c,v
retrieving revision 1.43
diff -u -r1.43 bus_dma.c
--- x86/bus_dma.c	4 Jun 2008 12:41:42 -0000	1.43
+++ x86/bus_dma.c	10 Jun 2008 18:22:40 -0000
@@ -722,7 +732,7 @@
 	 */
 	if (len == 0 || cookie == NULL ||
 	    (cookie->id_flags & X86_DMA_IS_BOUNCING) == 0)
-		return;
+		goto end;

 	switch (cookie->id_buftype) {
 	case X86_DMA_BUFTYPE_LINEAR:
@@ -844,6 +854,24 @@
 		printf("unknown buffer type %d\n", cookie->id_buftype);
 		panic("_bus_dmamap_sync");
 	}
+end:
+	if ((ops & (BUS_DMASYNC_PREWRITE|BUS_DMASYNC_POSTWRITE)) &&
+	    (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_POSTREAD))) {
+		/* both read and write, we need a full barrier */
+		x86_mfence();
+	} else if (ops & (BUS_DMASYNC_PREWRITE|BUS_DMASYNC_POSTWRITE)) {
+		/*
+		 * all past writes should have completed before this point,
+		 * and futur writes should not have started yet.
+		 */
+		x86_sfence();
+	} else if (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_POSTREAD)) {
+		/*
+		 * all past reads should have completed before this point,
+		 * and futur reads should not have started yet.
+		 */
+		x86_lfence();
+	}
 }

 /*

>Release-Note:

>Audit-Trail:
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier
Date: Tue, 10 Jun 2008 22:16:05 +0100

 On Tue, Jun 10, 2008 at 06:30:00PM +0000, Manuel Bouyer wrote:

 > 	the x86 implementation of bus_dmamap_sync() turn into a NOP if
 > 	we're not using bouce-buffers (which is the common case).
 > 	Yet some DMA devices rely on the read or write operations
 > 	happening in order (e.g. when the driver can add DMA descripors to
 > 	a queue while the device may also be looking at the queue).
 > 	Typical examples are the uhci and ehci drivers (but it's not
 > 	the only example in our tree). Without appropriate memory barrier,
 > 	the CPU's write buffers can reorder writes in a may that makes
 > 	the DMA descriptors in an inconsistent state from the device's view.

 Writes are never reordered by the CPU unless the target memory region has
 been set as one of the weakly ordered types using an MTRR or PAE equivalent.
 For write-combining regions it could happen but that should never be the
 case for memory mapped for a device. In theory it can also happen with a
 write-back (Intel ordering) region if you use the SSE instructions, but we
 only use those in kernel for zeroing free pages.

 The tech docs from Intel are really confused about this issue, but the
 memory ordering rules are (in short):

 - loads can be reordered around other loads in any direction
 - a load after a store in program order can be reordered to occur before
   the store
 - a load before a store in program order _cannot_ be reordered to occur
   after the store
 - a store is strongly ordered with respect to all other stores
 - locked transactions prevent all reordering and flush store buffers

 amd64 is (was?) pretty much the same, except loads are never reordered
 around stores. 

 > @@ -844,6 +854,24 @@
 >  		printf("unknown buffer type %d\n", cookie->id_buftype);
 >  		panic("_bus_dmamap_sync");
 >  	}
 > +end:
 > +	if ((ops & (BUS_DMASYNC_PREWRITE|BUS_DMASYNC_POSTWRITE)) &&
 > +	    (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_POSTREAD))) {
 > +		/* both read and write, we need a full barrier */
 > +		x86_mfence();
 > +	} else if (ops & (BUS_DMASYNC_PREWRITE|BUS_DMASYNC_POSTWRITE)) {
 > +		/*
 > +		 * all past writes should have completed before this point,
 > +		 * and futur writes should not have started yet.
 > +		 */
 > +		x86_sfence();
 > +	} else if (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_POSTREAD)) {
 > +		/*
 > +		 * all past reads should have completed before this point,
 > +		 * and futur reads should not have started yet.
 > +		 */
 > +		x86_lfence();
 > +	}

 I think x86_lfence() without any if() would be enough. Does it solve the
 problem for you?

 Andrew

From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier
Date: Tue, 10 Jun 2008 22:25:12 +0100

 On Tue, Jun 10, 2008 at 09:20:02PM +0000, Andrew Doran wrote:

 >  I think x86_lfence() without any if() would be enough. Does it solve the
 >  problem for you?

 Bah, that will always evaluate to the equivalent of x86_mfence(). For this,
 membar_consumer() would a be better test since it will be patched to 'lfence'
 during boot.

 Andrew

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: port-i386-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org
Subject: Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier
Date: Wed, 11 Jun 2008 00:08:51 +0200

 On Tue, Jun 10, 2008 at 09:30:02PM +0000, Andrew Doran wrote:
 > The following reply was made to PR port-i386/38935; it has been noted by GNATS.
 > 
 > From: Andrew Doran <ad@netbsd.org>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier
 > Date: Tue, 10 Jun 2008 22:25:12 +0100
 > 
 >  On Tue, Jun 10, 2008 at 09:20:02PM +0000, Andrew Doran wrote:
 >  
 >  >  I think x86_lfence() without any if() would be enough. Does it solve the
 >  >  problem for you?
 >  
 >  Bah, that will always evaluate to the equivalent of x86_mfence(). For this,

 What would always evaluate to x86_mfence ? x86_lfence(), or my code ?

 >  membar_consumer() would a be better test since it will be patched to 'lfence'
 >  during boot.

 where is it defined ? I'd like to know what it would look like on
 xen (xen doesn't use x86/patch.c for now).

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: port-i386-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org
Subject: Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier
Date: Thu, 12 Jun 2008 13:48:50 +0200

 On Tue, Jun 10, 2008 at 09:20:02PM +0000, Andrew Doran wrote:
 >  On Tue, Jun 10, 2008 at 06:30:00PM +0000, Manuel Bouyer wrote:
 >  
 >  > 	the x86 implementation of bus_dmamap_sync() turn into a NOP if
 >  > 	we're not using bouce-buffers (which is the common case).
 >  > 	Yet some DMA devices rely on the read or write operations
 >  > 	happening in order (e.g. when the driver can add DMA descripors to
 >  > 	a queue while the device may also be looking at the queue).
 >  > 	Typical examples are the uhci and ehci drivers (but it's not
 >  > 	the only example in our tree). Without appropriate memory barrier,
 >  > 	the CPU's write buffers can reorder writes in a may that makes
 >  > 	the DMA descriptors in an inconsistent state from the device's view.
 >  
 >  Writes are never reordered by the CPU unless the target memory region has
 >  been set as one of the weakly ordered types using an MTRR or PAE equivalent.
 >  For write-combining regions it could happen but that should never be the
 >  case for memory mapped for a device. In theory it can also happen with a
 >  write-back (Intel ordering) region if you use the SSE instructions, but we
 >  only use those in kernel for zeroing free pages.
 >  
 >  The tech docs from Intel are really confused about this issue, but the
 >  memory ordering rules are (in short):
 >  
 >  - loads can be reordered around other loads in any direction
 >  - a load after a store in program order can be reordered to occur before
 >    the store
 >  - a load before a store in program order _cannot_ be reordered to occur
 >    after the store
 >  - a store is strongly ordered with respect to all other stores
 >  - locked transactions prevent all reordering and flush store buffers
 >  
 >  amd64 is (was?) pretty much the same, except loads are never reordered
 >  around stores. 

 I've been rereading chapter 7 "memory system" of the "AMD64 Architecture
 Programmer's Manual Volume 2 (rev 3.13)". Actually loads can be reordered
 before stores from the memory POV (which is what matters for a bus-master
 device), because of the write buffers. A load can fetch the data directly
 from the write buffer, before the data is actually in cache or memory.
 A mfence or locked instruction is needed to prevent that.

 there is also a sentence in "AMD64 Architecture Programmer's Manual Volume 3
 (rev 3.13)" about lfence and mfence that I don't really understand:
 "The MFENCE instruction is weakly-ordered with respect to data and instruction
 prefetches.
 Speculative loads initiated by the processor, or specified explicitly using
 cache-prefetch instructions, can be reordered around an MFENCE."
 and
 "Speculative loads initiated by the processor, or specified explicitly using
 cache-prefetch instructions, can be reordered around an LFENCE."

 so I'm wondering if mfence/lfence is really what we need here, or
 if we'd need a locked instruction.


 >  > @@ -844,6 +854,24 @@
 >  >  		printf("unknown buffer type %d\n", cookie->id_buftype);
 >  >  		panic("_bus_dmamap_sync");
 >  >  	}
 >  > +end:
 >  > +	if ((ops & (BUS_DMASYNC_PREWRITE|BUS_DMASYNC_POSTWRITE)) &&
 >  > +	    (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_POSTREAD))) {
 >  > +		/* both read and write, we need a full barrier */
 >  > +		x86_mfence();
 >  > +	} else if (ops & (BUS_DMASYNC_PREWRITE|BUS_DMASYNC_POSTWRITE)) {
 >  > +		/*
 >  > +		 * all past writes should have completed before this point,
 >  > +		 * and futur writes should not have started yet.
 >  > +		 */
 >  > +		x86_sfence();
 >  > +	} else if (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_POSTREAD)) {
 >  > +		/*
 >  > +		 * all past reads should have completed before this point,
 >  > +		 * and futur reads should not have started yet.
 >  > +		 */
 >  > +		x86_lfence();
 >  > +	}
 >  
 >  I think x86_lfence() without any if() would be enough. Does it solve the
 >  problem for you?

 As a load can occurs before a store hit memory, I think we need
 x86_mfence() for all write cases, in fact. Assuming mfence and lfence really
 prevent speculative loads.

 -- 
 Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
      NetBSD: 26 ans d'experience feront toujours la difference
 --

Responsible-Changed-From-To: port-i386-maintainer->bouyer
Responsible-Changed-By: bouyer@NetBSD.org
Responsible-Changed-When: Sat, 28 Jun 2008 17:24:47 +0000
Responsible-Changed-Why:
I commited the patch.


State-Changed-From-To: open->closed
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Sat, 28 Jun 2008 17:24:47 +0000
State-Changed-Why:
Commited to -current; will request pullups to netbsd-4 and netbsd-3 in
a few days.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.