NetBSD Problem Report #13654

Received: (qmail 2820 invoked from network); 8 Aug 2001 16:22:28 -0000
Message-Id: <200108081626.f78GQbM25128@armandeche.lip6.fr>
Date: Wed, 8 Aug 2001 18:26:37 +0200 (MEST)
From: bouyer@antioche.lip6.fr (Manuel Bouyer)
Reply-To: bouyer@antioche.lip6.fr (Manuel Bouyer)
To: gnats-bugs@gnats.netbsd.org
Subject: problems with iommu_dvmamap_load_raw()
X-Send-Pr-Version: 3.95

>Number:         13654
>Category:       port-sparc64
>Synopsis:       problems with iommu_dvmamap_load_raw()
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-sparc64-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Aug 08 16:23:00 +0000 2001
>Closed-Date:    Thu Jul 01 07:11:41 +0000 2004
>Last-Modified:  Fri Jul 02 18:01:00 +0000 2004
>Originator:     
>Release:        -current as half an our ago (from main CVS)
>Organization:

LIP6, Universite Paris VI.

>Environment:

System: NetBSD java 1.5X NetBSD 1.5X (JAVA) #0: Wed Aug 8 17:22:04 MEST 2001 bouyer@java:/home/cvs.netbsd.org/src/sys/arch/sparc64/compile/JAVA sparc64
Machine: Ultra5 400Mhz


>Description:
	I believe there are still problems with iommu_dvmamap_load_raw().
	First, the code to compute sgsize (passed to extent_alloc) doesn't
	seem to take care of offset withing the pages: if a segment has a
	small len, but cross a page boundary (this can happens with mbufs),
	we will account one page instead of 2 (or maybe callers of
	iommu_dvmamap_load_raw() already split this in 2 segments ? I didn't
	check).
	Second, I'm almost sure there are problems with out of order segments
	(again, sure this can happen with mbuf chains): If we have 3 segments,
	seg[0] in page X, seg[1] in page Y != X and seg[2] in page X,
	we'll account for 3 pages instead of 2, and we'll have 2 entries in
	the IOMMU for page X. 

	While testing the tl driver on a U5 I get problems under load
	(like dd if=/dev/zero of=file bs=64k on a NFS filesystem), the system
	panic almost immediatly with either a "psycho0: uncorrectable DMA
	error" or in extent_free "region not found".
	I added code to check that the size passed to extent_alloc and
	extent_free is the same:

Index: include/bus.h
===================================================================
RCS file: /cvsroot/syssrc/sys/arch/sparc64/include/bus.h,v
retrieving revision 1.28
diff -u -r1.28 bus.h
--- include/bus.h	2001/07/19 15:32:19	1.28
+++ include/bus.h	2001/08/08 16:06:11
@@ -1514,6 +1514,7 @@
 	void		*_dm_source;	/* source mbuf, uio, etc. needed for unload *///////////////////////

 	void		*_dm_cookie;	/* cookie for bus-specific functions */
+	bus_size_t	_dm_sgsize;	/* size of extent */

 	/*
 	 * PUBLIC MEMBERS: these are used by machine-independent code.
Index: dev/iommu.c
===================================================================
RCS file: /cvsroot/syssrc/sys/arch/sparc64/dev/iommu.c,v
retrieving revision 1.37
diff -u -r1.37 iommu.c
--- dev/iommu.c	2001/08/06 22:02:58	1.37
+++ dev/iommu.c	2001/08/08 16:06:11
@@ -501,6 +501,7 @@
 	err = extent_alloc(is->is_dvmamap, sgsize, align,
 	    boundary, EX_NOWAIT|EX_BOUNDZERO, (u_long *)&dvmaddr);
 	splx(s);
+	map->_dm_sgsize = sgsize;

 #ifdef DEBUG
 	if (err || (dvmaddr == (bus_addr_t)-1))	
@@ -599,6 +600,12 @@
 		pa = addr + offset + len;

 	}
+	if (sgsize != map->_dm_sgsize) {
+		printf("iommu_dvmamap_unload: sgsize %ld different from %ld\n",
+			(u_long)sgsize, (u_long)map->_dm_sgsize);
+		/* panic("iommu_dvmamap_unload"); */
+		sgsize = map->_dm_sgsize;
+	}
 	/* Flush the caches */
 	bus_dmamap_unload(t->_parent, map);

@@ -656,6 +663,7 @@
 		pa = segs[i].ds_addr + segs[i].ds_len;
 	}
 	sgsize = round_page(sgsize);
+	map->_dm_sgsize = sgsize;

 	/*
 	 * A boundary presented to bus_dmamem_alloc() takes precedence

	With this code, I get:
iommu_dvmamap_unload: sgsize 16384 different from 24576
iommu_dvmamap_unload: sgsize 16384 different from 24576
panic: psycho0: uncorrectable DMA error AFAR 1097e150 AFSR 410000ff40800000

	I tried to solve the fist bug (offset not used to compute number of
	pages) by using code cut'n'pasted from iommu_dvmamap_unload().
	Now the machine didn't panic any more, but I get much more messages
	"iommu_dvmamap_unload: sgsize s1 different from s2"
	with s1 being one page larger or less than s2; and I get
	very weird behavior from the adapter: a tcpdump on the NFS server
	shows that I get the last segment *twice*: 
18:12:21.054581 java.369053902 > disco-bu.nfs: 1472 write fh 16,20/1931 8192 bytes @ 0 (frag 4368:1480@0+)
18:12:21.054582 java > disco-bu: (frag 4368:920@7400)
18:12:21.054583 java > disco-bu: (frag 4368:1480@1480+)
18:12:21.054585 java > disco-bu: (frag 4368:1480@2960+)
18:12:21.054586 java > disco-bu: (frag 4368:1480@4440+)
18:12:21.054587 java > disco-bu: (frag 4368:1480@5920+)
18:12:21.054588 java > disco-bu: (frag 4368:920@7400)
	Yes, the last fragement inserted between first and second, and repeated
	at the end. I can't explain this otherwise but the adapter did read
	corruped data from DMA (it DMA the transmist list too). I checked
	at the driver level, and the list isn't corrupted after transmist.

	Now why I believe the problem is in bus_dma and not the tl driver:
	I get the exact same behavior off a tlp (21041) adapter, and off a
	epic (SMC etherpowerII).
	The HME driver doesn't have this problem because it uses statically
	allocated buffer to/from which it copies mbufs, and so doesn't
	use bus_dmamap_load_mbuf.

>How-To-Repeat:
	trie to use a tl, tlp or epic (or probably any driver which uses
	bus_dmamap_load_mbuf) in a sparc64 (Ultra5 in my case).
>Fix:
	I don't know at this point. getting the algorith to handle
	out of order segments in an efficient way isn't that easy, I guess.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: port-sparc64-maintainer->jdolecek 
Responsible-Changed-By: jdolecek 
Responsible-Changed-When: Sat May 31 19:37:46 UTC 2003 
Responsible-Changed-Why:  
I may look on this soonish. 
Responsible-Changed-From-To: jdolecek->port-sparc64-maintainer 
Responsible-Changed-By: jdolecek 
Responsible-Changed-When: Sun Jun 22 09:40:35 UTC 2003 
Responsible-Changed-Why:  
I've cooked up a fix, rest is for the sparc64 maintainer to decide. 
Patch coming in e-mail. 

From: Jaromir Dolecek <jdolecek@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:  
Subject: Re: port-sparc64/13654: problems with iommu_dvmamap_load_raw()
Date: Sun, 22 Jun 2003 11:49:09 +0200 (CEST)

 --ELM1056275348-15420-0_
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain; charset=US-ASCII

 Hi,

 I've ported over the OpenBSD fix for this. Since it unfortunately
 was done as part of other busdma changes in OpenBSD tree, the diff
 is not straighforward. 

 This is confirmed working with gem(4), other network drivers
 using bus_dma_load_mbuf() should now work reliably too.

 Perhaps the real fix is very simple and the overhaul
 isn't necessary. Since I'm not sparc64 busdma expert, I leave
 this to maintainer to decide. The current code also uses
 OpenBSD <sys/tree.h>; the attached file tree.h has to be installed
 to src/sys/sys/ in order for the change to be compilable. Also,
 the bus_space_render_tag() could be g/c'ed, I only kept it
 to minimize differences to the OpenBSD source.

 Jaromir
 -- 
 Jaromir Dolecek <jdolecek@NetBSD.org>            http://www.NetBSD.cz/
 -=- We should be mindful of the potential goal, but as the tantric    -=-
 -=- Buddhist masters say, ``You may notice during meditation that you -=-
 -=- sometimes levitate or glow.   Do not let this distract you.''     -=-

 --ELM1056275348-15420-0_
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain; charset=ISO-8859-1
 Content-Disposition: attachment; filename=iommufix.diff
 Content-Description: 

 Index: dev/iommu.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/dev/iommu.c,v
 retrieving revision 1.51.4.6
 diff -u -r1.51.4.6 iommu.c
 --- dev/iommu.c	1 Dec 2002 22:18:03 -0000	1.51.4.6
 +++ dev/iommu.c	10 Jun 2003 15:32:21 -0000
 @@ -1,6 +1,8 @@
 -/*	$NetBSD: iommu.c,v 1.51.4.6 2002/12/01 22:18:03 he Exp $	*/
 +/*	$OpenBSD: iommu.c,v 1.29 2003/05/22 21:16:29 henric Exp $	*/
 +/*	$NetBSD: iommu.c,v 1.47 2002/02/08 20:03:45 eeh Exp $	*/

  /*
 + * Copyright (c) 2003 Henric Jungheim
   * Copyright (c) 2001, 2002 Eduardo Horvath
   * Copyright (c) 1999, 2000 Matthew R. Green
   * All rights reserved.
 @@ -32,14 +34,12 @@
  /*
   * UltraSPARC IOMMU support; used by both the sbus and pci code.
   */
 -#include "opt_ddb.h"
 -
  #include <sys/param.h>
  #include <sys/extent.h>
  #include <sys/malloc.h>
  #include <sys/systm.h>
  #include <sys/device.h>
 -#include <sys/proc.h>
 +#include <sys/mbuf.h>

  #include <uvm/uvm_extern.h>

 @@ -51,24 +51,68 @@
  #include <machine/autoconf.h>
  #include <machine/cpu.h>

 +#ifdef DDB
 +#include <machine/db_machdep.h>
 +#include <ddb/db_sym.h>
 +#include <ddb/db_extern.h>
 +#endif
 +
  #ifdef DEBUG
  #define IDB_BUSDMA	0x1
  #define IDB_IOMMU	0x2
  #define IDB_INFO	0x4
 -#define	IDB_SYNC	0x8
 +#define IDB_SYNC	0x8
 +#define IDB_XXX		0x10
 +#define IDB_PRINT_MAP	0x20
 +#define IDB_BREAK	0x40
  int iommudebug = 0x0;
  #define DPRINTF(l, s)   do { if (iommudebug & l) printf s; } while (0)
  #else
  #define DPRINTF(l, s)
  #endif

 -#define iommu_strbuf_flush(i, v) do {					\
 -	if ((i)->sb_flush)						\
 -		bus_space_write_8((i)->sb_is->is_bustag, (i)->sb_sb,	\
 -			STRBUFREG(strbuf_pgflush), (v));		\
 -	} while (0)
 +void iommu_enter(struct iommu_state *, struct strbuf_ctl *, vaddr_t, paddr_t,
 +    int);
 +void iommu_remove(struct iommu_state *, struct strbuf_ctl *, vaddr_t);
 +int iommu_dvmamap_sync_range(struct strbuf_ctl*, vaddr_t, bus_size_t);
 +int iommu_strbuf_flush_done(struct iommu_map_state *);
 +int iommu_dvmamap_load_seg(bus_dma_tag_t, struct iommu_state *,
 +    bus_dmamap_t, bus_dma_segment_t *, int, int, bus_size_t, bus_size_t);
 +int iommu_dvmamap_load_mlist(bus_dma_tag_t, struct iommu_state *,
 +    bus_dmamap_t, struct pglist *, int, bus_size_t, bus_size_t);
 +int iommu_dvmamap_validate_map(bus_dma_tag_t, struct iommu_state *,
 +    bus_dmamap_t);
 +void iommu_dvmamap_print_map(bus_dma_tag_t, struct iommu_state *,
 +    bus_dmamap_t);
 +int iommu_dvmamap_append_range(bus_dma_tag_t, bus_dmamap_t, paddr_t,
 +    bus_size_t, int, bus_size_t);
 +int64_t iommu_tsb_entry(struct iommu_state *, vaddr_t);
 +void strbuf_reset(struct strbuf_ctl *);
 +int iommu_iomap_insert_page(struct iommu_map_state *, paddr_t);
 +vaddr_t iommu_iomap_translate(struct iommu_map_state *, paddr_t);
 +int iommu_iomap_load_map(struct iommu_state *, struct iommu_map_state *,
 +    vaddr_t, int);
 +int iommu_iomap_unload_map(struct iommu_state *, struct iommu_map_state *);
 +struct iommu_map_state *iommu_iomap_create(int);
 +void iommu_iomap_destroy(struct iommu_map_state *);
 +void iommu_iomap_clear_pages(struct iommu_map_state *);
 +
 +/*
 + * Initiate an STC entry flush.
 + */
 +static inline void
 +iommu_strbuf_flush(struct strbuf_ctl *sb, vaddr_t va)
 +{
 +#ifdef DEBUG
 +	if (sb->sb_flush == NULL) {
 +		printf("iommu_strbuf_flush: attempting to flush w/o STC\n");
 +		return;
 +	}
 +#endif

 -static	int iommu_strbuf_flush_done __P((struct strbuf_ctl *));
 +	bus_space_write_8(sb->sb_bustag, sb->sb_sb,
 +	    STRBUFREG(strbuf_pgflush), va);
 +}

  /*
   * initialise the UltraSPARC IOMMU (SBUS or PCI):
 @@ -78,11 +122,7 @@
   *	- create a private DVMA map.
   */
  void
 -iommu_init(name, is, tsbsize, iovabase)
 -	char *name;
 -	struct iommu_state *is;
 -	int tsbsize;
 -	u_int32_t iovabase;
 +iommu_init(char *name, struct iommu_state *is, int tsbsize, u_int32_t iovabase)
  {
  	psize_t size;
  	vaddr_t va;
 @@ -121,10 +161,10 @@
  	 * contiguous.
  	 */

 -	size = NBPG<<(is->is_tsbsize);
 +	size = PAGE_SIZE << is->is_tsbsize;
  	TAILQ_INIT(&mlist);
  	if (uvm_pglistalloc((psize_t)size, (paddr_t)0, (paddr_t)-1,
 -		(paddr_t)NBPG, (paddr_t)0, &mlist, 1, 0) != 0)
 +		(paddr_t)PAGE_SIZE, (paddr_t)0, &mlist, 1, 0) != 0)
  		panic("iommu_init: no memory");

  	va = uvm_km_valloc(kernel_map, size);
 @@ -141,32 +181,27 @@
  		pmap_enter(pmap_kernel(), va, pa | PMAP_NVC,
  			VM_PROT_READ|VM_PROT_WRITE,
  			VM_PROT_READ|VM_PROT_WRITE|PMAP_WIRED);
 -		va += NBPG;
 +		va += PAGE_SIZE;
  	}
  	pmap_update(pmap_kernel());
 -	bzero(is->is_tsb, size);
 +	memset(is->is_tsb, 0, size);

  #ifdef DEBUG
 -	if (iommudebug & IDB_INFO)
 -	{
 +	if (iommudebug & IDB_INFO) {
  		/* Probe the iommu */
 -
 +		/* The address or contents of the regs...? */
  		printf("iommu regs at: cr=%lx tsb=%lx flush=%lx\n",
 -			(u_long)bus_space_read_8(is->is_bustag, is->is_iommu,
 -				offsetof (struct iommureg, iommu_cr)),
 -			(u_long)bus_space_read_8(is->is_bustag, is->is_iommu,
 -				offsetof (struct iommureg, iommu_tsb)),
 -			(u_long)bus_space_read_8(is->is_bustag, is->is_iommu,
 -				offsetof (struct iommureg, iommu_flush)));
 -		printf("iommu cr=%llx tsb=%llx\n",
 -			(unsigned long long)bus_space_read_8(is->is_bustag,
 -				is->is_iommu,
 -				offsetof (struct iommureg, iommu_cr)),
 -			(unsigned long long)bus_space_read_8(is->is_bustag,
 -				is->is_iommu,
 -				offsetof (struct iommureg, iommu_tsb)));
 -		printf("TSB base %p phys %llx\n", (void *)is->is_tsb, 
 -			(unsigned long long)is->is_ptsb);
 +		    (u_long)bus_space_vaddr(is->is_bustag, is->is_iommu) +
 +			IOMMUREG(iommu_cr),
 +		    (u_long)bus_space_vaddr(is->is_bustag, is->is_iommu) +
 +			IOMMUREG(iommu_tsb),
 +		    (u_long)bus_space_vaddr(is->is_bustag, is->is_iommu) +
 +			IOMMUREG(iommu_flush));
 +		printf("iommu cr=%lx tsb=%lx\n",
 +		    IOMMUREG_READ(is, iommu_cr),
 +		    IOMMUREG_READ(is, iommu_tsb));
 +		printf("TSB base %p phys %llx\n",
 +		    (void *)is->is_tsb, (unsigned long long)is->is_ptsb);
  		delay(1000000); /* 1 s */
  	}
  #endif
 @@ -179,306 +214,510 @@
  	/*
  	 * Now all the hardware's working we need to allocate a dvma map.
  	 */
 -	printf("DVMA map: %x to %x\n", 
 -		(unsigned int)is->is_dvmabase,
 -		(unsigned int)is->is_dvmaend);
 -	printf("IOTSB: %llx to %llx\n", 
 -		(unsigned long long)is->is_ptsb,
 -		(unsigned long long)(is->is_ptsb + size));
 +	printf("DVMA map: %x to %x\n", is->is_dvmabase, is->is_dvmaend);
 +	printf("IOTDB: %llx to %llx\n", 
 +	    (unsigned long long)is->is_ptsb,
 +	    (unsigned long long)(is->is_ptsb + size));
  	is->is_dvmamap = extent_create(name,
 -				       is->is_dvmabase, is->is_dvmaend - NBPG,
 -				       M_DEVBUF, 0, 0, EX_NOWAIT);
 +	    is->is_dvmabase, is->is_dvmaend - PAGE_SIZE,
 +	    M_DEVBUF, 0, 0, EX_NOWAIT);
  }

  /*
 - * Streaming buffers don't exist on the UltraSPARC IIi; we should have
 + * Streaming buffers don't exist on the UltraSPARC IIi/e; we should have
   * detected that already and disabled them.  If not, we will notice that
   * they aren't there when the STRBUF_EN bit does not remain.
   */
  void
 -iommu_reset(is)
 -	struct iommu_state *is;
 +iommu_reset(struct iommu_state *is)
  {
  	int i;
 -	struct strbuf_ctl *sb;

 -	/* Need to do 64-bit stores */
 -	bus_space_write_8(is->is_bustag, is->is_iommu, IOMMUREG(iommu_tsb), 
 -		is->is_ptsb);
 -
 -	/* Enable IOMMU in diagnostic mode */
 -	bus_space_write_8(is->is_bustag, is->is_iommu, IOMMUREG(iommu_cr),
 -		is->is_cr|IOMMUCR_DE);
 -
 -	for (i=0; i<2; i++) {
 -		if ((sb = is->is_sb[i])) {
 -
 -			/* Enable diagnostics mode? */
 -			bus_space_write_8(is->is_bustag, is->is_sb[i]->sb_sb, 
 -				STRBUFREG(strbuf_ctl), STRBUF_EN);
 -
 -			/* No streaming buffers? Disable them */
 -			if (bus_space_read_8(is->is_bustag, 
 -				is->is_sb[i]->sb_sb, 
 -				STRBUFREG(strbuf_ctl)) == 0) {
 -				is->is_sb[i]->sb_flush = NULL;
 -			} else {
 -				/*
 -				 * locate the pa of the flush buffer.
 -				 */
 -				(void)pmap_extract(pmap_kernel(),
 -					(vaddr_t)is->is_sb[i]->sb_flush,
 -					&is->is_sb[i]->sb_flushpa);
 -			}
 +	IOMMUREG_WRITE(is, iommu_tsb, is->is_ptsb);
 +
 +	/* Enable IOMMU */
 +	IOMMUREG_WRITE(is, iommu_cr, is->is_cr);
 +
 +	for (i = 0; i < 2; ++i) {
 +		struct strbuf_ctl *sb = is->is_sb[i];
 +
 +		if (sb == NULL)
 +			continue;
 +
 +		sb->sb_iommu = is;
 +		strbuf_reset(sb);
 +
 +
 +		if (sb->sb_flush) {
 +			char buf[64];
 +			bus_space_render_tag(sb->sb_bustag, buf, sizeof buf);
 +			printf("STC%d on %s enabled\n", i, buf);
  		}
  	}
  }

  /*
 - * Here are the iommu control routines. 
 + * Inititalize one STC.
   */
  void
 -iommu_enter(sb, va, pa, flags)
 -	struct strbuf_ctl *sb;
 -	vaddr_t va;
 -	int64_t pa;
 -	int flags;
 +strbuf_reset(struct strbuf_ctl *sb)
 +{
 +	if(sb->sb_flush == NULL)
 +		return;
 +
 +	bus_space_write_8(sb->sb_bustag, sb->sb_sb,
 +	    STRBUFREG(strbuf_ctl), STRBUF_EN);
 +
 +	membar_lookaside();
 +
 +	/* No streaming buffers? Disable them */
 +	if (bus_space_read_8(sb->sb_bustag, sb->sb_sb,
 +	    STRBUFREG(strbuf_ctl)) == 0) {
 +		sb->sb_flush = NULL;
 +	} else {
 +		/*
 +		 * locate the pa of the flush buffer
 +		 */
 +		if (pmap_extract(pmap_kernel(),
 +		    (vaddr_t)sb->sb_flush, &sb->sb_flushpa) == FALSE)
 +			sb->sb_flush = NULL;
 +	}
 +}
 +
 +/*
 + * Add an entry to the IOMMU table.
 + *
 + * The entry is marked streaming if an STC was detected and 
 + * the BUS_DMA_STREAMING flag is set.
 + */
 +void
 +iommu_enter(struct iommu_state *is, struct strbuf_ctl *sb, vaddr_t va,
 +    paddr_t pa, int flags)
  {
 -	struct iommu_state *is = sb->sb_is;
 -	int strbuf = (flags & BUS_DMA_STREAMING);
  	int64_t tte;
 +	volatile int64_t *tte_ptr = &is->is_tsb[IOTSBSLOT(va,is->is_tsbsize)];

  #ifdef DIAGNOSTIC
 -	if (va < is->is_dvmabase || va > is->is_dvmaend)
 +	if (va < is->is_dvmabase || round_page(va + PAGE_SIZE) >
 +	    is->is_dvmaend + 1)
  		panic("iommu_enter: va %#lx not in DVMA space", va);
 -#endif

 -	/* Is the streamcache flush really needed? */
 -	if (sb->sb_flush) {
 -		iommu_strbuf_flush(sb, va);
 -		iommu_strbuf_flush_done(sb);
 -	} else
 -		/* If we can't flush the strbuf don't enable it. */
 -		strbuf = 0;
 +	tte = *tte_ptr;

 -	tte = MAKEIOTTE(pa, !(flags & BUS_DMA_NOWRITE), 
 -		!(flags & BUS_DMA_NOCACHE), (strbuf));
 -#ifdef DEBUG
 -	tte |= (flags & 0xff000LL)<<(4*8);
 +	if (tte & IOTTE_V) {
 +		printf("Overwriting valid tte entry (dva %lx pa %lx "
 +		    "&tte %p tte %lx)\n", va, pa, tte_ptr, tte);
 +		extent_print(is->is_dvmamap);
 +		panic("IOMMU overwrite");
 +	}
  #endif
 -	
 +
 +	tte = MAKEIOTTE(pa, !(flags & BUS_DMA_NOWRITE),
 +	    !(flags & BUS_DMA_NOCACHE), (flags & BUS_DMA_STREAMING));
 +
  	DPRINTF(IDB_IOMMU, ("Clearing TSB slot %d for va %p\n", 
 -		       (int)IOTSBSLOT(va,is->is_tsbsize), (void *)(u_long)va));
 -	is->is_tsb[IOTSBSLOT(va,is->is_tsbsize)] = tte;
 -	bus_space_write_8(is->is_bustag, is->is_iommu, 
 -		IOMMUREG(iommu_flush), va);
 +	    (int)IOTSBSLOT(va,is->is_tsbsize), (void *)(u_long)va));
 +
 +	*tte_ptr = tte;
 +
 +	/*
 +	 * Why bother to flush this va?  It should only be relevant for
 +	 * V ==> V or V ==> non-V transitions.  The former is illegal and
 +	 * the latter is never done here.  It is true that this provides
 +	 * some protection against a misbehaving master using an address
 +	 * after it should.  The IOMMU documentations specifically warns
 +	 * that the consequences of a simultaneous IOMMU flush and DVMA
 +	 * access to the same address are undefined.  (By that argument,
 +	 * the STC should probably be flushed as well.)   Note that if
 +	 * a bus master keeps using a memory region after it has been
 +	 * unmapped, the specific behavior of the IOMMU is likely to
 +	 * be the least of our worries.
 +	 */
 +	IOMMUREG_WRITE(is, iommu_flush, va);
 +
  	DPRINTF(IDB_IOMMU, ("iommu_enter: va %lx pa %lx TSB[%lx]@%p=%lx\n",
 -		va, (long)pa, (u_long)IOTSBSLOT(va,is->is_tsbsize),
 -		(void *)(u_long)&is->is_tsb[IOTSBSLOT(va,is->is_tsbsize)],
 -		(u_long)tte));
 +	    va, (long)pa, (u_long)IOTSBSLOT(va,is->is_tsbsize), 
 +	    (void *)(u_long)&is->is_tsb[IOTSBSLOT(va,is->is_tsbsize)],
 +	    (u_long)tte));
  }

 +/*
 + * Remove an entry from the IOMMU table.
 + *
 + * The entry is flushed from the STC if an STC is detected and the TSB
 + * entry has the IOTTE_STREAM flags set.  It should be impossible for
 + * the TSB entry to have this flag set without the BUS_DMA_STREAMING
 + * flag, but better to be safe.  (The IOMMU will be ignored as long
 + * as an STC entry exists.)
 + */
 +void
 +iommu_remove(struct iommu_state *is, struct strbuf_ctl *sb, vaddr_t va)
 +{
 +	int64_t *tte_ptr = &is->is_tsb[IOTSBSLOT(va, is->is_tsbsize)];
 +	int64_t tte;
 +
 +#ifdef DIAGNOSTIC
 +	if (trunc_page(va) < is->is_dvmabase || round_page(va) >
 +	    is->is_dvmaend + 1)
 +		panic("iommu_remove: va 0x%lx not in DVMA space", (u_long)va);
 +	if (va != trunc_page(va)) {
 +		printf("iommu_remove: unaligned va: %lx\n", va);
 +		va = trunc_page(va);
 +	}
 +#endif
 +	tte = *tte_ptr;
 +
 +	DPRINTF(IDB_IOMMU, ("iommu_remove: va %lx TSB[%lx]@%p\n",
 +	    va, tte, tte_ptr));
 +
 +#ifdef DIAGNOSTIC
 +	if ((tte & IOTTE_V) == 0) {
 +		printf("Removing invalid tte entry (dva %lx &tte %p "
 +		    "tte %lx)\n", va, tte_ptr, tte);
 +		extent_print(is->is_dvmamap);
 +		panic("IOMMU remove overwrite");
 +	}
 +#endif
 +
 +	*tte_ptr = tte & ~IOTTE_V;
 +
 +	/*
 +	 * IO operations are strongly ordered WRT each other.  It is
 +	 * unclear how they relate to normal memory accesses.
 +	 */
 +	membar_storestore();
 +
 +	IOMMUREG_WRITE(is, iommu_flush, va);
 +
 +	if (sb && (tte & IOTTE_STREAM))
 +		iommu_strbuf_flush(sb, va);
 +
 +	/* Should we sync the iommu and stc here? */
 +}

  /*
 - * Find the value of a DVMA address (debug routine).
 + * Find the physical address of a DVMA address (debug routine).
   */
  paddr_t
 -iommu_extract(is, dva)
 -	struct iommu_state *is;
 -	vaddr_t dva;
 +iommu_extract(struct iommu_state *is, vaddr_t dva)
  {
  	int64_t tte = 0;

 -	if (dva >= is->is_dvmabase && dva < is->is_dvmaend)
 +	if (dva >= is->is_dvmabase && dva <= is->is_dvmaend)
  		tte = is->is_tsb[IOTSBSLOT(dva, is->is_tsbsize)];

 -	if ((tte & IOTTE_V) == 0)
 -		return ((paddr_t)-1L);
  	return (tte & IOTTE_PAMASK);
  }

  /*
 - * iommu_remove: removes mappings created by iommu_enter
 - *
 - * Only demap from IOMMU if flag is set.
 - *
 - * XXX: this function needs better internal error checking.
 + * Lookup a TSB entry for a given DVMA (debug routine).
   */
 -void
 -iommu_remove(is, va, len)
 -	struct iommu_state *is;
 -	vaddr_t va;
 -	size_t len;
 +int64_t
 +iommu_lookup_tte(struct iommu_state *is, vaddr_t dva)
  {
 +	int64_t tte = 0;
 +	
 +	if (dva >= is->is_dvmabase && dva <= is->is_dvmaend)
 +		tte = is->is_tsb[IOTSBSLOT(dva, is->is_tsbsize)];

 -#ifdef DIAGNOSTIC
 -	if (va < is->is_dvmabase || va > is->is_dvmaend)
 -		panic("iommu_remove: va 0x%lx not in DVMA space", (u_long)va);
 -	if ((long)(va + len) < (long)va)
 -		panic("iommu_remove: va 0x%lx + len 0x%lx wraps", 
 -		      (long) va, (long) len);
 -	if (len & ~0xfffffff) 
 -		panic("iommu_remove: rediculous len 0x%lx", (u_long)len);
 -#endif
 +	return (tte);
 +}

 -	va = trunc_page(va);
 -	DPRINTF(IDB_IOMMU, ("iommu_remove: va %lx TSB[%lx]@%p\n",
 -		va, (u_long)IOTSBSLOT(va, is->is_tsbsize),
 -		&is->is_tsb[IOTSBSLOT(va, is->is_tsbsize)]));
 -	while (len > 0) {
 -		DPRINTF(IDB_IOMMU, ("iommu_remove: clearing TSB slot %d "
 -			"for va %p size %lx\n",
 -			(int)IOTSBSLOT(va,is->is_tsbsize), (void *)(u_long)va,
 -			(u_long)len));
 -		if (len <= NBPG)
 -			len = 0;
 -		else
 -			len -= NBPG;
 +/*
 + * Lookup a TSB entry at a given physical address (debug routine).
 + */
 +int64_t
 +iommu_fetch_tte(struct iommu_state *is, paddr_t pa)
 +{
 +	int64_t tte = 0;
 +	
 +	if (pa >= is->is_ptsb && pa < is->is_ptsb +
 +	    (PAGE_SIZE << is->is_tsbsize)) 
 +		tte = ldxa(pa, ASI_PHYS_CACHED);

 -		/* XXX Zero-ing the entry would not require RMW */
 -		is->is_tsb[IOTSBSLOT(va,is->is_tsbsize)] &= ~IOTTE_V;
 -		bus_space_write_8(is->is_bustag, is->is_iommu, 
 -			IOMMUREG(iommu_flush), va);
 -		va += NBPG;
 -	}
 +	return (tte);
  }

 -static int 
 -iommu_strbuf_flush_done(sb)
 -	struct strbuf_ctl *sb;
 +/*
 + * Fetch a TSB entry with some sanity checking.
 + */
 +int64_t
 +iommu_tsb_entry(struct iommu_state *is, vaddr_t dva)
  {
 -	struct iommu_state *is = sb->sb_is;
 -	struct timeval cur, flushtimeout;
 +	int64_t tte;

 -#define BUMPTIME(t, usec) { \
 -	register volatile struct timeval *tp = (t); \
 -	register long us; \
 - \
 -	tp->tv_usec = us = tp->tv_usec + (usec); \
 -	if (us >= 1000000) { \
 -		tp->tv_usec = us - 1000000; \
 -		tp->tv_sec++; \
 -	} \
 +	if (dva < is->is_dvmabase && dva > is->is_dvmaend)
 +		panic("invalid dva: %llx", (long long)dva);
 +
 +	tte = is->is_tsb[IOTSBSLOT(dva,is->is_tsbsize)];
 +
 +	if ((tte & IOTTE_V) == 0)
 +		panic("iommu_tsb_entry: invalid entry %lx", dva);
 +
 +	return (tte);
  }

 -	if (!sb->sb_flush)
 -		return (0);
 -				
 +/*
 + * Initiate and then block until an STC flush synchronization has completed.
 + */
 +int 
 +iommu_strbuf_flush_done(struct iommu_map_state *ims)
 +{
 +	struct strbuf_ctl *sb = ims->ims_sb;
 +	struct strbuf_flush *sf = &ims->ims_flush;
 +	struct timeval cur, flushtimeout;
 +	struct timeval to = { 0, 500000 };
 +	u_int64_t flush;
 +	int timeout_started = 0;
 +
 +#ifdef DIAGNOSTIC
 +	if (sb == NULL) {
 +		panic("iommu_strbuf_flush_done: invalid flush buffer");
 +	}
 +#endif
 +
  	/*
  	 * Streaming buffer flushes:
  	 * 
 -	 *   1 Tell strbuf to flush by storing va to strbuf_pgflush.  If
 -	 *     we're not on a cache line boundary (64-bits):
 +	 *   1 Tell strbuf to flush by storing va to strbuf_pgflush.
  	 *   2 Store 0 in flag
  	 *   3 Store pointer to flag in flushsync
  	 *   4 wait till flushsync becomes 0x1
  	 *
 -	 * If it takes more than .5 sec, something
 -	 * went wrong.
 +	 * If it takes more than .5 sec, something went very, very wrong.
 +	 */
 +
 +	/*
 +	 * If we're reading from ASI_PHYS_CACHED, then we'll write to
 +	 * it too.  No need to tempt fate or learn about Si bugs or such.
 +	 * FreeBSD just uses normal "volatile" reads/writes...
  	 */

 -	*sb->sb_flush = 0;
 -	bus_space_write_8(is->is_bustag, sb->sb_sb, 
 -		STRBUFREG(strbuf_flushsync), sb->sb_flushpa);
 +	stxa(sf->sbf_flushpa, ASI_PHYS_CACHED, 0);
 +
 +	/*
 +	 * Insure any previous strbuf operations are complete and that 
 +	 * memory is initialized before the IOMMU uses it.
 +	 * Is this Needed?  How are IO and memory operations ordered? 
 +	 */
 +	membar_storestore();
 +
 +	bus_space_write_8(sb->sb_bustag, sb->sb_sb,
 +		    STRBUFREG(strbuf_flushsync), sf->sbf_flushpa);
 +
 +	DPRINTF(IDB_IOMMU,
 +	    ("iommu_strbuf_flush_done: flush = %lx pa = %lx\n", 
 +		ldxa(sf->sbf_flushpa, ASI_PHYS_CACHED), sf->sbf_flushpa));
 +
 +	membar_storeload();
 +       	membar_lookaside();
 +
 +	for(;;) {
 +		int i;
 +
 +		/*
 +		 * Try to shave a few instruction cycles off the average
 +		 * latency by only checking the elapsed time every few
 +		 * fetches.
 +		 */
 +		for (i = 0; i < 1000; ++i) {
 +			membar_loadload();
 +			/* Bypass non-coherent D$ */
 +			/* non-coherent...?   Huh? */
 +			flush = ldxa(sf->sbf_flushpa, ASI_PHYS_CACHED);
 +
 +			if (flush) {
 +				DPRINTF(IDB_IOMMU,
 +				    ("iommu_strbuf_flush_done: flushed\n"));
 +				return (0);
 +			}
 +		}

 -	microtime(&flushtimeout); 
 -	cur = flushtimeout;
 -	BUMPTIME(&flushtimeout, 500000); /* 1/2 sec */
 -	
 -	DPRINTF(IDB_IOMMU, ("iommu_strbuf_flush_done: flush = %lx "
 -		"at va = %lx pa = %lx now=%lx:%lx until = %lx:%lx\n",
 -		(long)*sb->sb_flush, (long)sb->sb_flush, (long)sb->sb_flushpa, 
 -		cur.tv_sec, cur.tv_usec,
 -		flushtimeout.tv_sec, flushtimeout.tv_usec));
 -
 -	/* Bypass non-coherent D$ */
 -	while ((!ldxa(sb->sb_flushpa, ASI_PHYS_CACHED)) &&
 -		timercmp(&cur, &flushtimeout, <=))
  		microtime(&cur);

 -#ifdef DIAGNOSTIC
 -	if (!ldxa(sb->sb_flushpa, ASI_PHYS_CACHED)) {
 -		printf("iommu_strbuf_flush_done: flush timeout %p, at %p\n",
 -			(void *)(u_long)*sb->sb_flush,
 -			(void *)(u_long)sb->sb_flushpa); /* panic? */
 -#ifdef DDB
 -		Debugger();
 -#endif
 +		if (timeout_started) {
 +			if (timercmp(&cur, &flushtimeout, >))
 +				panic("STC timeout at %lx (%ld)",
 +				    sf->sbf_flushpa, flush);
 +		} else {
 +			timeradd(&cur, &to, &flushtimeout);
 +			
 +			timeout_started = 1;
 +	
 +			DPRINTF(IDB_IOMMU,
 +			    ("iommu_strbuf_flush_done: flush = %lx pa = %lx "
 +				"now=%lx:%lx until = %lx:%lx\n", 
 +				ldxa(sf->sbf_flushpa, ASI_PHYS_CACHED),
 +				sf->sbf_flushpa, cur.tv_sec, cur.tv_usec, 
 +				flushtimeout.tv_sec, flushtimeout.tv_usec));
 +		}
  	}
 -#endif
 -	DPRINTF(IDB_IOMMU, ("iommu_strbuf_flush_done: flushed\n"));
 -	return (*sb->sb_flush);
  }

  /*
   * IOMMU DVMA operations, common to SBUS and PCI.
   */
  int
 -iommu_dvmamap_load(t, sb, map, buf, buflen, p, flags)
 -	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 +iommu_dvmamap_create(bus_dma_tag_t t, struct iommu_state *is,
 +    struct strbuf_ctl *sb, bus_size_t size, int nsegments, bus_size_t maxsegsz,
 +    bus_size_t boundary, int flags, bus_dmamap_t *dmamap)
 +{
 +	int ret;
  	bus_dmamap_t map;
 -	void *buf;
 -	bus_size_t buflen;
 -	struct proc *p;
 -	int flags;
 +	struct iommu_map_state *ims;
 +
 +	ret = bus_dmamap_create(t->_parent, size, nsegments, maxsegsz,
 +	    boundary, flags, &map);
 +
 +	if (ret)
 +		return (ret);
 +
 +	ims = iommu_iomap_create(nsegments);
 +
 +	if (ims == NULL) {
 +		bus_dmamap_destroy(t->_parent, map);
 +		return (ENOMEM);
 +	}
 +
 +	ims->ims_sb = sb;
 +	map->_dm_cookie = ims;
 +	*dmamap = map;
 +
 +	return (0);
 +}
 +
 +void
 +iommu_dvmamap_destroy(bus_dma_tag_t t, bus_dmamap_t map)
 +{
 +	/*
 +	 * The specification (man page) requires a loaded
 +	 * map to be unloaded before it is destroyed.
 +	 */
 +	if (map->dm_nsegs)
 +		bus_dmamap_unload(t, map);
 +
 +        if (map->_dm_cookie)
 +                iommu_iomap_destroy(map->_dm_cookie);
 +	map->_dm_cookie = NULL;
 +
 +	bus_dmamap_destroy(t->_parent, map);
 +}
 +
 +/*
 + * Load a contiguous kva buffer into a dmamap.  The physical pages are
 + * not assumed to be contiguous.  Two passes are made through the buffer
 + * and both call pmap_extract() for the same va->pa translations.  It
 + * is possible to run out of pa->dvma mappings; the code should be smart
 + * enough to resize the iomap (when the "flags" permit allocation).  It
 + * is trivial to compute the number of entries required (round the length
 + * up to the page size and then divide by the page size)...
 + */
 +int
 +iommu_dvmamap_load(bus_dma_tag_t t, struct iommu_state *is, bus_dmamap_t map,
 +    void *buf, bus_size_t buflen, struct proc *p, int flags)
  {
 -	struct iommu_state *is = sb->sb_is;
  	int s;
 -	int err;
 +	int err = 0;
  	bus_size_t sgsize;
 -	paddr_t curaddr;
  	u_long dvmaddr, sgstart, sgend;
  	bus_size_t align, boundary;
 -	vaddr_t vaddr = (vaddr_t)buf;
 -	int seg;
 +	struct iommu_map_state *ims = map->_dm_cookie;
  	pmap_t pmap;

 +#ifdef DIAGNOSTIC
 +	if (ims == NULL)
 +		panic("iommu_dvmamap_load: null map state");
 +#endif
 +
  	if (map->dm_nsegs) {
 -		/* Already in use?? */
 +		/*
 +		 * Is it still in use? _bus_dmamap_load should have taken care
 +		 * of this.
 +		 */
  #ifdef DIAGNOSTIC
 -		printf("iommu_dvmamap_load: map still in use\n");
 +		panic("iommu_dvmamap_load: map still in use");
  #endif
  		bus_dmamap_unload(t, map);
  	}
 +
  	/*
  	 * Make sure that on error condition we return "no valid mappings".
  	 */
  	map->dm_nsegs = 0;

 -	if (buflen > map->_dm_size) {
 +	if (buflen < 1 || buflen > map->_dm_size) {
  		DPRINTF(IDB_BUSDMA,
  		    ("iommu_dvmamap_load(): error %d > %d -- "
  		     "map size exceeded!\n", (int)buflen, (int)map->_dm_size));
  		return (EINVAL);
  	}

 -	sgsize = round_page(buflen + ((int)vaddr & PGOFSET));
 -
  	/*
  	 * A boundary presented to bus_dmamem_alloc() takes precedence
  	 * over boundary in the map.
  	 */
  	if ((boundary = (map->dm_segs[0]._ds_boundary)) == 0)
  		boundary = map->_dm_boundary;
 -	align = max(map->dm_segs[0]._ds_align, NBPG);
 -	s = splhigh();
 +	align = max(map->dm_segs[0]._ds_align, PAGE_SIZE);
 +
 +	pmap = p ? p->p_vmspace->vm_map.pmap : pmap_kernel();
 +
 +	/* Count up the total number of pages we need */
 +	iommu_iomap_clear_pages(ims);
 +	{ /* Scope */
 +		bus_addr_t a, aend;
 +		bus_addr_t addr = (vaddr_t)buf;
 +		int seg_len = buflen;
 +
 +		aend = round_page(addr + seg_len - 1);
 +		for (a = trunc_page(addr); a < aend; a += PAGE_SIZE) {
 +			paddr_t pa;
 +
 +			if (pmap_extract(pmap, a, &pa) == FALSE) {
 +				printf("iomap pmap error addr 0x%lx\n", a);
 +				iommu_iomap_clear_pages(ims);
 +				return (E2BIG);
 +			}
 +
 +			err = iommu_iomap_insert_page(ims, pa);
 +			if (err) {
 +				printf("iomap insert error: %d for "
 +				    "va 0x%lx pa 0x%lx "
 +				    "(buf %p len %ld/%lx)\n",
 +				    err, a, pa, buf, buflen, buflen);
 +				iommu_dvmamap_print_map(t, is, map);
 +				iommu_iomap_clear_pages(ims);
 +				return (E2BIG);
 +			}
 +		}
 +	}
 +	sgsize = ims->ims_map.ipm_pagecnt * PAGE_SIZE;
 +
 +	if (flags & BUS_DMA_24BIT) {
 +		sgstart = max(is->is_dvmamap->ex_start, 0xff000000);
 +		sgend = min(is->is_dvmamap->ex_end, 0xffffffff);
 +	} else {
 +		sgstart = is->is_dvmamap->ex_start;
 +		sgend = is->is_dvmamap->ex_end;
 +	}
 +
  	/* 
  	 * If our segment size is larger than the boundary we need to 
 -	 * split the transfer up int little pieces ourselves.
 +	 * split the transfer up into little pieces ourselves.
  	 */
 -	err = extent_alloc(is->is_dvmamap, sgsize, align, 
 -		(sgsize > boundary) ? 0 : boundary, 
 -		EX_NOWAIT|EX_BOUNDZERO, &dvmaddr);
 +	s = splhigh();
 +	err = extent_alloc_subregion1(is->is_dvmamap, sgstart, sgend,
 +	    sgsize, align, 0, (sgsize > boundary) ? 0 : boundary, 
 +	    EX_NOWAIT | EX_BOUNDZERO, (u_long *)&dvmaddr);
  	splx(s);

  #ifdef DEBUG
 -	if (err || (dvmaddr == (bus_addr_t)-1))	
 -	{ 
 +	if (err || (dvmaddr == (bus_addr_t)-1))	{ 
  		printf("iommu_dvmamap_load(): extent_alloc(%d, %x) failed!\n",
  		    (int)sgsize, flags);
  #ifdef DDB
 -		Debugger();
 +		if (iommudebug & IDB_BREAK)
 +			Debugger();
  #endif
  	}		
  #endif	
 @@ -492,163 +731,109 @@
  	map->_dm_dvmastart = dvmaddr;
  	map->_dm_dvmasize = sgsize;

 -	/*
 -	 * Now split the DVMA range into segments, not crossing
 -	 * the boundary.
 -	 */
 -	seg = 0;
 -	sgstart = dvmaddr + (vaddr & PGOFSET);
 -	sgend = sgstart + buflen - 1;
 -	map->dm_segs[seg].ds_addr = sgstart;
 -	DPRINTF(IDB_INFO, ("iommu_dvmamap_load: boundary %lx boundary-1 %lx "
 -		"~(boundary-1) %lx\n", (long)boundary, (long)(boundary-1), (long)~(boundary-1)));
 -	while ((sgstart & ~(boundary - 1)) != (sgend & ~(boundary - 1))) {
 -		/* Oops.  We crossed a boundary.  Split the xfer. */
 -		DPRINTF(IDB_INFO, ("iommu_dvmamap_load: "
 -			"seg %d start %lx size %lx\n", seg,
 -			(long)map->dm_segs[seg].ds_addr, 
 -			(long)map->dm_segs[seg].ds_len));
 -		map->dm_segs[seg].ds_len =
 -		    boundary - (sgstart & (boundary - 1));
 -		if (++seg >= map->_dm_segcnt) {
 -			/* Too many segments.  Fail the operation. */
 -			DPRINTF(IDB_INFO, ("iommu_dvmamap_load: "
 -				"too many segments %d\n", seg));
 -			s = splhigh();
 -			/* How can this fail?  And if it does what can we do? */
 -			err = extent_free(is->is_dvmamap,
 -				dvmaddr, sgsize, EX_NOWAIT);
 -			map->_dm_dvmastart = 0;
 -			map->_dm_dvmasize = 0;
 -			splx(s);
 -			return (E2BIG);
 -		}
 -		sgstart = roundup(sgstart, boundary);
 -		map->dm_segs[seg].ds_addr = sgstart;
 -	}
 -	map->dm_segs[seg].ds_len = sgend - sgstart + 1;
 -	DPRINTF(IDB_INFO, ("iommu_dvmamap_load: "
 -		"seg %d start %lx size %lx\n", seg,
 -		(long)map->dm_segs[seg].ds_addr, (long)map->dm_segs[seg].ds_len));
 -	map->dm_nsegs = seg+1;
  	map->dm_mapsize = buflen;

 -	if (p != NULL)
 -		pmap = p->p_vmspace->vm_map.pmap;
 -	else
 -		pmap = pmap_kernel();
 +#ifdef DEBUG
 +	iommu_dvmamap_validate_map(t, is, map);
 +#endif

 -	for (; buflen > 0; ) {
 -		/*
 -		 * Get the physical address for this page.
 -		 */
 -		if (pmap_extract(pmap, (vaddr_t)vaddr, &curaddr) == FALSE) {
 -			bus_dmamap_unload(t, map);
 -			return (-1);
 -		}
 +	if (iommu_iomap_load_map(is, ims, dvmaddr, flags))
 +		return (E2BIG);

 -		/*
 -		 * Compute the segment size, and adjust counts.
 -		 */
 -		sgsize = NBPG - ((u_long)vaddr & PGOFSET);
 -		if (buflen < sgsize)
 -			sgsize = buflen;
 +	{ /* Scope */
 +		bus_addr_t a, aend;
 +		bus_addr_t addr = (vaddr_t)buf;
 +		int seg_len = buflen;
 +
 +		aend = round_page(addr + seg_len - 1);
 +		for (a = trunc_page(addr); a < aend; a += PAGE_SIZE) {
 +			bus_addr_t pgstart;
 +			bus_addr_t pgend;
 +			paddr_t pa;
 +			int pglen;
 +
 +			/* Yuck... Redoing the same pmap_extract... */
 +			if (pmap_extract(pmap, a, &pa) == FALSE) {
 +				printf("iomap pmap error addr 0x%lx\n", a);
 +				iommu_iomap_clear_pages(ims);
 +				return (E2BIG);
 +			}

 -		DPRINTF(IDB_BUSDMA,
 -		    ("iommu_dvmamap_load: map %p loading va %p "
 -			    "dva %lx at pa %lx\n",
 -			    map, (void *)vaddr, (long)dvmaddr,
 -			    (long)(curaddr & ~(NBPG-1))));
 -		iommu_enter(sb, trunc_page(dvmaddr), trunc_page(curaddr),
 -		    flags|0x4000);
 -			
 -		dvmaddr += PAGE_SIZE;
 -		vaddr += sgsize;
 -		buflen -= sgsize;
 -	}
 -#ifdef DIAGNOSTIC
 -	for (seg = 0; seg < map->dm_nsegs; seg++) {
 -		if (map->dm_segs[seg].ds_addr < is->is_dvmabase ||
 -			map->dm_segs[seg].ds_addr > is->is_dvmaend) {
 -			printf("seg %d dvmaddr %lx out of range %x - %x\n",
 -				seg, (long)map->dm_segs[seg].ds_addr, 
 -				is->is_dvmabase, is->is_dvmaend);
 -			Debugger();
 +			pgstart = pa | (max(a, addr) & PAGE_MASK);
 +			pgend = pa | (min(a + PAGE_SIZE - 1,
 +			    addr + seg_len - 1) & PAGE_MASK);
 +			pglen = pgend - pgstart + 1;
 +
 +			if (pglen < 1)
 +				continue;
 +
 +			err = iommu_dvmamap_append_range(t, map, pgstart,
 +			    pglen, flags, boundary);
 +			if (err) {
 +				printf("iomap load seg page: %d for "
 +				    "va 0x%lx pa %lx (%lx - %lx) "
 +				    "for %d/0x%x\n",
 +				    err, a, pa, pgstart, pgend, pglen, pglen);
 +				return (err);
 +			}
  		}
  	}
 -#endif
 -	return (0);
 -}
 -

 -void
 -iommu_dvmamap_unload(t, sb, map)
 -	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 -	bus_dmamap_t map;
 -{
 -	struct iommu_state *is = sb->sb_is;
 -	int error, s;
 -	bus_size_t sgsize;
 +#ifdef DIAGNOSTIC
 +	iommu_dvmamap_validate_map(t, is, map);
 +#endif

 -	/* Flush the iommu */
  #ifdef DEBUG
 -	if (!map->_dm_dvmastart) {
 -		printf("iommu_dvmamap_unload: No dvmastart is zero\n");
 +	if (err)
 +		printf("**** iommu_dvmamap_load failed with error %d\n",
 +		    err);
 +	
 +	if (err || (iommudebug & IDB_PRINT_MAP)) {
 +		iommu_dvmamap_print_map(t, is, map);
  #ifdef DDB
 -		Debugger();
 +		if (iommudebug & IDB_BREAK)
 +			Debugger();
  #endif
  	}
  #endif
 -	iommu_remove(is, map->_dm_dvmastart, map->_dm_dvmasize);
 -
 -	/* Flush the caches */
 -	bus_dmamap_unload(t->_parent, map);
 -
 -	/* Mark the mappings as invalid. */
 -	map->dm_mapsize = 0;
 -	map->dm_nsegs = 0;
 -	
 -	s = splhigh();
 -	error = extent_free(is->is_dvmamap, map->_dm_dvmastart, 
 -		map->_dm_dvmasize, EX_NOWAIT);
 -	map->_dm_dvmastart = 0;
 -	map->_dm_dvmasize = 0;
 -	splx(s);
 -	if (error != 0)
 -		printf("warning: %qd of DVMA space lost\n", (long long)sgsize);

 -	/* Clear the map */
 +	return (err);
  }

 -
 +/*
 + * Load a dvmamap from an array of segs or an mlist (if the first
 + * "segs" entry's mlist is non-null).  It calls iommu_dvmamap_load_segs()
 + * or iommu_dvmamap_load_mlist() for part of the 2nd pass through the
 + * mapping.  This is ugly.  A better solution would probably be to have
 + * function pointers for implementing the traversal.  That way, there
 + * could be one core load routine for each of the three required algorithms
 + * (buffer, seg, and mlist).  That would also mean that the traversal
 + * algorithm would then only need one implementation for each algorithm
 + * instead of two (one for populating the iomap and one for populating
 + * the dvma map).
 + */
  int
 -iommu_dvmamap_load_raw(t, sb, map, segs, nsegs, flags, size)
 -	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 -	bus_dmamap_t map;
 -	bus_dma_segment_t *segs;
 -	int nsegs;
 -	int flags;
 -	bus_size_t size;
 +iommu_dvmamap_load_raw(bus_dma_tag_t t, struct iommu_state *is,
 +    bus_dmamap_t map, bus_dma_segment_t *segs, int nsegs, int flags,
 +    bus_size_t size)
  {
 -	struct iommu_state *is = sb->sb_is;
 -	struct vm_page *m;
 -	int i, j, s;
 +	int i, s;
  	int left;
 -	int err;
 +	int err = 0;
  	bus_size_t sgsize;
 -	paddr_t pa;
  	bus_size_t boundary, align;
  	u_long dvmaddr, sgstart, sgend;
 -	struct pglist *mlist;
 -	int pagesz = PAGE_SIZE;
 -	int npg = 0; /* DEBUG */
 +	struct iommu_map_state *ims = map->_dm_cookie;
 +
 +#ifdef DIAGNOSTIC
 +	if (ims == NULL)
 +		panic("iommu_dvmamap_load_raw: null map state");
 +#endif

  	if (map->dm_nsegs) {
  		/* Already in use?? */
  #ifdef DIAGNOSTIC
 -		printf("iommu_dvmamap_load_raw: map still in use\n");
 +		panic("iommu_dvmamap_load_raw: map still in use");
  #endif
  		bus_dmamap_unload(t, map);
  	}
 @@ -660,45 +845,87 @@
  	if ((boundary = segs[0]._ds_boundary) == 0)
  		boundary = map->_dm_boundary;

 -	align = max(segs[0]._ds_align, pagesz);
 +	align = max(segs[0]._ds_align, PAGE_SIZE);

  	/*
  	 * Make sure that on error condition we return "no valid mappings".
  	 */
  	map->dm_nsegs = 0;
 -	/* Count up the total number of pages we need */
 -	pa = segs[0].ds_addr;
 -	sgsize = 0;
 -	left = size;
 -	for (i=0; left && i<nsegs; i++) {
 -		if (round_page(pa) != round_page(segs[i].ds_addr))
 -			sgsize = round_page(sgsize);
 -		sgsize += min(left, segs[i].ds_len);
 -		left -= segs[i].ds_len;
 -		pa = segs[i].ds_addr + segs[i].ds_len;
 -	}
 -	sgsize = round_page(sgsize);

 -	s = splhigh();
 +	iommu_iomap_clear_pages(ims);
 +	if (segs[0]._ds_mlist) {
 +		struct pglist *mlist = segs[0]._ds_mlist;
 +		struct vm_page *m;
 +		for (m = TAILQ_FIRST(mlist); m != NULL;
 +		    m = TAILQ_NEXT(m,pageq)) {
 +			err = iommu_iomap_insert_page(ims, VM_PAGE_TO_PHYS(m));
 +
 +			if(err) {
 +				printf("iomap insert error: %d for "
 +				    "pa 0x%lx\n", err, VM_PAGE_TO_PHYS(m));
 +				iommu_iomap_clear_pages(ims);
 +				return (E2BIG);
 +			}
 +		}
 +	} else {
 +		/* Count up the total number of pages we need */
 +		for (i = 0, left = size; left > 0 && i < nsegs; i++) {
 +			bus_addr_t a, aend;
 +			bus_size_t len = segs[i].ds_len;
 +			bus_addr_t addr = segs[i].ds_addr;
 +			int seg_len = min(left, len);
 +
 +			if (len < 1)
 +				continue;
 +
 +			aend = round_page(addr + seg_len - 1);
 +			for (a = trunc_page(addr); a < aend; a += PAGE_SIZE) {
 +
 +				err = iommu_iomap_insert_page(ims, a);
 +				if (err) {
 +					printf("iomap insert error: %d for "
 +					    "pa 0x%lx\n", err, a);
 +					iommu_iomap_clear_pages(ims);
 +					return (E2BIG);
 +				}
 +			}
 +
 +			left -= seg_len;
 +		}
 +	}
 +	sgsize = ims->ims_map.ipm_pagecnt * PAGE_SIZE;
 +
 +	if (flags & BUS_DMA_24BIT) {
 +		sgstart = max(is->is_dvmamap->ex_start, 0xff000000);
 +		sgend = min(is->is_dvmamap->ex_end, 0xffffffff);
 +	} else {
 +		sgstart = is->is_dvmamap->ex_start;
 +		sgend = is->is_dvmamap->ex_end;
 +	}
 +
  	/* 
  	 * If our segment size is larger than the boundary we need to 
  	 * split the transfer up into little pieces ourselves.
  	 */
 -	err = extent_alloc(is->is_dvmamap, sgsize, align,
 -		(sgsize > boundary) ? 0 : boundary,
 -		((flags & BUS_DMA_NOWAIT) == 0 ? EX_WAITOK : EX_NOWAIT) |
 -		EX_BOUNDZERO, &dvmaddr);
 +	s = splhigh();
 +	err = extent_alloc_subregion1(is->is_dvmamap, sgstart, sgend,
 +	    sgsize, align, 0, (sgsize > boundary) ? 0 : boundary, 
 +	    EX_NOWAIT | EX_BOUNDZERO, (u_long *)&dvmaddr);
  	splx(s);

  	if (err != 0)
  		return (err);

  #ifdef DEBUG
 -	if (dvmaddr == (bus_addr_t)-1)	
 -	{ 
 -		printf("iommu_dvmamap_load_raw(): extent_alloc(%d, %x) failed!\n",
 -		    (int)sgsize, flags);
 -		Debugger();
 +	if (dvmaddr == (bus_addr_t)-1)	{ 
 +		printf("iommu_dvmamap_load_raw(): extent_alloc(%d, %x) "
 +		    "failed!\n", (int)sgsize, flags);
 +#ifdef DDB
 +		if (iommudebug & IDB_BREAK)
 +			Debugger();
 +#else
 +		panic("");
 +#endif
  	}		
  #endif	
  	if (dvmaddr == (bus_addr_t)-1)
 @@ -708,303 +935,601 @@
  	map->_dm_dvmastart = dvmaddr;
  	map->_dm_dvmasize = sgsize;

 -	if ((mlist = segs[0]._ds_mlist) == NULL) {
 -		u_long prev_va = NULL;
 -		paddr_t prev_pa = 0;
 -		int end = 0, offset;
 +	map->dm_mapsize = size;

 -		/*
 -		 * This segs is made up of individual physical
 -		 *  segments, probably by _bus_dmamap_load_uio() or 
 -		 * _bus_dmamap_load_mbuf().  Ignore the mlist and
 -		 * load each one individually.
 -		 */
 -		map->dm_mapsize = size;
 +#ifdef DEBUG
 +	iommu_dvmamap_validate_map(t, is, map);
 +#endif

 -		j = 0;
 -		for (i = 0; i < nsegs ; i++) {
 +	if (iommu_iomap_load_map(is, ims, dvmaddr, flags))
 +		return (E2BIG);

 -			pa = segs[i].ds_addr;
 -			offset = (pa & PGOFSET);
 -			pa = trunc_page(pa);
 -			dvmaddr = trunc_page(dvmaddr);
 -			left = min(size, segs[i].ds_len);
 -
 -			DPRINTF(IDB_INFO, ("iommu_dvmamap_load_raw: converting "
 -				"physseg %d start %lx size %lx\n", i, 
 -				(long)segs[i].ds_addr, (long)segs[i].ds_len));
 -
 -			if ((pa == prev_pa) && 
 -				((offset != 0) || (end != offset))) {
 -				/* We can re-use this mapping */
 -				dvmaddr = prev_va;
 -			}
 +	if (segs[0]._ds_mlist)
 +		err = iommu_dvmamap_load_mlist(t, is, map, segs[0]._ds_mlist,
 +		    flags, size, boundary);
 +	else
 +		err = iommu_dvmamap_load_seg(t, is, map, segs, nsegs,
 +		    flags, size, boundary);

 -			sgstart = dvmaddr + offset;
 -			sgend = sgstart + left - 1;
 +	if (err)
 +		iommu_iomap_unload_map(is, ims);

 -			/* Are the segments virtually adjacent? */
 -			if ((j > 0) && (end == offset) && 
 -				((offset == 0) || (pa == prev_pa))) {
 -				/* Just append to the previous segment. */
 -				map->dm_segs[--j].ds_len += left;
 -				DPRINTF(IDB_INFO, ("iommu_dvmamap_load_raw: "
 -					"appending seg %d start %lx size %lx\n", j,
 -					(long)map->dm_segs[j].ds_addr, 
 -					(long)map->dm_segs[j].ds_len));
 -			} else {
 -				if (j >= map->_dm_segcnt) {
 -					iommu_dvmamap_unload(t, sb, map);
 -					return (E2BIG);
 -				}
 -				map->dm_segs[j].ds_addr = sgstart;
 -				map->dm_segs[j].ds_len = left;
 -				DPRINTF(IDB_INFO, ("iommu_dvmamap_load_raw: "
 -					"seg %d start %lx size %lx\n", j,
 -					(long)map->dm_segs[j].ds_addr,
 -					(long)map->dm_segs[j].ds_len));
 -			}
 -			end = (offset + left) & PGOFSET;
 +#ifdef DIAGNOSTIC
 +	/* The map should be valid even if the load failed */
 +	if (iommu_dvmamap_validate_map(t, is, map)) {
 +		printf("load size %ld/0x%lx\n", size, size);
 +		if (segs[0]._ds_mlist)
 +			printf("mlist %p\n", segs[0]._ds_mlist);
 +		else  {
 +			long tot_len = 0;
 +			long clip_len = 0;
 +			printf("segs %p nsegs %d\n", segs, nsegs);
 +
 +			left = size;
 +			for(i = 0; i < nsegs; i++) {
 +				bus_size_t len = segs[i].ds_len;
 +				bus_addr_t addr = segs[i].ds_addr;
 +				int seg_len = min(left, len);
 +
 +				printf("addr %lx len %ld/0x%lx seg_len "
 +				    "%d/0x%x left %d/0x%x\n", addr, len, len,
 +				    seg_len, seg_len, left, left);

 -			/* Check for boundary issues */
 -			while ((sgstart & ~(boundary - 1)) !=
 -				(sgend & ~(boundary - 1))) {
 -				/* Need a new segment. */
 -				map->dm_segs[j].ds_len =
 -					boundary - (sgstart & (boundary - 1));
 -				DPRINTF(IDB_INFO, ("iommu_dvmamap_load_raw: "
 -					"seg %d start %lx size %lx\n", j,
 -					(long)map->dm_segs[j].ds_addr, 
 -					(long)map->dm_segs[j].ds_len));
 -				if (++j >= map->_dm_segcnt) {
 -					iommu_dvmamap_unload(t, sb, map);
 -					return (E2BIG);
 -				}
 -				sgstart = roundup(sgstart, boundary);
 -				map->dm_segs[j].ds_addr = sgstart;
 -				map->dm_segs[j].ds_len = sgend - sgstart + 1;
 +				left -= seg_len;
 +				
 +				clip_len += seg_len;
 +				tot_len += segs[i].ds_len;
  			}
 +			printf("total length %ld/0x%lx total seg. "
 +			    "length %ld/0x%lx\n", tot_len, tot_len, clip_len,
 +			    clip_len);
 +		}

 -			if (sgsize == 0)
 -				panic("iommu_dmamap_load_raw: size botch");
 +		if (err == 0)
 +			err = 1;
 +	}

 -			/* Now map a series of pages. */
 -			while (dvmaddr <= sgend) {
 -				DPRINTF(IDB_BUSDMA,
 -					("iommu_dvmamap_load_raw: map %p "
 -						"loading va %lx at pa %lx\n",
 -						map, (long)dvmaddr,
 -						(long)(pa)));
 -				/* Enter it if we haven't before. */
 -				if (prev_va != dvmaddr)
 -					iommu_enter(sb, prev_va = dvmaddr,
 -						prev_pa = pa,
 -						flags|(++npg<<12));
 -				dvmaddr += pagesz;
 -				pa += pagesz;
 -			}
 +#endif

 -			size -= left;
 -			++j;
 -		}
 +#ifdef DEBUG
 +	if (err)
 +		printf("**** iommu_dvmamap_load_raw failed with error %d\n",
 +		    err);
 +	
 +	if (err || (iommudebug & IDB_PRINT_MAP)) {
 +		iommu_dvmamap_print_map(t, is, map);
 +#ifdef DDB
 +		if (iommudebug & IDB_BREAK)
 +			Debugger();
 +#endif
 +	}
 +#endif
 +
 +	return (err);
 +}
 +
 +/*
 + * Insert a range of addresses into a loaded map respecting the specified
 + * boundary and alignment restrictions.  The range is specified by its 
 + * physical address and length.  The range cannot cross a page boundary.
 + * This code (along with most of the rest of the function in this file)
 + * assumes that the IOMMU page size is equal to PAGE_SIZE.
 + */
 +int
 +iommu_dvmamap_append_range(bus_dma_tag_t t, bus_dmamap_t map, paddr_t pa,
 +    bus_size_t length, int flags, bus_size_t boundary)
 +{
 +	struct iommu_map_state *ims = map->_dm_cookie;
 +	bus_addr_t sgstart, sgend, bd_mask;
 +	bus_dma_segment_t *seg = NULL;
 +	int i = map->dm_nsegs;
 +
 +#ifdef DEBUG
 +	if (ims == NULL)
 +		panic("iommu_dvmamap_append_range: null map state");
 +#endif
 +
 +	sgstart = iommu_iomap_translate(ims, pa);
 +	sgend = sgstart + length - 1;

 -		map->dm_nsegs = j;
  #ifdef DIAGNOSTIC
 -		{ int seg;
 -	for (seg = 0; seg < map->dm_nsegs; seg++) {
 -		if (map->dm_segs[seg].ds_addr < is->is_dvmabase ||
 -			map->dm_segs[seg].ds_addr > is->is_dvmaend) {
 -			printf("seg %d dvmaddr %lx out of range %x - %x\n",
 -				seg, (long)map->dm_segs[seg].ds_addr, 
 -				is->is_dvmabase, is->is_dvmaend);
 -			Debugger();
 +	if (sgstart == NULL || sgstart > sgend) {
 +		printf("append range invalid mapping for %lx "
 +		    "(0x%lx - 0x%lx)\n", pa, sgstart, sgend);
 +		map->dm_nsegs = 0;
 +		return (EINVAL);
 +	}
 +#endif
 +
 +#ifdef DEBUG
 +	if (trunc_page(sgstart) != trunc_page(sgend)) {
 +		printf("append range crossing page boundary! "
 +		    "pa %lx length %ld/0x%lx sgstart %lx sgend %lx\n",
 +		    pa, length, length, sgstart, sgend);
 +	}
 +#endif
 +
 +	/*
 +	 * We will attempt to merge this range with the previous entry
 +	 * (if there is one).
 +	 */
 +	if (i > 0) {
 +		seg = &map->dm_segs[i - 1];
 +		if (sgstart == seg->ds_addr + seg->ds_len) {
 +			length += seg->ds_len;
 +			sgstart = seg->ds_addr;
 +			sgend = sgstart + length - 1;
 +		} else
 +			seg = NULL;
 +	}
 +
 +	if (seg == NULL) {
 +		seg = &map->dm_segs[i];
 +		if (++i > map->_dm_segcnt) {
 +			printf("append range, out of segments (%d)\n", i);
 +			iommu_dvmamap_print_map(t, NULL, map);
 +			map->dm_nsegs = 0;
 +			return (ENOMEM);
  		}
  	}
 +
 +	/*
 +	 * At this point, "i" is the index of the *next* bus_dma_segment_t
 +	 * (the segment count, aka map->dm_nsegs) and "seg" points to the
 +	 * *current* entry.  "length", "sgstart", and "sgend" reflect what
 +	 * we intend to put in "*seg".  No assumptions should be made about
 +	 * the contents of "*seg".  Only "boundary" issue can change this
 +	 * and "boundary" is often zero, so explicitly test for that case
 +	 * (the test is strictly an optimization).
 +	 */ 
 +	if (boundary != 0) {
 +		bd_mask = ~(boundary - 1);
 +
 +		while ((sgstart & bd_mask) != (sgend & bd_mask)) {
 +			/*
 +			 * We are crossing a boundary so fill in the current
 +			 * segment with as much as possible, then grab a new
 +			 * one.
 +			 */
 +
 +			seg->ds_addr = sgstart;
 +			seg->ds_len = boundary - (sgstart & bd_mask);
 +
 +			sgstart += seg->ds_len; /* sgend stays the same */
 +			length -= seg->ds_len;
 +
 +			seg = &map->dm_segs[i];
 +			if (++i > map->_dm_segcnt) {
 +				printf("append range, out of segments\n");
 +				iommu_dvmamap_print_map(t, NULL, map);
 +				map->dm_nsegs = 0;
 +				return (E2BIG);
 +			}
  		}
 -#endif
 -		return (0);
  	}
 +
 +	seg->ds_addr = sgstart;
 +	seg->ds_len = length;
 +	map->dm_nsegs = i;
 +
 +	return (0);
 +}
 +
 +/*
 + * Populate the iomap from a bus_dma_segment_t array.  See note for
 + * iommu_dvmamap_load() * regarding page entry exhaustion of the iomap.
 + * This is less of a problem for load_seg, as the number of pages
 + * is usually similar to the number of segments (nsegs).
 + */
 +int
 +iommu_dvmamap_load_seg(bus_dma_tag_t t, struct iommu_state *is,
 +    bus_dmamap_t map, bus_dma_segment_t *segs, int nsegs, int flags,
 +    bus_size_t size, bus_size_t boundary)
 +{
 +	int i;
 +	int left;
 +	int seg;
 +
  	/*
 -	 * This was allocated with bus_dmamem_alloc.
 -	 * The pages are on an `mlist'.
 +	 * This segs is made up of individual physical
 +	 * segments, probably by _bus_dmamap_load_uio() or
 +	 * _bus_dmamap_load_mbuf().  Ignore the mlist and
 +	 * load each one individually.
  	 */
 -	map->dm_mapsize = size;
 -	i = 0;
 -	sgstart = dvmaddr;
 -	sgend = sgstart + size - 1;
 -	map->dm_segs[i].ds_addr = sgstart;
 -	while ((sgstart & ~(boundary - 1)) != (sgend & ~(boundary - 1))) {
 -		/* Oops.  We crossed a boundary.  Split the xfer. */
 -		map->dm_segs[i].ds_len = boundary - (sgstart & (boundary - 1));
 -		DPRINTF(IDB_INFO, ("iommu_dvmamap_load_raw: "
 -			"seg %d start %lx size %lx\n", i,
 -			(long)map->dm_segs[i].ds_addr,
 -			(long)map->dm_segs[i].ds_len));
 -		if (++i >= map->_dm_segcnt) {
 -			/* Too many segments.  Fail the operation. */
 -			s = splhigh();
 -			/* How can this fail?  And if it does what can we do? */
 -			err = extent_free(is->is_dvmamap,
 -				dvmaddr, sgsize, EX_NOWAIT);
 -			map->_dm_dvmastart = 0;
 -			map->_dm_dvmasize = 0;
 -			splx(s);
 -			return (E2BIG);
 +
 +	/*
 +	 * Keep in mind that each segment could span
 +	 * multiple pages and that these are not always
 +	 * adjacent. The code is no longer adding dvma
 +	 * aliases to the IOMMU.  The STC will not cross
 +	 * page boundaries anyway and a IOMMU table walk
 +	 * vs. what may be a streamed PCI DMA to a ring
 +	 * descriptor is probably a wash.  It eases TLB
 +	 * pressure and in the worst possible case, it is
 +	 * only as bad a non-IOMMUed architecture.  More
 +	 * importantly, the code is not quite as hairy.
 +	 * (It's bad enough as it is.)
 +	 */
 +	left = size;
 +	seg = 0;
 +	for (i = 0; left > 0 && i < nsegs; i++) {
 +		bus_addr_t a, aend;
 +		bus_size_t len = segs[i].ds_len;
 +		bus_addr_t addr = segs[i].ds_addr;
 +		int seg_len = min(left, len);
 +
 +		if (len < 1)
 +			continue;
 +
 +		aend = addr + seg_len - 1;
 +		for (a = trunc_page(addr); a < round_page(aend);
 +		    a += PAGE_SIZE) {
 +			bus_addr_t pgstart;
 +			bus_addr_t pgend;
 +			int pglen;
 +			int err;
 +
 +			pgstart = max(a, addr);
 +			pgend = min(a + PAGE_SIZE - 1, addr + seg_len - 1);
 +			pglen = pgend - pgstart + 1;
 +			
 +			if (pglen < 1)
 +				continue;
 +
 +			err = iommu_dvmamap_append_range(t, map, pgstart,
 +			    pglen, flags, boundary);
 +			if (err) {
 +				printf("iomap load seg page: %d for "
 +				    "pa 0x%lx (%lx - %lx for %d/%x\n",
 +				    err, a, pgstart, pgend, pglen, pglen);
 +				return (err);
 +			}
 +
  		}
 -		sgstart = roundup(sgstart, boundary);
 -		map->dm_segs[i].ds_addr = sgstart;
 +
 +		left -= seg_len;
  	}
 -	DPRINTF(IDB_INFO, ("iommu_dvmamap_load_raw: "
 -			"seg %d start %lx size %lx\n", i,
 -			(long)map->dm_segs[i].ds_addr, (long)map->dm_segs[i].ds_len));
 -	map->dm_segs[i].ds_len = sgend - sgstart + 1;
 +	return (0);
 +}

 +/*
 + * Populate the iomap from an mlist.  See note for iommu_dvmamap_load()
 + * regarding page entry exhaustion of the iomap.
 + */
 +int
 +iommu_dvmamap_load_mlist(bus_dma_tag_t t, struct iommu_state *is,
 +    bus_dmamap_t map, struct pglist *mlist, int flags,
 +    bus_size_t size, bus_size_t boundary)
 +{
 +	struct vm_page *m;
 +	paddr_t pa;
 +	int err;
 +
 +	/*
 +	 * This was allocated with bus_dmamem_alloc.
 +	 * The pages are on an `mlist'.
 +	 */
  	for (m = TAILQ_FIRST(mlist); m != NULL; m = TAILQ_NEXT(m,pageq)) {
 -		if (sgsize == 0)
 -			panic("iommu_dmamap_load_raw: size botch");
  		pa = VM_PAGE_TO_PHYS(m);

 -		DPRINTF(IDB_BUSDMA,
 -		    ("iommu_dvmamap_load_raw: map %p loading va %lx at pa %lx\n",
 -		    map, (long)dvmaddr, (long)(pa)));
 -		iommu_enter(sb, dvmaddr, pa, flags|0x8000);
 -			
 -		dvmaddr += pagesz;
 -		sgsize -= pagesz;
 +		err = iommu_dvmamap_append_range(t, map, pa, PAGE_SIZE,
 +		    flags, boundary);
 +		if (err) {
 +			printf("iomap load seg page: %d for pa 0x%lx "
 +			    "(%lx - %lx for %d/%x\n", err, pa, pa,
 +			    pa + PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
 +			return (err);
 +		}
  	}
 -	map->dm_mapsize = size;
 -	map->dm_nsegs = i+1;
 -#ifdef DIAGNOSTIC
 -	{ int seg;
 -	for (seg = 0; seg < map->dm_nsegs; seg++) {
 -		if (map->dm_segs[seg].ds_addr < is->is_dvmabase ||
 -			map->dm_segs[seg].ds_addr > is->is_dvmaend) {
 -			printf("seg %d dvmaddr %lx out of range %x - %x\n",
 -				seg, (long)map->dm_segs[seg].ds_addr, 
 -				is->is_dvmabase, is->is_dvmaend);
 +
 +	return (0);
 +}
 +
 +/*
 + * Unload a dvmamap.
 + */
 +void
 +iommu_dvmamap_unload(bus_dma_tag_t t, struct iommu_state *is, bus_dmamap_t map)
 +{
 +	struct iommu_map_state *ims = map->_dm_cookie;
 +	bus_addr_t dvmaddr = map->_dm_dvmastart;
 +	bus_size_t sgsize = map->_dm_dvmasize;
 +	int error, s;
 +
 +	/* Flush the iommu */
 +#ifdef DEBUG
 +	if (dvmaddr == 0) {
 +		printf("iommu_dvmamap_unload: No dvmastart\n");
 +#ifdef DDB
 +		if (iommudebug & IDB_BREAK)
  			Debugger();
 -		}
 +#endif
 +		return;
 +	}
 +	iommu_dvmamap_validate_map(t, is, map);
 +
 +	if (iommudebug & IDB_PRINT_MAP)
 +		iommu_dvmamap_print_map(t, is, map);
 +#endif /* DEBUG */
 +
 +	/* Remove the IOMMU entries */
 +	iommu_iomap_unload_map(is, ims);
 +
 +	/* Clear the iomap */
 +	iommu_iomap_clear_pages(ims);
 +
 +	bus_dmamap_unload(t->_parent, map);
 +
 +	/* Mark the mappings as invalid. */
 +	map->dm_mapsize = 0;
 +	map->dm_nsegs = 0;
 +
 +	s = splhigh();
 +	error = extent_free(is->is_dvmamap, dvmaddr, 
 +		sgsize, EX_NOWAIT);
 +	map->_dm_dvmastart = 0;
 +	map->_dm_dvmasize = 0;
 +	splx(s);
 +	if (error != 0)
 +		printf("warning: %ld of DVMA space lost\n", sgsize);
 +}
 +
 +/*
 + * Perform internal consistency checking on a dvmamap.
 + */
 +int
 +iommu_dvmamap_validate_map(bus_dma_tag_t t, struct iommu_state *is,
 +    bus_dmamap_t map)
 +{
 +	int err = 0;
 +	int seg;
 +
 +	if (trunc_page(map->_dm_dvmastart) != map->_dm_dvmastart) {
 +		printf("**** dvmastart address not page aligned: %lx",
 +			map->_dm_dvmastart);
 +		err = 1;
 +	}
 +	if (trunc_page(map->_dm_dvmasize) != map->_dm_dvmasize) {
 +		printf("**** dvmasize not a multiple of page size: %lx",
 +			map->_dm_dvmasize);
 +		err = 1;
 +	}
 +	if (map->_dm_dvmastart < is->is_dvmabase ||
 +	    round_page(map->_dm_dvmastart + map->_dm_dvmasize) >
 +	    is->is_dvmaend + 1) {
 +		printf("dvmaddr %lx len %lx out of range %x - %x\n",
 +			    map->_dm_dvmastart, map->_dm_dvmasize,
 +			    is->is_dvmabase, is->is_dvmaend);
 +		err = 1;
  	}
 +	for (seg = 0; seg < map->dm_nsegs; seg++) {
 +		if (map->dm_segs[seg].ds_addr == 0 ||
 +		    map->dm_segs[seg].ds_len == 0) {
 +			printf("seg %d null segment dvmaddr %lx len %lx for "
 +			    "range %lx len %lx\n",
 +			    seg,
 +			    map->dm_segs[seg].ds_addr,
 +			    map->dm_segs[seg].ds_len,
 +			    map->_dm_dvmastart, map->_dm_dvmasize);
 +			err = 1;
 +		} else if (map->dm_segs[seg].ds_addr < map->_dm_dvmastart ||
 +		    round_page(map->dm_segs[seg].ds_addr +
 +			map->dm_segs[seg].ds_len) >
 +		    map->_dm_dvmastart + map->_dm_dvmasize) {
 +			printf("seg %d dvmaddr %lx len %lx out of "
 +			    "range %lx len %lx\n",
 +			    seg,
 +			    map->dm_segs[seg].ds_addr,
 +			    map->dm_segs[seg].ds_len,
 +			    map->_dm_dvmastart, map->_dm_dvmasize);
 +			err = 1;
 +		}
  	}
 +
 +	if (err) {
 +		iommu_dvmamap_print_map(t, is, map);
 +#if defined(DDB) && defined(DEBUG)
 +		if (iommudebug & IDB_BREAK)
 +			Debugger();
  #endif
 -	return (0);
 +	}
 +
 +	return (err);
  }

  void
 -iommu_dvmamap_sync(t, sb, map, offset, len, ops)
 -	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 -	bus_dmamap_t map;
 -	bus_addr_t offset;
 -	bus_size_t len;
 -	int ops;
 +iommu_dvmamap_print_map(bus_dma_tag_t t, struct iommu_state *is,
 +    bus_dmamap_t map)
  {
 -	struct iommu_state *is = sb->sb_is;
 -	vaddr_t va = map->dm_segs[0].ds_addr + offset;
 -	int64_t tte;
 +	int seg, i;
 +	long full_len, source_len;
 +	struct mbuf *m;
 +
 +	printf("DVMA %x for %x, mapping %p: dvstart %lx dvsize %lx "
 +	    "size %ld/%lx maxsegsz %lx boundary %lx segcnt %d "
 +	    "flags %x type %d source %p "
 +	    "cookie %p mapsize %lx nsegs %d\n",
 +	    is ? is->is_dvmabase : 0, is ? is->is_dvmaend : 0, map,
 +	    map->_dm_dvmastart, map->_dm_dvmasize,
 +	    map->_dm_size, map->_dm_size, map->_dm_maxsegsz, map->_dm_boundary,
 +	    map->_dm_segcnt, map->_dm_flags, map->_dm_type,
 +	    map->_dm_source, map->_dm_cookie, map->dm_mapsize,
 +	    map->dm_nsegs);

 -	/*
 -	 * We only support one DMA segment; supporting more makes this code
 -         * too unweildy.
 -	 */
 +	full_len = 0;
 +	for (seg = 0; seg < map->dm_nsegs; seg++) {
 +		printf("seg %d dvmaddr %lx pa %lx len %lx (tte %lx)\n",
 +		    seg, map->dm_segs[seg].ds_addr,
 +		    is ? iommu_extract(is, map->dm_segs[seg].ds_addr) : 0,
 +		    map->dm_segs[seg].ds_len,
 +		    is ? iommu_lookup_tte(is, map->dm_segs[seg].ds_addr) : 0);
 +		full_len += map->dm_segs[seg].ds_len;
 +	}
 +	printf("total length = %ld/0x%lx\n", full_len, full_len);
 +
 +	if (map->_dm_source) switch (map->_dm_type) {
 +	case _DM_TYPE_MBUF:
 +		m = map->_dm_source;
 +		if (m->m_flags & M_PKTHDR)
 +			printf("source PKTHDR mbuf (%p) hdr len = %d/0x%x:\n",
 +			    m, m->m_pkthdr.len, m->m_pkthdr.len);
 +		else
 +			printf("source mbuf (%p):\n", m);

 -	if (ops & BUS_DMASYNC_PREREAD) {
 -		DPRINTF(IDB_SYNC,
 -		    ("iommu_dvmamap_sync: syncing va %p len %lu "
 -		     "BUS_DMASYNC_PREREAD\n", (void *)(u_long)va, (u_long)len));
 +		source_len = 0;
 +		for ( ; m; m = m->m_next) {
 +			vaddr_t vaddr = mtod(m, vaddr_t);
 +			long len = m->m_len;
 +			paddr_t pa;
 +
 +			if (pmap_extract(pmap_kernel(), vaddr, &pa))
 +				printf("kva %lx pa %lx len %ld/0x%lx\n",
 +				    vaddr, pa, len, len);
 +			else
 +				printf("kva %lx pa <invalid> len %ld/0x%lx\n",
 +				    vaddr, len, len);
 +
 +			source_len += len;
 +		}
 +
 +		if (full_len != source_len)
 +			printf("mbuf length %ld/0x%lx is %s than mapping "
 +			    "length %ld/0x%lx\n", source_len, source_len,
 +			    (source_len > full_len) ? "greater" : "less",
 +			    full_len, full_len);
 +		else
 +			printf("mbuf length %ld/0x%lx\n", source_len,
 +			    source_len);
 +		break;
 +	case _DM_TYPE_LOAD:
 +	case _DM_TYPE_SEGS:
 +	case _DM_TYPE_UIO:
 +	default:
 +		break;
 +	}
 +
 +	if (map->_dm_cookie) {
 +		struct iommu_map_state *ims = map->_dm_cookie;
 +		struct iommu_page_map *ipm = &ims->ims_map;
 +
 +		printf("page map (%p) of size %d with %d entries\n",
 +		    ipm, ipm->ipm_maxpage, ipm->ipm_pagecnt);
 +		for (i = 0; i < ipm->ipm_pagecnt; ++i) {
 +			struct iommu_page_entry *e = &ipm->ipm_map[i];
 +			printf("%d: vmaddr 0x%lx pa 0x%lx\n", i,
 +			    e->ipe_va, e->ipe_pa);
 +		}
 +	} else
 +		printf("iommu map state (cookie) is NULL\n");
 +}
 +
 +void
 +iommu_dvmamap_sync(bus_dma_tag_t t, struct iommu_state *is, bus_dmamap_t map,
 +	bus_addr_t offset, bus_size_t len, int ops)
 +{
 +	struct iommu_map_state *ims = map->_dm_cookie;
 +	struct strbuf_ctl *sb;
 +	bus_size_t count;
 +	int i, needsflush = 0;

 -		/* Nothing to do */;
 -	}
 -	if (ops & BUS_DMASYNC_POSTREAD) {
 -		DPRINTF(IDB_SYNC,
 -		    ("iommu_dvmamap_sync: syncing va %p len %lu "
 -		     "BUS_DMASYNC_POSTREAD\n", (void *)(u_long)va, (u_long)len));
  #ifdef DIAGNOSTIC
 -		if (va < is->is_dvmabase || va >= is->is_dvmaend)
 -			panic("iommu_dvmamap_sync: invalid dva %lx", va);
 +	if (ims == NULL)
 +		panic("iommu_dvmamap_sync: null map state");
  #endif
 -		tte = is->is_tsb[IOTSBSLOT(va, is->is_tsbsize)];
 +	sb = ims->ims_sb;

 -		DPRINTF(IDB_SYNC,
 -		    ("iommu_dvmamap_sync: syncing va %p len %lu "
 -		     "BUS_DMASYNC_PREWRITE\n", (void *)(u_long)va, (u_long)len));
 -
 -		/* if we have a streaming buffer, flush it here first */
 -		if ((tte & IOTTE_STREAM) && sb->sb_flush)
 -			while (len > 0) {
 -				DPRINTF(IDB_BUSDMA,
 -				    ("iommu_dvmamap_sync: flushing va %p, %lu "
 -				     "bytes left\n", (void *)(u_long)va, 
 -					    (u_long)len));
 -				iommu_strbuf_flush(sb, va);
 -				if (len <= NBPG) {
 -					iommu_strbuf_flush_done(sb);
 -					len = 0;
 -				} else
 -					len -= NBPG;
 -				va += NBPG;
 -			}
 +	if ((ims->ims_flags & IOMMU_MAP_STREAM) == 0 || (len == 0))
 +		return;
 +
 +	if (ops & (BUS_DMASYNC_PREREAD | BUS_DMASYNC_POSTWRITE))
 +		return;
 +
 +	if ((ops & (BUS_DMASYNC_POSTREAD | BUS_DMASYNC_PREWRITE)) == 0)
 +		return;
 +
 +	for (i = 0; i < map->dm_nsegs; i++) {
 +		if (offset < map->dm_segs[i].ds_len)
 +			break;
 +		offset -= map->dm_segs[i].ds_len;
 +	}
 +
 +	if (i == map->dm_nsegs)
 +		panic("iommu_dvmamap_sync: too short %lu", offset);
 +
 +	for (; len > 0 && i < map->dm_nsegs; i++) {
 +		count = min(map->dm_segs[i].ds_len - offset, len);
 +		if (count > 0 && iommu_dvmamap_sync_range(sb,
 +		    map->dm_segs[i].ds_addr + offset, count))
 +			needsflush = 1;
 +		len -= count;
  	}
 -	if (ops & BUS_DMASYNC_PREWRITE) {
 +
 +	if (i == map->dm_nsegs && len > 0)
 +		panic("iommu_dvmamap_sync: leftover %lu", len);
 +
 +	if (needsflush)
 +		iommu_strbuf_flush_done(ims);
 +}
 +
 +/*
 + * Flush an individual dma segment, returns non-zero if the streaming buffers
 + * need flushing afterwards.
 + */
 +int
 +iommu_dvmamap_sync_range(struct strbuf_ctl *sb, vaddr_t va, bus_size_t len)
 +{
 +	vaddr_t vaend;
  #ifdef DIAGNOSTIC
 -		if (va < is->is_dvmabase || va >= is->is_dvmaend)
 -			panic("iommu_dvmamap_sync: invalid dva %lx", va);
 -#endif
 -		tte = is->is_tsb[IOTSBSLOT(va, is->is_tsbsize)];
 +	struct iommu_state *is = sb->sb_iommu;

 -		DPRINTF(IDB_SYNC,
 -		    ("iommu_dvmamap_sync: syncing va %p len %lu "
 -		     "BUS_DMASYNC_PREWRITE\n", (void *)(u_long)va, (u_long)len));
 -
 -		/* if we have a streaming buffer, flush it here first */
 -		if ((tte & IOTTE_STREAM) && sb->sb_flush)
 -			while (len > 0) {
 -				DPRINTF(IDB_BUSDMA,
 -				    ("iommu_dvmamap_sync: flushing va %p, %lu "
 -				     "bytes left\n", (void *)(u_long)va, 
 -					    (u_long)len));
 -				iommu_strbuf_flush(sb, va);
 -				if (len <= NBPG) {
 -					iommu_strbuf_flush_done(sb);
 -					len = 0;
 -				} else
 -					len -= NBPG;
 -				va += NBPG;
 -			}
 +	if (va < is->is_dvmabase || va > is->is_dvmaend)
 +		panic("invalid va: %llx", (long long)va);
 +
 +	if ((is->is_tsb[IOTSBSLOT(va, is->is_tsbsize)] & IOTTE_STREAM) == 0) {
 +		printf("iommu_dvmamap_sync_range: attempting to flush "
 +		    "non-streaming entry\n");
 +		return (0);
  	}
 -	if (ops & BUS_DMASYNC_POSTWRITE) {
 -		DPRINTF(IDB_SYNC,
 -		    ("iommu_dvmamap_sync: syncing va %p len %lu "
 -		     "BUS_DMASYNC_POSTWRITE\n", (void *)(u_long)va, (u_long)len));
 -		/* Nothing to do */;
 +#endif
 +
 +	vaend = (va + len + PAGE_MASK) & ~PAGE_MASK;
 +	va &= ~PAGE_MASK;
 +
 +#ifdef DIAGNOSTIC
 +	if (va < is->is_dvmabase || vaend > is->is_dvmaend)
 +		panic("invalid va range: %llx to %llx (%x to %x)",
 +		    (long long)va, (long long)vaend,
 +		    is->is_dvmabase,
 +		    is->is_dvmaend);
 +#endif
 +
 +	for ( ; va <= vaend; va += PAGE_SIZE) {
 +		DPRINTF(IDB_BUSDMA,
 +		    ("iommu_dvmamap_sync_range: flushing va %p\n",
 +		    (void *)(u_long)va));
 +		iommu_strbuf_flush(sb, va);
  	}
 +
 +	return (1);
  }

  int
 -iommu_dvmamem_alloc(t, sb, size, alignment, boundary, segs, nsegs, rsegs, flags)
 -	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 -	bus_size_t size, alignment, boundary;
 -	bus_dma_segment_t *segs;
 -	int nsegs;
 -	int *rsegs;
 -	int flags;
 +iommu_dvmamem_alloc(bus_dma_tag_t t, struct iommu_state *is, bus_size_t size,
 +    bus_size_t alignment, bus_size_t boundary, bus_dma_segment_t *segs,
 +    int nsegs, int *rsegs, int flags)
  {

 -	DPRINTF(IDB_BUSDMA, ("iommu_dvmamem_alloc: sz %llx align %llx bound %llx "
 -	   "segp %p flags %d\n", (unsigned long long)size,
 -	   (unsigned long long)alignment, (unsigned long long)boundary,
 -	   segs, flags));
 +	DPRINTF(IDB_BUSDMA, ("iommu_dvmamem_alloc: sz %llx align %llx "
 +	    "bound %llx segp %p flags %d\n", (unsigned long long)size,
 +	    (unsigned long long)alignment, (unsigned long long)boundary,
 +	    segs, flags));
  	return (bus_dmamem_alloc(t->_parent, size, alignment, boundary,
 -	    segs, nsegs, rsegs, flags|BUS_DMA_DVMA));
 +	    segs, nsegs, rsegs, flags | BUS_DMA_DVMA));
  }

  void
 -iommu_dvmamem_free(t, sb, segs, nsegs)
 -	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 -	bus_dma_segment_t *segs;
 -	int nsegs;
 +iommu_dvmamem_free(bus_dma_tag_t t, struct iommu_state *is,
 +    bus_dma_segment_t *segs, int nsegs)
  {

  	DPRINTF(IDB_BUSDMA, ("iommu_dvmamem_free: segp %p nsegs %d\n",
 @@ -1017,20 +1542,14 @@
   * Check the flags to see whether we're streaming or coherent.
   */
  int
 -iommu_dvmamem_map(t, sb, segs, nsegs, size, kvap, flags)
 -	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 -	bus_dma_segment_t *segs;
 -	int nsegs;
 -	size_t size;
 -	caddr_t *kvap;
 -	int flags;
 +iommu_dvmamem_map(bus_dma_tag_t t, struct iommu_state *is,
 +    bus_dma_segment_t *segs, int nsegs, size_t size, caddr_t *kvap, int flags)
  {
  	struct vm_page *m;
  	vaddr_t va;
  	bus_addr_t addr;
  	struct pglist *mlist;
 -	int cbit;
 +	bus_addr_t cbit = 0;

  	DPRINTF(IDB_BUSDMA, ("iommu_dvmamem_map: segp %p nsegs %d size %lx\n",
  	    segs, nsegs, size));
 @@ -1049,9 +1568,10 @@
  	/* 
  	 * digest flags:
  	 */
 -	cbit = 0;
 +#if 0
  	if (flags & BUS_DMA_COHERENT)	/* Disable vcache */
  		cbit |= PMAP_NVC;
 +#endif
  	if (flags & BUS_DMA_NOCACHE)	/* sideffects */
  		cbit |= PMAP_NC;

 @@ -1066,7 +1586,8 @@
  #endif
  		addr = VM_PAGE_TO_PHYS(m);
  		DPRINTF(IDB_BUSDMA, ("iommu_dvmamem_map: "
 -		    "mapping va %lx at %llx\n", va, (unsigned long long)addr | cbit));
 +		    "mapping va %lx at %lx\n", va,
 +		    (unsigned long long)addr | cbit));
  		pmap_enter(pmap_kernel(), va, addr | cbit,
  		    VM_PROT_READ | VM_PROT_WRITE,
  		    VM_PROT_READ | VM_PROT_WRITE | PMAP_WIRED);
 @@ -1082,29 +1603,197 @@
   * Unmap DVMA mappings from kernel
   */
  void
 -iommu_dvmamem_unmap(t, sb, kva, size)
 -	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 -	caddr_t kva;
 -	size_t size;
 +iommu_dvmamem_unmap(bus_dma_tag_t t, struct iommu_state *is, caddr_t kva,
 +    size_t size)
  {

  	DPRINTF(IDB_BUSDMA, ("iommu_dvmamem_unmap: kvm %p size %lx\n",
  	    kva, size));

  #ifdef DIAGNOSTIC
 -	if ((u_long)kva & PGOFSET)
 +	if ((u_long)kva & PAGE_MASK)
  		panic("iommu_dvmamem_unmap");
  #endif

  	size = round_page(size);
  	pmap_remove(pmap_kernel(), (vaddr_t)kva, size);
  	pmap_update(pmap_kernel());
 -#if 0
 -	/*
 -	 * XXX ? is this necessary? i think so and i think other
 -	 * implementations are missing it.
 -	 */
  	uvm_km_free(kernel_map, (vaddr_t)kva, size);
 +}
 +
 +/*
 + * Create a new iomap.
 + */
 +struct iommu_map_state *
 +iommu_iomap_create(int n)
 +{
 +	struct iommu_map_state *ims;
 +	struct strbuf_flush *sbf;
 +	vaddr_t va;
 +
 +	if (n < 64)
 +		n = 64;
 +
 +	ims = malloc(sizeof(*ims) + (n - 1) * sizeof(ims->ims_map.ipm_map[0]),
 +		M_DEVBUF, M_NOWAIT);
 +	if (ims == NULL)
 +		return (NULL);
 +
 +	memset(ims, 0, sizeof *ims);
 +
 +	/* Initialize the map. */
 +	ims->ims_map.ipm_maxpage = n;
 +	SPLAY_INIT(&ims->ims_map.ipm_tree);
 +
 +	/* Initialize the flush area. */
 +	sbf = &ims->ims_flush;
 +	va = (vaddr_t)&sbf->sbf_area[0x40];
 +	va &= ~0x3f;
 +	pmap_extract(pmap_kernel(), va, &sbf->sbf_flushpa);
 +	sbf->sbf_flush = (void *)va;
 +
 +	return (ims);
 +}
 +
 +/*
 + * Destroy an iomap.
 + */
 +void
 +iommu_iomap_destroy(struct iommu_map_state *ims)
 +{
 +#ifdef DIAGNOSTIC
 +	if (ims->ims_map.ipm_pagecnt > 0)
 +		printf("iommu_iomap_destroy: %d page entries in use\n",
 +		    ims->ims_map.ipm_pagecnt);
  #endif
 +
 +	free(ims, M_DEVBUF);
 +}
 +
 +/*
 + * Utility function used by splay tree to order page entries by pa.
 + */
 +static inline int
 +iomap_compare(struct iommu_page_entry *a, struct iommu_page_entry *b)
 +{
 +	return ((a->ipe_pa > b->ipe_pa) ? 1 :
 +		(a->ipe_pa < b->ipe_pa) ? -1 : 0);
 +}
 +
 +SPLAY_PROTOTYPE(iommu_page_tree, iommu_page_entry, ipe_node, iomap_compare);
 +
 +SPLAY_GENERATE(iommu_page_tree, iommu_page_entry, ipe_node, iomap_compare);
 +
 +/*
 + * Insert a pa entry in the iomap.
 + */
 +int
 +iommu_iomap_insert_page(struct iommu_map_state *ims, paddr_t pa)
 +{
 +	struct iommu_page_map *ipm = &ims->ims_map;
 +	struct iommu_page_entry *e;
 +
 +	if (ipm->ipm_pagecnt >= ipm->ipm_maxpage) {
 +		struct iommu_page_entry ipe;
 +
 +		ipe.ipe_pa = pa;
 +		if (SPLAY_FIND(iommu_page_tree, &ipm->ipm_tree, &ipe))
 +			return (0);
 +
 +		return (ENOMEM);
 +	}
 +
 +	e = &ipm->ipm_map[ipm->ipm_pagecnt];
 +
 +	e->ipe_pa = pa;
 +	e->ipe_va = NULL;
 +
 +	e = SPLAY_INSERT(iommu_page_tree, &ipm->ipm_tree, e);
 +
 +	/* Duplicates are okay, but only count them once. */
 +	if (e)
 +		return (0);
 +
 +	++ipm->ipm_pagecnt;
 +
 +	return (0);
 +}
 +
 +/*
 + * Locate the iomap by filling in the pa->va mapping and inserting it
 + * into the IOMMU tables.
 + */
 +int
 +iommu_iomap_load_map(struct iommu_state *is, struct iommu_map_state *ims,
 +    vaddr_t vmaddr, int flags)
 +{
 +	struct iommu_page_map *ipm = &ims->ims_map;
 +	struct iommu_page_entry *e;
 +	struct strbuf_ctl *sb = ims->ims_sb;
 +	int i;
 +
 +	if (sb->sb_flush == NULL)
 +		flags &= ~BUS_DMA_STREAMING;
 +
 +	if (flags & BUS_DMA_STREAMING)
 +		ims->ims_flags |= IOMMU_MAP_STREAM;
 +	else
 +		ims->ims_flags &= ~IOMMU_MAP_STREAM;
 +
 +	for (i = 0, e = ipm->ipm_map; i < ipm->ipm_pagecnt; ++i, ++e) {
 +		e->ipe_va = vmaddr;
 +		iommu_enter(is, sb, e->ipe_va, e->ipe_pa, flags);
 +		vmaddr += PAGE_SIZE;
 +	}
 +
 +	return (0);
 +}
 +
 +/*
 + * Remove the iomap from the IOMMU.
 + */
 +int
 +iommu_iomap_unload_map(struct iommu_state *is, struct iommu_map_state *ims)
 +{
 +	struct iommu_page_map *ipm = &ims->ims_map;
 +	struct iommu_page_entry *e;
 +	struct strbuf_ctl *sb = ims->ims_sb;
 +	int i;
 +
 +	for (i = 0, e = ipm->ipm_map; i < ipm->ipm_pagecnt; ++i, ++e)
 +		iommu_remove(is, sb, e->ipe_va);
 +
 +	return (0);
  }
 +
 +/*
 + * Translate a physical address (pa) into a DVMA address.
 + */
 +vaddr_t
 +iommu_iomap_translate(struct iommu_map_state *ims, paddr_t pa)
 +{
 +	struct iommu_page_map *ipm = &ims->ims_map;
 +	struct iommu_page_entry *e;
 +	struct iommu_page_entry pe;
 +	paddr_t offset = pa & PAGE_MASK;
 +
 +	pe.ipe_pa = trunc_page(pa);
 +
 +	e = SPLAY_FIND(iommu_page_tree, &ipm->ipm_tree, &pe);
 +
 +	if (e == NULL)
 +		return (NULL);
 +
 +	return (e->ipe_va | offset);
 +}
 +
 +/*
 + * Clear the iomap table and tree.
 + */
 +void
 +iommu_iomap_clear_pages(struct iommu_map_state *ims)
 +{
 +	ims->ims_map.ipm_pagecnt = 0;
 +	SPLAY_INIT(&ims->ims_map.ipm_tree);
 +}
 +
 Index: dev/iommureg.h
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/dev/iommureg.h,v
 retrieving revision 1.8
 diff -u -r1.8 iommureg.h
 --- dev/iommureg.h	20 Mar 2002 18:54:47 -0000	1.8
 +++ dev/iommureg.h	10 Jun 2003 15:32:21 -0000
 @@ -95,9 +95,10 @@
  #define IOTTE_8K	0x0000000000000000LL
  #define IOTTE_STREAM	0x1000000000000000LL	/* Is page streamable? */
  #define	IOTTE_LOCAL	0x0800000000000000LL	/* Accesses to same bus segment? */
 -#define IOTTE_PAMASK	0x000001ffffffe000LL	/* Let's assume this is correct */
 +#define IOTTE_PAMASK	0x000007ffffffe000LL	/* Let's assume this is correct */
  #define IOTTE_C		0x0000000000000010LL	/* Accesses to cacheable space */
  #define IOTTE_W		0x0000000000000002LL	/* Writeable */
 +#define IOTTE_SOFTWARE 0x0000000000001F80LL	/* For software use (bits 12..7) */

  /*
   * On sun4u each bus controller has a separate IOMMU.  The IOMMU has 
 Index: dev/iommuvar.h
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/dev/iommuvar.h,v
 retrieving revision 1.11.6.1
 diff -u -r1.11.6.1 iommuvar.h
 --- dev/iommuvar.h	21 Jun 2002 06:25:57 -0000	1.11.6.1
 +++ dev/iommuvar.h	10 Jun 2003 15:32:21 -0000
 @@ -31,6 +31,8 @@
  #ifndef _SPARC64_DEV_IOMMUVAR_H_
  #define _SPARC64_DEV_IOMMUVAR_H_

 +#include <sys/tree.h>
 +
  /*
   * Streaming buffer control
   *
 @@ -40,13 +42,52 @@
   * of data.
   */
  struct strbuf_ctl {
 -	struct iommu_state	*sb_is;		/* Pointer to our iommu */
 +	bus_space_tag_t	sb_bustag;
  	bus_space_handle_t	sb_sb;		/* Handle for our regs */
 +	struct iommu_state *sb_is;
 +	struct iommu_state *sb_iommu;
  	paddr_t			sb_flushpa;	/* to flush streaming buffers */
  	volatile int64_t	*sb_flush;
  };

  /*
 + * per-map STC flush area
 + */
 +struct strbuf_flush {
 +	char	sbf_area[0x80];		/* Holds 64-byte long/aligned buffer */
 +	void	*sbf_flush;		/* Kernel virtual address of buffer */
 +	paddr_t	sbf_flushpa;		/* Physical address of buffer area */
 +};
 +
 +/* 
 + * per-map DVMA page table
 + */
 +struct iommu_page_entry {
 +	SPLAY_ENTRY(iommu_page_entry) ipe_node;
 +	paddr_t	ipe_pa;
 +	vaddr_t	ipe_va;
 +};
 +struct iommu_page_map {
 +	SPLAY_HEAD(iommu_page_tree, iommu_page_entry) ipm_tree;
 +	int ipm_maxpage;	/* Size of allocated page map */
 +	int ipm_pagecnt;	/* Number of entries in use */
 +	struct iommu_page_entry	ipm_map[1];
 +};
 +
 +/*
 + * per-map IOMMU state
 + *
 + * This is what bus_dvmamap_t'c _dm_cookie should be pointing to.
 + */
 +struct iommu_map_state {
 +	struct strbuf_flush ims_flush;	/* flush should be first (alignment) */
 +	struct strbuf_ctl *ims_sb;	/* Link to parent */
 +	int ims_flags;
 +	struct iommu_page_map ims_map;	/* map must be last (array at end) */
 +};
 +#define IOMMU_MAP_STREAM	1
 +
 +/*
   * per-IOMMU state
   */
  struct iommu_state {
 @@ -68,26 +109,40 @@
  /* interfaces for PCI/SBUS code */
  void	iommu_init __P((char *, struct iommu_state *, int, u_int32_t));
  void	iommu_reset __P((struct iommu_state *));
 -void    iommu_enter __P((struct strbuf_ctl *, vaddr_t, int64_t, int));
 -void    iommu_remove __P((struct iommu_state *, vaddr_t, size_t));
  paddr_t iommu_extract __P((struct iommu_state *, vaddr_t));
 -
 -int	iommu_dvmamap_load __P((bus_dma_tag_t, struct strbuf_ctl *,
 +int64_t iommu_lookup_tte(struct iommu_state *, vaddr_t);
 +int64_t iommu_fetch_tte(struct iommu_state *, paddr_t);
 +int	iommu_dvmamap_create(bus_dma_tag_t, struct iommu_state *,
 +	    struct strbuf_ctl *, bus_size_t, int, bus_size_t, bus_size_t,
 +	    int, bus_dmamap_t *);
 +void	iommu_dvmamap_destroy(bus_dma_tag_t, bus_dmamap_t);
 +int	iommu_dvmamap_load __P((bus_dma_tag_t, struct iommu_state *,
  	    bus_dmamap_t, void *, bus_size_t, struct proc *, int));
 -void	iommu_dvmamap_unload __P((bus_dma_tag_t, struct strbuf_ctl *,
 +void	iommu_dvmamap_unload __P((bus_dma_tag_t, struct iommu_state *,
  	    bus_dmamap_t));
 -int	iommu_dvmamap_load_raw __P((bus_dma_tag_t, struct strbuf_ctl *,
 +int	iommu_dvmamap_load_raw __P((bus_dma_tag_t, struct iommu_state *,
  	    bus_dmamap_t, bus_dma_segment_t *, int, int, bus_size_t));
 -void	iommu_dvmamap_sync __P((bus_dma_tag_t, struct strbuf_ctl *,
 +void	iommu_dvmamap_sync __P((bus_dma_tag_t, struct iommu_state *,
  	    bus_dmamap_t, bus_addr_t, bus_size_t, int));
 -int	iommu_dvmamem_alloc __P((bus_dma_tag_t, struct strbuf_ctl *,
 +int	iommu_dvmamem_alloc __P((bus_dma_tag_t, struct iommu_state *,
  	    bus_size_t, bus_size_t, bus_size_t, bus_dma_segment_t *,
  	    int, int *, int));
 -void	iommu_dvmamem_free __P((bus_dma_tag_t, struct strbuf_ctl *,
 +void	iommu_dvmamem_free __P((bus_dma_tag_t, struct iommu_state *,
  	    bus_dma_segment_t *, int));
 -int	iommu_dvmamem_map __P((bus_dma_tag_t, struct strbuf_ctl *,
 +int	iommu_dvmamem_map __P((bus_dma_tag_t, struct iommu_state *,
  	    bus_dma_segment_t *, int, size_t, caddr_t *, int));
 -void	iommu_dvmamem_unmap __P((bus_dma_tag_t, struct strbuf_ctl *,
 +void	iommu_dvmamem_unmap __P((bus_dma_tag_t, struct iommu_state *,
  	    caddr_t, size_t));
 +
 +#define IOMMUREG_READ(is, reg)				\
 +	bus_space_read_8((is)->is_bustag,		\
 +		(is)->is_iommu,				\
 +		IOMMUREG(reg))	
 +
 +#define IOMMUREG_WRITE(is, reg, v)			\
 +	bus_space_write_8((is)->is_bustag,		\
 +		(is)->is_iommu,				\
 +		IOMMUREG(reg),				\
 +		(v))

  #endif /* _SPARC64_DEV_IOMMUVAR_H_ */
 Index: dev/psycho.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/dev/psycho.c,v
 retrieving revision 1.48.2.3
 diff -u -r1.48.2.3 psycho.c
 --- dev/psycho.c	21 Jun 2002 06:33:11 -0000	1.48.2.3
 +++ dev/psycho.c	10 Jun 2003 15:32:23 -0000
 @@ -101,6 +101,9 @@
  static void *psycho_intr_establish __P((bus_space_tag_t, int, int, int,
  				int (*) __P((void *)), void *));

 +static int psycho_dmamap_create(bus_dma_tag_t, bus_size_t, int, bus_size_t,
 +    bus_size_t, int, bus_dmamap_t *);
 +static void psycho_dmamap_destroy(bus_dma_tag_t, bus_dmamap_t);
  static int psycho_dmamap_load __P((bus_dma_tag_t, bus_dmamap_t, void *,
  				   bus_size_t, struct proc *, int));
  static void psycho_dmamap_unload __P((bus_dma_tag_t, bus_dmamap_t));
 @@ -864,9 +867,9 @@
  	bzero(dt, sizeof *dt);
  	dt->_cookie = pp;
  	dt->_parent = pdt;
 +	dt->_dmamap_create = psycho_dmamap_create;
 +	dt->_dmamap_destroy = psycho_dmamap_destroy;
  #define PCOPY(x)	dt->x = pdt->x
 -	PCOPY(_dmamap_create);
 -	PCOPY(_dmamap_destroy);
  	dt->_dmamap_load = psycho_dmamap_load;
  	PCOPY(_dmamap_load_mbuf);
  	PCOPY(_dmamap_load_uio);
 @@ -1133,6 +1136,24 @@
   * hooks into the iommu dvma calls.
   */
  int
 +psycho_dmamap_create(bus_dma_tag_t t, bus_size_t size,
 +    int nsegments, bus_size_t maxsegsz, bus_size_t boundary, int flags,
 +    bus_dmamap_t *dmamp)
 +{
 +	struct psycho_pbm *pp = t->_cookie;
 +	struct psycho_softc *sc = pp->pp_sc;
 +
 +	return (iommu_dvmamap_create(t, sc->sc_is, &pp->pp_sb, size,
 +	    nsegments, maxsegsz, boundary, flags, dmamp));
 +}
 +
 +void
 +psycho_dmamap_destroy(bus_dma_tag_t t, bus_dmamap_t map)
 +{
 +	iommu_dvmamap_destroy(t, map); 
 +}
 +
 +int
  psycho_dmamap_load(t, map, buf, buflen, p, flags)
  	bus_dma_tag_t t;
  	bus_dmamap_t map;
 @@ -1143,7 +1164,10 @@
  {
  	struct psycho_pbm *pp = (struct psycho_pbm *)t->_cookie;

 -	return (iommu_dvmamap_load(t, &pp->pp_sb, map, buf, buflen, p, flags));
 +	if (pp->pp_sb.sb_flush == NULL)
 +		flags &= ~BUS_DMA_STREAMING;
 +
 +	return (iommu_dvmamap_load(t, pp->pp_sb.sb_is, map, buf, buflen, p, flags));
  }

  void
 @@ -1153,7 +1177,7 @@
  {
  	struct psycho_pbm *pp = (struct psycho_pbm *)t->_cookie;

 -	iommu_dvmamap_unload(t, &pp->pp_sb, map);
 +	iommu_dvmamap_unload(t, pp->pp_sb.sb_is, map);
  }

  int
 @@ -1167,7 +1191,10 @@
  {
  	struct psycho_pbm *pp = (struct psycho_pbm *)t->_cookie;

 -	return (iommu_dvmamap_load_raw(t, &pp->pp_sb, map, segs, nsegs, flags, size));
 +	if (pp->pp_sb.sb_flush == NULL)
 +		flags &= ~BUS_DMA_STREAMING;
 +
 +	return (iommu_dvmamap_load_raw(t, pp->pp_sb.sb_is, map, segs, nsegs, flags, size));
  }

  void
 @@ -1183,11 +1210,11 @@
  	if (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_PREWRITE)) {
  		/* Flush the CPU then the IOMMU */
  		bus_dmamap_sync(t->_parent, map, offset, len, ops);
 -		iommu_dvmamap_sync(t, &pp->pp_sb, map, offset, len, ops);
 +		iommu_dvmamap_sync(t, pp->pp_sb.sb_is, map, offset, len, ops);
  	}
  	if (ops & (BUS_DMASYNC_POSTREAD|BUS_DMASYNC_POSTWRITE)) {
  		/* Flush the IOMMU then the CPU */
 -		iommu_dvmamap_sync(t, &pp->pp_sb, map, offset, len, ops);
 +		iommu_dvmamap_sync(t, pp->pp_sb.sb_is, map, offset, len, ops);
  		bus_dmamap_sync(t->_parent, map, offset, len, ops);
  	}

 @@ -1206,7 +1233,7 @@
  {
  	struct psycho_pbm *pp = (struct psycho_pbm *)t->_cookie;

 -	return (iommu_dvmamem_alloc(t, &pp->pp_sb, size, alignment, boundary,
 +	return (iommu_dvmamem_alloc(t, pp->pp_sb.sb_is, size, alignment, boundary,
  	    segs, nsegs, rsegs, flags));
  }

 @@ -1218,7 +1245,7 @@
  {
  	struct psycho_pbm *pp = (struct psycho_pbm *)t->_cookie;

 -	iommu_dvmamem_free(t, &pp->pp_sb, segs, nsegs);
 +	iommu_dvmamem_free(t, pp->pp_sb.sb_is, segs, nsegs);
  }

  int
 @@ -1232,7 +1259,7 @@
  {
  	struct psycho_pbm *pp = (struct psycho_pbm *)t->_cookie;

 -	return (iommu_dvmamem_map(t, &pp->pp_sb, segs, nsegs, size, kvap, flags));
 +	return (iommu_dvmamem_map(t, pp->pp_sb.sb_is, segs, nsegs, size, kvap, flags));
  }

  void
 @@ -1243,5 +1270,5 @@
  {
  	struct psycho_pbm *pp = (struct psycho_pbm *)t->_cookie;

 -	iommu_dvmamem_unmap(t, &pp->pp_sb, kva, size);
 +	iommu_dvmamem_unmap(t, pp->pp_sb.sb_is, kva, size);
  }
 Index: dev/sbus.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/dev/sbus.c,v
 retrieving revision 1.49.6.2
 diff -u -r1.49.6.2 sbus.c
 --- dev/sbus.c	22 Nov 2002 17:39:37 -0000	1.49.6.2
 +++ dev/sbus.c	10 Jun 2003 15:32:23 -0000
 @@ -762,7 +762,7 @@
  {
  	struct sbus_softc *sc = (struct sbus_softc *)tag->_cookie;

 -	return (iommu_dvmamap_load(tag, &sc->sc_sb, map, buf, buflen, p, flags));
 +	return (iommu_dvmamap_load(tag, sc->sc_sb.sb_is, map, buf, buflen, p, flags));
  }

  int
 @@ -776,7 +776,7 @@
  {
  	struct sbus_softc *sc = (struct sbus_softc *)tag->_cookie;

 -	return (iommu_dvmamap_load_raw(tag, &sc->sc_sb, map, segs, nsegs, flags, size));
 +	return (iommu_dvmamap_load_raw(tag, sc->sc_sb.sb_is, map, segs, nsegs, flags, size));
  }

  void
 @@ -786,7 +786,7 @@
  {
  	struct sbus_softc *sc = (struct sbus_softc *)tag->_cookie;

 -	iommu_dvmamap_unload(tag, &sc->sc_sb, map);
 +	iommu_dvmamap_unload(tag, sc->sc_sb.sb_is, map);
  }

  void
 @@ -802,11 +802,11 @@
  	if (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_PREWRITE)) {
  		/* Flush the CPU then the IOMMU */
  		bus_dmamap_sync(tag->_parent, map, offset, len, ops);
 -		iommu_dvmamap_sync(tag, &sc->sc_sb, map, offset, len, ops);
 +		iommu_dvmamap_sync(tag, sc->sc_sb.sb_is, map, offset, len, ops);
  	}
  	if (ops & (BUS_DMASYNC_POSTREAD|BUS_DMASYNC_POSTWRITE)) {
  		/* Flush the IOMMU then the CPU */
 -		iommu_dvmamap_sync(tag, &sc->sc_sb, map, offset, len, ops);
 +		iommu_dvmamap_sync(tag, sc->sc_sb.sb_is, map, offset, len, ops);
  		bus_dmamap_sync(tag->_parent, map, offset, len, ops);
  	}
  }
 @@ -824,7 +824,7 @@
  {
  	struct sbus_softc *sc = (struct sbus_softc *)tag->_cookie;

 -	return (iommu_dvmamem_alloc(tag, &sc->sc_sb, size, alignment, boundary,
 +	return (iommu_dvmamem_alloc(tag, sc->sc_sb.sb_is, size, alignment, boundary,
  	    segs, nsegs, rsegs, flags));
  }

 @@ -836,7 +836,7 @@
  {
  	struct sbus_softc *sc = (struct sbus_softc *)tag->_cookie;

 -	iommu_dvmamem_free(tag, &sc->sc_sb, segs, nsegs);
 +	iommu_dvmamem_free(tag, sc->sc_sb.sb_is, segs, nsegs);
  }

  int
 @@ -850,7 +850,7 @@
  {
  	struct sbus_softc *sc = (struct sbus_softc *)tag->_cookie;

 -	return (iommu_dvmamem_map(tag, &sc->sc_sb, segs, nsegs, size, kvap, flags));
 +	return (iommu_dvmamem_map(tag, sc->sc_sb.sb_is, segs, nsegs, size, kvap, flags));
  }

  void
 @@ -861,5 +861,5 @@
  {
  	struct sbus_softc *sc = (struct sbus_softc *)tag->_cookie;

 -	iommu_dvmamem_unmap(tag, &sc->sc_sb, kva, size);
 +	iommu_dvmamem_unmap(tag, sc->sc_sb.sb_is, kva, size);
  }
 Index: include/bus.h
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/include/bus.h,v
 retrieving revision 1.39
 diff -u -r1.39 bus.h
 --- include/bus.h	21 Mar 2002 00:43:42 -0000	1.39
 +++ include/bus.h	10 Jun 2003 15:32:25 -0000
 @@ -1396,6 +1396,7 @@

  #define	BUS_DMA_NOCACHE		BUS_DMA_BUS1
  #define	BUS_DMA_DVMA		BUS_DMA_BUS2	/* Don't bother with alignment */
 +#define BUS_DMA_24BIT		BUS_DMA_BUS3	/* 24bit device */

  /* Forwards needed by prototypes below. */
  struct mbuf;
 @@ -1499,6 +1500,10 @@
  #define	bus_dmamem_mmap(t, sg, n, o, p, f)			\
  	(*(t)->_dmamem_mmap)((t), (sg), (n), (o), (p), (f))

 +void		bus_space_render_tag(
 +				bus_space_tag_t,
 +				char *,
 +				size_t);
  /*
   *	bus_dmamap_t
   *
 @@ -1524,6 +1529,7 @@
  	void		*_dm_source;	/* source mbuf, uio, etc. needed for unload */

  	void		*_dm_cookie;	/* cookie for bus-specific functions */
 +	bus_size_t  _dm_sgsize;

  	/*
  	 * PUBLIC MEMBERS: these are used by machine-independent code.
 Index: sparc64/machdep.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/sparc64/machdep.c,v
 retrieving revision 1.119.6.4
 diff -u -r1.119.6.4 machdep.c
 --- sparc64/machdep.c	22 Nov 2002 17:14:55 -0000	1.119.6.4
 +++ sparc64/machdep.c	10 Jun 2003 15:32:26 -0000
 @@ -1297,6 +1319,12 @@
  	}
  #endif
  	return (bus_dmamap_load_raw(t, map, segs, i, (bus_size_t)len, flags));
 +}
 +
 +void
 +bus_space_render_tag(bus_space_tag_t t, char* buf, size_t len)
 +{
 +	snprintf(buf, len, "<NULL>");
  }

  /*

 --ELM1056275348-15420-0_
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain; charset=ISO-8859-2
 Content-Disposition: attachment; filename=tree.h
 Content-Description: 

 /*	$OpenBSD: tree.h,v 1.7 2002/10/17 21:51:54 art Exp $	*/
 /*
  * Copyright 2002 Niels Provos <provos@citi.umich.edu>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

 #ifndef	_SYS_TREE_H_
 #define	_SYS_TREE_H_

 /*
  * This file defines data structures for different types of trees:
  * splay trees and red-black trees.
  *
  * A splay tree is a self-organizing data structure.  Every operation
  * on the tree causes a splay to happen.  The splay moves the requested
  * node to the root of the tree and partly rebalances it.
  *
  * This has the benefit that request locality causes faster lookups as
  * the requested nodes move to the top of the tree.  On the other hand,
  * every lookup causes memory writes.
  *
  * The Balance Theorem bounds the total access time for m operations
  * and n inserts on an initially empty tree as O((m + n)lg n).  The
  * amortized cost for a sequence of m accesses to a splay tree is O(lg n);
  *
  * A red-black tree is a binary search tree with the node color as an
  * extra attribute.  It fulfills a set of conditions:
  *	- every search path from the root to a leaf consists of the
  *	  same number of black nodes,
  *	- each red node (except for the root) has a black parent,
  *	- each leaf node is black.
  *
  * Every operation on a red-black tree is bounded as O(lg n).
  * The maximum height of a red-black tree is 2lg (n+1).
  */

 #define SPLAY_HEAD(name, type)						\
 struct name {								\
 	struct type *sph_root; /* root of the tree */			\
 }

 #define SPLAY_INITIALIZER(root)						\
 	{ NULL }

 #define SPLAY_INIT(root) do {						\
 	(root)->sph_root = NULL;					\
 } while (0)

 #define SPLAY_ENTRY(type)						\
 struct {								\
 	struct type *spe_left; /* left element */			\
 	struct type *spe_right; /* right element */			\
 }

 #define SPLAY_LEFT(elm, field)		(elm)->field.spe_left
 #define SPLAY_RIGHT(elm, field)		(elm)->field.spe_right
 #define SPLAY_ROOT(head)		(head)->sph_root
 #define SPLAY_EMPTY(head)		(SPLAY_ROOT(head) == NULL)

 /* SPLAY_ROTATE_{LEFT,RIGHT} expect that tmp hold SPLAY_{RIGHT,LEFT} */
 #define SPLAY_ROTATE_RIGHT(head, tmp, field) do {			\
 	SPLAY_LEFT((head)->sph_root, field) = SPLAY_RIGHT(tmp, field);	\
 	SPLAY_RIGHT(tmp, field) = (head)->sph_root;			\
 	(head)->sph_root = tmp;						\
 } while (0)

 #define SPLAY_ROTATE_LEFT(head, tmp, field) do {			\
 	SPLAY_RIGHT((head)->sph_root, field) = SPLAY_LEFT(tmp, field);	\
 	SPLAY_LEFT(tmp, field) = (head)->sph_root;			\
 	(head)->sph_root = tmp;						\
 } while (0)

 #define SPLAY_LINKLEFT(head, tmp, field) do {				\
 	SPLAY_LEFT(tmp, field) = (head)->sph_root;			\
 	tmp = (head)->sph_root;						\
 	(head)->sph_root = SPLAY_LEFT((head)->sph_root, field);		\
 } while (0)

 #define SPLAY_LINKRIGHT(head, tmp, field) do {				\
 	SPLAY_RIGHT(tmp, field) = (head)->sph_root;			\
 	tmp = (head)->sph_root;						\
 	(head)->sph_root = SPLAY_RIGHT((head)->sph_root, field);	\
 } while (0)

 #define SPLAY_ASSEMBLE(head, node, left, right, field) do {		\
 	SPLAY_RIGHT(left, field) = SPLAY_LEFT((head)->sph_root, field);	\
 	SPLAY_LEFT(right, field) = SPLAY_RIGHT((head)->sph_root, field);\
 	SPLAY_LEFT((head)->sph_root, field) = SPLAY_RIGHT(node, field);	\
 	SPLAY_RIGHT((head)->sph_root, field) = SPLAY_LEFT(node, field);	\
 } while (0)

 /* Generates prototypes and inline functions */

 #define SPLAY_PROTOTYPE(name, type, field, cmp)				\
 void name##_SPLAY(struct name *, struct type *);			\
 void name##_SPLAY_MINMAX(struct name *, int);				\
 struct type *name##_SPLAY_INSERT(struct name *, struct type *);		\
 struct type *name##_SPLAY_REMOVE(struct name *, struct type *);		\
 									\
 /* Finds the node with the same key as elm */				\
 static __inline struct type *						\
 name##_SPLAY_FIND(struct name *head, struct type *elm)			\
 {									\
 	if (SPLAY_EMPTY(head))						\
 		return(NULL);						\
 	name##_SPLAY(head, elm);					\
 	if ((cmp)(elm, (head)->sph_root) == 0)				\
 		return (head->sph_root);				\
 	return (NULL);							\
 }									\
 									\
 static __inline struct type *						\
 name##_SPLAY_NEXT(struct name *head, struct type *elm)			\
 {									\
 	name##_SPLAY(head, elm);					\
 	if (SPLAY_RIGHT(elm, field) != NULL) {				\
 		elm = SPLAY_RIGHT(elm, field);				\
 		while (SPLAY_LEFT(elm, field) != NULL) {		\
 			elm = SPLAY_LEFT(elm, field);			\
 		}							\
 	} else								\
 		elm = NULL;						\
 	return (elm);							\
 }									\
 									\
 static __inline struct type *						\
 name##_SPLAY_MIN_MAX(struct name *head, int val)			\
 {									\
 	name##_SPLAY_MINMAX(head, val);					\
         return (SPLAY_ROOT(head));					\
 }

 /* Main splay operation.
  * Moves node close to the key of elm to top
  */
 #define SPLAY_GENERATE(name, type, field, cmp)				\
 struct type *								\
 name##_SPLAY_INSERT(struct name *head, struct type *elm)		\
 {									\
     if (SPLAY_EMPTY(head)) {						\
 	    SPLAY_LEFT(elm, field) = SPLAY_RIGHT(elm, field) = NULL;	\
     } else {								\
 	    int __comp;							\
 	    name##_SPLAY(head, elm);					\
 	    __comp = (cmp)(elm, (head)->sph_root);			\
 	    if(__comp < 0) {						\
 		    SPLAY_LEFT(elm, field) = SPLAY_LEFT((head)->sph_root, field);\
 		    SPLAY_RIGHT(elm, field) = (head)->sph_root;		\
 		    SPLAY_LEFT((head)->sph_root, field) = NULL;		\
 	    } else if (__comp > 0) {					\
 		    SPLAY_RIGHT(elm, field) = SPLAY_RIGHT((head)->sph_root, field);\
 		    SPLAY_LEFT(elm, field) = (head)->sph_root;		\
 		    SPLAY_RIGHT((head)->sph_root, field) = NULL;	\
 	    } else							\
 		    return ((head)->sph_root);				\
     }									\
     (head)->sph_root = (elm);						\
     return (NULL);							\
 }									\
 									\
 struct type *								\
 name##_SPLAY_REMOVE(struct name *head, struct type *elm)		\
 {									\
 	struct type *__tmp;						\
 	if (SPLAY_EMPTY(head))						\
 		return (NULL);						\
 	name##_SPLAY(head, elm);					\
 	if ((cmp)(elm, (head)->sph_root) == 0) {			\
 		if (SPLAY_LEFT((head)->sph_root, field) == NULL) {	\
 			(head)->sph_root = SPLAY_RIGHT((head)->sph_root, field);\
 		} else {						\
 			__tmp = SPLAY_RIGHT((head)->sph_root, field);	\
 			(head)->sph_root = SPLAY_LEFT((head)->sph_root, field);\
 			name##_SPLAY(head, elm);			\
 			SPLAY_RIGHT((head)->sph_root, field) = __tmp;	\
 		}							\
 		return (elm);						\
 	}								\
 	return (NULL);							\
 }									\
 									\
 void									\
 name##_SPLAY(struct name *head, struct type *elm)			\
 {									\
 	struct type __node, *__left, *__right, *__tmp;			\
 	int __comp;							\
 \
 	SPLAY_LEFT(&__node, field) = SPLAY_RIGHT(&__node, field) = NULL;\
 	__left = __right = &__node;					\
 \
 	while ((__comp = (cmp)(elm, (head)->sph_root))) {		\
 		if (__comp < 0) {					\
 			__tmp = SPLAY_LEFT((head)->sph_root, field);	\
 			if (__tmp == NULL)				\
 				break;					\
 			if ((cmp)(elm, __tmp) < 0){			\
 				SPLAY_ROTATE_RIGHT(head, __tmp, field);	\
 				if (SPLAY_LEFT((head)->sph_root, field) == NULL)\
 					break;				\
 			}						\
 			SPLAY_LINKLEFT(head, __right, field);		\
 		} else if (__comp > 0) {				\
 			__tmp = SPLAY_RIGHT((head)->sph_root, field);	\
 			if (__tmp == NULL)				\
 				break;					\
 			if ((cmp)(elm, __tmp) > 0){			\
 				SPLAY_ROTATE_LEFT(head, __tmp, field);	\
 				if (SPLAY_RIGHT((head)->sph_root, field) == NULL)\
 					break;				\
 			}						\
 			SPLAY_LINKRIGHT(head, __left, field);		\
 		}							\
 	}								\
 	SPLAY_ASSEMBLE(head, &__node, __left, __right, field);		\
 }									\
 									\
 /* Splay with either the minimum or the maximum element			\
  * Used to find minimum or maximum element in tree.			\
  */									\
 void name##_SPLAY_MINMAX(struct name *head, int __comp) \
 {									\
 	struct type __node, *__left, *__right, *__tmp;			\
 \
 	SPLAY_LEFT(&__node, field) = SPLAY_RIGHT(&__node, field) = NULL;\
 	__left = __right = &__node;					\
 \
 	while (1) {							\
 		if (__comp < 0) {					\
 			__tmp = SPLAY_LEFT((head)->sph_root, field);	\
 			if (__tmp == NULL)				\
 				break;					\
 			if (__comp < 0){				\
 				SPLAY_ROTATE_RIGHT(head, __tmp, field);	\
 				if (SPLAY_LEFT((head)->sph_root, field) == NULL)\
 					break;				\
 			}						\
 			SPLAY_LINKLEFT(head, __right, field);		\
 		} else if (__comp > 0) {				\
 			__tmp = SPLAY_RIGHT((head)->sph_root, field);	\
 			if (__tmp == NULL)				\
 				break;					\
 			if (__comp > 0) {				\
 				SPLAY_ROTATE_LEFT(head, __tmp, field);	\
 				if (SPLAY_RIGHT((head)->sph_root, field) == NULL)\
 					break;				\
 			}						\
 			SPLAY_LINKRIGHT(head, __left, field);		\
 		}							\
 	}								\
 	SPLAY_ASSEMBLE(head, &__node, __left, __right, field);		\
 }

 #define SPLAY_NEGINF	-1
 #define SPLAY_INF	1

 #define SPLAY_INSERT(name, x, y)	name##_SPLAY_INSERT(x, y)
 #define SPLAY_REMOVE(name, x, y)	name##_SPLAY_REMOVE(x, y)
 #define SPLAY_FIND(name, x, y)		name##_SPLAY_FIND(x, y)
 #define SPLAY_NEXT(name, x, y)		name##_SPLAY_NEXT(x, y)
 #define SPLAY_MIN(name, x)		(SPLAY_EMPTY(x) ? NULL	\
 					: name##_SPLAY_MIN_MAX(x, SPLAY_NEGINF))
 #define SPLAY_MAX(name, x)		(SPLAY_EMPTY(x) ? NULL	\
 					: name##_SPLAY_MIN_MAX(x, SPLAY_INF))

 #define SPLAY_FOREACH(x, name, head)					\
 	for ((x) = SPLAY_MIN(name, head);				\
 	     (x) != NULL;						\
 	     (x) = SPLAY_NEXT(name, head, x))

 /* Macros that define a red-back tree */
 #define RB_HEAD(name, type)						\
 struct name {								\
 	struct type *rbh_root; /* root of the tree */			\
 }

 #define RB_INITIALIZER(root)						\
 	{ NULL }

 #define RB_INIT(root) do {						\
 	(root)->rbh_root = NULL;					\
 } while (0)

 #define RB_BLACK	0
 #define RB_RED		1
 #define RB_ENTRY(type)							\
 struct {								\
 	struct type *rbe_left;		/* left element */		\
 	struct type *rbe_right;		/* right element */		\
 	struct type *rbe_parent;	/* parent element */		\
 	int rbe_color;			/* node color */		\
 }

 #define RB_LEFT(elm, field)		(elm)->field.rbe_left
 #define RB_RIGHT(elm, field)		(elm)->field.rbe_right
 #define RB_PARENT(elm, field)		(elm)->field.rbe_parent
 #define RB_COLOR(elm, field)		(elm)->field.rbe_color
 #define RB_ROOT(head)			(head)->rbh_root
 #define RB_EMPTY(head)			(RB_ROOT(head) == NULL)

 #define RB_SET(elm, parent, field) do {					\
 	RB_PARENT(elm, field) = parent;					\
 	RB_LEFT(elm, field) = RB_RIGHT(elm, field) = NULL;		\
 	RB_COLOR(elm, field) = RB_RED;					\
 } while (0)

 #define RB_SET_BLACKRED(black, red, field) do {				\
 	RB_COLOR(black, field) = RB_BLACK;				\
 	RB_COLOR(red, field) = RB_RED;					\
 } while (0)

 #ifndef RB_AUGMENT
 #define RB_AUGMENT(x)
 #endif

 #define RB_ROTATE_LEFT(head, elm, tmp, field) do {			\
 	(tmp) = RB_RIGHT(elm, field);					\
 	if ((RB_RIGHT(elm, field) = RB_LEFT(tmp, field))) {		\
 		RB_PARENT(RB_LEFT(tmp, field), field) = (elm);		\
 	}								\
 	RB_AUGMENT(elm);						\
 	if ((RB_PARENT(tmp, field) = RB_PARENT(elm, field))) {		\
 		if ((elm) == RB_LEFT(RB_PARENT(elm, field), field))	\
 			RB_LEFT(RB_PARENT(elm, field), field) = (tmp);	\
 		else							\
 			RB_RIGHT(RB_PARENT(elm, field), field) = (tmp);	\
 	} else								\
 		(head)->rbh_root = (tmp);				\
 	RB_LEFT(tmp, field) = (elm);					\
 	RB_PARENT(elm, field) = (tmp);					\
 	RB_AUGMENT(tmp);						\
 	if ((RB_PARENT(tmp, field)))					\
 		RB_AUGMENT(RB_PARENT(tmp, field));			\
 } while (0)

 #define RB_ROTATE_RIGHT(head, elm, tmp, field) do {			\
 	(tmp) = RB_LEFT(elm, field);					\
 	if ((RB_LEFT(elm, field) = RB_RIGHT(tmp, field))) {		\
 		RB_PARENT(RB_RIGHT(tmp, field), field) = (elm);		\
 	}								\
 	RB_AUGMENT(elm);						\
 	if ((RB_PARENT(tmp, field) = RB_PARENT(elm, field))) {		\
 		if ((elm) == RB_LEFT(RB_PARENT(elm, field), field))	\
 			RB_LEFT(RB_PARENT(elm, field), field) = (tmp);	\
 		else							\
 			RB_RIGHT(RB_PARENT(elm, field), field) = (tmp);	\
 	} else								\
 		(head)->rbh_root = (tmp);				\
 	RB_RIGHT(tmp, field) = (elm);					\
 	RB_PARENT(elm, field) = (tmp);					\
 	RB_AUGMENT(tmp);						\
 	if ((RB_PARENT(tmp, field)))					\
 		RB_AUGMENT(RB_PARENT(tmp, field));			\
 } while (0)

 /* Generates prototypes and inline functions */
 #define RB_PROTOTYPE(name, type, field, cmp)				\
 void name##_RB_INSERT_COLOR(struct name *, struct type *);	\
 void name##_RB_REMOVE_COLOR(struct name *, struct type *, struct type *);\
 struct type *name##_RB_REMOVE(struct name *, struct type *);		\
 struct type *name##_RB_INSERT(struct name *, struct type *);		\
 struct type *name##_RB_FIND(struct name *, struct type *);		\
 struct type *name##_RB_NEXT(struct name *, struct type *);		\
 struct type *name##_RB_MINMAX(struct name *, int);			\
 									\

 /* Main rb operation.
  * Moves node close to the key of elm to top
  */
 #define RB_GENERATE(name, type, field, cmp)				\
 void									\
 name##_RB_INSERT_COLOR(struct name *head, struct type *elm)		\
 {									\
 	struct type *parent, *gparent, *tmp;				\
 	while ((parent = RB_PARENT(elm, field)) &&			\
 	    RB_COLOR(parent, field) == RB_RED) {			\
 		gparent = RB_PARENT(parent, field);			\
 		if (parent == RB_LEFT(gparent, field)) {		\
 			tmp = RB_RIGHT(gparent, field);			\
 			if (tmp && RB_COLOR(tmp, field) == RB_RED) {	\
 				RB_COLOR(tmp, field) = RB_BLACK;	\
 				RB_SET_BLACKRED(parent, gparent, field);\
 				elm = gparent;				\
 				continue;				\
 			}						\
 			if (RB_RIGHT(parent, field) == elm) {		\
 				RB_ROTATE_LEFT(head, parent, tmp, field);\
 				tmp = parent;				\
 				parent = elm;				\
 				elm = tmp;				\
 			}						\
 			RB_SET_BLACKRED(parent, gparent, field);	\
 			RB_ROTATE_RIGHT(head, gparent, tmp, field);	\
 		} else {						\
 			tmp = RB_LEFT(gparent, field);			\
 			if (tmp && RB_COLOR(tmp, field) == RB_RED) {	\
 				RB_COLOR(tmp, field) = RB_BLACK;	\
 				RB_SET_BLACKRED(parent, gparent, field);\
 				elm = gparent;				\
 				continue;				\
 			}						\
 			if (RB_LEFT(parent, field) == elm) {		\
 				RB_ROTATE_RIGHT(head, parent, tmp, field);\
 				tmp = parent;				\
 				parent = elm;				\
 				elm = tmp;				\
 			}						\
 			RB_SET_BLACKRED(parent, gparent, field);	\
 			RB_ROTATE_LEFT(head, gparent, tmp, field);	\
 		}							\
 	}								\
 	RB_COLOR(head->rbh_root, field) = RB_BLACK;			\
 }									\
 									\
 void									\
 name##_RB_REMOVE_COLOR(struct name *head, struct type *parent, struct type *elm) \
 {									\
 	struct type *tmp;						\
 	while ((elm == NULL || RB_COLOR(elm, field) == RB_BLACK) &&	\
 	    elm != RB_ROOT(head)) {					\
 		if (RB_LEFT(parent, field) == elm) {			\
 			tmp = RB_RIGHT(parent, field);			\
 			if (RB_COLOR(tmp, field) == RB_RED) {		\
 				RB_SET_BLACKRED(tmp, parent, field);	\
 				RB_ROTATE_LEFT(head, parent, tmp, field);\
 				tmp = RB_RIGHT(parent, field);		\
 			}						\
 			if ((RB_LEFT(tmp, field) == NULL ||		\
 			    RB_COLOR(RB_LEFT(tmp, field), field) == RB_BLACK) &&\
 			    (RB_RIGHT(tmp, field) == NULL ||		\
 			    RB_COLOR(RB_RIGHT(tmp, field), field) == RB_BLACK)) {\
 				RB_COLOR(tmp, field) = RB_RED;		\
 				elm = parent;				\
 				parent = RB_PARENT(elm, field);		\
 			} else {					\
 				if (RB_RIGHT(tmp, field) == NULL ||	\
 				    RB_COLOR(RB_RIGHT(tmp, field), field) == RB_BLACK) {\
 					struct type *oleft;		\
 					if ((oleft = RB_LEFT(tmp, field)))\
 						RB_COLOR(oleft, field) = RB_BLACK;\
 					RB_COLOR(tmp, field) = RB_RED;	\
 					RB_ROTATE_RIGHT(head, tmp, oleft, field);\
 					tmp = RB_RIGHT(parent, field);	\
 				}					\
 				RB_COLOR(tmp, field) = RB_COLOR(parent, field);\
 				RB_COLOR(parent, field) = RB_BLACK;	\
 				if (RB_RIGHT(tmp, field))		\
 					RB_COLOR(RB_RIGHT(tmp, field), field) = RB_BLACK;\
 				RB_ROTATE_LEFT(head, parent, tmp, field);\
 				elm = RB_ROOT(head);			\
 				break;					\
 			}						\
 		} else {						\
 			tmp = RB_LEFT(parent, field);			\
 			if (RB_COLOR(tmp, field) == RB_RED) {		\
 				RB_SET_BLACKRED(tmp, parent, field);	\
 				RB_ROTATE_RIGHT(head, parent, tmp, field);\
 				tmp = RB_LEFT(parent, field);		\
 			}						\
 			if ((RB_LEFT(tmp, field) == NULL ||		\
 			    RB_COLOR(RB_LEFT(tmp, field), field) == RB_BLACK) &&\
 			    (RB_RIGHT(tmp, field) == NULL ||		\
 			    RB_COLOR(RB_RIGHT(tmp, field), field) == RB_BLACK)) {\
 				RB_COLOR(tmp, field) = RB_RED;		\
 				elm = parent;				\
 				parent = RB_PARENT(elm, field);		\
 			} else {					\
 				if (RB_LEFT(tmp, field) == NULL ||	\
 				    RB_COLOR(RB_LEFT(tmp, field), field) == RB_BLACK) {\
 					struct type *oright;		\
 					if ((oright = RB_RIGHT(tmp, field)))\
 						RB_COLOR(oright, field) = RB_BLACK;\
 					RB_COLOR(tmp, field) = RB_RED;	\
 					RB_ROTATE_LEFT(head, tmp, oright, field);\
 					tmp = RB_LEFT(parent, field);	\
 				}					\
 				RB_COLOR(tmp, field) = RB_COLOR(parent, field);\
 				RB_COLOR(parent, field) = RB_BLACK;	\
 				if (RB_LEFT(tmp, field))		\
 					RB_COLOR(RB_LEFT(tmp, field), field) = RB_BLACK;\
 				RB_ROTATE_RIGHT(head, parent, tmp, field);\
 				elm = RB_ROOT(head);			\
 				break;					\
 			}						\
 		}							\
 	}								\
 	if (elm)							\
 		RB_COLOR(elm, field) = RB_BLACK;			\
 }									\
 									\
 struct type *								\
 name##_RB_REMOVE(struct name *head, struct type *elm)			\
 {									\
 	struct type *child, *parent, *old = elm;			\
 	int color;							\
 	if (RB_LEFT(elm, field) == NULL)				\
 		child = RB_RIGHT(elm, field);				\
 	else if (RB_RIGHT(elm, field) == NULL)				\
 		child = RB_LEFT(elm, field);				\
 	else {								\
 		struct type *left;					\
 		elm = RB_RIGHT(elm, field);				\
 		while ((left = RB_LEFT(elm, field)))			\
 			elm = left;					\
 		child = RB_RIGHT(elm, field);				\
 		parent = RB_PARENT(elm, field);				\
 		color = RB_COLOR(elm, field);				\
 		if (child)						\
 			RB_PARENT(child, field) = parent;		\
 		if (parent) {						\
 			if (RB_LEFT(parent, field) == elm)		\
 				RB_LEFT(parent, field) = child;		\
 			else						\
 				RB_RIGHT(parent, field) = child;	\
 			RB_AUGMENT(parent);				\
 		} else							\
 			RB_ROOT(head) = child;				\
 		if (RB_PARENT(elm, field) == old)			\
 			parent = elm;					\
 		(elm)->field = (old)->field;				\
 		if (RB_PARENT(old, field)) {				\
 			if (RB_LEFT(RB_PARENT(old, field), field) == old)\
 				RB_LEFT(RB_PARENT(old, field), field) = elm;\
 			else						\
 				RB_RIGHT(RB_PARENT(old, field), field) = elm;\
 			RB_AUGMENT(RB_PARENT(old, field));		\
 		} else							\
 			RB_ROOT(head) = elm;				\
 		RB_PARENT(RB_LEFT(old, field), field) = elm;		\
 		if (RB_RIGHT(old, field))				\
 			RB_PARENT(RB_RIGHT(old, field), field) = elm;	\
 		if (parent) {						\
 			left = parent;					\
 			do {						\
 				RB_AUGMENT(left);			\
 			} while ((left = RB_PARENT(left, field)));	\
 		}							\
 		goto color;						\
 	}								\
 	parent = RB_PARENT(elm, field);					\
 	color = RB_COLOR(elm, field);					\
 	if (child)							\
 		RB_PARENT(child, field) = parent;			\
 	if (parent) {							\
 		if (RB_LEFT(parent, field) == elm)			\
 			RB_LEFT(parent, field) = child;			\
 		else							\
 			RB_RIGHT(parent, field) = child;		\
 		RB_AUGMENT(parent);					\
 	} else								\
 		RB_ROOT(head) = child;					\
 color:									\
 	if (color == RB_BLACK)						\
 		name##_RB_REMOVE_COLOR(head, parent, child);		\
 	return (old);							\
 }									\
 									\
 /* Inserts a node into the RB tree */					\
 struct type *								\
 name##_RB_INSERT(struct name *head, struct type *elm)			\
 {									\
 	struct type *tmp;						\
 	struct type *parent = NULL;					\
 	int comp = 0;							\
 	tmp = RB_ROOT(head);						\
 	while (tmp) {							\
 		parent = tmp;						\
 		comp = (cmp)(elm, parent);				\
 		if (comp < 0)						\
 			tmp = RB_LEFT(tmp, field);			\
 		else if (comp > 0)					\
 			tmp = RB_RIGHT(tmp, field);			\
 		else							\
 			return (tmp);					\
 	}								\
 	RB_SET(elm, parent, field);					\
 	if (parent != NULL) {						\
 		if (comp < 0)						\
 			RB_LEFT(parent, field) = elm;			\
 		else							\
 			RB_RIGHT(parent, field) = elm;			\
 		RB_AUGMENT(parent);					\
 	} else								\
 		RB_ROOT(head) = elm;					\
 	name##_RB_INSERT_COLOR(head, elm);				\
 	return (NULL);							\
 }									\
 									\
 /* Finds the node with the same key as elm */				\
 struct type *								\
 name##_RB_FIND(struct name *head, struct type *elm)			\
 {									\
 	struct type *tmp = RB_ROOT(head);				\
 	int comp;							\
 	while (tmp) {							\
 		comp = cmp(elm, tmp);					\
 		if (comp < 0)						\
 			tmp = RB_LEFT(tmp, field);			\
 		else if (comp > 0)					\
 			tmp = RB_RIGHT(tmp, field);			\
 		else							\
 			return (tmp);					\
 	}								\
 	return (NULL);							\
 }									\
 									\
 struct type *								\
 name##_RB_NEXT(struct name *head, struct type *elm)			\
 {									\
 	if (RB_RIGHT(elm, field)) {					\
 		elm = RB_RIGHT(elm, field);				\
 		while (RB_LEFT(elm, field))				\
 			elm = RB_LEFT(elm, field);			\
 	} else {							\
 		if (RB_PARENT(elm, field) &&				\
 		    (elm == RB_LEFT(RB_PARENT(elm, field), field)))	\
 			elm = RB_PARENT(elm, field);			\
 		else {							\
 			while (RB_PARENT(elm, field) &&			\
 			    (elm == RB_RIGHT(RB_PARENT(elm, field), field)))\
 				elm = RB_PARENT(elm, field);		\
 			elm = RB_PARENT(elm, field);			\
 		}							\
 	}								\
 	return (elm);							\
 }									\
 									\
 struct type *								\
 name##_RB_MINMAX(struct name *head, int val)				\
 {									\
 	struct type *tmp = RB_ROOT(head);				\
 	struct type *parent = NULL;					\
 	while (tmp) {							\
 		parent = tmp;						\
 		if (val < 0)						\
 			tmp = RB_LEFT(tmp, field);			\
 		else							\
 			tmp = RB_RIGHT(tmp, field);			\
 	}								\
 	return (parent);						\
 }

 #define RB_NEGINF	-1
 #define RB_INF	1

 #define RB_INSERT(name, x, y)	name##_RB_INSERT(x, y)
 #define RB_REMOVE(name, x, y)	name##_RB_REMOVE(x, y)
 #define RB_FIND(name, x, y)	name##_RB_FIND(x, y)
 #define RB_NEXT(name, x, y)	name##_RB_NEXT(x, y)
 #define RB_MIN(name, x)		name##_RB_MINMAX(x, RB_NEGINF)
 #define RB_MAX(name, x)		name##_RB_MINMAX(x, RB_INF)

 #define RB_FOREACH(x, name, head)					\
 	for ((x) = RB_MIN(name, head);					\
 	     (x) != NULL;						\
 	     (x) = name##_RB_NEXT(head, x))

 #endif	/* _SYS_TREE_H_ */

 --ELM1056275348-15420-0_--

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@gnats.netbsd.org
Cc:  
Subject: re: port-sparc64/13654
Date: Wed, 31 Mar 2004 15:53:32 +1000

 i've updated this patch to -current and it still works.  some of it had
 been already independantly merged by martin it seems..



 .mrg.


 Index: dev/iommu.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/dev/iommu.c,v
 retrieving revision 1.73
 diff -p -u -r1.73 iommu.c
 --- dev/iommu.c	22 Mar 2004 12:20:52 -0000	1.73
 +++ dev/iommu.c	31 Mar 2004 05:52:32 -0000
 @@ -1,6 +1,7 @@
  /*	$NetBSD: iommu.c,v 1.73 2004/03/22 12:20:52 nakayama Exp $	*/

  /*
 + * Copyright (c) 2003 Henric Jungheim
   * Copyright (c) 2001, 2002 Eduardo Horvath
   * Copyright (c) 1999, 2000 Matthew R. Green
   * All rights reserved.
 @@ -29,6 +30,8 @@
   * SUCH DAMAGE.
   */

 +/* from: OpenBSD: iommu.c,v 1.29 2003/05/22 21:16:29 henric Exp */
 +
  /*
   * UltraSPARC IOMMU support; used by both the sbus and pci code.
   */
 @@ -43,7 +46,7 @@ __KERNEL_RCSID(0, "$NetBSD: iommu.c,v 1.
  #include <sys/malloc.h>
  #include <sys/systm.h>
  #include <sys/device.h>
 -#include <sys/proc.h>
 +#include <sys/mbuf.h>

  #include <uvm/uvm_extern.h>

 @@ -55,24 +58,68 @@ __KERNEL_RCSID(0, "$NetBSD: iommu.c,v 1.
  #include <machine/autoconf.h>
  #include <machine/cpu.h>

 +#ifdef DDB
 +#include <machine/db_machdep.h>
 +#include <ddb/db_sym.h>
 +#include <ddb/db_extern.h>
 +#endif
 +
  #ifdef DEBUG
  #define IDB_BUSDMA	0x1
  #define IDB_IOMMU	0x2
  #define IDB_INFO	0x4
 -#define	IDB_SYNC	0x8
 +#define IDB_SYNC	0x8
 +#define IDB_XXX		0x10
 +#define IDB_PRINT_MAP	0x20
 +#define IDB_BREAK	0x40
  int iommudebug = 0x0;
  #define DPRINTF(l, s)   do { if (iommudebug & l) printf s; } while (0)
  #else
  #define DPRINTF(l, s)
  #endif

 -#define iommu_strbuf_flush(i, v) do {					\
 -	if ((i)->sb_flush)						\
 -		bus_space_write_8((i)->sb_is->is_bustag, (i)->sb_sb,	\
 -			STRBUFREG(strbuf_pgflush), (v));		\
 -	} while (0)
 +void iommu_enter(struct iommu_state *, struct strbuf_ctl *, vaddr_t, paddr_t,
 +    int);
 +void iommu_remove(struct iommu_state *, struct strbuf_ctl *, vaddr_t);
 +int iommu_dvmamap_sync_range(struct strbuf_ctl*, vaddr_t, bus_size_t);
 +int iommu_strbuf_flush_done(struct iommu_map_state *);
 +int iommu_dvmamap_load_seg(bus_dma_tag_t, struct iommu_state *,
 +    bus_dmamap_t, bus_dma_segment_t *, int, int, bus_size_t, bus_size_t);
 +int iommu_dvmamap_load_mlist(bus_dma_tag_t, struct iommu_state *,
 +    bus_dmamap_t, struct pglist *, int, bus_size_t, bus_size_t);
 +int iommu_dvmamap_validate_map(bus_dma_tag_t, struct iommu_state *,
 +    bus_dmamap_t);
 +void iommu_dvmamap_print_map(bus_dma_tag_t, struct iommu_state *,
 +    bus_dmamap_t);
 +int iommu_dvmamap_append_range(bus_dma_tag_t, bus_dmamap_t, paddr_t,
 +    bus_size_t, int, bus_size_t);
 +int64_t iommu_tsb_entry(struct iommu_state *, vaddr_t);
 +void strbuf_reset(struct strbuf_ctl *);
 +int iommu_iomap_insert_page(struct iommu_map_state *, paddr_t);
 +vaddr_t iommu_iomap_translate(struct iommu_map_state *, paddr_t);
 +int iommu_iomap_load_map(struct iommu_state *, struct iommu_map_state *,
 +    vaddr_t, int);
 +int iommu_iomap_unload_map(struct iommu_state *, struct iommu_map_state *);
 +struct iommu_map_state *iommu_iomap_create(int);
 +void iommu_iomap_destroy(struct iommu_map_state *);
 +void iommu_iomap_clear_pages(struct iommu_map_state *);
 +
 +/*
 + * Initiate an STC entry flush.
 + */
 +static inline void
 +iommu_strbuf_flush(struct strbuf_ctl *sb, vaddr_t va)
 +{
 +#ifdef DEBUG
 +	if (sb->sb_flush == NULL) {
 +		printf("iommu_strbuf_flush: attempting to flush w/o STC\n");
 +		return;
 +	}
 +#endif

 -static	int iommu_strbuf_flush_done __P((struct strbuf_ctl *));
 +	bus_space_write_8(sb->sb_bustag, sb->sb_sb,
 +	    STRBUFREG(strbuf_pgflush), va);
 +}

  /*
   * initialise the UltraSPARC IOMMU (SBUS or PCI):
 @@ -82,11 +129,7 @@ static	int iommu_strbuf_flush_done __P((
   *	- create a private DVMA map.
   */
  void
 -iommu_init(name, is, tsbsize, iovabase)
 -	char *name;
 -	struct iommu_state *is;
 -	int tsbsize;
 -	u_int32_t iovabase;
 +iommu_init(char *name, struct iommu_state *is, int tsbsize, u_int32_t iovabase)
  {
  	psize_t size;
  	vaddr_t va;
 @@ -147,26 +190,22 @@ iommu_init(name, is, tsbsize, iovabase)
  	memset(is->is_tsb, 0, size);

  #ifdef DEBUG
 -	if (iommudebug & IDB_INFO)
 +	if (iommudebug & IDB_INFO) {
  	{
  		/* Probe the iommu */

  		printf("iommu regs at: cr=%lx tsb=%lx flush=%lx\n",
 -			(u_long)bus_space_read_8(is->is_bustag, is->is_iommu,
 -				offsetof (struct iommureg, iommu_cr)),
 -			(u_long)bus_space_read_8(is->is_bustag, is->is_iommu,
 -				offsetof (struct iommureg, iommu_tsb)),
 -			(u_long)bus_space_read_8(is->is_bustag, is->is_iommu,
 -				offsetof (struct iommureg, iommu_flush)));
 -		printf("iommu cr=%llx tsb=%llx\n",
 -			(unsigned long long)bus_space_read_8(is->is_bustag,
 -				is->is_iommu,
 -				offsetof (struct iommureg, iommu_cr)),
 -			(unsigned long long)bus_space_read_8(is->is_bustag,
 -				is->is_iommu,
 -				offsetof (struct iommureg, iommu_tsb)));
 -		printf("TSB base %p phys %llx\n", (void *)is->is_tsb,
 -			(unsigned long long)is->is_ptsb);
 +		    (u_long)bus_space_vaddr(is->is_bustag, is->is_iommu) +
 +			IOMMUREG(iommu_cr),
 +		    (u_long)bus_space_vaddr(is->is_bustag, is->is_iommu) +
 +			IOMMUREG(iommu_tsb),
 +		    (u_long)bus_space_vaddr(is->is_bustag, is->is_iommu) +
 +			IOMMUREG(iommu_flush));
 +		printf("iommu cr=%lx tsb=%lx\n",
 +		    IOMMUREG_READ(is, iommu_cr),
 +		    IOMMUREG_READ(is, iommu_tsb));
 +		printf("TSB base %p phys %llx\n",
 +		    (void *)is->is_tsb, (unsigned long long)is->is_ptsb);
  		delay(1000000); /* 1 s */
  	}
  #endif
 @@ -179,19 +218,17 @@ iommu_init(name, is, tsbsize, iovabase)
  	/*
  	 * Now all the hardware's working we need to allocate a dvma map.
  	 */
 -	printf("DVMA map: %x to %x\n",
 -		(unsigned int)is->is_dvmabase,
 -		(unsigned int)is->is_dvmaend);
 -	printf("IOTSB: %llx to %llx\n",
 -		(unsigned long long)is->is_ptsb,
 -		(unsigned long long)(is->is_ptsb + size));
 +	printf("DVMA map: %x to %x\n", is->is_dvmabase, is->is_dvmaend);
 +	printf("IOTDB: %llx to %llx\n", 
 +	    (unsigned long long)is->is_ptsb,
 +	    (unsigned long long)(is->is_ptsb + size));
  	is->is_dvmamap = extent_create(name,
  	    is->is_dvmabase, is->is_dvmaend - PAGE_SIZE,
  	    M_DEVBUF, 0, 0, EX_NOWAIT);
  }

  /*
 - * Streaming buffers don't exist on the UltraSPARC IIi; we should have
 + * Streaming buffers don't exist on the UltraSPARC IIi/e; we should have
   * detected that already and disabled them.  If not, we will notice that
   * they aren't there when the STRBUF_EN bit does not remain.
   */
 @@ -200,83 +237,171 @@ iommu_reset(is)
  	struct iommu_state *is;
  {
  	int i;
 -	struct strbuf_ctl *sb;

 -	/* Need to do 64-bit stores */
 -	bus_space_write_8(is->is_bustag, is->is_iommu, IOMMUREG(iommu_tsb),
 -		is->is_ptsb);
 -
 -	/* Enable IOMMU in diagnostic mode */
 -	bus_space_write_8(is->is_bustag, is->is_iommu, IOMMUREG(iommu_cr),
 -		is->is_cr|IOMMUCR_DE);
 -
 -	for (i = 0; i < 2; i++) {
 -		if ((sb = is->is_sb[i])) {
 -
 -			/* Enable diagnostics mode? */
 -			bus_space_write_8(is->is_bustag, is->is_sb[i]->sb_sb,
 -				STRBUFREG(strbuf_ctl), STRBUF_EN);
 -
 -			/* No streaming buffers? Disable them */
 -			if (bus_space_read_8(is->is_bustag,
 -				is->is_sb[i]->sb_sb,
 -				STRBUFREG(strbuf_ctl)) == 0) {
 -				is->is_sb[i]->sb_flush = NULL;
 -			} else {
 -
 -				/*
 -				 * locate the pa of the flush buffer.
 -				 */
 -				(void)pmap_extract(pmap_kernel(),
 -					(vaddr_t)is->is_sb[i]->sb_flush,
 -					&is->is_sb[i]->sb_flushpa);
 -			}
 -		}
 +	IOMMUREG_WRITE(is, iommu_tsb, is->is_ptsb);
 +
 +	/* Enable IOMMU */
 +	IOMMUREG_WRITE(is, iommu_cr, is->is_cr);
 +
 +	for (i = 0; i < 2; ++i) {
 +		struct strbuf_ctl *sb = is->is_sb[i];
 +
 +		if (sb == NULL)
 +			continue;
 +
 +		sb->sb_iommu = is;
 +		strbuf_reset(sb);
 +	}
 +}
 +  
 +/*
 + * Inititalize one STC.
 + */
 +void
 +strbuf_reset(struct strbuf_ctl *sb)
 +{
 +
 +	if (sb->sb_flush == NULL)
 +		return;
 +
 +	bus_space_write_8(sb->sb_bustag, sb->sb_sb,
 +	    STRBUFREG(strbuf_ctl), STRBUF_EN);
 +
 +	membar_lookaside();
 +
 +	/* No streaming buffers? Disable them */
 +	if (bus_space_read_8(sb->sb_bustag, sb->sb_sb,
 +	    STRBUFREG(strbuf_ctl)) == 0) {
 +		sb->sb_flush = NULL;
 +	} else {
 +		/*
 +		 * locate the pa of the flush buffer
 +		 */
 +		if (pmap_extract(pmap_kernel(),
 +		    (vaddr_t)sb->sb_flush, &sb->sb_flushpa) == FALSE)
 +			sb->sb_flush = NULL;
  	}
  }

  /*
 - * Here are the iommu control routines.
 + * Add an entry to the IOMMU table.
 + *
 + * The entry is marked streaming if an STC was detected and 
 + * the BUS_DMA_STREAMING flag is set.
   */
  void
 -iommu_enter(sb, va, pa, flags)
 +iommu_enter(is, sb, va, pa, flags)
 +	struct iommu_state *is;
  	struct strbuf_ctl *sb;
  	vaddr_t va;
 -	int64_t pa;
 +	paddr_t pa;
  	int flags;
  {
 -	struct iommu_state *is = sb->sb_is;
 -	int strbuf = (flags & BUS_DMA_STREAMING);
  	int64_t tte;
 +	volatile int64_t *tte_ptr = &is->is_tsb[IOTSBSLOT(va,is->is_tsbsize)];

  #ifdef DIAGNOSTIC
 -	if (va < is->is_dvmabase || va > is->is_dvmaend)
 +	if (va < is->is_dvmabase || round_page(va + PAGE_SIZE) >
 +	    is->is_dvmaend + 1)
  		panic("iommu_enter: va %#lx not in DVMA space", va);
 -#endif

 -	/* Is the streamcache flush really needed? */
 -	if (sb->sb_flush) {
 -		iommu_strbuf_flush(sb, va);
 -		iommu_strbuf_flush_done(sb);
 -	} else
 -		/* If we can't flush the strbuf don't enable it. */
 -		strbuf = 0;
 +	tte = *tte_ptr;
 +
 +	if (tte & IOTTE_V) {
 +		printf("Overwriting valid tte entry (dva %lx pa %lx "
 +		    "&tte %p tte %lx)\n", va, pa, tte_ptr, tte);
 +		extent_print(is->is_dvmamap);
 +		panic("IOMMU overwrite");
 +	}
 +#endif

  	tte = MAKEIOTTE(pa, !(flags & BUS_DMA_NOWRITE),
 -		!(flags & BUS_DMA_NOCACHE), (strbuf));
 +	    !(flags & BUS_DMA_NOCACHE), (flags & BUS_DMA_STREAMING));
  #ifdef DEBUG
  	tte |= (flags & 0xff000LL)<<(4*8);
  #endif

  	DPRINTF(IDB_IOMMU, ("Clearing TSB slot %d for va %p\n",
 -		       (int)IOTSBSLOT(va,is->is_tsbsize), (void *)(u_long)va));
 -	is->is_tsb[IOTSBSLOT(va,is->is_tsbsize)] = tte;
 -	bus_space_write_8(is->is_bustag, is->is_iommu,
 -		IOMMUREG(iommu_flush), va);
 +	    (int)IOTSBSLOT(va,is->is_tsbsize), (void *)(u_long)va));
 +
 +	*tte_ptr = tte;
 +
 +	/*
 +	 * Why bother to flush this va?  It should only be relevant for
 +	 * V ==> V or V ==> non-V transitions.  The former is illegal and
 +	 * the latter is never done here.  It is true that this provides
 +	 * some protection against a misbehaving master using an address
 +	 * after it should.  The IOMMU documentations specifically warns
 +	 * that the consequences of a simultaneous IOMMU flush and DVMA
 +	 * access to the same address are undefined.  (By that argument,
 +	 * the STC should probably be flushed as well.)   Note that if
 +	 * a bus master keeps using a memory region after it has been
 +	 * unmapped, the specific behavior of the IOMMU is likely to
 +	 * be the least of our worries.
 +	 */
 +	IOMMUREG_WRITE(is, iommu_flush, va);
 +
  	DPRINTF(IDB_IOMMU, ("iommu_enter: va %lx pa %lx TSB[%lx]@%p=%lx\n",
 -		va, (long)pa, (u_long)IOTSBSLOT(va,is->is_tsbsize),
 -		(void *)(u_long)&is->is_tsb[IOTSBSLOT(va,is->is_tsbsize)],
 -		(u_long)tte));
 +	    va, (long)pa, (u_long)IOTSBSLOT(va,is->is_tsbsize), 
 +	    (void *)(u_long)&is->is_tsb[IOTSBSLOT(va,is->is_tsbsize)],
 +	    (u_long)tte));
 +}
 +  
 +/*
 + * Remove an entry from the IOMMU table.
 + *
 + * The entry is flushed from the STC if an STC is detected and the TSB
 + * entry has the IOTTE_STREAM flags set.  It should be impossible for
 + * the TSB entry to have this flag set without the BUS_DMA_STREAMING
 + * flag, but better to be safe.  (The IOMMU will be ignored as long
 + * as an STC entry exists.)
 + */
 +void
 +iommu_remove(is, sb, va)
 +	struct iommu_state *is;
 +	struct strbuf_ctl *sb;
 +	vaddr_t va;
 +{
 +	int64_t *tte_ptr = &is->is_tsb[IOTSBSLOT(va, is->is_tsbsize)];
 +	int64_t tte;
 +
 +#ifdef DIAGNOSTIC
 +	if (trunc_page(va) < is->is_dvmabase || round_page(va) >
 +	    is->is_dvmaend + 1)
 +		panic("iommu_remove: va 0x%lx not in DVMA space", (u_long)va);
 +	if (va != trunc_page(va)) {
 +		printf("iommu_remove: unaligned va: %lx\n", va);
 +		va = trunc_page(va);
 +	}
 +#endif
 +	tte = *tte_ptr;
 +
 +	DPRINTF(IDB_IOMMU, ("iommu_remove: va %lx TSB[%lx]@%p\n",
 +	    va, tte, tte_ptr));
 +
 +#ifdef DIAGNOSTIC
 +	if ((tte & IOTTE_V) == 0) {
 +		printf("Removing invalid tte entry (dva %lx &tte %p "
 +		    "tte %lx)\n", va, tte_ptr, tte);
 +		extent_print(is->is_dvmamap);
 +		panic("IOMMU remove overwrite");
 +	}
 +#endif
 +
 +	*tte_ptr = tte & ~IOTTE_V;
 +
 +	/*
 +	 * IO operations are strongly ordered WRT each other.  It is
 +	 * unclear how they relate to normal memory accesses.
 +	 */
 +	membar_storestore();
 +
 +	IOMMUREG_WRITE(is, iommu_flush, va);
 +
 +	if (sb && (tte & IOTTE_STREAM))
 +		iommu_strbuf_flush(sb, va);
 +
 +	/* Should we sync the iommu and stc here? */
  }

  /*
 @@ -289,155 +414,250 @@ iommu_extract(is, dva)
  {
  	int64_t tte = 0;

 -	if (dva >= is->is_dvmabase && dva < is->is_dvmaend)
 +	if (dva >= is->is_dvmabase && dva <= is->is_dvmaend)
  		tte = is->is_tsb[IOTSBSLOT(dva, is->is_tsbsize)];

 +#if 0
  	if ((tte & IOTTE_V) == 0)
  		return ((paddr_t)-1L);
 +#endif
  	return (tte & IOTTE_PAMASK);
 +}  
 +
 +/*
 + * Lookup a TSB entry for a given DVMA (debug routine).
 + */
 +int64_t
 +iommu_lookup_tte(struct iommu_state *is, vaddr_t dva)
 +{
 +	int64_t tte = 0;
 +	
 +	if (dva >= is->is_dvmabase && dva <= is->is_dvmaend)
 +		tte = is->is_tsb[IOTSBSLOT(dva, is->is_tsbsize)];
 +
 +	return (tte);
 +}
 +  
 +/*
 + * Lookup a TSB entry at a given physical address (debug routine).
 + */
 +int64_t
 +iommu_fetch_tte(struct iommu_state *is, paddr_t pa)
 +{
 +	int64_t tte = 0;
 +
 +	if (pa >= is->is_ptsb && pa < is->is_ptsb +
 +	    (PAGE_SIZE << is->is_tsbsize)) 
 +		tte = ldxa(pa, ASI_PHYS_CACHED);
 + 
 +	return (tte);
  }

  /*
 - * iommu_remove: removes mappings created by iommu_enter
 - *
 - * Only demap from IOMMU if flag is set.
 - *
 - * XXX: this function needs better internal error checking.
 + * Fetch a TSB entry with some sanity checking.
   */
 -void
 -iommu_remove(is, va, len)
 -	struct iommu_state *is;
 -	vaddr_t va;
 -	size_t len;
 +int64_t
 +iommu_tsb_entry(struct iommu_state *is, vaddr_t dva)
  {
 +	int64_t tte;

 -#ifdef DIAGNOSTIC
 -	if (va < is->is_dvmabase || va > is->is_dvmaend)
 -		panic("iommu_remove: va 0x%lx not in DVMA space", (u_long)va);
 -	if ((long)(va + len) < (long)va)
 -		panic("iommu_remove: va 0x%lx + len 0x%lx wraps",
 -		      (long) va, (long) len);
 -	if (len & ~0xfffffff)
 -		panic("iommu_remove: ridiculous len 0x%lx", (u_long)len);
 -#endif
 +	if (dva < is->is_dvmabase && dva > is->is_dvmaend)
 +		panic("invalid dva: %llx", (long long)dva);

 -	va = trunc_page(va);
 -	DPRINTF(IDB_IOMMU, ("iommu_remove: va %lx TSB[%lx]@%p\n",
 -		va, (u_long)IOTSBSLOT(va, is->is_tsbsize),
 -		&is->is_tsb[IOTSBSLOT(va, is->is_tsbsize)]));
 -	while (len > 0) {
 -		DPRINTF(IDB_IOMMU, ("iommu_remove: clearing TSB slot %d "
 -			"for va %p size %lx\n",
 -			(int)IOTSBSLOT(va,is->is_tsbsize), (void *)(u_long)va,
 -			(u_long)len));
 -		if (len <= PAGE_SIZE)
 -			len = 0;
 -		else
 -			len -= PAGE_SIZE;
 +	tte = is->is_tsb[IOTSBSLOT(dva,is->is_tsbsize)];

 -		/* XXX Zero-ing the entry would not require RMW */
 -		is->is_tsb[IOTSBSLOT(va,is->is_tsbsize)] &= ~IOTTE_V;
 -		bus_space_write_8(is->is_bustag, is->is_iommu,
 -			IOMMUREG(iommu_flush), va);
 -		va += PAGE_SIZE;
 -	}
 +	if ((tte & IOTTE_V) == 0)
 +		panic("iommu_tsb_entry: invalid entry %lx", dva);
 +
 +	return (tte);
  }

 -static int
 -iommu_strbuf_flush_done(sb)
 -	struct strbuf_ctl *sb;
 +/*
 + * Initiate and then block until an STC flush synchronization has completed.
 + */
 +int
 +iommu_strbuf_flush_done(ims)
 +	struct iommu_map_state *ims;
  {
 -	struct iommu_state *is = sb->sb_is;
 +	struct strbuf_ctl *sb = ims->ims_sb;
 +	struct strbuf_flush *sf = &ims->ims_flush;
  	struct timeval cur, flushtimeout;
 +	struct timeval to = { 0, 500000 };
 +	u_int64_t flush;
 +	int timeout_started = 0;

 -#define BUMPTIME(t, usec) { \
 -	register volatile struct timeval *tp = (t); \
 -	register long us; \
 - \
 -	tp->tv_usec = us = tp->tv_usec + (usec); \
 -	if (us >= 1000000) { \
 -		tp->tv_usec = us - 1000000; \
 -		tp->tv_sec++; \
 -	} \
 -}
 -
 -	if (!sb->sb_flush)
 -		return (0);
 +#ifdef DIAGNOSTIC
 +	if (sb == NULL) {
 +		panic("iommu_strbuf_flush_done: invalid flush buffer");
 +	}
 +#endif

  	/*
  	 * Streaming buffer flushes:
  	 *
 -	 *   1 Tell strbuf to flush by storing va to strbuf_pgflush.  If
 -	 *     we're not on a cache line boundary (64-bits):
 +	 *   1 Tell strbuf to flush by storing va to strbuf_pgflush.
  	 *   2 Store 0 in flag
  	 *   3 Store pointer to flag in flushsync
  	 *   4 wait till flushsync becomes 0x1
  	 *
 -	 * If it takes more than .5 sec, something
 -	 * went wrong.
 +	 * If it takes more than .5 sec, something went very, very wrong.
 +	 */
 +
 +	/*
 +	 * If we're reading from ASI_PHYS_CACHED, then we'll write to
 +	 * it too.  No need to tempt fate or learn about Si bugs or such.
 +	 * FreeBSD just uses normal "volatile" reads/writes...
 +	 */
 +
 +	stxa(sf->sbf_flushpa, ASI_PHYS_CACHED, 0);
 +
 +	/*
 +	 * Insure any previous strbuf operations are complete and that 
 +	 * memory is initialized before the IOMMU uses it.
 +	 * Is this Needed?  How are IO and memory operations ordered? 
  	 */
 +	membar_storestore();
 +
 +	bus_space_write_8(sb->sb_bustag, sb->sb_sb,
 +		    STRBUFREG(strbuf_flushsync), sf->sbf_flushpa);
 +
 +	DPRINTF(IDB_IOMMU,
 +	    ("iommu_strbuf_flush_done: flush = %lx pa = %lx\n", 
 +		ldxa(sf->sbf_flushpa, ASI_PHYS_CACHED), sf->sbf_flushpa));
 +
 +	membar_storeload();
 +       	membar_lookaside();
 +
 +	for (;;) {
 +		int i;
 +
 +		/*
 +		 * Try to shave a few instruction cycles off the average
 +		 * latency by only checking the elapsed time every few
 +		 * fetches.
 +		 */
 +		for (i = 0; i < 1000; ++i) {
 +			membar_loadload();
 +			/* Bypass non-coherent D$ */
 +			/* non-coherent...?   Huh? */
 +			flush = ldxa(sf->sbf_flushpa, ASI_PHYS_CACHED);
 +
 +			if (flush) {
 +				DPRINTF(IDB_IOMMU,
 + 				    ("iommu_strbuf_flush_done: flushed\n"));
 +				return (0);
 +			}
 +		}

 -	*sb->sb_flush = 0;
 -	bus_space_write_8(is->is_bustag, sb->sb_sb,
 -		STRBUFREG(strbuf_flushsync), sb->sb_flushpa);
 -
 -	microtime(&flushtimeout);
 -	cur = flushtimeout;
 -	BUMPTIME(&flushtimeout, 500000); /* 1/2 sec */
 -
 -	DPRINTF(IDB_IOMMU, ("iommu_strbuf_flush_done: flush = %lx "
 -		"at va = %lx pa = %lx now=%lx:%lx until = %lx:%lx\n",
 -		(long)*sb->sb_flush, (long)sb->sb_flush, (long)sb->sb_flushpa,
 -		cur.tv_sec, cur.tv_usec,
 -		flushtimeout.tv_sec, flushtimeout.tv_usec));
 -
 -	/* Bypass non-coherent D$ */
 -	while ((!ldxa(sb->sb_flushpa, ASI_PHYS_CACHED)) &&
 -		timercmp(&cur, &flushtimeout, <=))
  		microtime(&cur);

 -#ifdef DIAGNOSTIC
 -	if (!ldxa(sb->sb_flushpa, ASI_PHYS_CACHED)) {
 -		printf("iommu_strbuf_flush_done: flush timeout %p, at %p\n",
 -			(void *)(u_long)*sb->sb_flush,
 -			(void *)(u_long)sb->sb_flushpa); /* panic? */
 -#ifdef DDB
 -		Debugger();
 -#endif
 -	}
 -#endif
 -	DPRINTF(IDB_IOMMU, ("iommu_strbuf_flush_done: flushed\n"));
 -	return (*sb->sb_flush);
 +		if (timeout_started) {
 +			if (timercmp(&cur, &flushtimeout, >))
 +				panic("STC timeout at %lx (%ld)",
 +				    sf->sbf_flushpa, flush);
 +		} else {
 +			timeradd(&cur, &to, &flushtimeout);
 +
 +			timeout_started = 1;
 +
 +			DPRINTF(IDB_IOMMU,
 +			    ("iommu_strbuf_flush_done: flush = %lx pa = %lx "
 +				"now=%lx:%lx until = %lx:%lx\n", 
 +				ldxa(sf->sbf_flushpa, ASI_PHYS_CACHED),
 +				sf->sbf_flushpa, cur.tv_sec, cur.tv_usec, 
 +				flushtimeout.tv_sec, flushtimeout.tv_usec));
 +		}
 +  	}
  }

  /*
   * IOMMU DVMA operations, common to SBUS and PCI.
   */
  int
 -iommu_dvmamap_load(t, sb, map, buf, buflen, p, flags)
 +iommu_dvmamap_create(t, is, sb,size, nsegments, maxsegsz, boundary, flags, dmamap)
  	bus_dma_tag_t t;
 +	struct iommu_state *is;
  	struct strbuf_ctl *sb;
 +	bus_size_t size;
 +	int nsegments;
 +	bus_size_t maxsegsz;
 +	bus_size_t boundary;
 +	int flags;
 +	bus_dmamap_t *dmamap;
 +{
 +	int ret;
 +	bus_dmamap_t map;
 +	struct iommu_map_state *ims;
 +
 +	ret = bus_dmamap_create(t->_parent, size, nsegments, maxsegsz,
 +	    boundary, flags, &map);
 +
 +	if (ret)
 +		return (ret);
 +
 +	ims = iommu_iomap_create(nsegments);
 +
 +	if (ims == NULL) {
 +		bus_dmamap_destroy(t->_parent, map);
 +		return (ENOMEM);
 +	}
 +
 +	ims->ims_sb = sb;
 +	map->_dm_cookie = ims;
 +	*dmamap = map;
 +
 +	return (0);
 +}
 +
 +void
 +iommu_dvmamap_destroy(t, map)
 +	bus_dma_tag_t t;
 +	bus_dmamap_t map;
 +{
 +	/*
 +	 * The specification (man page) requires a loaded
 +	 * map to be unloaded before it is destroyed.
 +	 */
 +	if (map->dm_nsegs)
 +		bus_dmamap_unload(t, map);
 +
 +	if (map->_dm_cookie)
 +	iommu_iomap_destroy(map->_dm_cookie);
 +	map->_dm_cookie = NULL;
 +
 +	bus_dmamap_destroy(t->_parent, map);
 +}
 +
 +int
 +iommu_dvmamap_load(t, is, map, buf, buflen, p, flags)
 +	bus_dma_tag_t t;
 +	struct iommu_state *is;
  	bus_dmamap_t map;
  	void *buf;
  	bus_size_t buflen;
  	struct proc *p;
  	int flags;
  {
 -	struct iommu_state *is = sb->sb_is;
  	int s;
 -	int err;
 +	int err = 0;
  	bus_size_t sgsize;
 -	paddr_t curaddr;
  	u_long dvmaddr, sgstart, sgend;
 -	bus_size_t align, boundary, len;
 -	vaddr_t vaddr = (vaddr_t)buf;
 -	int seg;
 +	bus_size_t align, boundary;
 +	struct iommu_map_state *ims = map->_dm_cookie;
  	struct pmap *pmap;

 +#ifdef DIAGNOSTIC
 +	if (ims == NULL)
 +		panic("iommu_dvmamap_load: null map state");
 +#endif
 +
  	if (map->dm_nsegs) {
 -		/* Already in use?? */
 +		/*
 +		 * Is it still in use? _bus_dmamap_load should have taken care
 +		 * of this.
 +		 */
  #ifdef DIAGNOSTIC
 -		printf("iommu_dvmamap_load: map still in use\n");
 +		panic("iommu_dvmamap_load: map still in use");
  #endif
  		bus_dmamap_unload(t, map);
  	}
 @@ -446,15 +666,14 @@ iommu_dvmamap_load(t, sb, map, buf, bufl
  	 * Make sure that on error condition we return "no valid mappings".
  	 */
  	map->dm_nsegs = 0;
 -	if (buflen > map->_dm_size) {
 +
 +	if (buflen < 1 || buflen > map->_dm_size) {
  		DPRINTF(IDB_BUSDMA,
  		    ("iommu_dvmamap_load(): error %d > %d -- "
  		     "map size exceeded!\n", (int)buflen, (int)map->_dm_size));
  		return (EINVAL);
  	}

 -	sgsize = round_page(buflen + ((int)vaddr & PGOFSET));
 -
  	/*
  	 * A boundary presented to bus_dmamem_alloc() takes precedence
  	 * over boundary in the map.
 @@ -463,22 +682,64 @@ iommu_dvmamap_load(t, sb, map, buf, bufl
  		boundary = map->_dm_boundary;
  	align = max(map->dm_segs[0]._ds_align, PAGE_SIZE);

 +	pmap = p ? p->p_vmspace->vm_map.pmap : pmap_kernel();
 +
 +	/* Count up the total number of pages we need */
 +	iommu_iomap_clear_pages(ims);
 +	{ /* Scope */
 +		bus_addr_t a, aend;
 +		bus_addr_t addr = (vaddr_t)buf;
 +		int seg_len = buflen;
 +
 +		aend = round_page(addr + seg_len - 1);
 +		for (a = trunc_page(addr); a < aend; a += PAGE_SIZE) {
 +			paddr_t pa;
 +
 +			if (pmap_extract(pmap, a, &pa) == FALSE) {
 +				printf("iomap pmap error addr 0x%lx\n", a);
 +				iommu_iomap_clear_pages(ims);
 +				return (E2BIG);
 +			}
 +
 +			err = iommu_iomap_insert_page(ims, pa);
 +			if (err) {
 +				printf("iomap insert error: %d for "
 +				    "va 0x%lx pa 0x%lx "
 +				    "(buf %p len %ld/%lx)\n",
 +				    err, a, pa, buf, buflen, buflen);
 +				iommu_dvmamap_print_map(t, is, map);
 +				iommu_iomap_clear_pages(ims);
 +				return (E2BIG);
 +			}
 +		}
 +	}
 +	sgsize = ims->ims_map.ipm_pagecnt * PAGE_SIZE;
 +
 +	if (flags & BUS_DMA_24BIT) {
 +		sgstart = max(is->is_dvmamap->ex_start, 0xff000000);
 +		sgend = min(is->is_dvmamap->ex_end, 0xffffffff);
 +	} else {
 +		sgstart = is->is_dvmamap->ex_start;
 +		sgend = is->is_dvmamap->ex_end;
 +	}
 +
  	/*
  	 * If our segment size is larger than the boundary we need to
 -	 * split the transfer up int little pieces ourselves.
 +	 * split the transfer up into little pieces ourselves.
  	 */
  	s = splhigh();
 -	err = extent_alloc(is->is_dvmamap, sgsize, align,
 -	    (sgsize > boundary) ? 0 : boundary,
 -	    EX_NOWAIT|EX_BOUNDZERO, &dvmaddr);
 +	err = extent_alloc_subregion1(is->is_dvmamap, sgstart, sgend,
 +	    sgsize, align, 0, (sgsize > boundary) ? 0 : boundary, 
 +	    EX_NOWAIT | EX_BOUNDZERO, (u_long *)&dvmaddr);
  	splx(s);

  #ifdef DEBUG
 -	if (err || (dvmaddr == (u_long)-1)) {
 +	if (err || (dvmaddr == (bus_addr_t)-1))	{ 
  		printf("iommu_dvmamap_load(): extent_alloc(%d, %x) failed!\n",
  		    (int)sgsize, flags);
  #ifdef DDB
 -		Debugger();
 +		if (iommudebug & IDB_BREAK)
 +			Debugger();
  #endif
  	}
  #endif
 @@ -492,167 +753,115 @@ iommu_dvmamap_load(t, sb, map, buf, bufl
  	map->_dm_dvmastart = dvmaddr;
  	map->_dm_dvmasize = sgsize;

 -	/*
 -	 * Now split the DVMA range into segments, not crossing
 -	 * the boundary.
 -	 */
 -	seg = 0;
 -	sgstart = dvmaddr + (vaddr & PGOFSET);
 -	sgend = sgstart + buflen - 1;
 -	map->dm_segs[seg].ds_addr = sgstart;
 -	DPRINTF(IDB_INFO, ("iommu_dvmamap_load: boundary %lx boundary - 1 %lx "
 -	    "~(boundary - 1) %lx\n", (long)boundary, (long)(boundary - 1),
 -	    (long)~(boundary - 1)));
 -	while ((sgstart & ~(boundary - 1)) != (sgend & ~(boundary - 1))) {
 -		/* Oops.  We crossed a boundary.  Split the xfer. */
 -		len = boundary - (sgstart & (boundary - 1));
 -		map->dm_segs[seg].ds_len = len;
 -		DPRINTF(IDB_INFO, ("iommu_dvmamap_load: "
 -		    "seg %d start %lx size %lx\n", seg,
 -		    (long)map->dm_segs[seg].ds_addr,
 -		    (long)map->dm_segs[seg].ds_len));
 -		if (++seg >= map->_dm_segcnt) {
 -			/* Too many segments.  Fail the operation. */
 -			DPRINTF(IDB_INFO, ("iommu_dvmamap_load: "
 -			    "too many segments %d\n", seg));
 -			s = splhigh();
 -			/* How can this fail?  And if it does what can we do? */
 -			err = extent_free(is->is_dvmamap,
 -			    dvmaddr, sgsize, EX_NOWAIT);
 -			map->_dm_dvmastart = 0;
 -			map->_dm_dvmasize = 0;
 -			splx(s);
 -			return (E2BIG);
 -		}
 -		sgstart += len;
 -		map->dm_segs[seg].ds_addr = sgstart;
 -	}
 -	map->dm_segs[seg].ds_len = sgend - sgstart + 1;
 -	DPRINTF(IDB_INFO, ("iommu_dvmamap_load: "
 -	    "seg %d start %lx size %lx\n", seg,
 -	    (long)map->dm_segs[seg].ds_addr, (long)map->dm_segs[seg].ds_len));
 -	map->dm_nsegs = seg + 1;
  	map->dm_mapsize = buflen;

 -	if (p != NULL)
 -		pmap = p->p_vmspace->vm_map.pmap;
 -	else
 -		pmap = pmap_kernel();
 +#ifdef DEBUG
 +	iommu_dvmamap_validate_map(t, is, map);
 +#endif

 -	for (; buflen > 0; ) {
 +	if (iommu_iomap_load_map(is, ims, dvmaddr, flags))
 +		return (E2BIG);
 + 
 +	{ /* Scope */
 +		bus_addr_t a, aend;
 +		bus_addr_t addr = (vaddr_t)buf;
 +		int seg_len = buflen;
 +
 +		aend = round_page(addr + seg_len - 1);
 +		for (a = trunc_page(addr); a < aend; a += PAGE_SIZE) {
 +			bus_addr_t pgstart;
 +			bus_addr_t pgend;
 +			paddr_t pa;
 +			int pglen;
 +
 +			/* Yuck... Redoing the same pmap_extract... */
 +			if (pmap_extract(pmap, a, &pa) == FALSE) {
 +				printf("iomap pmap error addr 0x%lx\n", a);
 +				iommu_iomap_clear_pages(ims);
 +				return (E2BIG);
 +			}

 -		/*
 -		 * Get the physical address for this page.
 -		 */
 -		if (pmap_extract(pmap, (vaddr_t)vaddr, &curaddr) == FALSE) {
 -			bus_dmamap_unload(t, map);
 -			return (-1);
 +			pgstart = pa | (max(a, addr) & PAGE_MASK);
 +			pgend = pa | (min(a + PAGE_SIZE - 1,
 +			    addr + seg_len - 1) & PAGE_MASK);
 +			pglen = pgend - pgstart + 1;
 +
 +			if (pglen < 1)
 +				continue;
 +
 +			err = iommu_dvmamap_append_range(t, map, pgstart,
 +			    pglen, flags, boundary);
 +			if (err) {
 +				printf("iomap load seg page: %d for "
 +				    "va 0x%lx pa %lx (%lx - %lx) "
 +				    "for %d/0x%x\n",
 +				    err, a, pa, pgstart, pgend, pglen, pglen);
 +				return (err);
 +			}
  		}
 -
 -		/*
 -		 * Compute the segment size, and adjust counts.
 -		 */
 -		sgsize = PAGE_SIZE - ((u_long)vaddr & PGOFSET);
 -		if (buflen < sgsize)
 -			sgsize = buflen;
 -
 -		DPRINTF(IDB_BUSDMA,
 -		    ("iommu_dvmamap_load: map %p loading va %p "
 -		    "dva %lx at pa %lx\n",
 -		    map, (void *)vaddr, (long)dvmaddr,
 -		    (long)(curaddr & ~(PAGE_SIZE-1))));
 -		iommu_enter(sb, trunc_page(dvmaddr), trunc_page(curaddr),
 -		    flags|0x4000);
 -
 -		dvmaddr += PAGE_SIZE;
 -		vaddr += sgsize;
 -		buflen -= sgsize;
  	}
 +
  #ifdef DIAGNOSTIC
 -	for (seg = 0; seg < map->dm_nsegs; seg++) {
 -		if (map->dm_segs[seg].ds_addr < is->is_dvmabase ||
 -			map->dm_segs[seg].ds_addr > is->is_dvmaend) {
 -			printf("seg %d dvmaddr %lx out of range %x - %x\n",
 -			    seg, (long)map->dm_segs[seg].ds_addr,
 -			    is->is_dvmabase, is->is_dvmaend);
 -#ifdef DDB
 -			Debugger();
 -#endif
 -		}
 -	}
 +	iommu_dvmamap_validate_map(t, is, map);
  #endif
 -	return (0);
 -}

 -
 -void
 -iommu_dvmamap_unload(t, sb, map)
 -	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 -	bus_dmamap_t map;
 -{
 -	struct iommu_state *is = sb->sb_is;
 -	int error, s;
 -	bus_size_t sgsize = map->_dm_dvmasize;
 -
 -	/* Flush the iommu */
  #ifdef DEBUG
 -	if (!map->_dm_dvmastart) {
 -		printf("iommu_dvmamap_unload: No dvmastart is zero\n");
 +	if (err)
 +		printf("**** iommu_dvmamap_load failed with error %d\n",
 +		    err);
 +	
 +	if (err || (iommudebug & IDB_PRINT_MAP)) {
 +		iommu_dvmamap_print_map(t, is, map);
  #ifdef DDB
 -		Debugger();
 +		if (iommudebug & IDB_BREAK)
 +			Debugger();
  #endif
  	}
  #endif
 -	iommu_remove(is, map->_dm_dvmastart, map->_dm_dvmasize);
 -
 -	/* Flush the caches */
 -	bus_dmamap_unload(t->_parent, map);
 -
 -	/* Mark the mappings as invalid. */
 -	map->dm_mapsize = 0;
 -	map->dm_nsegs = 0;
 -
 -	s = splhigh();
 -	error = extent_free(is->is_dvmamap, map->_dm_dvmastart,
 -		map->_dm_dvmasize, EX_NOWAIT);
 -	map->_dm_dvmastart = 0;
 -	map->_dm_dvmasize = 0;
 -	splx(s);
 -	if (error != 0)
 -		printf("warning: %qd of DVMA space lost\n", (long long)sgsize);

 -	/* Clear the map */
 +	return (err);
  }


 +/*
 + * Load a dvmamap from an array of segs or an mlist (if the first
 + * "segs" entry's mlist is non-null).  It calls iommu_dvmamap_load_segs()
 + * or iommu_dvmamap_load_mlist() for part of the 2nd pass through the
 + * mapping.  This is ugly.  A better solution would probably be to have
 + * function pointers for implementing the traversal.  That way, there
 + * could be one core load routine for each of the three required algorithms
 + * (buffer, seg, and mlist).  That would also mean that the traversal
 + * algorithm would then only need one implementation for each algorithm
 + * instead of two (one for populating the iomap and one for populating
 + * the dvma map).
 + */
  int
 -iommu_dvmamap_load_raw(t, sb, map, segs, nsegs, flags, size)
 +iommu_dvmamap_load_raw(t, is, map, segs, nsegs, flags, size)
  	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 +	struct iommu_state *is;
  	bus_dmamap_t map;
  	bus_dma_segment_t *segs;
  	int nsegs;
  	int flags;
  	bus_size_t size;
  {
 -	struct iommu_state *is = sb->sb_is;
 -	struct vm_page *pg;
 -	int i, j, s;
 +	int i, s;
  	int left;
 -	int err;
 +	int err = 0;
  	bus_size_t sgsize;
 -	paddr_t pa;
  	bus_size_t boundary, align;
  	u_long dvmaddr, sgstart, sgend;
 -	struct pglist *pglist;
 -	int pagesz = PAGE_SIZE;
 -	int npg = 0; /* DEBUG */
 +	struct iommu_map_state *ims = map->_dm_cookie;
 +
 +#ifdef DIAGNOSTIC
 +	if (ims == NULL)
 +		panic("iommu_dvmamap_load_raw: null map state");
 +#endif

  	if (map->dm_nsegs) {
  		/* Already in use?? */
  #ifdef DIAGNOSTIC
 -		printf("iommu_dvmamap_load_raw: map still in use\n");
 ++ 		panic("iommu_dvmamap_load_raw: map still in use");
  #endif
  		bus_dmamap_unload(t, map);
  	}
 @@ -664,46 +873,86 @@ iommu_dvmamap_load_raw(t, sb, map, segs,
  	if ((boundary = segs[0]._ds_boundary) == 0)
  		boundary = map->_dm_boundary;

 -	align = max(segs[0]._ds_align, pagesz);
 +	align = max(segs[0]._ds_align, PAGE_SIZE);

  	/*
  	 * Make sure that on error condition we return "no valid mappings".
  	 */
  	map->dm_nsegs = 0;
 -	/* Count up the total number of pages we need */
 -	pa = segs[0].ds_addr;
 -	sgsize = 0;
 -	left = size;
 -	for (i = 0; left && i < nsegs; i++) {
 -		if (round_page(pa) != round_page(segs[i].ds_addr))
 -			sgsize = round_page(sgsize);
 -		sgsize += min(left, segs[i].ds_len);
 -		left -= segs[i].ds_len;
 -		pa = segs[i].ds_addr + segs[i].ds_len;
 +
 +	iommu_iomap_clear_pages(ims);
 +	if (segs[0]._ds_mlist) {
 +		struct pglist *mlist = segs[0]._ds_mlist;
 +		struct vm_page *m;
 +		for (m = TAILQ_FIRST(mlist); m != NULL;
 +		    m = TAILQ_NEXT(m,pageq)) {
 +			err = iommu_iomap_insert_page(ims, VM_PAGE_TO_PHYS(m));
 +
 +			if (err) {
 +				printf("iomap insert error: %d for "
 +				    "pa 0x%lx\n", err, VM_PAGE_TO_PHYS(m));
 +				iommu_iomap_clear_pages(ims);
 +				return (E2BIG);
 +			}
 +		}
 +	} else {
 +		/* Count up the total number of pages we need */
 +		for (i = 0, left = size; left > 0 && i < nsegs; i++) {
 +			bus_addr_t a, aend;
 +			bus_size_t len = segs[i].ds_len;
 +			bus_addr_t addr = segs[i].ds_addr;
 +			int seg_len = min(left, len);
 +
 +			if (len < 1)
 +				continue;
 +
 +			aend = round_page(addr + seg_len - 1);
 +			for (a = trunc_page(addr); a < aend; a += PAGE_SIZE) {
 +
 +				err = iommu_iomap_insert_page(ims, a);
 +				if (err) {
 +					printf("iomap insert error: %d for "
 +					    "pa 0x%lx\n", err, a);
 +					iommu_iomap_clear_pages(ims);
 +					return (E2BIG);
 +				}
 +			}
 +
 +			left -= seg_len;
 +		}
 +	}
 +	sgsize = ims->ims_map.ipm_pagecnt * PAGE_SIZE;
 +
 +	if (flags & BUS_DMA_24BIT) {
 +		sgstart = max(is->is_dvmamap->ex_start, 0xff000000);
 +		sgend = min(is->is_dvmamap->ex_end, 0xffffffff);
 +	} else {
 +		sgstart = is->is_dvmamap->ex_start;
 +		sgend = is->is_dvmamap->ex_end;
  	}
 -	sgsize = round_page(sgsize);

 -	s = splhigh();
  	/*
  	 * If our segment size is larger than the boundary we need to
  	 * split the transfer up into little pieces ourselves.
  	 */
 -	err = extent_alloc(is->is_dvmamap, sgsize, align,
 -		(sgsize > boundary) ? 0 : boundary,
 -		((flags & BUS_DMA_NOWAIT) == 0 ? EX_WAITOK : EX_NOWAIT) |
 -		EX_BOUNDZERO, &dvmaddr);
 +	s = splhigh();
 + 	err = extent_alloc_subregion1(is->is_dvmamap, sgstart, sgend,
 +	    sgsize, align, 0, (sgsize > boundary) ? 0 : boundary, 
 +	    EX_NOWAIT | EX_BOUNDZERO, (u_long *)&dvmaddr);
  	splx(s);

  	if (err != 0)
  		return (err);

  #ifdef DEBUG
 -	if (dvmaddr == (u_long)-1)
 -	{
 -		printf("iommu_dvmamap_load_raw(): extent_alloc(%d, %x) failed!\n",
 -		    (int)sgsize, flags);
 +	if (dvmaddr == (bus_addr_t)-1)	{ 
 +		printf("iommu_dvmamap_load_raw(): extent_alloc(%d, %x) "
 +		    "failed!\n", (int)sgsize, flags);
  #ifdef DDB
 -		Debugger();
 +		if (iommudebug & IDB_BREAK)
 +			Debugger();
 +#else
 +		panic("");
  #endif
  	}
  #endif
 @@ -714,200 +963,503 @@ iommu_dvmamap_load_raw(t, sb, map, segs,
  	map->_dm_dvmastart = dvmaddr;
  	map->_dm_dvmasize = sgsize;

 -	if ((pglist = segs[0]._ds_mlist) == NULL) {
 -		u_long prev_va = 0UL;
 -		paddr_t prev_pa = 0;
 -		int end = 0, offset;
 +	map->dm_mapsize = size;

 -		/*
 -		 * This segs is made up of individual physical
 -		 *  segments, probably by _bus_dmamap_load_uio() or
 -		 * _bus_dmamap_load_mbuf().  Ignore the mlist and
 -		 * load each one individually.
 -		 */
 -		map->dm_mapsize = size;
 +#ifdef DEBUG
 +	iommu_dvmamap_validate_map(t, is, map);
 +#endif

 -		j = 0;
 -		for (i = 0; i < nsegs ; i++) {
 +	if (iommu_iomap_load_map(is, ims, dvmaddr, flags))
 +		return (E2BIG);

 -			pa = segs[i].ds_addr;
 -			offset = (pa & PGOFSET);
 -			pa = trunc_page(pa);
 -			dvmaddr = trunc_page(dvmaddr);
 -			left = min(size, segs[i].ds_len);
 -
 -			DPRINTF(IDB_INFO, ("iommu_dvmamap_load_raw: converting "
 -				"physseg %d start %lx size %lx\n", i,
 -				(long)segs[i].ds_addr, (long)segs[i].ds_len));
 -
 -			if ((pa == prev_pa) &&
 -				((offset != 0) || (end != offset))) {
 -				/* We can re-use this mapping */
 -				dvmaddr = prev_va;
 -			}
 +	if (segs[0]._ds_mlist)
 +		err = iommu_dvmamap_load_mlist(t, is, map, segs[0]._ds_mlist,
 +		    flags, size, boundary);
 +	else
 +		err = iommu_dvmamap_load_seg(t, is, map, segs, nsegs,
 +		    flags, size, boundary);

 -			sgstart = dvmaddr + offset;
 -			sgend = sgstart + left - 1;
 +	if (err)
 +		iommu_iomap_unload_map(is, ims);

 -			/* Are the segments virtually adjacent? */
 -			if ((j > 0) && (end == offset) &&
 -				((offset == 0) || (pa == prev_pa))) {
 -				/* Just append to the previous segment. */
 -				map->dm_segs[--j].ds_len += left;
 -				DPRINTF(IDB_INFO, ("iommu_dvmamap_load_raw: "
 -					"appending seg %d start %lx size %lx\n", j,
 -					(long)map->dm_segs[j].ds_addr,
 -					(long)map->dm_segs[j].ds_len));
 -			} else {
 -				if (j >= map->_dm_segcnt) {
 -					iommu_dvmamap_unload(t, sb, map);
 -					return (E2BIG);
 -				}
 -				map->dm_segs[j].ds_addr = sgstart;
 -				map->dm_segs[j].ds_len = left;
 -				DPRINTF(IDB_INFO, ("iommu_dvmamap_load_raw: "
 -					"seg %d start %lx size %lx\n", j,
 -					(long)map->dm_segs[j].ds_addr,
 -					(long)map->dm_segs[j].ds_len));
 +#ifdef DIAGNOSTIC
 +	/* The map should be valid even if the load failed */
 +	if (iommu_dvmamap_validate_map(t, is, map)) {
 +		printf("load size %ld/0x%lx\n", size, size);
 +		if (segs[0]._ds_mlist)
 +			printf("mlist %p\n", segs[0]._ds_mlist);
 +		else  {
 +			long tot_len = 0;
 +			long clip_len = 0;
 +			printf("segs %p nsegs %d\n", segs, nsegs);
 +
 +			left = size;
 +			for (i = 0; i < nsegs; i++) {
 +				bus_size_t len = segs[i].ds_len;
 +				bus_addr_t addr = segs[i].ds_addr;
 +				int seg_len = min(left, len);
 +
 +				printf("addr %lx len %ld/0x%lx seg_len "
 +				    "%d/0x%x left %d/0x%x\n", addr, len, len,
 +				    seg_len, seg_len, left, left);
 +
 +				left -= seg_len;
 +				
 +				clip_len += seg_len;
 +				tot_len += segs[i].ds_len;
  			}
 -			end = (offset + left) & PGOFSET;
 +			printf("total length %ld/0x%lx total seg. "
 +			    "length %ld/0x%lx\n", tot_len, tot_len, clip_len,
 +			    clip_len);
 +		}

 -			/* Check for boundary issues */
 -			while ((sgstart & ~(boundary - 1)) !=
 -				(sgend & ~(boundary - 1))) {
 -				/* Need a new segment. */
 -				map->dm_segs[j].ds_len =
 -					boundary - (sgstart & (boundary - 1));
 -				DPRINTF(IDB_INFO, ("iommu_dvmamap_load_raw: "
 -					"seg %d start %lx size %lx\n", j,
 -					(long)map->dm_segs[j].ds_addr,
 -					(long)map->dm_segs[j].ds_len));
 -				if (++j >= map->_dm_segcnt) {
 -					iommu_dvmamap_unload(t, sb, map);
 -					return (E2BIG);
 -				}
 -				sgstart = roundup(sgstart, boundary);
 -				map->dm_segs[j].ds_addr = sgstart;
 -				map->dm_segs[j].ds_len = sgend - sgstart + 1;
 -			}
 +		if (err == 0)
 +			err = 1;
 +	}

 -			if (sgsize == 0)
 -				panic("iommu_dmamap_load_raw: size botch");
 +#endif

 -			/* Now map a series of pages. */
 -			while (dvmaddr <= sgend) {
 -				DPRINTF(IDB_BUSDMA,
 -					("iommu_dvmamap_load_raw: map %p "
 -						"loading va %lx at pa %lx\n",
 -						map, (long)dvmaddr,
 -						(long)(pa)));
 -				/* Enter it if we haven't before. */
 -				if (prev_va != dvmaddr)
 -					iommu_enter(sb, prev_va = dvmaddr,
 -						prev_pa = pa,
 -						flags | (++npg << 12));
 -				dvmaddr += pagesz;
 -				pa += pagesz;
 -			}
 +#ifdef DEBUG
 +	if (err)
 +		printf("**** iommu_dvmamap_load_raw failed with error %d\n",
 +		    err);
 +	
 +	if (err || (iommudebug & IDB_PRINT_MAP)) {
 +		iommu_dvmamap_print_map(t, is, map);
 +#ifdef DDB
 +		if (iommudebug & IDB_BREAK)
 +			Debugger();
 +#endif
 +	}
 +#endif

 -			size -= left;
 -			++j;
 -		}
 +	return (err);
 +}
 +
 +/*
 + * Insert a range of addresses into a loaded map respecting the specified
 + * boundary and alignment restrictions.  The range is specified by its 
 + * physical address and length.  The range cannot cross a page boundary.
 + * This code (along with most of the rest of the function in this file)
 + * assumes that the IOMMU page size is equal to PAGE_SIZE.
 + */
 +int
 +iommu_dvmamap_append_range(bus_dma_tag_t t, bus_dmamap_t map, paddr_t pa,
 +    bus_size_t length, int flags, bus_size_t boundary)
 +{
 +	struct iommu_map_state *ims = map->_dm_cookie;
 +	bus_addr_t sgstart, sgend, bd_mask;
 +	bus_dma_segment_t *seg = NULL;
 +	int i = map->dm_nsegs;
 +
 +#ifdef DEBUG
 +	if (ims == NULL)
 +		panic("iommu_dvmamap_append_range: null map state");
 +#endif
 +
 +	sgstart = iommu_iomap_translate(ims, pa);
 +	sgend = sgstart + length - 1;

 -		map->dm_nsegs = j;
  #ifdef DIAGNOSTIC
 -		{ int seg;
 -	for (seg = 0; seg < map->dm_nsegs; seg++) {
 -		if (map->dm_segs[seg].ds_addr < is->is_dvmabase ||
 -			map->dm_segs[seg].ds_addr > is->is_dvmaend) {
 -			printf("seg %d dvmaddr %lx out of range %x - %x\n",
 -				seg, (long)map->dm_segs[seg].ds_addr,
 -				is->is_dvmabase, is->is_dvmaend);
 -#ifdef DDB
 -			Debugger();
 +	if (sgstart == NULL || sgstart > sgend) {
 +		printf("append range invalid mapping for %lx "
 +		    "(0x%lx - 0x%lx)\n", pa, sgstart, sgend);
 +		map->dm_nsegs = 0;
 +		return (EINVAL);
 +	}
  #endif
 +
 +#ifdef DEBUG
 +	if (trunc_page(sgstart) != trunc_page(sgend)) {
 +		printf("append range crossing page boundary! "
 +		    "pa %lx length %ld/0x%lx sgstart %lx sgend %lx\n",
 +		    pa, length, length, sgstart, sgend);
 +	}
 +#endif
 +
 +	/*
 +	 * We will attempt to merge this range with the previous entry
 +	 * (if there is one).
 +	 */
 +	if (i > 0) {
 +		seg = &map->dm_segs[i - 1];
 +		if (sgstart == seg->ds_addr + seg->ds_len) {
 +			length += seg->ds_len;
 +			sgstart = seg->ds_addr;
 +			sgend = sgstart + length - 1;
 +		} else
 +			seg = NULL;
 +	}
 +
 +	if (seg == NULL) {
 +		seg = &map->dm_segs[i];
 +		if (++i > map->_dm_segcnt) {
 +			printf("append range, out of segments (%d)\n", i);
 +			iommu_dvmamap_print_map(t, NULL, map);
 +			map->dm_nsegs = 0;
 +			return (ENOMEM);
  		}
  	}
 +
 +	/*
 +	 * At this point, "i" is the index of the *next* bus_dma_segment_t
 +	 * (the segment count, aka map->dm_nsegs) and "seg" points to the
 +	 * *current* entry.  "length", "sgstart", and "sgend" reflect what
 +	 * we intend to put in "*seg".  No assumptions should be made about
 +	 * the contents of "*seg".  Only "boundary" issue can change this
 +	 * and "boundary" is often zero, so explicitly test for that case
 +	 * (the test is strictly an optimization).
 +	 */ 
 +	if (boundary != 0) {
 +		bd_mask = ~(boundary - 1);
 +
 +		while ((sgstart & bd_mask) != (sgend & bd_mask)) {
 +			/*
 +			 * We are crossing a boundary so fill in the current
 +			 * segment with as much as possible, then grab a new
 +			 * one.
 +			 */
 +
 +			seg->ds_addr = sgstart;
 +			seg->ds_len = boundary - (sgstart & bd_mask);
 +
 +			sgstart += seg->ds_len; /* sgend stays the same */
 +			length -= seg->ds_len;
 +
 +			seg = &map->dm_segs[i];
 +			if (++i > map->_dm_segcnt) {
 +				printf("append range, out of segments\n");
 +				iommu_dvmamap_print_map(t, NULL, map);
 +				map->dm_nsegs = 0;
 +				return (E2BIG);
 +			}
  		}
 -#endif
 -		return (0);
  	}

 +	seg->ds_addr = sgstart;
 +	seg->ds_len = length;
 +	map->dm_nsegs = i;
 +
 +	return (0);
 +}
 +
 +/*
 + * Populate the iomap from a bus_dma_segment_t array.  See note for
 + * iommu_dvmamap_load() * regarding page entry exhaustion of the iomap.
 + * This is less of a problem for load_seg, as the number of pages
 + * is usually similar to the number of segments (nsegs).
 + */
 +int
 +iommu_dvmamap_load_seg(bus_dma_tag_t t, struct iommu_state *is,
 +    bus_dmamap_t map, bus_dma_segment_t *segs, int nsegs, int flags,
 +    bus_size_t size, bus_size_t boundary)
 +{
 +	int i;
 +	int left;
 +	int seg;
 +
  	/*
 -	 * This was allocated with bus_dmamem_alloc.
 -	 * The pages are on a `pglist'.
 +	 * This segs is made up of individual physical
 +	 * segments, probably by _bus_dmamap_load_uio() or
 +	 * _bus_dmamap_load_mbuf().  Ignore the mlist and
 +	 * load each one individually.
  	 */
 -	map->dm_mapsize = size;
 -	i = 0;
 -	sgstart = dvmaddr;
 -	sgend = sgstart + size - 1;
 -	map->dm_segs[i].ds_addr = sgstart;
 -	while ((sgstart & ~(boundary - 1)) != (sgend & ~(boundary - 1))) {
 -		/* Oops.  We crossed a boundary.  Split the xfer. */
 -		map->dm_segs[i].ds_len = boundary - (sgstart & (boundary - 1));
 -		DPRINTF(IDB_INFO, ("iommu_dvmamap_load_raw: "
 -			"seg %d start %lx size %lx\n", i,
 -			(long)map->dm_segs[i].ds_addr,
 -			(long)map->dm_segs[i].ds_len));
 -		if (++i >= map->_dm_segcnt) {
 -			/* Too many segments.  Fail the operation. */
 -			s = splhigh();
 -			/* How can this fail?  And if it does what can we do? */
 -			err = extent_free(is->is_dvmamap,
 -				dvmaddr, sgsize, EX_NOWAIT);
 -			map->_dm_dvmastart = 0;
 -			map->_dm_dvmasize = 0;
 -			splx(s);
 -			return (E2BIG);
 +
 +	/*
 +	 * Keep in mind that each segment could span
 +	 * multiple pages and that these are not always
 +	 * adjacent. The code is no longer adding dvma
 +	 * aliases to the IOMMU.  The STC will not cross
 +	 * page boundaries anyway and a IOMMU table walk
 +	 * vs. what may be a streamed PCI DMA to a ring
 +	 * descriptor is probably a wash.  It eases TLB
 +	 * pressure and in the worst possible case, it is
 +	 * only as bad a non-IOMMUed architecture.  More
 +	 * importantly, the code is not quite as hairy.
 +	 * (It's bad enough as it is.)
 +	 */
 +	left = size;
 +	seg = 0;
 +	for (i = 0; left > 0 && i < nsegs; i++) {
 +		bus_addr_t a, aend;
 +		bus_size_t len = segs[i].ds_len;
 +		bus_addr_t addr = segs[i].ds_addr;
 +		int seg_len = min(left, len);
 +
 +		if (len < 1)
 +			continue;
 +
 +		aend = addr + seg_len - 1;
 +		for (a = trunc_page(addr); a < round_page(aend);
 +		    a += PAGE_SIZE) {
 +			bus_addr_t pgstart;
 +			bus_addr_t pgend;
 +			int pglen;
 +			int err;
 +
 +			pgstart = max(a, addr);
 +			pgend = min(a + PAGE_SIZE - 1, addr + seg_len - 1);
 +			pglen = pgend - pgstart + 1;
 +			
 +			if (pglen < 1)
 +				continue;
 +
 +			err = iommu_dvmamap_append_range(t, map, pgstart,
 +			    pglen, flags, boundary);
 +			if (err) {
 +				printf("iomap load seg page: %d for "
 +				    "pa 0x%lx (%lx - %lx for %d/%x\n",
 +				    err, a, pgstart, pgend, pglen, pglen);
 +				return (err);
 +			}
 +
  		}
 -		sgstart = roundup(sgstart, boundary);
 -		map->dm_segs[i].ds_addr = sgstart;
 +
 +		left -= seg_len;
  	}
 -	DPRINTF(IDB_INFO, ("iommu_dvmamap_load_raw: "
 -			"seg %d start %lx size %lx\n", i,
 -			(long)map->dm_segs[i].ds_addr, (long)map->dm_segs[i].ds_len));
 -	map->dm_segs[i].ds_len = sgend - sgstart + 1;
 +	return (0);
 +}

 -	TAILQ_FOREACH(pg, pglist, pageq) {
 -		if (sgsize == 0)
 -			panic("iommu_dmamap_load_raw: size botch");
 -		pa = VM_PAGE_TO_PHYS(pg);
 +/*
 + * Populate the iomap from an mlist.  See note for iommu_dvmamap_load()
 + * regarding page entry exhaustion of the iomap.
 + */
 +int
 +iommu_dvmamap_load_mlist(bus_dma_tag_t t, struct iommu_state *is,
 +    bus_dmamap_t map, struct pglist *mlist, int flags,
 +    bus_size_t size, bus_size_t boundary)
 +{
 +	struct vm_page *m;
 +	paddr_t pa;
 +	int err;

 -		DPRINTF(IDB_BUSDMA,
 -		    ("iommu_dvmamap_load_raw: map %p loading va %lx at pa %lx\n",
 -		    map, (long)dvmaddr, (long)(pa)));
 -		iommu_enter(sb, dvmaddr, pa, flags|0x8000);
 +	/*
 +	 * This was allocated with bus_dmamem_alloc.
 +	 * The pages are on an `mlist'.
 +	 */
 +	for (m = TAILQ_FIRST(mlist); m != NULL; m = TAILQ_NEXT(m,pageq)) {
 +		pa = VM_PAGE_TO_PHYS(m);

 -		dvmaddr += pagesz;
 -		sgsize -= pagesz;
 +		err = iommu_dvmamap_append_range(t, map, pa, PAGE_SIZE,
 +		    flags, boundary);
 +		if (err) {
 +			printf("iomap load seg page: %d for pa 0x%lx "
 +			    "(%lx - %lx for %d/%x\n", err, pa, pa,
 +			    pa + PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
 +			return (err);
 +		}
  	}
 -	map->dm_mapsize = size;
 -	map->dm_nsegs = i+1;
 -#ifdef DIAGNOSTIC
 -	{ int seg;
 -	for (seg = 0; seg < map->dm_nsegs; seg++) {
 -		if (map->dm_segs[seg].ds_addr < is->is_dvmabase ||
 -			map->dm_segs[seg].ds_addr > is->is_dvmaend) {
 -			printf("seg %d dvmaddr %lx out of range %x - %x\n",
 -				seg, (long)map->dm_segs[seg].ds_addr,
 -				is->is_dvmabase, is->is_dvmaend);
 +
 +	return (0);
 +}
 +
 +/*
 + * Unload a dvmamap.
 + */
 +void
 +iommu_dvmamap_unload(bus_dma_tag_t t, struct iommu_state *is, bus_dmamap_t map)
 +{
 +	struct iommu_map_state *ims = map->_dm_cookie;
 +	bus_addr_t dvmaddr = map->_dm_dvmastart;
 +	bus_size_t sgsize = map->_dm_dvmasize;
 +	int error, s;
 +
 +	/* Flush the iommu */
 +#ifdef DEBUG
 +	if (dvmaddr == 0) {
 +		printf("iommu_dvmamap_unload: No dvmastart\n");
  #ifdef DDB
 +		if (iommudebug & IDB_BREAK)
  			Debugger();
  #endif
 -		}
 +		return;
 +	}
 +	iommu_dvmamap_validate_map(t, is, map);
 +
 +	if (iommudebug & IDB_PRINT_MAP)
 +		iommu_dvmamap_print_map(t, is, map);
 +#endif /* DEBUG */
 +
 +	/* Remove the IOMMU entries */
 +	iommu_iomap_unload_map(is, ims);
 +
 +	/* Clear the iomap */
 +	iommu_iomap_clear_pages(ims);
 +
 +	bus_dmamap_unload(t->_parent, map);
 +
 +	/* Mark the mappings as invalid. */
 +	map->dm_mapsize = 0;
 +	map->dm_nsegs = 0;
 +
 +	s = splhigh();
 +	error = extent_free(is->is_dvmamap, dvmaddr, 
 +		sgsize, EX_NOWAIT);
 +	map->_dm_dvmastart = 0;
 +	map->_dm_dvmasize = 0;
 +	splx(s);
 +	if (error != 0)
 +		printf("warning: %ld of DVMA space lost\n", sgsize);
 +}
 +
 +/*
 + * Perform internal consistency checking on a dvmamap.
 + */
 +int
 +iommu_dvmamap_validate_map(bus_dma_tag_t t, struct iommu_state *is,
 +    bus_dmamap_t map)
 +{
 +	int err = 0;
 +	int seg;
 +
 +	if (trunc_page(map->_dm_dvmastart) != map->_dm_dvmastart) {
 +		printf("**** dvmastart address not page aligned: %lx",
 +			map->_dm_dvmastart);
 +		err = 1;
 +	}
 +	if (trunc_page(map->_dm_dvmasize) != map->_dm_dvmasize) {
 +		printf("**** dvmasize not a multiple of page size: %lx",
 +			map->_dm_dvmasize);
 +		err = 1;
 +	}
 +	if (map->_dm_dvmastart < is->is_dvmabase ||
 +	    round_page(map->_dm_dvmastart + map->_dm_dvmasize) >
 +	    is->is_dvmaend + 1) {
 +		printf("dvmaddr %lx len %lx out of range %x - %x\n",
 +			    map->_dm_dvmastart, map->_dm_dvmasize,
 +			    is->is_dvmabase, is->is_dvmaend);
 +		err = 1;
  	}
 +	for (seg = 0; seg < map->dm_nsegs; seg++) {
 +		if (map->dm_segs[seg].ds_addr == 0 ||
 +		    map->dm_segs[seg].ds_len == 0) {
 +			printf("seg %d null segment dvmaddr %lx len %lx for "
 +			    "range %lx len %lx\n",
 +			    seg,
 +			    map->dm_segs[seg].ds_addr,
 +			    map->dm_segs[seg].ds_len,
 +			    map->_dm_dvmastart, map->_dm_dvmasize);
 +			err = 1;
 +		} else if (map->dm_segs[seg].ds_addr < map->_dm_dvmastart ||
 +		    round_page(map->dm_segs[seg].ds_addr +
 +			map->dm_segs[seg].ds_len) >
 +		    map->_dm_dvmastart + map->_dm_dvmasize) {
 +			printf("seg %d dvmaddr %lx len %lx out of "
 +			    "range %lx len %lx\n",
 +			    seg,
 +			    map->dm_segs[seg].ds_addr,
 +			    map->dm_segs[seg].ds_len,
 +			    map->_dm_dvmastart, map->_dm_dvmasize);
 +			err = 1;
 +		}
  	}
 +
 +	if (err) {
 +		iommu_dvmamap_print_map(t, is, map);
 +#if defined(DDB) && defined(DEBUG)
 +		if (iommudebug & IDB_BREAK)
 +			Debugger();
  #endif
 -	return (0);
 +	}
 +
 +	return (err);
  }

 +void
 +iommu_dvmamap_print_map(bus_dma_tag_t t, struct iommu_state *is,
 +    bus_dmamap_t map)
 +{
 +	int seg, i;
 +	long full_len, source_len;
 +	struct mbuf *m;
 +
 +	printf("DVMA %x for %x, mapping %p: dvstart %lx dvsize %lx "
 +	    "size %ld/%lx maxsegsz %lx boundary %lx segcnt %d "
 +	    "flags %x type %d source %p "
 +	    "cookie %p mapsize %lx nsegs %d\n",
 +	    is ? is->is_dvmabase : 0, is ? is->is_dvmaend : 0, map,
 +	    map->_dm_dvmastart, map->_dm_dvmasize,
 +	    map->_dm_size, map->_dm_size, map->_dm_maxsegsz, map->_dm_boundary,
 +	    map->_dm_segcnt, map->_dm_flags, map->_dm_type,
 +	    map->_dm_source, map->_dm_cookie, map->dm_mapsize,
 +	    map->dm_nsegs);
 +
 +	full_len = 0;
 +	for (seg = 0; seg < map->dm_nsegs; seg++) {
 +		printf("seg %d dvmaddr %lx pa %lx len %lx (tte %lx)\n",
 +		    seg, map->dm_segs[seg].ds_addr,
 +		    is ? iommu_extract(is, map->dm_segs[seg].ds_addr) : 0,
 +		    map->dm_segs[seg].ds_len,
 +		    is ? iommu_lookup_tte(is, map->dm_segs[seg].ds_addr) : 0);
 +		full_len += map->dm_segs[seg].ds_len;
 +	}
 +	printf("total length = %ld/0x%lx\n", full_len, full_len);
 +
 +	if (map->_dm_source) switch (map->_dm_type) {
 +	case _DM_TYPE_MBUF:
 +		m = map->_dm_source;
 +		if (m->m_flags & M_PKTHDR)
 +			printf("source PKTHDR mbuf (%p) hdr len = %d/0x%x:\n",
 +			    m, m->m_pkthdr.len, m->m_pkthdr.len);
 +		else
 +			printf("source mbuf (%p):\n", m);
 +
 +		source_len = 0;
 +		for ( ; m; m = m->m_next) {
 +			vaddr_t vaddr = mtod(m, vaddr_t);
 +			long len = m->m_len;
 +			paddr_t pa;
 +
 +			if (pmap_extract(pmap_kernel(), vaddr, &pa))
 +				printf("kva %lx pa %lx len %ld/0x%lx\n",
 +				    vaddr, pa, len, len);
 +			else
 +				printf("kva %lx pa <invalid> len %ld/0x%lx\n",
 +				    vaddr, len, len);
 +
 +			source_len += len;
 +		}
 +
 +		if (full_len != source_len)
 +			printf("mbuf length %ld/0x%lx is %s than mapping "
 +			    "length %ld/0x%lx\n", source_len, source_len,
 +			    (source_len > full_len) ? "greater" : "less",
 +			    full_len, full_len);
 +		else
 +			printf("mbuf length %ld/0x%lx\n", source_len,
 +			    source_len);
 +		break;
 +	case _DM_TYPE_LOAD:
 +	case _DM_TYPE_SEGS:
 +	case _DM_TYPE_UIO:
 +	default:
 +		break;
 +	}
 +
 +	if (map->_dm_cookie) {
 +		struct iommu_map_state *ims = map->_dm_cookie;
 +		struct iommu_page_map *ipm = &ims->ims_map;
 +
 +		printf("page map (%p) of size %d with %d entries\n",
 +		    ipm, ipm->ipm_maxpage, ipm->ipm_pagecnt);
 +		for (i = 0; i < ipm->ipm_pagecnt; ++i) {
 +			struct iommu_page_entry *e = &ipm->ipm_map[i];
 +			printf("%d: vmaddr 0x%lx pa 0x%lx\n", i,
 +			    e->ipe_va, e->ipe_pa);
 +		}
 +	} else
 +		printf("iommu map state (cookie) is NULL\n");
 +}

  /*
   * Flush an individual dma segment, returns non-zero if the streaming buffers
   * need flushing afterwards.
   */
 -static int
 -iommu_dvmamap_sync_range(struct strbuf_ctl *sb, vaddr_t va, bus_size_t len)
 +int
 +iommu_dvmamap_sync_range(sb, va, len)
 +	struct strbuf_ctl *sb;
 +	vaddr_t va;
 +	bus_size_t len;
  {
  	vaddr_t vaend;
  	struct iommu_state *is = sb->sb_is;
 @@ -924,8 +1476,8 @@ iommu_dvmamap_sync_range(struct strbuf_c
  		return (0);
  	}

 -	vaend = (va + len + PGOFSET) & ~PGOFSET;
 -	va &= ~PGOFSET;
 +	vaend = (va + len + PAGE_MASK) & ~PAGE_MASK;
 +	va &= ~PAGE_MASK;

  #ifdef DIAGNOSTIC
  	if (va < is->is_dvmabase || vaend > is->is_dvmaend)
 @@ -946,18 +1498,32 @@ iommu_dvmamap_sync_range(struct strbuf_c
  }

  void
 -iommu_dvmamap_sync(t, sb, map, offset, len, ops)
 +iommu_dvmamap_sync(t, is, map, offset, len, ops)
  	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 +	struct iommu_state *is;
  	bus_dmamap_t map;
  	bus_addr_t offset;
  	bus_size_t len;
  	int ops;
  {
 +	struct iommu_map_state *ims = map->_dm_cookie;
 +	struct strbuf_ctl *sb;
  	bus_size_t count;
  	int i, needsflush = 0;

 -	if (!sb->sb_flush)
 +#ifdef DIAGNOSTIC
 +	if (ims == NULL)
 +		panic("iommu_dvmamap_sync: null map state");
 +#endif
 +	sb = ims->ims_sb;
 +
 +	if ((ims->ims_flags & IOMMU_MAP_STREAM) == 0 || (len == 0))
 +		return;
 +
 +	if (ops & (BUS_DMASYNC_PREREAD | BUS_DMASYNC_POSTWRITE))
 +		return;
 +
 +	if ((ops & (BUS_DMASYNC_POSTREAD | BUS_DMASYNC_PREWRITE)) == 0)
  		return;

  	for (i = 0; i < map->dm_nsegs; i++) {
 @@ -967,39 +1533,27 @@ iommu_dvmamap_sync(t, sb, map, offset, l
  	}

  	if (i == map->dm_nsegs)
 -		panic("iommu_dvmamap_sync: segment too short %llu", 
 -		    (unsigned long long)offset);
 +		panic("iommu_dvmamap_sync: too short %lu", offset);

 -	if (ops & (BUS_DMASYNC_PREREAD | BUS_DMASYNC_POSTWRITE)) {
 -		/* Nothing to do */;
 +	for (; len > 0 && i < map->dm_nsegs; i++) {
 +		count = min(map->dm_segs[i].ds_len - offset, len);
 +		if (count > 0 && iommu_dvmamap_sync_range(sb,
 +		    map->dm_segs[i].ds_addr + offset, count))
 +			needsflush = 1;
 +		len -= count;
  	}

 -	if (ops & (BUS_DMASYNC_POSTREAD | BUS_DMASYNC_PREWRITE)) {
 -
 -		for (; len > 0 && i < map->dm_nsegs; i++) {
 -			count = MIN(map->dm_segs[i].ds_len - offset, len);
 -			if (count > 0 && 
 -			    iommu_dvmamap_sync_range(sb,
 -				map->dm_segs[i].ds_addr + offset, count))
 -				needsflush = 1;
 -			offset = 0;
 -			len -= count;
 -		}
 -#ifdef DIAGNOSTIC
 -		if (i == map->dm_nsegs && len > 0)
 -			panic("iommu_dvmamap_sync: leftover %llu",
 -			    (unsigned long long)len);
 -#endif
 +	if (i == map->dm_nsegs && len > 0)
 +		panic("iommu_dvmamap_sync: leftover %lu", len);

 -		if (needsflush)
 -			iommu_strbuf_flush_done(sb);
 -	}
 +	if (needsflush)
 +		iommu_strbuf_flush_done(ims);
  }

  int
 -iommu_dvmamem_alloc(t, sb, size, alignment, boundary, segs, nsegs, rsegs, flags)
 +iommu_dvmamem_alloc(t, is, size, alignment, boundary, segs, nsegs, rsegs, flags)
  	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 +	struct iommu_state *is;
  	bus_size_t size, alignment, boundary;
  	bus_dma_segment_t *segs;
  	int nsegs;
 @@ -1012,13 +1566,13 @@ iommu_dvmamem_alloc(t, sb, size, alignme
  	   (unsigned long long)alignment, (unsigned long long)boundary,
  	   segs, flags));
  	return (bus_dmamem_alloc(t->_parent, size, alignment, boundary,
 -	    segs, nsegs, rsegs, flags|BUS_DMA_DVMA));
 +	    segs, nsegs, rsegs, flags | BUS_DMA_DVMA));
  }

  void
 -iommu_dvmamem_free(t, sb, segs, nsegs)
 +iommu_dvmamem_free(t, is, segs, nsegs)
  	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 +	struct iommu_state *is;
  	bus_dma_segment_t *segs;
  	int nsegs;
  {
 @@ -1033,9 +1587,9 @@ iommu_dvmamem_free(t, sb, segs, nsegs)
   * Check the flags to see whether we're streaming or coherent.
   */
  int
 -iommu_dvmamem_map(t, sb, segs, nsegs, size, kvap, flags)
 +iommu_dvmamem_map(t, is, segs, nsegs, size, kvap, flags)
  	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 +	struct iommu_state *is;
  	bus_dma_segment_t *segs;
  	int nsegs;
  	size_t size;
 @@ -1046,7 +1600,7 @@ iommu_dvmamem_map(t, sb, segs, nsegs, si
  	vaddr_t va;
  	bus_addr_t addr;
  	struct pglist *pglist;
 -	int cbit;
 +	bus_addr_t cbit = 0;

  	DPRINTF(IDB_BUSDMA, ("iommu_dvmamem_map: segp %p nsegs %d size %lx\n",
  	    segs, nsegs, size));
 @@ -1065,9 +1619,10 @@ iommu_dvmamem_map(t, sb, segs, nsegs, si
  	/*
  	 * digest flags:
  	 */
 -	cbit = 0;
 +#if 0
  	if (flags & BUS_DMA_COHERENT)	/* Disable vcache */
  		cbit |= PMAP_NVC;
 +#endif
  	if (flags & BUS_DMA_NOCACHE)	/* sideffects */
  		cbit |= PMAP_NC;

 @@ -1082,7 +1637,8 @@ iommu_dvmamem_map(t, sb, segs, nsegs, si
  #endif
  		addr = VM_PAGE_TO_PHYS(pg);
  		DPRINTF(IDB_BUSDMA, ("iommu_dvmamem_map: "
 -		    "mapping va %lx at %llx\n", va, (unsigned long long)addr | cbit));
 +		    "mapping va %lx at %llx\n", va,
 +		    (unsigned long long)addr | cbit));
  		pmap_kenter_pa(va, addr | cbit, VM_PROT_READ | VM_PROT_WRITE);
  		va += PAGE_SIZE;
  		size -= PAGE_SIZE;
 @@ -1095,9 +1651,9 @@ iommu_dvmamem_map(t, sb, segs, nsegs, si
   * Unmap DVMA mappings from kernel
   */
  void
 -iommu_dvmamem_unmap(t, sb, kva, size)
 +iommu_dvmamem_unmap(t, is, kva, size)
  	bus_dma_tag_t t;
 -	struct strbuf_ctl *sb;
 +	struct iommu_state *is;
  	caddr_t kva;
  	size_t size;
  {
 @@ -1106,7 +1662,7 @@ iommu_dvmamem_unmap(t, sb, kva, size)
  	    kva, size));

  #ifdef DIAGNOSTIC
 -	if ((u_long)kva & PGOFSET)
 +	if ((u_long)kva & PAGE_MASK)
  		panic("iommu_dvmamem_unmap");
  #endif

 @@ -1115,3 +1671,181 @@ iommu_dvmamem_unmap(t, sb, kva, size)
  	pmap_update(pmap_kernel());
  	uvm_km_free(kernel_map, (vaddr_t)kva, size);
  }
 +
 +/*
 + * Create a new iomap.
 + */
 +struct iommu_map_state *
 +iommu_iomap_create(int n)
 +{
 +	struct iommu_map_state *ims;
 +	struct strbuf_flush *sbf;
 +	vaddr_t va;
 +
 +	if (n < 64)
 +		n = 64;
 +
 +	ims = malloc(sizeof(*ims) + (n - 1) * sizeof(ims->ims_map.ipm_map[0]),
 +		M_DEVBUF, M_NOWAIT);
 +	if (ims == NULL)
 +		return (NULL);
 +
 +	memset(ims, 0, sizeof *ims);
 +
 +	/* Initialize the map. */
 +	ims->ims_map.ipm_maxpage = n;
 +	SPLAY_INIT(&ims->ims_map.ipm_tree);
 +
 +	/* Initialize the flush area. */
 +	sbf = &ims->ims_flush;
 +	va = (vaddr_t)&sbf->sbf_area[0x40];
 +	va &= ~0x3f;
 +	pmap_extract(pmap_kernel(), va, &sbf->sbf_flushpa);
 +	sbf->sbf_flush = (void *)va;
 +
 +	return (ims);
 +}
 +
 +/*
 + * Destroy an iomap.
 + */
 +void
 +iommu_iomap_destroy(struct iommu_map_state *ims)
 +{
 +
 +#ifdef DIAGNOSTIC
 +	if (ims->ims_map.ipm_pagecnt > 0)
 +		printf("iommu_iomap_destroy: %d page entries in use\n",
 +		    ims->ims_map.ipm_pagecnt);
 +#endif
 +	free(ims, M_DEVBUF);
 +}
 +
 +/*
 + * Utility function used by splay tree to order page entries by pa.
 + */
 +static inline int
 +iomap_compare(struct iommu_page_entry *a, struct iommu_page_entry *b)
 +{
 +
 +	return ((a->ipe_pa > b->ipe_pa) ? 1 :
 +		(a->ipe_pa < b->ipe_pa) ? -1 : 0);
 +}
 +
 +SPLAY_PROTOTYPE(iommu_page_tree, iommu_page_entry, ipe_node, iomap_compare);
 +
 +SPLAY_GENERATE(iommu_page_tree, iommu_page_entry, ipe_node, iomap_compare);
 +
 +/*
 + * Insert a pa entry in the iomap.
 + */
 +int
 +iommu_iomap_insert_page(struct iommu_map_state *ims, paddr_t pa)
 +{
 +	struct iommu_page_map *ipm = &ims->ims_map;
 +	struct iommu_page_entry *e;
 +
 +	if (ipm->ipm_pagecnt >= ipm->ipm_maxpage) {
 +		struct iommu_page_entry ipe;
 +
 +		ipe.ipe_pa = pa;
 +		if (SPLAY_FIND(iommu_page_tree, &ipm->ipm_tree, &ipe))
 +			return (0);
 +
 +		return (ENOMEM);
 +	}
 +
 +	e = &ipm->ipm_map[ipm->ipm_pagecnt];
 +
 +	e->ipe_pa = pa;
 +	e->ipe_va = 0;
 +
 +	e = SPLAY_INSERT(iommu_page_tree, &ipm->ipm_tree, e);
 +
 +	/* Duplicates are okay, but only count them once. */
 +	if (e)
 +		return (0);
 +
 +	++ipm->ipm_pagecnt;
 +
 +	return (0);
 +}
 +
 +/*
 + * Locate the iomap by filling in the pa->va mapping and inserting it
 + * into the IOMMU tables.
 + */
 +int
 +iommu_iomap_load_map(struct iommu_state *is, struct iommu_map_state *ims,
 +    vaddr_t vmaddr, int flags)
 +{
 +	struct iommu_page_map *ipm = &ims->ims_map;
 +	struct iommu_page_entry *e;
 +	struct strbuf_ctl *sb = ims->ims_sb;
 +	int i;
 +
 +	if (sb->sb_flush == NULL)
 +		flags &= ~BUS_DMA_STREAMING;
 +
 +	if (flags & BUS_DMA_STREAMING)
 +		ims->ims_flags |= IOMMU_MAP_STREAM;
 +	else
 +		ims->ims_flags &= ~IOMMU_MAP_STREAM;
 +
 +	for (i = 0, e = ipm->ipm_map; i < ipm->ipm_pagecnt; ++i, ++e) {
 +		e->ipe_va = vmaddr;
 +		iommu_enter(is, sb, e->ipe_va, e->ipe_pa, flags);
 +		vmaddr += PAGE_SIZE;
 +	}
 +
 +	return (0);
 +}
 +
 +/*
 + * Remove the iomap from the IOMMU.
 + */
 +int
 +iommu_iomap_unload_map(struct iommu_state *is, struct iommu_map_state *ims)
 +{
 +	struct iommu_page_map *ipm = &ims->ims_map;
 +	struct iommu_page_entry *e;
 +	struct strbuf_ctl *sb = ims->ims_sb;
 +	int i;
 +
 +	for (i = 0, e = ipm->ipm_map; i < ipm->ipm_pagecnt; ++i, ++e)
 +		iommu_remove(is, sb, e->ipe_va);
 +
 +	return (0);
 +}
 +
 +/*
 + * Translate a physical address (pa) into a DVMA address.
 + */
 +vaddr_t
 +iommu_iomap_translate(struct iommu_map_state *ims, paddr_t pa)
 +{
 +	struct iommu_page_map *ipm = &ims->ims_map;
 +	struct iommu_page_entry *e;
 +	struct iommu_page_entry pe;
 +	paddr_t offset = pa & PAGE_MASK;
 +
 +	pe.ipe_pa = trunc_page(pa);
 +
 +	e = SPLAY_FIND(iommu_page_tree, &ipm->ipm_tree, &pe);
 +
 +	if (e == NULL)
 +		return (0);
 +
 +	return (e->ipe_va | offset);
 +}
 +
 +/*
 + * Clear the iomap table and tree.
 + */
 +void
 +iommu_iomap_clear_pages(struct iommu_map_state *ims)
 +{
 +
 +	ims->ims_map.ipm_pagecnt = 0;
 +	SPLAY_INIT(&ims->ims_map.ipm_tree);
 +}
 Index: dev/iommureg.h
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/dev/iommureg.h,v
 retrieving revision 1.11
 diff -p -u -r1.11 iommureg.h
 --- dev/iommureg.h	11 Oct 2003 20:53:14 -0000	1.11
 +++ dev/iommureg.h	31 Mar 2004 05:52:32 -0000
 @@ -91,7 +91,7 @@ struct iommu_strbuf {
  #define IOTTE_8K	0x0000000000000000LL
  #define IOTTE_STREAM	0x1000000000000000LL	/* Is page streamable? */
  #define	IOTTE_LOCAL	0x0800000000000000LL	/* Accesses to same bus segment? */
 -#define IOTTE_PAMASK	0x000001ffffffe000LL	/* Let's assume this is correct */
 +#define IOTTE_PAMASK	0x000007ffffffe000LL	/* Let's assume this is correct */
  #define IOTTE_C		0x0000000000000010LL	/* Accesses to cacheable space */
  #define IOTTE_W		0x0000000000000002LL	/* Writable */

 Index: dev/iommuvar.h
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/dev/iommuvar.h,v
 retrieving revision 1.12
 diff -p -u -r1.12 iommuvar.h
 --- dev/iommuvar.h	20 Jun 2002 18:26:24 -0000	1.12
 +++ dev/iommuvar.h	31 Mar 2004 05:52:32 -0000
 @@ -31,6 +31,8 @@
  #ifndef _SPARC64_DEV_IOMMUVAR_H_
  #define _SPARC64_DEV_IOMMUVAR_H_

 +#include <sys/tree.h>
 +
  /*
   * Streaming buffer control
   *
 @@ -40,13 +42,52 @@
   * of data.
   */
  struct strbuf_ctl {
 -	struct iommu_state	*sb_is;		/* Pointer to our iommu */
 +	bus_space_tag_t	sb_bustag;
  	bus_space_handle_t	sb_sb;		/* Handle for our regs */
 +	struct iommu_state *sb_is;
 +	struct iommu_state *sb_iommu;
  	paddr_t			sb_flushpa;	/* to flush streaming buffers */
  	volatile int64_t	*sb_flush;
  };

  /*
 + * per-map STC flush area
 + */
 +struct strbuf_flush {
 +	char	sbf_area[0x80];		/* Holds 64-byte long/aligned buffer */
 +	void	*sbf_flush;		/* Kernel virtual address of buffer */
 +	paddr_t	sbf_flushpa;		/* Physical address of buffer area */
 +};
 +
 +/* 
 + * per-map DVMA page table
 + */
 +struct iommu_page_entry {
 +	SPLAY_ENTRY(iommu_page_entry) ipe_node;
 +	paddr_t	ipe_pa;
 +	vaddr_t	ipe_va;
 +};
 +struct iommu_page_map {
 +	SPLAY_HEAD(iommu_page_tree, iommu_page_entry) ipm_tree;
 +	int ipm_maxpage;	/* Size of allocated page map */
 +	int ipm_pagecnt;	/* Number of entries in use */
 +	struct iommu_page_entry	ipm_map[1];
 +};
 +
 +/*
 + * per-map IOMMU state
 + *
 + * This is what bus_dvmamap_t'c _dm_cookie should be pointing to.
 + */
 +struct iommu_map_state {
 +	struct strbuf_flush ims_flush;	/* flush should be first (alignment) */
 +	struct strbuf_ctl *ims_sb;	/* Link to parent */
 +	int ims_flags;
 +	struct iommu_page_map ims_map;	/* map must be last (array at end) */
 +};
 +#define IOMMU_MAP_STREAM	1
 +
 +/*
   * per-IOMMU state
   */
  struct iommu_state {
 @@ -68,26 +109,40 @@ struct iommu_state {
  /* interfaces for PCI/SBUS code */
  void	iommu_init __P((char *, struct iommu_state *, int, u_int32_t));
  void	iommu_reset __P((struct iommu_state *));
 -void    iommu_enter __P((struct strbuf_ctl *, vaddr_t, int64_t, int));
 -void    iommu_remove __P((struct iommu_state *, vaddr_t, size_t));
  paddr_t iommu_extract __P((struct iommu_state *, vaddr_t));
 -
 -int	iommu_dvmamap_load __P((bus_dma_tag_t, struct strbuf_ctl *,
 +int64_t iommu_lookup_tte(struct iommu_state *, vaddr_t);
 +int64_t iommu_fetch_tte(struct iommu_state *, paddr_t);
 +int	iommu_dvmamap_create(bus_dma_tag_t, struct iommu_state *,
 +	    struct strbuf_ctl *, bus_size_t, int, bus_size_t, bus_size_t,
 +	    int, bus_dmamap_t *);
 +void	iommu_dvmamap_destroy(bus_dma_tag_t, bus_dmamap_t);
 +int	iommu_dvmamap_load __P((bus_dma_tag_t, struct iommu_state *,
  	    bus_dmamap_t, void *, bus_size_t, struct proc *, int));
 -void	iommu_dvmamap_unload __P((bus_dma_tag_t, struct strbuf_ctl *,
 +void	iommu_dvmamap_unload __P((bus_dma_tag_t, struct iommu_state *,
  	    bus_dmamap_t));
 -int	iommu_dvmamap_load_raw __P((bus_dma_tag_t, struct strbuf_ctl *,
 +int	iommu_dvmamap_load_raw __P((bus_dma_tag_t, struct iommu_state *,
  	    bus_dmamap_t, bus_dma_segment_t *, int, int, bus_size_t));
 -void	iommu_dvmamap_sync __P((bus_dma_tag_t, struct strbuf_ctl *,
 +void	iommu_dvmamap_sync __P((bus_dma_tag_t, struct iommu_state *,
  	    bus_dmamap_t, bus_addr_t, bus_size_t, int));
 -int	iommu_dvmamem_alloc __P((bus_dma_tag_t, struct strbuf_ctl *,
 +int	iommu_dvmamem_alloc __P((bus_dma_tag_t, struct iommu_state *,
  	    bus_size_t, bus_size_t, bus_size_t, bus_dma_segment_t *,
  	    int, int *, int));
 -void	iommu_dvmamem_free __P((bus_dma_tag_t, struct strbuf_ctl *,
 +void	iommu_dvmamem_free __P((bus_dma_tag_t, struct iommu_state *,
  	    bus_dma_segment_t *, int));
 -int	iommu_dvmamem_map __P((bus_dma_tag_t, struct strbuf_ctl *,
 +int	iommu_dvmamem_map __P((bus_dma_tag_t, struct iommu_state *,
  	    bus_dma_segment_t *, int, size_t, caddr_t *, int));
 -void	iommu_dvmamem_unmap __P((bus_dma_tag_t, struct strbuf_ctl *,
 +void	iommu_dvmamem_unmap __P((bus_dma_tag_t, struct iommu_state *,
  	    caddr_t, size_t));

 +#define IOMMUREG_READ(is, reg)				\
 +	bus_space_read_8((is)->is_bustag,		\
 +		(is)->is_iommu,				\
 +		IOMMUREG(reg))	
 +
 +#define IOMMUREG_WRITE(is, reg, v)			\
 +	bus_space_write_8((is)->is_bustag,		\
 +		(is)->is_iommu,				\
 +		IOMMUREG(reg),				\
 +		(v))
 +
  #endif /* _SPARC64_DEV_IOMMUVAR_H_ */
 Index: dev/psycho.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/dev/psycho.c,v
 retrieving revision 1.72
 diff -p -u -r1.72 psycho.c
 --- dev/psycho.c	28 Mar 2004 09:31:21 -0000	1.72
 +++ dev/psycho.c	31 Mar 2004 05:52:33 -0000
 @@ -112,6 +112,9 @@ static int _psycho_bus_map __P((bus_spac
  static void *psycho_intr_establish __P((bus_space_tag_t, int, int,
  				int (*) __P((void *)), void *, void(*)__P((void))));

 +static int psycho_dmamap_create(bus_dma_tag_t, bus_size_t, int, bus_size_t,
 +    bus_size_t, int, bus_dmamap_t *);
 +static void psycho_dmamap_destroy(bus_dma_tag_t, bus_dmamap_t);
  static int psycho_dmamap_load __P((bus_dma_tag_t, bus_dmamap_t, void *,
  				   bus_size_t, struct proc *, int));
  static void psycho_dmamap_unload __P((bus_dma_tag_t, bus_dmamap_t));
 @@ -1064,9 +1067,9 @@ psycho_alloc_dma_tag(pp)
  	memset(dt, 0, sizeof *dt);
  	dt->_cookie = pp;
  	dt->_parent = pdt;
 +	dt->_dmamap_create = psycho_dmamap_create;
 +	dt->_dmamap_destroy = psycho_dmamap_destroy;
  #define PCOPY(x)	dt->x = pdt->x
 -	PCOPY(_dmamap_create);
 -	PCOPY(_dmamap_destroy);
  	dt->_dmamap_load = psycho_dmamap_load;
  	PCOPY(_dmamap_load_mbuf);
  	PCOPY(_dmamap_load_uio);
 @@ -1371,6 +1374,24 @@ found:
   * hooks into the iommu dvma calls.
   */
  int
 +psycho_dmamap_create(bus_dma_tag_t t, bus_size_t size,
 +    int nsegments, bus_size_t maxsegsz, bus_size_t boundary, int flags,
 +    bus_dmamap_t *dmamp)
 +{
 +	struct psycho_pbm *pp = t->_cookie;
 +	struct psycho_softc *sc = pp->pp_sc;
 +
 +	return (iommu_dvmamap_create(t, sc->sc_is, &pp->pp_sb, size,
 +	    nsegments, maxsegsz, boundary, flags, dmamp));
 +}
 +
 +void
 +psycho_dmamap_destroy(bus_dma_tag_t t, bus_dmamap_t map)
 +{
 +	iommu_dvmamap_destroy(t, map); 
 +}
 +
 +int
  psycho_dmamap_load(t, map, buf, buflen, p, flags)
  	bus_dma_tag_t t;
  	bus_dmamap_t map;
 @@ -1381,7 +1402,10 @@ psycho_dmamap_load(t, map, buf, buflen, 
  {
  	struct psycho_pbm *pp = (struct psycho_pbm *)t->_cookie;

 -	return (iommu_dvmamap_load(t, &pp->pp_sb, map, buf, buflen, p, flags));
 +	if (pp->pp_sb.sb_flush == NULL)
 +		flags &= ~BUS_DMA_STREAMING;
 +
 +	return (iommu_dvmamap_load(t, pp->pp_sb.sb_is, map, buf, buflen, p, flags));
  }

  void
 @@ -1391,7 +1415,7 @@ psycho_dmamap_unload(t, map)
  {
  	struct psycho_pbm *pp = (struct psycho_pbm *)t->_cookie;

 -	iommu_dvmamap_unload(t, &pp->pp_sb, map);
 +	iommu_dvmamap_unload(t, pp->pp_sb.sb_is, map);
  }

  int
 @@ -1405,7 +1429,10 @@ psycho_dmamap_load_raw(t, map, segs, nse
  {
  	struct psycho_pbm *pp = (struct psycho_pbm *)t->_cookie;

 -	return (iommu_dvmamap_load_raw(t, &pp->pp_sb, map, segs, nsegs, flags, size));
 +	if (pp->pp_sb.sb_flush == NULL)
 +		flags &= ~BUS_DMA_STREAMING;
 +
 +	return (iommu_dvmamap_load_raw(t, pp->pp_sb.sb_is, map, segs, nsegs, flags, size));
  }

  void
 @@ -1421,11 +1448,11 @@ psycho_dmamap_sync(t, map, offset, len, 
  	if (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_PREWRITE)) {
  		/* Flush the CPU then the IOMMU */
  		bus_dmamap_sync(t->_parent, map, offset, len, ops);
 -		iommu_dvmamap_sync(t, &pp->pp_sb, map, offset, len, ops);
 +		iommu_dvmamap_sync(t, pp->pp_sb.sb_is, map, offset, len, ops);
  	}
  	if (ops & (BUS_DMASYNC_POSTREAD|BUS_DMASYNC_POSTWRITE)) {
  		/* Flush the IOMMU then the CPU */
 -		iommu_dvmamap_sync(t, &pp->pp_sb, map, offset, len, ops);
 +		iommu_dvmamap_sync(t, pp->pp_sb.sb_is, map, offset, len, ops);
  		bus_dmamap_sync(t->_parent, map, offset, len, ops);
  	}

 @@ -1444,7 +1471,7 @@ psycho_dmamem_alloc(t, size, alignment, 
  {
  	struct psycho_pbm *pp = (struct psycho_pbm *)t->_cookie;

 -	return (iommu_dvmamem_alloc(t, &pp->pp_sb, size, alignment, boundary,
 +	return (iommu_dvmamem_alloc(t, pp->pp_sb.sb_is, size, alignment, boundary,
  	    segs, nsegs, rsegs, flags));
  }

 @@ -1456,7 +1483,7 @@ psycho_dmamem_free(t, segs, nsegs)
  {
  	struct psycho_pbm *pp = (struct psycho_pbm *)t->_cookie;

 -	iommu_dvmamem_free(t, &pp->pp_sb, segs, nsegs);
 +	iommu_dvmamem_free(t, pp->pp_sb.sb_is, segs, nsegs);
  }

  int
 @@ -1470,7 +1497,7 @@ psycho_dmamem_map(t, segs, nsegs, size, 
  {
  	struct psycho_pbm *pp = (struct psycho_pbm *)t->_cookie;

 -	return (iommu_dvmamem_map(t, &pp->pp_sb, segs, nsegs, size, kvap, flags));
 +	return (iommu_dvmamem_map(t, pp->pp_sb.sb_is, segs, nsegs, size, kvap, flags));
  }

  void
 @@ -1481,5 +1508,5 @@ psycho_dmamem_unmap(t, kva, size)
  {
  	struct psycho_pbm *pp = (struct psycho_pbm *)t->_cookie;

 -	iommu_dvmamem_unmap(t, &pp->pp_sb, kva, size);
 +	iommu_dvmamem_unmap(t, pp->pp_sb.sb_is, kva, size);
  }
 Index: dev/sbus.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/dev/sbus.c,v
 retrieving revision 1.68
 diff -p -u -r1.68 sbus.c
 --- dev/sbus.c	21 Mar 2004 12:50:14 -0000	1.68
 +++ dev/sbus.c	31 Mar 2004 05:52:34 -0000
 @@ -763,7 +763,7 @@ sbus_dmamap_load(tag, map, buf, buflen, 
  {
  	struct sbus_softc *sc = (struct sbus_softc *)tag->_cookie;

 -	return (iommu_dvmamap_load(tag, &sc->sc_sb, map, buf, buflen, p, flags));
 +	return (iommu_dvmamap_load(tag, sc->sc_sb.sb_is, map, buf, buflen, p, flags));
  }

  int
 @@ -777,7 +777,7 @@ sbus_dmamap_load_raw(tag, map, segs, nse
  {
  	struct sbus_softc *sc = (struct sbus_softc *)tag->_cookie;

 -	return (iommu_dvmamap_load_raw(tag, &sc->sc_sb, map, segs, nsegs, flags, size));
 +	return (iommu_dvmamap_load_raw(tag, sc->sc_sb.sb_is, map, segs, nsegs, flags, size));
  }

  void
 @@ -787,7 +787,7 @@ sbus_dmamap_unload(tag, map)
  {
  	struct sbus_softc *sc = (struct sbus_softc *)tag->_cookie;

 -	iommu_dvmamap_unload(tag, &sc->sc_sb, map);
 +	iommu_dvmamap_unload(tag, sc->sc_sb.sb_is, map);
  }

  void
 @@ -803,11 +803,11 @@ sbus_dmamap_sync(tag, map, offset, len, 
  	if (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_PREWRITE)) {
  		/* Flush the CPU then the IOMMU */
  		bus_dmamap_sync(tag->_parent, map, offset, len, ops);
 -		iommu_dvmamap_sync(tag, &sc->sc_sb, map, offset, len, ops);
 +		iommu_dvmamap_sync(tag, sc->sc_sb.sb_is, map, offset, len, ops);
  	}
  	if (ops & (BUS_DMASYNC_POSTREAD|BUS_DMASYNC_POSTWRITE)) {
  		/* Flush the IOMMU then the CPU */
 -		iommu_dvmamap_sync(tag, &sc->sc_sb, map, offset, len, ops);
 +		iommu_dvmamap_sync(tag, sc->sc_sb.sb_is, map, offset, len, ops);
  		bus_dmamap_sync(tag->_parent, map, offset, len, ops);
  	}
  }
 @@ -825,7 +825,7 @@ sbus_dmamem_alloc(tag, size, alignment, 
  {
  	struct sbus_softc *sc = (struct sbus_softc *)tag->_cookie;

 -	return (iommu_dvmamem_alloc(tag, &sc->sc_sb, size, alignment, boundary,
 +	return (iommu_dvmamem_alloc(tag, sc->sc_sb.sb_is, size, alignment, boundary,
  	    segs, nsegs, rsegs, flags));
  }

 @@ -837,7 +837,7 @@ sbus_dmamem_free(tag, segs, nsegs)
  {
  	struct sbus_softc *sc = (struct sbus_softc *)tag->_cookie;

 -	iommu_dvmamem_free(tag, &sc->sc_sb, segs, nsegs);
 +	iommu_dvmamem_free(tag, sc->sc_sb.sb_is, segs, nsegs);
  }

  int
 @@ -851,7 +851,7 @@ sbus_dmamem_map(tag, segs, nsegs, size, 
  {
  	struct sbus_softc *sc = (struct sbus_softc *)tag->_cookie;

 -	return (iommu_dvmamem_map(tag, &sc->sc_sb, segs, nsegs, size, kvap, flags));
 +	return (iommu_dvmamem_map(tag, sc->sc_sb.sb_is, segs, nsegs, size, kvap, flags));
  }

  void
 @@ -862,5 +862,5 @@ sbus_dmamem_unmap(tag, kva, size)
  {
  	struct sbus_softc *sc = (struct sbus_softc *)tag->_cookie;

 -	iommu_dvmamem_unmap(tag, &sc->sc_sb, kva, size);
 +	iommu_dvmamem_unmap(tag, sc->sc_sb.sb_is, kva, size);
  }
 Index: include/bus.h
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/include/bus.h,v
 retrieving revision 1.45
 diff -p -u -r1.45 bus.h
 --- include/bus.h	15 Jun 2003 23:09:06 -0000	1.45
 +++ include/bus.h	31 Mar 2004 05:52:34 -0000
 @@ -1539,6 +1539,7 @@ bus_space_copy_region_stream_8(t, h1, o1
  #define	BUS_DMA_NOCACHE		0x800	/* hint: map non-cached memory */

  #define	BUS_DMA_DVMA		BUS_DMA_BUS2	/* Don't bother with alignment */
 +#define BUS_DMA_24BIT		BUS_DMA_BUS3	/* 24bit device */

  /* Forwards needed by prototypes below. */
  struct mbuf;
 @@ -1669,6 +1670,7 @@ struct sparc_bus_dmamap {
  	void		*_dm_source;	/* source mbuf, uio, etc. needed for unload */

  	void		*_dm_cookie;	/* cookie for bus-specific functions */
 +	bus_size_t	_dm_sgsize;

  	/*
  	 * PUBLIC MEMBERS: these are used by machine-independent code.
State-Changed-From-To: open->closed 
State-Changed-By: mrg 
State-Changed-When: Fri Apr 2 17:07:22 UTC 2004 
State-Changed-Why:  
after discussions with martin & andrey, it seems this problem is 
already fixed.  i have been unable to reproduce problems on my U5/300. 

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@gnats.netbsd.org
Cc: mrg@netbsd.org, port-sparc64-maintainer@netbsd.org, martin@netbsd.org,
  petrov@netbsd.org
Subject: Re: port-sparc64/13654
Date: Fri, 2 Apr 2004 22:20:57 +0200

 On Fri, Apr 02, 2004 at 05:08:14PM -0000, mrg@netbsd.org wrote:
 > Synopsis: problems with iommu_dvmamap_load_raw()
 > 
 > State-Changed-From-To: open->closed
 > State-Changed-By: mrg
 > State-Changed-When: Fri Apr 2 17:07:22 UTC 2004
 > State-Changed-Why: 
 > after discussions with martin & andrey, it seems this problem is
 > already fixed.  i have been unable to reproduce problems on my U5/300.

 With what interface did you check ? You have to use something else than
 a hme, because the hme driver use a static, contigous buffer for transmit
 and receive, so you won't see the problem.

 Reading the sources, I think that if we load a mapping with 2 distinct
 segments in the same page, the same page will be mapped twice in the IOMMU.
 But this may not be a problem.

 I'll check this with a tl on sparc64 monday, at work.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Andrey Petrov <petrov@netbsd.org>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@gnats.netbsd.org, mrg@netbsd.org,
  port-sparc64-maintainer@netbsd.org, martin@netbsd.org
Subject: Re: port-sparc64/13654
Date: Fri, 2 Apr 2004 12:52:43 -0800

 On Fri, Apr 02, 2004 at 10:20:57PM +0200, Manuel Bouyer wrote:
 > On Fri, Apr 02, 2004 at 05:08:14PM -0000, mrg@netbsd.org wrote:
 > > Synopsis: problems with iommu_dvmamap_load_raw()
 > > 
 > > State-Changed-From-To: open->closed
 > > State-Changed-By: mrg
 > > State-Changed-When: Fri Apr 2 17:07:22 UTC 2004
 > > State-Changed-Why: 
 > > after discussions with martin & andrey, it seems this problem is
 > > already fixed.  i have been unable to reproduce problems on my U5/300.
 > 
 > With what interface did you check ? You have to use something else than
 > a hme, because the hme driver use a static, contigous buffer for transmit
 > and receive, so you won't see the problem.

 I tested on tlp and wm. One of those resulted in changes I brought in.

 	Andrey

From: matthew green <mrg@eterna.com.au>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: port-sparc64-maintainer@netbsd.org, martin@netbsd.org, petrov@netbsd.org,
  gnats-bugs@gnats.netbsd.org
Subject: re: port-sparc64/13654 
Date: Sat, 03 Apr 2004 17:06:26 +1000

    On Fri, Apr 02, 2004 at 05:08:14PM -0000, mrg@netbsd.org wrote:
    > Synopsis: problems with iommu_dvmamap_load_raw()
    > 
    > State-Changed-From-To: open->closed
    > State-Changed-By: mrg
    > State-Changed-When: Fri Apr 2 17:07:22 UTC 2004
    > State-Changed-Why: 
    > after discussions with martin & andrey, it seems this problem is
    > already fixed.  i have been unable to reproduce problems on my U5/300.

    With what interface did you check ? You have to use something else than
    a hme, because the hme driver use a static, contigous buffer for transmit
    and receive, so you won't see the problem.

 i pounded an fxp for many hours without it failing...

    Reading the sources, I think that if we load a mapping with 2 distinct
    segments in the same page, the same page will be mapped twice in the IOMMU.
    But this may not be a problem.

    I'll check this with a tl on sparc64 monday, at work.

 please re-open if it's still a problem!  and if it *is* a problem,
 could you try the latest patch i send to the PR?  it's an updated
 version of the code jar ported from openbsd.


 .mrg.

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: matthew green <mrg@eterna.com.au>
Cc: port-sparc64-maintainer@netbsd.org, martin@netbsd.org, petrov@netbsd.org,
  gnats-bugs@gnats.netbsd.org
Subject: Re: port-sparc64/13654
Date: Tue, 6 Apr 2004 00:36:37 +0200

 On Sat, Apr 03, 2004 at 05:06:26PM +1000, matthew green wrote:
 >    With what interface did you check ? You have to use something else than
 >    a hme, because the hme driver use a static, contigous buffer for transmit
 >    and receive, so you won't see the problem.
 > 
 > i pounded an fxp for many hours without it failing...
 >    
 >    Reading the sources, I think that if we load a mapping with 2 distinct
 >    segments in the same page, the same page will be mapped twice in the IOMMU.
 >    But this may not be a problem.
 >    
 >    I'll check this with a tl on sparc64 monday, at work.
 > 
 > please re-open if it's still a problem!

 It seems to be for me; on a tl(4) I can't write to UDP mounts with large wsize
 (8k works, 9k fails).
 A 2.0_BETA i386 has to problems with wsize=32768.
 If you still have your setup available, could you try it ?
 mount_nfs -r 32768 -w 32768 ....
 and then:
 dd if=/dev/zero of=toto bs=1m count=100

 From what I've seen with tcpdump, the server receive all fragments in order,
 but doesn't ack the RPC, just as if a checksum was bad (tcpdump doesn't
 compute checksums for fragmented packets).
 I have the output of tcpdump -w of this.

 > and if it *is* a problem,
 > could you try the latest patch i send to the PR?  it's an updated
 > version of the code jar ported from openbsd.

 With this patch, no problems. I can use UDP mounts with wsize up to 32k.
 So I think the current code still has a problem, but it's fixed with this
 patch.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --
State-Changed-From-To: closed->open 
State-Changed-By: bouyer 
State-Changed-When: Mon Apr 5 22:47:17 UTC 2004 
State-Changed-Why:  
This is still a problem for me, see my last mail on the topic. 
However, it seems the last pacth posted by mrg fixes the problem. 

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org
Cc:  
Subject: re: port-sparc64/13654
Date: Mon, 21 Jun 2004 10:39:51 +1000

 so, this patch made my U2 go boom attaching csaudio :-(


 .mrg.

From: Andrey Petrov <petrov@netbsd.org>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: matthew green <mrg@eterna.com.au>, port-sparc64-maintainer@netbsd.org,
  martin@netbsd.org, gnats-bugs@gnats.netbsd.org
Subject: Re: port-sparc64/13654
Date: Fri, 25 Jun 2004 15:49:32 -0700

 On Tue, Apr 06, 2004 at 12:36:37AM +0200, Manuel Bouyer wrote:
 > 
 > It seems to be for me; on a tl(4) I can't write to UDP mounts with large wsize
 > (8k works, 9k fails).
 > A 2.0_BETA i386 has to problems with wsize=32768.
 > If you still have your setup available, could you try it ?
 > mount_nfs -r 32768 -w 32768 ....
 > and then:
 > dd if=/dev/zero of=toto bs=1m count=100
 > 
 > >From what I've seen with tcpdump, the server receive all fragments in order,
 > but doesn't ack the RPC, just as if a checksum was bad (tcpdump doesn't
 > compute checksums for fragmented packets).
 > I have the output of tcpdump -w of this.
 > 

 Indeed it's bad checksum as a test run immediatly shows increase in
 number bad-sum packets (netstat -p udp).

 So packets are corrupted.

 	Andrey


From: Andrey Petrov <petrov@netbsd.org>
To: gnats-bugs@gnats.netbsd.org
Cc: matthew green <mrg@eterna.com.au>, Manuel Bouyer <bouyer@antioche.eu.org>
Subject: Re: port-sparc64/13654
Date: Fri, 25 Jun 2004 16:16:08 -0700

 I don't see problems in mapping calculations. I havn't checked everything
 but so it seems being handled properly.
 But under large file coying over nfs it creates a burst of mbufs sent to wire,
 and those mbufs are colocated in the same pages which in turn creates multiple
 dvma->'same pa' mappings in iommu tsb. Don't know if that legal or not,
 also don't see how it would affect transmission.

 	Andrey

 Look at pa 0x4b40000, well other actially too.

 1638: iommu_enter: va 0xfe42c000 pa 0x4b40000 TSB[216]@0x79850b0=8000200004b40012
 1637: iommu_enter: va 0xfe42a000 pa 0x4b8c000 TSB[215]@0x79850a8=8000100004b8c012
 1636: iommu_dvmamap_load_raw: map 0x22b3000 addr 0xfe42a000 size 4000
 1635: iommu_dvmamap_load_raw: seg#1: 0x4b41eb8 148 [0x4b40000 0x4b40000]
 1634: iommu_dvmamap_load_raw: seg#0: 0x4b8d97a 22 [0x4b8c000 0x4b8c000]
 1633: iommu_enter: va 0xfe428000 pa 0x4b40000 TSB[214]@0x79850a0=8000200004b40012
 1632: iommu_enter: va 0xfe426000 pa 0x4b8c000 TSB[213]@0x7985098=8000100004b8c012
 1631: iommu_dvmamap_load_raw: map 0x22b2800 addr 0xfe426000 size 4000
 1630: iommu_dvmamap_load_raw: seg#1: 0x4b418f0 5c8 [0x4b40000 0x4b40000]
 1629: iommu_dvmamap_load_raw: seg#0: 0x4b8d77a 22 [0x4b8c000 0x4b8c000]
 1628: iommu_enter: va 0xfe424000 pa 0x4b40000 TSB[212]@0x7985090=8000200004b40012
 1627: iommu_enter: va 0xfe422000 pa 0x4b8c000 TSB[211]@0x7985088=8000100004b8c012
 1626: iommu_dvmamap_load_raw: map 0x22b2000 addr 0xfe422000 size 4000
 1625: iommu_dvmamap_load_raw: seg#1: 0x4b41328 5c8 [0x4b40000 0x4b40000]
 1624: iommu_dvmamap_load_raw: seg#0: 0x4b8d57a 22 [0x4b8c000 0x4b8c000]
 1623: iommu_enter: va 0xfe420000 pa 0x4b40000 TSB[210]@0x7985080=8000200004b40012
 1622: iommu_enter: va 0xfe41e000 pa 0x4b8c000 TSB[20f]@0x7985078=8000100004b8c012
 1621: iommu_dvmamap_load_raw: map 0x22b5800 addr 0xfe41e000 size 4000
 1620: iommu_dvmamap_load_raw: seg#1: 0x4b40d60 5c8 [0x4b40000 0x4b40000]
 1619: iommu_dvmamap_load_raw: seg#0: 0x4b8d37a 22 [0x4b8c000 0x4b8c000]
 1618: iommu_enter: va 0xfe41c000 pa 0x4b40000 TSB[20e]@0x7985070=8000200004b40012
 1617: iommu_enter: va 0xfe41a000 pa 0x4b8c000 TSB[20d]@0x7985068=8000100004b8c012
 1616: iommu_dvmamap_load_raw: map 0x22b5000 addr 0xfe41a000 size 4000
 1615: iommu_dvmamap_load_raw: seg#1: 0x4b40798 5c8 [0x4b40000 0x4b40000]
 1614: iommu_dvmamap_load_raw: seg#0: 0x4b8d17a 22 [0x4b8c000 0x4b8c000]
 1613: iommu_enter: va 0xfe418000 pa 0x4b40000 TSB[20c]@0x7985060=8000200004b40012
 1612: iommu_enter: va 0xfe416000 pa 0x4b8c000 TSB[20b]@0x7985058=8000100004b8c012
 1611: iommu_dvmamap_load_raw: map 0x22b4800 addr 0xfe416000 size 4000
 1610: iommu_dvmamap_load_raw: seg#1: 0x4b401d0 5c8 [0x4b40000 0x4b40000]
 1609: iommu_dvmamap_load_raw: seg#0: 0x4b8cf7a 22 [0x4b8c000 0x4b8c000]
 1608: iommu_enter: va 0xfe416000 pa 0x4b40000 TSB[20b]@0x7985058=8000300004b40012
 1607: iommu_enter: va 0xfe414000 pa 0x4b3e000 TSB[20a]@0x7985050=8000200004b3e012
 1606: iommu_enter: va 0xfe412000 pa 0x4b8c000 TSB[209]@0x7985048=8000100004b8c012
 1605: iommu_dvmamap_load_raw: map 0x22b4000 addr 0xfe412000 size 4000
 1604: iommu_dvmamap_load_raw: seg#1: 0x4b3fc08 5c8 [0x4b3e000 0x4b40000]

From: Andrey Petrov <petrov@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:  
Subject: Re: port-sparc64/13654
Date: Mon, 28 Jun 2004 23:02:24 -0700

 Hmm, I experimented more with multiple dvma->pa iommu mappings and
 that doesn't seem to be an issue if they are entered by iommu_dvmamem_load.
 Someone, probably esiop, uses a lot of 6-bypes dma transfers and they all
 from one pa and iommu creates tens of aliases w/o problem.

 Timing is important though, if I slow down iommu_enter (unintentionally,
 added scan loop over iommu tsb) then load_mbuf also works but! no multiple
 dvma->pa aliases are created.

 --
 	Andrey

From: Andrey Petrov <petrov@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:  
Subject: Re: port-sparc64/13654
Date: Wed, 30 Jun 2004 12:23:08 -0700

 As Manuel suggested indeed we failed to allocate needed dvma region.
 Having extra dvma page fixes the problem.
 We have to calculate required dvma size properly accounting segments locations.

 I'm going to commit the patch meanwhile.

 Index: iommu.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/dev/iommu.c,v
 retrieving revision 1.74
 diff -u -p -r1.74 iommu.c
 --- iommu.c	3 Jun 2004 06:17:05 -0000	1.74
 +++ iommu.c	30 Jun 2004 19:05:24 -0000
 @@ -684,7 +684,7 @@ iommu_dvmamap_load_raw(t, sb, map, segs,
  		left -= segs[i].ds_len;
  		pa = segs[i].ds_addr + segs[i].ds_len;
  	}
 -	sgsize = round_page(sgsize);
 +	sgsize = round_page(sgsize) + PAGE_SIZE; /* XXX */

  	s = splhigh();
  	/*

From: Andrey Petrov <petrov@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:  
Subject: pr/13654 CVS commit: src/sys/arch/sparc64/dev
Date: Thu,  1 Jul 2004 06:40:36 +0000 (UTC)

 Module Name:	src
 Committed By:	petrov
 Date:		Thu Jul  1 06:40:36 UTC 2004

 Modified Files:
 	src/sys/arch/sparc64/dev: iommu.c

 Log Message:
 iommu_dvma_load_raw: reserve extra dvma page. Fixes PR #13654.


 To generate a diff of this commit:
 cvs rdiff -r1.74 -r1.75 src/sys/arch/sparc64/dev/iommu.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed 
State-Changed-By: petrov 
State-Changed-When: Thu Jul 1 00:11:11 PDT 2004 
State-Changed-Why:  
Patch applied. 

From: Havard Eidnes <he@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:  
Subject: pr/13654 CVS commit: [netbsd-2-0] src/sys/arch/sparc64/dev
Date: Fri,  2 Jul 2004 18:00:17 +0000 (UTC)

 Module Name:	src
 Committed By:	he
 Date:		Fri Jul  2 18:00:17 UTC 2004

 Modified Files:
 	src/sys/arch/sparc64/dev [netbsd-2-0]: iommu.c

 Log Message:
 Pull up revision 1.75 (requested by petrov in ticket #576):
   Reserve extra dvma page in iommu_dvma_load_raw().  Fixes
   PR#13654.


 To generate a diff of this commit:
 cvs rdiff -r1.73 -r1.73.2.1 src/sys/arch/sparc64/dev/iommu.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.