NetBSD Problem Report #53935

From www@NetBSD.org  Sun Feb  3 04:02:54 2019
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 77A397A1BB
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  3 Feb 2019 04:02:54 +0000 (UTC)
Message-Id: <20190203040251.BCA667A1C7@mollari.NetBSD.org>
Date: Sun,  3 Feb 2019 04:02:51 +0000 (UTC)
From: davshao@gmail.com
Reply-To: davshao@gmail.com
To: gnats-bugs@NetBSD.org
Subject: graphics/MesaLib NetBSD current r600 and radeonsi options
X-Send-Pr-Version: www-1.0

>Number:         53935
>Category:       pkg
>Synopsis:       graphics/MesaLib NetBSD current r600 and radeonsi options
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    pkg-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Feb 03 04:05:00 +0000 2019
>Last-Modified:  Sun Jan 12 09:00:02 +0000 2020
>Originator:     David Shao
>Release:        pkgsrc current from around January 26
>Organization:
>Environment:
NetBSD xxx.xxx 8.99.31 NetBSD 8.99.31 (GENERIC) #11: Thu Jan 24 12:51:22 PST 2019  xxx@xxx.xxx:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
>Description:
Attempting to use the equivalent of graphics/MesaLib18 on a system with a Radeon r600 graphics card and another with a Radeon radeonsi graphics card results in an unusable system under xfce4: vim-gtk2 or changing the configuration of xfce4 causes crashes.  Also there are segfaults for firefox52 and mpv.
>How-To-Repeat:
The two Radeon graphics cards tested are a Sapphire Radeon HD6450 (CAICOS) and a Gigabyte Radeon R7 240 (Oland), Chipset: "OLAND" (ChipID = 0x6613)
>Fix:
Using three new options enables both the r600 and radeonsi cards to run MesaLib 18.3.3 with no crashes and success using
LD_PRELOAD=/usr/pkg/lib/libGL.so
to run WebGL demos on firefox52 and to run a mp4 file on mpv.

The first option restores 

$NetBSD: patch-src_gallium_winsys_radeon_drm_radeon__drm__winsys.c,v 1.1 2015/04/25 11:19:18 tnn Exp $

Don't create pipe thread on NetBSD. It triggers some kernel bug.
kern/49838.

--- src/gallium/winsys/radeon/drm/radeon_drm_winsys.c.orig	2018-07-06 23:20:10.000000000 +0000
+++ src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
@@ -906,8 +906,10 @@ radeon_drm_winsys_create(int fd, const s
     /* TTM aligns the BO size to the CPU page size */
     ws->info.gart_page_size = sysconf(_SC_PAGESIZE);

+#if !defined(NO_CS_QUEUE)
     if (ws->num_cpus > 1 && debug_get_option_thread())
         util_queue_init(&ws->cs_queue, "rcs", 8, 1, 0);
+#endif

     /* Create the screen at the end. The winsys must be initialized
      * completely.


The second and third options revert the following commits in mesa3d:

$NetBSD$

Option to revert:

2017-05-15 radeonsi: enable threaded_context
Commit:	1c8f7d3be6ffb3567041f1e11a037fa7e75e4c28

https://cgit.freedesktop.org/mesa/mesa/commit/?id=1c8f7d3be6ffb3567041f1e11a037fa7e75e4c28

2018-10-16 radeonsi: use compute shaders for clear_buffer & copy_buffer
Commit: 9b331e462e5021d994859756d46cd2519d9c9c6e

https://cgit.freedesktop.org/mesa/mesa/commit/?id=9b331e462e5021d994859756d46cd2519d9c9c6e

--- src/gallium/drivers/radeonsi/si_pipe.c.orig	2018-11-01 17:49:16.000000000 +0000
+++ src/gallium/drivers/radeonsi/si_pipe.c
@@ -195,10 +195,12 @@ static void si_destroy_context(struct pi
 		sctx->b.delete_vs_state(&sctx->b, sctx->vs_blit_color_layered);
 	if (sctx->vs_blit_texcoord)
 		sctx->b.delete_vs_state(&sctx->b, sctx->vs_blit_texcoord);
+#if !defined(REVERT_COPY_CLEAR)
 	if (sctx->cs_clear_buffer)
 		sctx->b.delete_compute_state(&sctx->b, sctx->cs_clear_buffer);
 	if (sctx->cs_copy_buffer)
 		sctx->b.delete_compute_state(&sctx->b, sctx->cs_copy_buffer);
+#endif

 	if (sctx->blitter)
 		util_blitter_destroy(sctx->blitter);
@@ -367,7 +369,11 @@ static void si_set_context_param(struct 
 }

 static struct pipe_context *si_create_context(struct pipe_screen *screen,
+#if defined(REVERT_THREADED_CONTEXT)
+                                              void *priv, unsigned flags)
+#else
                                               unsigned flags)
+#endif
 {
 	struct si_context *sctx = CALLOC_STRUCT(si_context);
 	struct si_screen* sscreen = (struct si_screen *)screen;
@@ -381,7 +387,11 @@ static struct pipe_context *si_create_co
 		sscreen->record_llvm_ir = true; /* racy but not critical */

 	sctx->b.screen = screen; /* this must be set first */
+#if defined(REVERT_THREADED_CONTEXT)
+	sctx->b.priv = priv;
+#else
 	sctx->b.priv = NULL;
+#endif
 	sctx->b.destroy = si_destroy_context;
 	sctx->b.emit_string_marker = si_emit_string_marker;
 	sctx->b.set_debug_callback = si_set_debug_callback;
@@ -508,6 +518,7 @@ static struct pipe_context *si_create_co
 	if (sscreen->debug_flags & DBG(FORCE_DMA))
 		sctx->b.resource_copy_region = sctx->dma_copy;

+#if !defined(REVERT_COPY_CLEAR)
 	bool dst_stream_policy = SI_COMPUTE_DST_CACHE_POLICY != L2_LRU;
 	sctx->cs_clear_buffer = si_create_dma_compute_shader(&sctx->b,
 					     SI_COMPUTE_CLEAR_DW_PER_THREAD,
@@ -515,6 +526,7 @@ static struct pipe_context *si_create_co
 	sctx->cs_copy_buffer = si_create_dma_compute_shader(&sctx->b,
 					     SI_COMPUTE_COPY_DW_PER_THREAD,
 					     dst_stream_policy, true);
+#endif

 	sctx->blitter = util_blitter_create(&sctx->b);
 	if (sctx->blitter == NULL)
@@ -631,6 +643,7 @@ fail:
 	return NULL;
 }

+#if !defined(REVERT_THREADED_CONTEXT)
 static struct pipe_context *si_pipe_create_context(struct pipe_screen *screen,
 						   void *priv, unsigned flags)
 {
@@ -661,6 +674,7 @@ static struct pipe_context *si_pipe_crea
 				       sscreen->info.drm_major >= 3 ? si_create_fence : NULL,
 				       &((struct si_context*)ctx)->tc);
 }
+#endif

 /*
  * pipe_screen
@@ -855,7 +869,11 @@ struct pipe_screen *radeonsi_screen_crea
 							debug_options, 0);

 	/* Set functions first. */
+#if defined(REVERT_THREADED_CONTEXT)
+	sscreen->b.context_create = si_create_context;
+#else
 	sscreen->b.context_create = si_pipe_create_context;
+#endif
 	sscreen->b.destroy = si_destroy_screen;

 	si_init_screen_get_functions(sscreen);
@@ -1118,7 +1136,11 @@ struct pipe_screen *radeonsi_screen_crea
 		si_init_compiler(sscreen, &sscreen->compiler_lowp[i]);

 	/* Create the auxiliary context. This must be done last. */
+#if defined(REVERT_THREADED_CONTEXT)
+	sscreen->aux_context = sscreen->b.context_create(&sscreen->b, NULL, 0);
+#else
 	sscreen->aux_context = si_create_context(&sscreen->b, 0);
+#endif

 	if (sscreen->debug_flags & DBG(TEST_DMA))
 		si_test_dma(sscreen);

$NetBSD$

2018-10-16 radeonsi: use compute shaders for clear_buffer & copy_buffer
Commit: 9b331e462e5021d994859756d46cd2519d9c9c6e

https://cgit.freedesktop.org/mesa/mesa/commit/?id=9b331e462e5021d994859756d46cd2519d9c9c6e

--- src/gallium/drivers/radeonsi/si_compute_blit.c.orig	2018-11-01 17:49:16.000000000 +0000
+++ src/gallium/drivers/radeonsi/si_compute_blit.c
@@ -32,10 +32,17 @@ static enum si_cache_policy get_cache_po
 					     enum si_coherency coher,
 					     uint64_t size)
 {
+#if defined(REVERT_COPY_CLEAR)
+	if ((sctx->chip_class >= GFX9 && coher == SI_COHERENCY_CB_META) ||
+	    (sctx->chip_class >= CIK && coher == SI_COHERENCY_SHADER))
+		return L2_LRU;
+		
+#else
 	if ((sctx->chip_class >= GFX9 && (coher == SI_COHERENCY_CB_META ||
 					  coher == SI_COHERENCY_CP)) ||
 	    (sctx->chip_class >= CIK && coher == SI_COHERENCY_SHADER))
 		return size <= 256 * 1024 ? L2_LRU : L2_STREAM;
+#endif

 	return L2_BYPASS;
 }
@@ -149,6 +156,51 @@ void si_clear_buffer(struct si_context *
 		     uint64_t offset, uint64_t size, uint32_t *clear_value,
 		     uint32_t clear_value_size, enum si_coherency coher)
 {
+#if defined(REVERT_COPY_CLEAR)
+
+/* Recommended maximum sizes for optimal performance.
+ * Fall back to compute or SDMA if the size is greater.
+ */
+#define CP_DMA_COPY_PERF_THRESHOLD	(64 * 1024) /* copied from Vulkan */
+#define CP_DMA_CLEAR_PERF_THRESHOLD	(32 * 1024) /* guess (clear is much slower) */
+
+	struct radeon_winsys *ws = sctx->ws;
+	struct r600_resource *rdst = r600_resource(dst);
+	enum si_cache_policy cache_policy = get_cache_policy(sctx, coher, size);
+
+	if (!size)
+		return;
+
+	uint64_t aligned_size = size & ~3ull;
+
+	/* dma_clear_buffer can use clear_buffer on failure. Make sure that
+	 * doesn't happen. We don't want an infinite recursion: */
+	if (sctx->dma_cs &&
+	    !(dst->flags & PIPE_RESOURCE_FLAG_SPARSE) &&
+	    (offset % 4 == 0) &&
+	    /* CP DMA is very slow. Always use SDMA for big clears. This
+	     * alone improves DeusEx:MD performance by 70%. */
+	    (size > CP_DMA_CLEAR_PERF_THRESHOLD ||
+	     /* Buffers not used by the GFX IB yet will be cleared by SDMA.
+	      * This happens to move most buffer clears to SDMA, including
+	      * DCC and CMASK clears, because pipe->clear clears them before
+	      * si_emit_framebuffer_state (in a draw call) adds them.
+	      * For example, DeusEx:MD has 21 buffer clears per frame and all
+	      * of them are moved to SDMA thanks to this. */
+	     !ws->cs_is_buffer_referenced(sctx->gfx_cs, rdst->buf,
+				          RADEON_USAGE_READWRITE))) {
+		si_sdma_clear_buffer(sctx, dst, offset, aligned_size, *clear_value);
+
+		offset += aligned_size;
+		size -= aligned_size;
+	} else if (aligned_size >= 4) {
+		si_cp_dma_clear_buffer(sctx, dst, offset, aligned_size, *clear_value,
+				       coher, get_cache_policy(sctx, coher, size));
+
+		offset += aligned_size;
+		size -= aligned_size;
+	}
+#else
 	if (!size)
 		return;

@@ -227,6 +279,7 @@ void si_clear_buffer(struct si_context *
 		offset += aligned_size;
 		size -= aligned_size;
 	}
+#endif

 	/* Handle non-dword alignment. */
 	if (size) {
@@ -244,6 +297,58 @@ static void si_pipe_clear_buffer(struct 
 				 const void *clear_value,
 				 int clear_value_size)
 {
+#if defined(REVERT_COPY_CLEAR)
+	struct si_context *sctx = (struct si_context*)ctx;
+	uint32_t dword_value;
+
+	assert(offset % clear_value_size == 0);
+	assert(size % clear_value_size == 0);
+
+	if (clear_value_size > 4) {
+		bool clear_dword_duplicated = true;
+
+		/* See if we can lower large fills to dword fills. */
+		for (unsigned i = 1; i < clear_value_size / 4; i++)
+			if (((uint32_t *)clear_value)[0] != ((uint32_t*)clear_value)[i]) {
+				clear_dword_duplicated = false;
+				break;
+			}
+
+		if (!clear_dword_duplicated) {
+			/* Use transform feedback for 64-bit, 96-bit, and
+			 * 128-bit fills.
+			 */
+			union pipe_color_union streamout_clear_value;
+
+			memcpy(&streamout_clear_value, clear_value, clear_value_size);
+			si_blitter_begin(sctx, SI_DISABLE_RENDER_COND);
+			util_blitter_clear_buffer(sctx->blitter, dst, offset,
+						  size, clear_value_size / 4,
+						  &streamout_clear_value);
+			si_blitter_end(sctx);
+			return;
+		}
+	}
+
+	/* Expand the clear value to a dword. */
+	switch (clear_value_size) {
+	case 1:
+		dword_value = *(uint8_t*)clear_value;
+		dword_value |= (dword_value << 8) |
+			       (dword_value << 16) |
+			       (dword_value << 24);
+		break;
+	case 2:
+		dword_value = *(uint16_t*)clear_value;
+		dword_value |= dword_value << 16;
+		break;
+	default:
+		dword_value = *(uint32_t*)clear_value;
+	}
+
+	si_clear_buffer(sctx, dst, offset, size, &dword_value,
+			clear_value_size, SI_COHERENCY_SHADER);
+#else
 	enum si_coherency coher;

 	if (dst->flags & SI_RESOURCE_FLAG_SO_FILLED_SIZE)
@@ -253,6 +358,7 @@ static void si_pipe_clear_buffer(struct 

 	si_clear_buffer((struct si_context*)ctx, dst, offset, size, (uint32_t*)clear_value,
 			clear_value_size, coher);
+#endif
 }

 void si_copy_buffer(struct si_context *sctx,
@@ -265,6 +371,17 @@ void si_copy_buffer(struct si_context *s
 	enum si_coherency coher = SI_COHERENCY_SHADER;
 	enum si_cache_policy cache_policy = get_cache_policy(sctx, coher, size);

+#if defined(REVERT_COPY_CLEAR)
+	si_cp_dma_copy_buffer(sctx, dst, src, dst_offset, src_offset, size,
+			      0, coher, cache_policy);
+ 
+	if (cache_policy != L2_BYPASS)
+ 		r600_resource(dst)->TC_L2_dirty = true;
+ 
+	/* If it's not a prefetch... */
+	if (dst_offset != src_offset)
+ 		sctx->num_cp_dma_calls++;
+#else
 	/* Only use compute for VRAM copies on dGPUs. */
 	if (sctx->screen->info.has_dedicated_vram &&
 	    r600_resource(dst)->domains & RADEON_DOMAIN_VRAM &&
@@ -277,6 +394,7 @@ void si_copy_buffer(struct si_context *s
 		si_cp_dma_copy_buffer(sctx, dst, src, dst_offset, src_offset, size,
 				      0, coher, cache_policy);
 	}
+#endif
 }

 void si_init_compute_blit_functions(struct si_context *sctx)

An altered version of the above patches has been tested successfully for the radeonsi card up through mesa3d development

2019-02-09: wsi/display: add comment

$NetBSD$

Option to revert:

2017-05-15 radeonsi: enable threaded_context
Commit:	1c8f7d3be6ffb3567041f1e11a037fa7e75e4c28

https://cgit.freedesktop.org/mesa/mesa/commit/?id=1c8f7d3be6ffb3567041f1e11a037fa7e75e4c28

2018-10-16 radeonsi: use compute shaders for clear_buffer & copy_buffer
Commit: 9b331e462e5021d994859756d46cd2519d9c9c6e

https://cgit.freedesktop.org/mesa/mesa/commit/?id=9b331e462e5021d994859756d46cd2519d9c9c6e

--- src/gallium/drivers/radeonsi/si_pipe.c.orig	2019-02-02 23:08:03.000000000 +0000
+++ src/gallium/drivers/radeonsi/si_pipe.c
@@ -197,10 +197,12 @@ static void si_destroy_context(struct pi
 		sctx->b.delete_vs_state(&sctx->b, sctx->vs_blit_color_layered);
 	if (sctx->vs_blit_texcoord)
 		sctx->b.delete_vs_state(&sctx->b, sctx->vs_blit_texcoord);
+#if !defined(REVERT_COPY_CLEAR)
 	if (sctx->cs_clear_buffer)
 		sctx->b.delete_compute_state(&sctx->b, sctx->cs_clear_buffer);
 	if (sctx->cs_copy_buffer)
 		sctx->b.delete_compute_state(&sctx->b, sctx->cs_copy_buffer);
+#endif
 	if (sctx->cs_copy_image)
 		sctx->b.delete_compute_state(&sctx->b, sctx->cs_copy_image);
 	if (sctx->cs_copy_image_1d_array)
@@ -373,7 +375,11 @@ static void si_set_context_param(struct 
 }

 static struct pipe_context *si_create_context(struct pipe_screen *screen,
+#if defined(REVERT_THREADED_CONTEXT)
+                                              void *priv, unsigned flags)
+#else
                                               unsigned flags)
+#endif
 {
 	struct si_context *sctx = CALLOC_STRUCT(si_context);
 	struct si_screen* sscreen = (struct si_screen *)screen;
@@ -388,7 +394,11 @@ static struct pipe_context *si_create_co
 		sscreen->record_llvm_ir = true; /* racy but not critical */

 	sctx->b.screen = screen; /* this must be set first */
+#if defined(REVERT_THREADED_CONTEXT)
+	sctx->b.priv = priv;
+#else
 	sctx->b.priv = NULL;
+#endif
 	sctx->b.destroy = si_destroy_context;
 	sctx->b.emit_string_marker = si_emit_string_marker;
 	sctx->b.set_debug_callback = si_set_debug_callback;
@@ -622,6 +632,7 @@ fail:
 	return NULL;
 }

+#if !defined(REVERT_THREADED_CONTEXT)
 static struct pipe_context *si_pipe_create_context(struct pipe_screen *screen,
 						   void *priv, unsigned flags)
 {
@@ -652,6 +663,7 @@ static struct pipe_context *si_pipe_crea
 				       sscreen->info.drm_major >= 3 ? si_create_fence : NULL,
 				       &((struct si_context*)ctx)->tc);
 }
+#endif

 /*
  * pipe_screen
@@ -847,7 +859,11 @@ struct pipe_screen *radeonsi_screen_crea
 							debug_options, 0);

 	/* Set functions first. */
+#if defined(REVERT_THREADED_CONTEXT)
+	sscreen->b.context_create = si_create_context;
+#else
 	sscreen->b.context_create = si_pipe_create_context;
+#endif
 	sscreen->b.destroy = si_destroy_screen;

 	si_init_screen_get_functions(sscreen);
@@ -1116,7 +1132,11 @@ struct pipe_screen *radeonsi_screen_crea
 		si_init_compiler(sscreen, &sscreen->compiler_lowp[i]);

 	/* Create the auxiliary context. This must be done last. */
+#if defined(REVERT_THREADED_CONTEXT)
+	sscreen->aux_context = sscreen->b.context_create(&sscreen->b, NULL, 0);
+#else
 	sscreen->aux_context = si_create_context(&sscreen->b, 0);
+#endif

 	if (sscreen->debug_flags & DBG(TEST_DMA))
 		si_test_dma(sscreen);

$NetBSD$

2018-10-16 radeonsi: use compute shaders for clear_buffer & copy_buffer
Commit: 9b331e462e5021d994859756d46cd2519d9c9c6e

https://cgit.freedesktop.org/mesa/mesa/commit/?id=9b331e462e5021d994859756d46cd2519d9c9c6e

--- src/gallium/drivers/radeonsi/si_compute_blit.c.orig	2019-02-02 23:08:03.000000000 +0000
+++ src/gallium/drivers/radeonsi/si_compute_blit.c
@@ -33,10 +33,17 @@ static enum si_cache_policy get_cache_po
 					     enum si_coherency coher,
 					     uint64_t size)
 {
+#if defined(REVERT_COPY_CLEAR)
+	if ((sctx->chip_class >= GFX9 && coher == SI_COHERENCY_CB_META) ||
+	    (sctx->chip_class >= CIK && coher == SI_COHERENCY_SHADER))
+		return L2_LRU;
+		
+#else
 	if ((sctx->chip_class >= GFX9 && (coher == SI_COHERENCY_CB_META ||
 					  coher == SI_COHERENCY_CP)) ||
 	    (sctx->chip_class >= CIK && coher == SI_COHERENCY_SHADER))
 		return size <= 256 * 1024 ? L2_LRU : L2_STREAM;
+#endif

 	return L2_BYPASS;
 }
@@ -179,6 +186,52 @@ void si_clear_buffer(struct si_context *
 		     uint64_t offset, uint64_t size, uint32_t *clear_value,
 		     uint32_t clear_value_size, enum si_coherency coher)
 {
+#if defined(REVERT_COPY_CLEAR)
+
+/* Recommended maximum sizes for optimal performance.
+ * Fall back to compute or SDMA if the size is greater.
+ */
+#define CP_DMA_COPY_PERF_THRESHOLD	(64 * 1024) /* copied from Vulkan */
+#define CP_DMA_CLEAR_PERF_THRESHOLD	(32 * 1024) /* guess (clear is much slower) */
+
+	struct radeon_winsys *ws = sctx->ws;
+	struct si_resource *rdst = si_resource(dst);
+	enum si_cache_policy cache_policy = get_cache_policy(sctx, coher, size);
+
+	if (!size)
+		return;
+
+	uint64_t aligned_size = size & ~3ull;
+
+	/* dma_clear_buffer can use clear_buffer on failure. Make sure that
+	 * doesn't happen. We don't want an infinite recursion: */
+	if (sctx->dma_cs &&
+	    !(dst->flags & PIPE_RESOURCE_FLAG_SPARSE) &&
+	    (offset % 4 == 0) &&
+	    /* CP DMA is very slow. Always use SDMA for big clears. This
+	     * alone improves DeusEx:MD performance by 70%. */
+	    (size > CP_DMA_CLEAR_PERF_THRESHOLD ||
+	     /* Buffers not used by the GFX IB yet will be cleared by SDMA.
+	      * This happens to move most buffer clears to SDMA, including
+	      * DCC and CMASK clears, because pipe->clear clears them before
+	      * si_emit_framebuffer_state (in a draw call) adds them.
+	      * For example, DeusEx:MD has 21 buffer clears per frame and all
+	      * of them are moved to SDMA thanks to this. */
+	     !ws->cs_is_buffer_referenced(sctx->gfx_cs, rdst->buf,
+				          RADEON_USAGE_READWRITE))) {
+		si_sdma_clear_buffer(sctx, dst, offset, aligned_size, *clear_value);
+
+		offset += aligned_size;
+		size -= aligned_size;
+	} else if (aligned_size >= 4) {
+		si_cp_dma_clear_buffer(sctx, sctx->gfx_cs, dst, offset,
+				       aligned_size, *clear_value, 0, coher,
+				       get_cache_policy(sctx, coher, size));
+
+		offset += aligned_size;
+		size -= aligned_size;
+	}
+#else
 	if (!size)
 		return;

@@ -257,6 +310,7 @@ void si_clear_buffer(struct si_context *
 		offset += aligned_size;
 		size -= aligned_size;
 	}
+#endif

 	/* Handle non-dword alignment. */
 	if (size) {
@@ -274,6 +328,58 @@ static void si_pipe_clear_buffer(struct 
 				 const void *clear_value,
 				 int clear_value_size)
 {
+#if defined(REVERT_COPY_CLEAR)
+	struct si_context *sctx = (struct si_context*)ctx;
+	uint32_t dword_value;
+
+	assert(offset % clear_value_size == 0);
+	assert(size % clear_value_size == 0);
+
+	if (clear_value_size > 4) {
+		bool clear_dword_duplicated = true;
+
+		/* See if we can lower large fills to dword fills. */
+		for (unsigned i = 1; i < clear_value_size / 4; i++)
+			if (((uint32_t *)clear_value)[0] != ((uint32_t*)clear_value)[i]) {
+				clear_dword_duplicated = false;
+				break;
+			}
+
+		if (!clear_dword_duplicated) {
+			/* Use transform feedback for 64-bit, 96-bit, and
+			 * 128-bit fills.
+			 */
+			union pipe_color_union streamout_clear_value;
+
+			memcpy(&streamout_clear_value, clear_value, clear_value_size);
+			si_blitter_begin(sctx, SI_DISABLE_RENDER_COND);
+			util_blitter_clear_buffer(sctx->blitter, dst, offset,
+						  size, clear_value_size / 4,
+						  &streamout_clear_value);
+			si_blitter_end(sctx);
+			return;
+		}
+	}
+
+	/* Expand the clear value to a dword. */
+	switch (clear_value_size) {
+	case 1:
+		dword_value = *(uint8_t*)clear_value;
+		dword_value |= (dword_value << 8) |
+			       (dword_value << 16) |
+			       (dword_value << 24);
+		break;
+	case 2:
+		dword_value = *(uint16_t*)clear_value;
+		dword_value |= dword_value << 16;
+		break;
+	default:
+		dword_value = *(uint32_t*)clear_value;
+	}
+
+	si_clear_buffer(sctx, dst, offset, size, &dword_value,
+			clear_value_size, SI_COHERENCY_SHADER);
+#else
 	enum si_coherency coher;

 	if (dst->flags & SI_RESOURCE_FLAG_SO_FILLED_SIZE)
@@ -283,6 +389,7 @@ static void si_pipe_clear_buffer(struct 

 	si_clear_buffer((struct si_context*)ctx, dst, offset, size, (uint32_t*)clear_value,
 			clear_value_size, coher);
+#endif
 }

 void si_copy_buffer(struct si_context *sctx,
@@ -295,6 +402,17 @@ void si_copy_buffer(struct si_context *s
 	enum si_coherency coher = SI_COHERENCY_SHADER;
 	enum si_cache_policy cache_policy = get_cache_policy(sctx, coher, size);

+#if defined(REVERT_COPY_CLEAR)
+	si_cp_dma_copy_buffer(sctx, dst, src, dst_offset, src_offset, size,
+			      0, coher, cache_policy);
+ 
+	if (cache_policy != L2_BYPASS)
+ 		si_resource(dst)->TC_L2_dirty = true;
+ 
+	/* If it's not a prefetch... */
+	if (dst_offset != src_offset)
+		sctx->num_cp_dma_calls++;
+#else
 	/* Only use compute for VRAM copies on dGPUs. */
 	if (sctx->screen->info.has_dedicated_vram &&
 	    si_resource(dst)->domains & RADEON_DOMAIN_VRAM &&
@@ -307,6 +425,7 @@ void si_copy_buffer(struct si_context *s
 		si_cp_dma_copy_buffer(sctx, dst, src, dst_offset, src_offset, size,
 				      0, coher, cache_policy);
 	}
+#endif
 }

 void si_compute_copy_image(struct si_context *sctx,

>Audit-Trail:
From: coypu@sdf.org
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: pkg/53935
Date: Sun, 3 Feb 2019 06:23:10 +0000

 > Don't create pipe thread on NetBSD. It triggers some kernel bug.
 > kern/49838.

 I fixed kern/49838. Is it triggering a problem now?

From: Tobias Nygren <tnn@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: coypu@sdf.org, davshao@gmail.com
Subject: Re: pkg/53935
Date: Sun, 3 Feb 2019 10:00:36 +0100

 On Sun,  3 Feb 2019 06:25:00 +0000 (UTC)
 coypu@sdf.org wrote:

 >  > Don't create pipe thread on NetBSD. It triggers some kernel bug.
 >  > kern/49838.
 >  
 >  I fixed kern/49838. Is it triggering a problem now?

 No, the issue is different; it is not a kernel panic but a userland crash.
 Root cause may well be in kernel though.
 I can reproduce the problem, both with CEDAR (r600) and BONAIRE (R7)
 Have been using CEDAR with AccelMethod "exa" for now to get a stable system.

From: coypu@sdf.org
To: Tobias Nygren <tnn@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, davshao@gmail.com
Subject: Re: pkg/53935
Date: Sun, 3 Feb 2019 15:29:40 +0000

 On Sun, Feb 03, 2019 at 10:00:36AM +0100, Tobias Nygren wrote:
 > On Sun,  3 Feb 2019 06:25:00 +0000 (UTC)
 > coypu@sdf.org wrote:
 > 
 > >  > Don't create pipe thread on NetBSD. It triggers some kernel bug.
 > >  > kern/49838.
 > >  
 > >  I fixed kern/49838. Is it triggering a problem now?
 > 
 > No, the issue is different; it is not a kernel panic but a userland crash.
 > Root cause may well be in kernel though.
 > I can reproduce the problem, both with CEDAR (r600) and BONAIRE (R7)
 > Have been using CEDAR with AccelMethod "exa" for now to get a stable system.

 hmm, I assumed my hardware is faulty (it has all the reasons to be,
 given what I did to it). I wonder if I am seeing the same.

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: pkg/53935
Date: Sun, 12 Jan 2020 09:57:40 +0100

 PR 54854 might be another instance of this (also seen on CEDAR), but:

  > I can reproduce the problem, both with CEDAR (r600) and BONAIRE (R7)
  > Have been using CEDAR with AccelMethod "exa" for now to get a stable system.

 ... that did not help in my case.

 Martin

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.