NetBSD Problem Report #54515

From reinoud@13thmonkey.org  Fri Aug 30 19:50:57 2019
Return-Path: <reinoud@13thmonkey.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id CCE3F7A153
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 30 Aug 2019 19:50:57 +0000 (UTC)
Message-Id: <20190830195053.7652CC1EA85@dropje.13thmonkey.org>
Date: Fri, 30 Aug 2019 21:50:53 +0200 (CEST)
From: reinoud@13thmonkey.org
Reply-To: reinoud@13thmonkey.org
To: gnats-bugs@NetBSD.org
Subject: Atomic update failure message in i915/intel_sprite.c
X-Send-Pr-Version: 3.95

>Number:         54515
>Category:       kern
>Synopsis:       Atomic update failure message in i915/intel_sprite.c
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 30 19:55:00 +0000 2019
>Closed-Date:    
>Last-Modified:  Sat Apr 17 14:44:37 +0000 2021
>Originator:     Reinoud Zandijk
>Release:        NetBSD 9.0_BETA
>Organization:
NetBSD

>Environment:


System: NetBSD dropje 9.0_BETA NetBSD 9.0_BETA (GENERIC) #0: Wed Aug 28 10:01:57 UTC 2019 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:

This machine can crash its i965 GPU under normal 2D usage. Its result is a 60
second display/mouse freeze until it resets the GPU and the machine unfreezes.
In the mean time the i915 driver has dumped its memory to dmesg. At times it
gives

kern error: [drm:(/usr/src/sys/external/bsd/drm2/dist/drm/i915/intel_sprite.c:132)intel_pipe_update_start]
*ERROR* Potential atomic update failure on pipe A: -35

The error -35 under linux is -EAGAIN so most likely interaction between linux
and netbsd code.

Relevant parts from Xorg.log:
[    50.726] (II) intel(0): [DRI2] Setup complete
[    50.726] (II) intel(0): [DRI2]   DRI driver: i965
[    50.726] (II) intel(0): [DRI2]   VDPAU driver: va_gl

Already running with
        Option     "AccelMethod"                "UXA"
in xorg.conf


>How-To-Repeat:
Boot NetBSD on and amd64 with an i965 GPU and work in X. Using gvim or pidgin
can crash the GPU easily due to its cursor/sprite update.


>Fix:


phone@NetBSD.org suggested it might have something to do with 
external/bsd/common/include/linux/err.h rev 1.3

Possible diagnostic path provided by phone@ (untested) :
https://www.netbsd.org/~mrg/syscall.diff :

---------
Index: sys/arch/x86/x86/syscall.c
===================================================================
RCS file: /cvsroot/src/sys/arch/x86/x86/syscall.c,v
retrieving revision 1.18
diff -p -u -r1.18 syscall.c
--- sys/arch/x86/x86/syscall.c	6 Apr 2019 11:54:21 -0000	1.18
+++ sys/arch/x86/x86/syscall.c	30 Aug 2019 19:32:00 -0000
@@ -47,6 +47,10 @@ __KERNEL_RCSID(0, "$NetBSD: syscall.c,v 
 #include <machine/psl.h>
 #include <machine/userret.h>

+// XXXMRG
+#include <machine/db_machdep.h>
+#include <ddb/db_interface.h>
+
 #include "opt_dtrace.h"

 #ifndef __x86_64__
@@ -143,6 +147,12 @@ syscall(struct trapframe *frame)
 		X86_TF_RFLAGS(frame) &= ~PSL_C;	/* carry bit */
 	} else {
 		switch (error) {
+#if 1 /* COMPAT_DRM */
+		case ELAST+1: /* linux-y ERESTARTSYS */
+			uprintf("%s: got linux ERESTARTSYS\n", __func__);
+			db_stacktrace();
+#endif
+		/* FALLTHROUGH */
 		case ERESTART:
 			/*
 			 * The offset to adjust the PC by depends on whether we
---------

>Release-Note:

>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: kern/54515: Atomic update failure message in i915/intel_sprite.c
Date: Sat, 31 Aug 2019 07:00:04 +1000

 > phone@NetBSD.org suggested it might have something to do with 
 > external/bsd/common/include/linux/err.h rev 1.3

 i think you misunderstood me.

 i'm saying, use the ideas present in that change as a way to
 diagnose this problem, which _may_ be a similar problem (but
 not likely the same problem.)

 the other patch is a way to find missing conversions similar
 to the change above.  i'm still tempted to commit it to get
 better diags in this case, but i'd like it to be less #ifdefy.
 (it also has a matching change in netbsd32_syscallc.c, and
 needs one for i386.)

 thanks.


 .mrg.

From: Reinoud Zandijk <reinoud@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54515: Atomic update failure message in i915/intel_sprite.c
Date: Tue, 3 Sep 2019 16:37:59 +0200

 It might be related to:
 kern error:
 [drm:(/usr/src/sys/external/bsd/drm2/dist/drm/i915/i915_irq.c:3093)i915_hangcheck_elapsed]
 *ERROR* Hangcheck timer elapsed... blitter ring idle

From: "Maya Rashish" <maya@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54515 CVS commit: src/sys/external/bsd/drm2/dist/drm/i915
Date: Sat, 31 Oct 2020 04:05:42 +0000

 Module Name:	src
 Committed By:	maya
 Date:		Sat Oct 31 04:05:42 UTC 2020

 Modified Files:
 	src/sys/external/bsd/drm2/dist/drm/i915: intel_sprite.c

 Log Message:
 Match linux here and wait without interrupts.

 From David H. Gutteridge in PR port-amd64/55555
 There's a second part to the patch, but "make our code behave the way
 the upstream code does" is very welcome.
 Also PR kern/54515 and possibly others.


 To generate a diff of this commit:
 cvs rdiff -u -r1.10 -r1.11 \
     src/sys/external/bsd/drm2/dist/drm/i915/intel_sprite.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54515 CVS commit: [netbsd-9] src/sys/external/bsd/drm2/dist/drm/i915
Date: Sun, 29 Nov 2020 11:34:04 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sun Nov 29 11:34:04 UTC 2020

 Modified Files:
 	src/sys/external/bsd/drm2/dist/drm/i915 [netbsd-9]: intel_sprite.c

 Log Message:
 Pull up following revision(s) (requested by maya in ticket #1136):

 	sys/external/bsd/drm2/dist/drm/i915/intel_sprite.c: revision 1.11

 Match linux here and wait without interrupts.

 From David H. Gutteridge in PR port-amd64/55555
 There's a second part to the patch, but "make our code behave the way
 the upstream code does" is very welcome.

 Also PR kern/54515 and possibly others.


 To generate a diff of this commit:
 cvs rdiff -u -r1.9 -r1.9.4.1 \
     src/sys/external/bsd/drm2/dist/drm/i915/intel_sprite.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: maya@NetBSD.org
State-Changed-When: Sat, 17 Apr 2021 08:02:10 +0000
State-Changed-Why:
Is this still an issue?


From: Reinoud Zandijk <reinoud@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54515 (Atomic update failure message in i915/intel_sprite.c)
Date: Sat, 17 Apr 2021 13:22:55 +0200

 On Sat, Apr 17, 2021 at 08:02:11AM +0000, maya@NetBSD.org wrote:
 > Synopsis: Atomic update failure message in i915/intel_sprite.c
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: maya@NetBSD.org
 > State-Changed-When: Sat, 17 Apr 2021 08:02:10 +0000
 > State-Changed-Why:
 > Is this still an issue?

 It ist still showing up a lot in NetBSD 9.99.81 (GENERIC) #0: Sat Mar 27
 14:24:25 CET 2021.i

 Running
 	zcat /var/log/messages.* | grep atomic | wc
 gives 297 lines and I haven't been using the desktop that often too, so yeah
 its still there.

State-Changed-From-To: feedback->open
State-Changed-By: maya@NetBSD.org
State-Changed-When: Sat, 17 Apr 2021 14:44:37 +0000
State-Changed-Why:


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.