NetBSD Problem Report #49838

From www@NetBSD.org  Mon Apr 13 10:17:14 2015
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id A71E8A6555
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 13 Apr 2015 10:17:14 +0000 (UTC)
Message-Id: <20150413101713.5350BA65BE@mollari.NetBSD.org>
Date: Mon, 13 Apr 2015 10:17:13 +0000 (UTC)
From: tnn@nygren.pp.se
Reply-To: tnn@nygren.pp.se
To: gnats-bugs@NetBSD.org
Subject: radeon drm2 kernel panic
X-Send-Pr-Version: www-1.0

>Number:         49838
>Category:       kern
>Synopsis:       radeon drm2 kernel panic
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    riastradh
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Apr 13 10:20:00 +0000 2015
>Closed-Date:    Tue Dec 20 05:23:21 +0000 2016
>Last-Modified:  Tue Mar 21 01:10:00 +0000 2017
>Originator:     Tobias Nygren
>Release:        current
>Organization:
>Environment:
7.99.9 amd64 Sun Apr 12 21:37:21 CEST 2015
>Description:
This happened while I was messing with the "teapot" demo from MesaDemos.
The teapot demo itself crashes after a few seconds, so this panic is possibly related to client getting untimely SIGSEGV.

rb_tree_iterate() at netbsd:rb_tree_iterate+0x1a6
drm_vma_offset_remove() at netbsd:drm_vma_offset_remove+0x2d
ttm_bo_unref() at netbsd:ttm_bo_unref+0x79
radeon_bo_unref() at netbsd:radeon_bo_unref+0x3d
radeon_gem_object_free() at netbsd:radeon_gem_object_free+0x1c
drm_gem_object_handle_unreference_unlocked() at netbsd:drm_gem_object_handle_unreference_unlocked+0xf5
drm_gem_handle_delete() at netbsd:drm_gem_handle_delete+0x86
drm_ioctl() at netbsd:drm_ioctl+0x14f
sys_ioctl() at netbsd:sys_ioctl+0x2c5
syscall() at netbsd:syscall+0xb2

>How-To-Repeat:
Using wip versions of MesaLib, xorg-server, run ./teapot from graphics/MesaDemos repeatedly on Radeon HD 5450.

teapot crashes after a few seconds with the following, sometimes takes kernel with it:

0  0x00007f7ff1528a2f in memcpy () from /usr/lib/libc.so.12
#1  0x00007f7ff077b1a4 in u_upload_data () from /usr/pkg/lib/dri/r600_dri.so
#2  0x00007f7ff0992a87 in r600_set_constant_buffer ()
   from /usr/pkg/lib/dri/r600_dri.so
#3  0x00007f7ff070a5d2 in cso_set_constant_buffer ()
   from /usr/pkg/lib/dri/r600_dri.so
#4  0x00007f7ff05f5aaa in st_upload_constants ()
   from /usr/pkg/lib/dri/r600_dri.so
#5  0x00007f7ff05f588f in st_validate_state ()
   from /usr/pkg/lib/dri/r600_dri.so
#6  0x00007f7ff060a735 in st_draw_vbo () from /usr/pkg/lib/dri/r600_dri.so
#7  0x00007f7ff05e1d10 in vbo_exec_vtx_flush ()
   from /usr/pkg/lib/dri/r600_dri.so
#8  0x00007f7ff05c8aaa in vbo_exec_FlushVertices ()
   from /usr/pkg/lib/dri/r600_dri.so
#9  0x00007f7ff04c81ec in _mesa_set_enable () from /usr/pkg/lib/dri/r600_dri.so
#10 0x0000000000403939 in draw ()
#11 0x00007f7ff741fb4c in processWindowWorkList ()
   from /usr/pkg/lib/libglut.so.3
#12 0x00007f7ff741e6b3 in glutMainLoop () from /usr/pkg/lib/libglut.so.3
#13 0x000000000040327b in main ()

>Fix:
unknown

>Release-Note:

>Audit-Trail:
From: Tobias Nygren <tnn@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49838: radeon drm2 kernel panic
Date: Mon, 13 Apr 2015 18:28:20 +0200

 > teapot crashes after a few seconds with the following, sometimes takes kernel with it:

 Found a workaround for the teapot segfault. the r600 driver in MesaLib
 10.x runs a separate worker thread for DRM_IOCTL_RADEON_CS if it
 detects multiprocessor hardware. It gets a segfault when the worker
 thread runs it's ioctl while the other thread loads things into the
 graphics pipeline. Note sure if this is a Mesa or kernel bug.

 Removing this code from Mesa's
 radeon_drm_winsys.c:radeon_drm_winsys_create() makes it not crash any
 more.

 -   if (ws->num_cpus > 1 && debug_get_option_thread())
 -       ws->thread = pipe_thread_create(radeon_drm_cs_emit_ioctl, ws);

Responsible-Changed-From-To: kern-bug-people->riastradh
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Fri, 25 Mar 2016 18:57:06 +0000
Responsible-Changed-Why:
I'm irresponsible for this one!


From: Rhialto <rhialto@falu.nl>
To: gnats-bugs@NetBSD.org
Cc: rhialto@falu.nl
Subject: Re: kern/49838
Date: Sat, 5 Nov 2016 19:32:17 +0100

 This patch/hack also helps me with the VICE emulator with gtk gui and
 using GL for hardware scaling of screen displays.

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 26 Nov 2016 07:52:13 +0000
State-Changed-Why:
the trace looks a lot like 50349, which we think is now finally found and
fixed; you might want to give that a try.


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49838: radeon drm2 kernel panic
Date: Sat, 26 Nov 2016 10:04:48 +0000

 On Mon, Apr 13, 2015 at 04:30:00PM +0000, Tobias Nygren wrote:
  >  Found a workaround for the teapot segfault. the r600 driver in MesaLib
  >  10.x runs a separate worker thread for DRM_IOCTL_RADEON_CS if it
  >  detects multiprocessor hardware. It gets a segfault when the worker
  >  thread runs it's ioctl while the other thread loads things into the
  >  graphics pipeline. Note sure if this is a Mesa or kernel bug.
  >  
  >  Removing this code from Mesa's
  >  radeon_drm_winsys.c:radeon_drm_winsys_create() makes it not crash any
  >  more.
  >  
  >  -   if (ws->num_cpus > 1 && debug_get_option_thread())
  >  -       ws->thread = pipe_thread_create(radeon_drm_cs_emit_ioctl, ws);

 Also it turns out that one can test this hack a lot more easily than
 by removing/commenting out these lines and recompiling:

    % setenv RADEON_THREAD FALSE
    % ./teapot

 This is not making it run for me but at the moment it does make the
 difference between glxgears running or not.

 (...this after patching gallium to use pthreads, without which it
 doesn't even load. go figure.)

 -- 
 David A. Holland
 dholland@netbsd.org

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/49838: radeon drm2 kernel panic
Date: Tue, 29 Nov 2016 23:44:04 +0000

 On Sat, Nov 26, 2016 at 10:05:01AM +0000, David Holland wrote:
  >  (...this after patching gallium to use pthreads, without which it
  >  doesn't even load. go figure.)

 (For the record, this patch has now been committed.)

 -- 
 David A. Holland
 dholland@netbsd.org

State-Changed-From-To: feedback->closed
State-Changed-By: tnn@NetBSD.org
State-Changed-When: Tue, 20 Dec 2016 05:23:21 +0000
State-Changed-Why:
couldn't reproduce original problem in 10 tries, so marking as fixed


From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: riastradh@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
    tnn@NetBSD.org
Subject: re: kern/49838 (radeon drm2 kernel panic)
Date: Tue, 20 Dec 2016 18:16:24 +1100

 tnn@NetBSD.org writes:
 > Synopsis: radeon drm2 kernel panic
 > 
 > State-Changed-From-To: feedback->closed
 > State-Changed-By: tnn@NetBSD.org
 > State-Changed-When: Tue, 20 Dec 2016 05:23:21 +0000
 > State-Changed-Why:
 > couldn't reproduce original problem in 10 tries, so marking as fixed

 this is most likely the rbtree/lock problem Maya found.
 from the original report:

 > rb_tree_iterate() at netbsd:rb_tree_iterate+0x1a6
 > drm_vma_offset_remove() at netbsd:drm_vma_offset_remove+0x2d


 .mrg.

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49838
Date: Tue, 21 Mar 2017 09:08:33 +0800 (PHT)

 Just another data point...

 Even with the fix for this PR committed, I'm seeing strange behavior 
 from the x11/xlockmore package's xlock image.  In particular, on my 
 Radeon HD 3450 viedo card, all attempts to run "xlock -mode maze" (or 
 any other mode!) results in a call to abort(3), several frames deep in 
 library calls (starting with libGL).  Running the command under ktrace 
 shows that only one lwp/thread exists, however using the work-around 
 mentioned in the PR (setenv RADEON_THREAD FALSE) allows xlock to 
 function normally!



 +------------------+--------------------------+------------------------+
 | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:      |
 | (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
 | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
 +------------------+--------------------------+------------------------+

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.