NetBSD Problem Report #50349

From dholland@macaran.eecs.harvard.edu  Wed Oct 21 01:27:12 2015
Return-Path: <dholland@macaran.eecs.harvard.edu>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id BD242A6531
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 21 Oct 2015 01:27:12 +0000 (UTC)
Message-Id: <20151021000734.0C22C6E246@macaran.eecs.harvard.edu>
Date: Tue, 20 Oct 2015 20:07:33 -0400 (EDT)
From: dholland@eecs.harvard.edu
Reply-To: dholland@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: radeondrmkms vt-switching crash
X-Send-Pr-Version: 3.95

>Number:         50349
>Category:       kern
>Synopsis:       radeondrmkms vt-switching crash
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    riastradh
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Oct 21 01:30:00 +0000 2015
>Closed-Date:    Sat Dec 03 12:51:19 +0000 2016
>Last-Modified:  Sat Dec 03 12:51:19 +0000 2016
>Originator:     David A. Holland
>Release:        NetBSD 7.99.20 (20150727)
>Organization:
>Environment:
System: NetBSD macaran 7.99.20 NetBSD 7.99.20 (MACARAN) #30: Mon Jul 27 20:25:15 EDT 2015 dholland@macaran:/usr/src/sys/arch/amd64/compile/MACARAN amd64
Architecture: x86_64
Machine: amd64
>Description:

Switching VTs a lot (especially between two X servers) seems to
eventually crash. With three X servers going it feels like it happens
faster but that may just be from switching more.

The crash is this assertion from common/lib/libc/gen/rb.c:405:
                KASSERT(RB_BLACK_P(grandpa));

(gdb) p grandpa
$1 = (struct rb_node *) 0xfffffe809c7499e8
(gdb) p/x *grandpa
$3 = {rb_nodes = {0xfffffe800efd05e8, 0xfffffe80be2719e8}, 
  rb_info = 0xfffffe818b4521ea}

curiously, grandpa->rb_info & RB_RED is 0... oh, I bet it's an earlier
value of grandpa and the value assigned on the previous line hasn't
shown through to gdb yet.
                grandpa = RB_FATHER(father);

(gdb) p father
$4 = (struct rb_node *) 0xfffffe807aa289e8
(gdb) p *father
$5 = {rb_nodes = {0x0, 0xfffffe80be2719e8}, rb_info = 3}

meaning that grandpa would be NULL.
however,
  #define RB_BLACK_P(rb) \
	(RB_SENTINEL_P(rb) || ((rb)->rb_info & RB_FLAG_RED) == 0)
and
  #define RB_SENTINEL_P(rb)	((rb) == NULL)

so theoretically it should still pass the assertion... I dunno what's
going on.

Other locals:

(gdb) p self
$6 = (struct rb_node *) 0xfffffe80be2719e8
(gdb) p/x *self
$8 = {rb_nodes = {0x0, 0x0}, rb_info = 0xfffffe809c7499eb}

(gdb) p *rbt
$13 = {rbt_root = 0xfffffe8124977de8, 
  rbt_ops = 0xffffffff805888c0 <drm_vma_node_rb_ops>, rbt_minmax = {
    0xfffffe81b658e1e8, 0xfffffe817235cde8}}

alas,
(gdb) p uncle
$9 = <optimized out>
(gdb) p which
$10 = <optimized out>
(gdb) p other
$11 = <optimized out>

stack trace:

drm_ioctl -> radeon_gem_create_ioctl -> radeon_gem_object_create ->
  radeon_bo_create -> ttm_bo_init -> drm_vma_offset_add ->
  rb_tree_insert_node -> rb_tree_insert_rebalance -> kern_assert

#2  0xffffffff8054e1b3 in kern_assert (
    fmt=fmt@entry=0xffffffff805dc380 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") at ../../../../../../lib/libkern/kern_assert.c:51
#3  0xffffffff8054c4bb in rb_tree_insert_rebalance (rbt=0xffff80000aebbad0, 
    self=0xfffffe80be2719e8)
    at ../../../../../../lib/libkern/../../../common/lib/libc/gen/rb.c:405
#4  rb_tree_insert_node (rbt=rbt@entry=0xffff80000aebbad0, 
    object=object@entry=0xfffffe80be2719b0)
    at ../../../../../../lib/libkern/../../../common/lib/libc/gen/rb.c:301
#5  0xffffffff801bcdfe in drm_vma_offset_add (
    mgr=mgr@entry=0xffff80000aebbac8, node=node@entry=0xfffffe80be2719b0, 
    npages=2) at ../../../../external/bsd/drm2/drm/drm_vma_manager.c:180
#6  0xffffffff8047caf9 in ttm_bo_init (bdev=bdev@entry=0xffff80000aebb760, 
    bo=bo@entry=0xfffffe80be271858, size=size@entry=8192, 
    type=type@entry=ttm_bo_type_device, 
    placement=placement@entry=0xfffffe80be271830, 
    page_alignment=page_alignment@entry=1, 
    interruptible=interruptible@entry=true, 
    persistent_swap_storage=persistent_swap_storage@entry=0x0, 
    acc_size=acc_size@entry=9344, sg=sg@entry=0x0, 
    destroy=destroy@entry=0xffffffff803afc91 <radeon_ttm_bo_destroy>)
    at ../../../../external/bsd/drm2/dist/drm/ttm/ttm_bo.c:1207
#7  0xffffffff803affde in radeon_bo_create (
    rdev=rdev@entry=0xffff80000aebb000, size=size@entry=8192, 
    byte_align=byte_align@entry=4096, kernel=kernel@entry=false, 
    domain=domain@entry=2, sg=sg@entry=0x0, 
    bo_ptr=bo_ptr@entry=0xfffffe80095a7cf0)
    at ../../../../external/bsd/drm2/dist/drm/radeon/radeon_object.c:193
#8  0xffffffff803a59bf in radeon_gem_object_create (
    rdev=rdev@entry=0xffff80000aebb000, size=8192, alignment=4096, 
    initial_domain=2, discardable=discardable@entry=false, 
    kernel=kernel@entry=false, obj=obj@entry=0xfffffe80095a7d50)
    at ../../../../external/bsd/drm2/dist/drm/radeon/radeon_gem.c:69
#9  0xffffffff803a5ead in radeon_gem_create_ioctl (dev=<optimized out>, 
    data=0xfffffe80095a7df8, filp=0xfffffe8181d89048)
    at ../../../../external/bsd/drm2/dist/drm/radeon/radeon_gem.c:258
#10 0xffffffff801ac77a in drm_ioctl (fp=<optimized out>, cmd=<optimized out>, 
    data=0xfffffe80095a7df8) at ../../../../external/bsd/drm2/drm/drm_drv.c:673
  :


radeon0 at pci1 dev 0 function 0: vendor 1002 product 9498 (rev. 0x00)
  :
drm: initializing kernel modesetting (RV730 0x1002:0x9498 0x1787:0x2009).
drm: register mmio base: 0xfbef0000
drm: register mmio size: 65536
drm kern info: ATOM BIOS: RV730PRO
radeon0: info: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
radeon0: info: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
drm: Detected VRAM RAM=400M, BAR=256M
drm: RAM width 128bits DDR
Zone  kernel: Available graphics memory: 2157688 kiB
Zone   dma32: Available graphics memory: 2097152 kiB
drm: radeon: 1024M of VRAM memory ready
drm: radeon: 1024M of GTT memory ready.
drm: Loading RV730 Microcode
drm: Internal thermal controller without fan control
drm: radeon: dpm initialized
drm: GART: num cpu pages 262144, num gpu pages 262144
drm: PCIE GART of 1024M enabled (table at 0x000000000025D000).
radeon0: info: WB enabled
radeon0: info: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0x0xffff80006cd3cc00
radeon0: info: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0x0xffff80006cd3cc0c
radeon0: info: fence driver on ring 5 use gpu addr 0x000000000005c598 and cpu addr 0x0xffff80006c93a598
drm: Supports vblank timestamp caching Rev 2 (21.10.2013).
drm: Driver supports precise vblank timestamp query.
radeon0: interrupting at ioapic0 pin 16 (radeon)
drm: radeon: irq initialized.
drm: ring test on 0 succeeded in 1 usecs
drm: ring test on 3 succeeded in 1 usecs
drm: ring test on 5 succeeded in 1 usecs
drm: UVD initialized successfully.
drm: ib test on ring 0 succeeded in 0 usecs
drm: ib test on ring 3 succeeded in 0 usecs
drm: ib test on ring 5 succeeded
drm: Radeon Display Connectors
drm: Connector 0:
drm:   HDMI-A-1
drm:   HPD2
drm:   DDC: 0x7f10 0x7f10 0x7f14 0x7f14 0x7f18 0x7f18 0x7f1c 0x7f1c
drm:   Encoders:
drm:     DFP2: INTERNAL_UNIPHY1
drm: Connector 1:
drm:   VGA-1
drm:   DDC: 0x7e60 0x7e60 0x7e64 0x7e64 0x7e68 0x7e68 0x7e6c 0x7e6c
drm:   Encoders:
drm:     CRT2: INTERNAL_KLDSCP_DAC2
drm: Connector 2:
drm:   DVI-I-1
drm:   HPD1
drm:   DDC: 0x7e20 0x7e20 0x7e24 0x7e24 0x7e28 0x7e28 0x7e2c 0x7e2c
drm:   Encoders:
drm:     CRT1: INTERNAL_KLDSCP_DAC1
drm:     DFP1: INTERNAL_UNIPHY
radeondrmkmsfb0 at radeon0
radeon0: info: registered panic notifier
radeondrmkmsfb0: framebuffer at 0xffff80006cf5e000, size 1600x1200, depth 32, stride 6400
WARNING: splash_render: not initialized
wsdisplay0 at radeondrmkmsfb0 kbdmux 1: console (default, vt100 emulation), using wskbd0
wsmux1: connecting to wsdisplay0

>How-To-Repeat:

as above

>Fix:

Dunno. I have a crash dump so let me know if there's anything else I
should extract.

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->riastradh
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Wed, 21 Oct 2015 14:17:05 +0000
Responsible-Changed-Why:
mine

Sounds like unrelated kernel memory corruption from drmkms.
Not the first time we've had a weird problem arising from the
drm_vma_manager business due to a stupid mistake I made in
reimplementing it!


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/50349 (radeondrmkms vt-switching crash)
Date: Sat, 14 May 2016 17:58:27 +0000

 On Wed, Oct 21, 2015 at 02:17:05PM +0000, riastradh@NetBSD.org wrote:
  > Sounds like unrelated kernel memory corruption from drmkms.
  > Not the first time we've had a weird problem arising from the
  > drm_vma_manager business due to a stupid mistake I made in
  > reimplementing it!

 So after not coming up for a long time, this struck again on
 Thursday... twice. This was with a kernel and userland from March 4.

 On the plus side, given what triggered it the second time I may be
 able to trigger it on purpose, so let me know if there's anything you
 want me to try.

 Otherwise I'll update the kernel and hope :-)

 -- 
 David A. Holland
 dholland@netbsd.org

From: coypu@SDF.ORG
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/50349: radeondrmkms vt-switching crash
Date: Sat, 19 Nov 2016 14:41:35 +0000

 Hi,

 In sys/external/bsd/drm2/drm/drm_vma_manager.c:drm_vma_offset_add, we
 use rw_enter(&node->von_lock, RW_WRITER); as preventing races.

 in sys/external/bsd/drm2/dist/drm/drm_vma_manager.c:drm_vma_offset_add
 linux uses write_lock(&mgr->vm_lock); 

 Isn't this the wrong lock? should we lock mgr?

 Thanks.

From: "Maya Rashish" <maya@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/50349 CVS commit: src/sys/external/bsd/drm2/drm
Date: Sat, 19 Nov 2016 17:19:59 +0000

 Module Name:	src
 Committed By:	maya
 Date:		Sat Nov 19 17:19:59 UTC 2016

 Modified Files:
 	src/sys/external/bsd/drm2/drm: drm_vma_manager.c

 Log Message:
 Lock the manager and not just the node for inserting/removing nodes

 should fix/help PR kern/50349: radeondrmkms vt-switching crash

 ok riastradh


 To generate a diff of this commit:
 cvs rdiff -u -r1.4 -r1.5 src/sys/external/bsd/drm2/drm/drm_vma_manager.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->pending-pullups
State-Changed-By: maya@NetBSD.org
State-Changed-When: Wed, 23 Nov 2016 22:16:38 +0000
State-Changed-Why:
#1277


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/50349 CVS commit: [netbsd-7-0] src/sys/external/bsd/drm2/drm
Date: Sat, 3 Dec 2016 12:23:57 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sat Dec  3 12:23:57 UTC 2016

 Modified Files:
 	src/sys/external/bsd/drm2/drm [netbsd-7-0]: drm_vma_manager.c

 Log Message:
 Pull up following revision(s) (requested by maya in ticket #1277):
 	sys/external/bsd/drm2/drm/drm_vma_manager.c: revision 1.5
 Lock the manager and not just the node for inserting/removing nodes
 should fix/help PR kern/50349: radeondrmkms vt-switching crash
 ok riastradh


 To generate a diff of this commit:
 cvs rdiff -u -r1.1.4.2 -r1.1.4.2.2.1 \
     src/sys/external/bsd/drm2/drm/drm_vma_manager.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/50349 CVS commit: [netbsd-7] src/sys/external/bsd/drm2/drm
Date: Sat, 3 Dec 2016 12:24:50 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sat Dec  3 12:24:50 UTC 2016

 Modified Files:
 	src/sys/external/bsd/drm2/drm [netbsd-7]: drm_vma_manager.c

 Log Message:
 Pull up following revision(s) (requested by maya in ticket #1277):
 	sys/external/bsd/drm2/drm/drm_vma_manager.c: revision 1.5
 Lock the manager and not just the node for inserting/removing nodes
 should fix/help PR kern/50349: radeondrmkms vt-switching crash
 ok riastradh


 To generate a diff of this commit:
 cvs rdiff -u -r1.1.4.2 -r1.1.4.3 \
     src/sys/external/bsd/drm2/drm/drm_vma_manager.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: maya@NetBSD.org
State-Changed-When: Sat, 03 Dec 2016 12:51:19 +0000
State-Changed-Why:
pullup complete - we can reopen if I misunderstood and the problem is not resolved.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.