NetBSD Problem Report #50349
From dholland@macaran.eecs.harvard.edu Wed Oct 21 01:27:12 2015
Return-Path: <dholland@macaran.eecs.harvard.edu>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id BD242A6531
for <gnats-bugs@gnats.NetBSD.org>; Wed, 21 Oct 2015 01:27:12 +0000 (UTC)
Message-Id: <20151021000734.0C22C6E246@macaran.eecs.harvard.edu>
Date: Tue, 20 Oct 2015 20:07:33 -0400 (EDT)
From: dholland@eecs.harvard.edu
Reply-To: dholland@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: radeondrmkms vt-switching crash
X-Send-Pr-Version: 3.95
>Number: 50349
>Category: kern
>Synopsis: radeondrmkms vt-switching crash
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: riastradh
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Oct 21 01:30:00 +0000 2015
>Closed-Date: Sat Dec 03 12:51:19 +0000 2016
>Last-Modified: Sat Dec 03 12:51:19 +0000 2016
>Originator: David A. Holland
>Release: NetBSD 7.99.20 (20150727)
>Organization:
>Environment:
System: NetBSD macaran 7.99.20 NetBSD 7.99.20 (MACARAN) #30: Mon Jul 27 20:25:15 EDT 2015 dholland@macaran:/usr/src/sys/arch/amd64/compile/MACARAN amd64
Architecture: x86_64
Machine: amd64
>Description:
Switching VTs a lot (especially between two X servers) seems to
eventually crash. With three X servers going it feels like it happens
faster but that may just be from switching more.
The crash is this assertion from common/lib/libc/gen/rb.c:405:
KASSERT(RB_BLACK_P(grandpa));
(gdb) p grandpa
$1 = (struct rb_node *) 0xfffffe809c7499e8
(gdb) p/x *grandpa
$3 = {rb_nodes = {0xfffffe800efd05e8, 0xfffffe80be2719e8},
rb_info = 0xfffffe818b4521ea}
curiously, grandpa->rb_info & RB_RED is 0... oh, I bet it's an earlier
value of grandpa and the value assigned on the previous line hasn't
shown through to gdb yet.
grandpa = RB_FATHER(father);
(gdb) p father
$4 = (struct rb_node *) 0xfffffe807aa289e8
(gdb) p *father
$5 = {rb_nodes = {0x0, 0xfffffe80be2719e8}, rb_info = 3}
meaning that grandpa would be NULL.
however,
#define RB_BLACK_P(rb) \
(RB_SENTINEL_P(rb) || ((rb)->rb_info & RB_FLAG_RED) == 0)
and
#define RB_SENTINEL_P(rb) ((rb) == NULL)
so theoretically it should still pass the assertion... I dunno what's
going on.
Other locals:
(gdb) p self
$6 = (struct rb_node *) 0xfffffe80be2719e8
(gdb) p/x *self
$8 = {rb_nodes = {0x0, 0x0}, rb_info = 0xfffffe809c7499eb}
(gdb) p *rbt
$13 = {rbt_root = 0xfffffe8124977de8,
rbt_ops = 0xffffffff805888c0 <drm_vma_node_rb_ops>, rbt_minmax = {
0xfffffe81b658e1e8, 0xfffffe817235cde8}}
alas,
(gdb) p uncle
$9 = <optimized out>
(gdb) p which
$10 = <optimized out>
(gdb) p other
$11 = <optimized out>
stack trace:
drm_ioctl -> radeon_gem_create_ioctl -> radeon_gem_object_create ->
radeon_bo_create -> ttm_bo_init -> drm_vma_offset_add ->
rb_tree_insert_node -> rb_tree_insert_rebalance -> kern_assert
#2 0xffffffff8054e1b3 in kern_assert (
fmt=fmt@entry=0xffffffff805dc380 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") at ../../../../../../lib/libkern/kern_assert.c:51
#3 0xffffffff8054c4bb in rb_tree_insert_rebalance (rbt=0xffff80000aebbad0,
self=0xfffffe80be2719e8)
at ../../../../../../lib/libkern/../../../common/lib/libc/gen/rb.c:405
#4 rb_tree_insert_node (rbt=rbt@entry=0xffff80000aebbad0,
object=object@entry=0xfffffe80be2719b0)
at ../../../../../../lib/libkern/../../../common/lib/libc/gen/rb.c:301
#5 0xffffffff801bcdfe in drm_vma_offset_add (
mgr=mgr@entry=0xffff80000aebbac8, node=node@entry=0xfffffe80be2719b0,
npages=2) at ../../../../external/bsd/drm2/drm/drm_vma_manager.c:180
#6 0xffffffff8047caf9 in ttm_bo_init (bdev=bdev@entry=0xffff80000aebb760,
bo=bo@entry=0xfffffe80be271858, size=size@entry=8192,
type=type@entry=ttm_bo_type_device,
placement=placement@entry=0xfffffe80be271830,
page_alignment=page_alignment@entry=1,
interruptible=interruptible@entry=true,
persistent_swap_storage=persistent_swap_storage@entry=0x0,
acc_size=acc_size@entry=9344, sg=sg@entry=0x0,
destroy=destroy@entry=0xffffffff803afc91 <radeon_ttm_bo_destroy>)
at ../../../../external/bsd/drm2/dist/drm/ttm/ttm_bo.c:1207
#7 0xffffffff803affde in radeon_bo_create (
rdev=rdev@entry=0xffff80000aebb000, size=size@entry=8192,
byte_align=byte_align@entry=4096, kernel=kernel@entry=false,
domain=domain@entry=2, sg=sg@entry=0x0,
bo_ptr=bo_ptr@entry=0xfffffe80095a7cf0)
at ../../../../external/bsd/drm2/dist/drm/radeon/radeon_object.c:193
#8 0xffffffff803a59bf in radeon_gem_object_create (
rdev=rdev@entry=0xffff80000aebb000, size=8192, alignment=4096,
initial_domain=2, discardable=discardable@entry=false,
kernel=kernel@entry=false, obj=obj@entry=0xfffffe80095a7d50)
at ../../../../external/bsd/drm2/dist/drm/radeon/radeon_gem.c:69
#9 0xffffffff803a5ead in radeon_gem_create_ioctl (dev=<optimized out>,
data=0xfffffe80095a7df8, filp=0xfffffe8181d89048)
at ../../../../external/bsd/drm2/dist/drm/radeon/radeon_gem.c:258
#10 0xffffffff801ac77a in drm_ioctl (fp=<optimized out>, cmd=<optimized out>,
data=0xfffffe80095a7df8) at ../../../../external/bsd/drm2/drm/drm_drv.c:673
:
radeon0 at pci1 dev 0 function 0: vendor 1002 product 9498 (rev. 0x00)
:
drm: initializing kernel modesetting (RV730 0x1002:0x9498 0x1787:0x2009).
drm: register mmio base: 0xfbef0000
drm: register mmio size: 65536
drm kern info: ATOM BIOS: RV730PRO
radeon0: info: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
radeon0: info: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
drm: Detected VRAM RAM=400M, BAR=256M
drm: RAM width 128bits DDR
Zone kernel: Available graphics memory: 2157688 kiB
Zone dma32: Available graphics memory: 2097152 kiB
drm: radeon: 1024M of VRAM memory ready
drm: radeon: 1024M of GTT memory ready.
drm: Loading RV730 Microcode
drm: Internal thermal controller without fan control
drm: radeon: dpm initialized
drm: GART: num cpu pages 262144, num gpu pages 262144
drm: PCIE GART of 1024M enabled (table at 0x000000000025D000).
radeon0: info: WB enabled
radeon0: info: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0x0xffff80006cd3cc00
radeon0: info: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0x0xffff80006cd3cc0c
radeon0: info: fence driver on ring 5 use gpu addr 0x000000000005c598 and cpu addr 0x0xffff80006c93a598
drm: Supports vblank timestamp caching Rev 2 (21.10.2013).
drm: Driver supports precise vblank timestamp query.
radeon0: interrupting at ioapic0 pin 16 (radeon)
drm: radeon: irq initialized.
drm: ring test on 0 succeeded in 1 usecs
drm: ring test on 3 succeeded in 1 usecs
drm: ring test on 5 succeeded in 1 usecs
drm: UVD initialized successfully.
drm: ib test on ring 0 succeeded in 0 usecs
drm: ib test on ring 3 succeeded in 0 usecs
drm: ib test on ring 5 succeeded
drm: Radeon Display Connectors
drm: Connector 0:
drm: HDMI-A-1
drm: HPD2
drm: DDC: 0x7f10 0x7f10 0x7f14 0x7f14 0x7f18 0x7f18 0x7f1c 0x7f1c
drm: Encoders:
drm: DFP2: INTERNAL_UNIPHY1
drm: Connector 1:
drm: VGA-1
drm: DDC: 0x7e60 0x7e60 0x7e64 0x7e64 0x7e68 0x7e68 0x7e6c 0x7e6c
drm: Encoders:
drm: CRT2: INTERNAL_KLDSCP_DAC2
drm: Connector 2:
drm: DVI-I-1
drm: HPD1
drm: DDC: 0x7e20 0x7e20 0x7e24 0x7e24 0x7e28 0x7e28 0x7e2c 0x7e2c
drm: Encoders:
drm: CRT1: INTERNAL_KLDSCP_DAC1
drm: DFP1: INTERNAL_UNIPHY
radeondrmkmsfb0 at radeon0
radeon0: info: registered panic notifier
radeondrmkmsfb0: framebuffer at 0xffff80006cf5e000, size 1600x1200, depth 32, stride 6400
WARNING: splash_render: not initialized
wsdisplay0 at radeondrmkmsfb0 kbdmux 1: console (default, vt100 emulation), using wskbd0
wsmux1: connecting to wsdisplay0
>How-To-Repeat:
as above
>Fix:
Dunno. I have a crash dump so let me know if there's anything else I
should extract.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->riastradh
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Wed, 21 Oct 2015 14:17:05 +0000
Responsible-Changed-Why:
mine
Sounds like unrelated kernel memory corruption from drmkms.
Not the first time we've had a weird problem arising from the
drm_vma_manager business due to a stupid mistake I made in
reimplementing it!
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/50349 (radeondrmkms vt-switching crash)
Date: Sat, 14 May 2016 17:58:27 +0000
On Wed, Oct 21, 2015 at 02:17:05PM +0000, riastradh@NetBSD.org wrote:
> Sounds like unrelated kernel memory corruption from drmkms.
> Not the first time we've had a weird problem arising from the
> drm_vma_manager business due to a stupid mistake I made in
> reimplementing it!
So after not coming up for a long time, this struck again on
Thursday... twice. This was with a kernel and userland from March 4.
On the plus side, given what triggered it the second time I may be
able to trigger it on purpose, so let me know if there's anything you
want me to try.
Otherwise I'll update the kernel and hope :-)
--
David A. Holland
dholland@netbsd.org
From: coypu@SDF.ORG
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/50349: radeondrmkms vt-switching crash
Date: Sat, 19 Nov 2016 14:41:35 +0000
Hi,
In sys/external/bsd/drm2/drm/drm_vma_manager.c:drm_vma_offset_add, we
use rw_enter(&node->von_lock, RW_WRITER); as preventing races.
in sys/external/bsd/drm2/dist/drm/drm_vma_manager.c:drm_vma_offset_add
linux uses write_lock(&mgr->vm_lock);
Isn't this the wrong lock? should we lock mgr?
Thanks.
From: "Maya Rashish" <maya@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/50349 CVS commit: src/sys/external/bsd/drm2/drm
Date: Sat, 19 Nov 2016 17:19:59 +0000
Module Name: src
Committed By: maya
Date: Sat Nov 19 17:19:59 UTC 2016
Modified Files:
src/sys/external/bsd/drm2/drm: drm_vma_manager.c
Log Message:
Lock the manager and not just the node for inserting/removing nodes
should fix/help PR kern/50349: radeondrmkms vt-switching crash
ok riastradh
To generate a diff of this commit:
cvs rdiff -u -r1.4 -r1.5 src/sys/external/bsd/drm2/drm/drm_vma_manager.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->pending-pullups
State-Changed-By: maya@NetBSD.org
State-Changed-When: Wed, 23 Nov 2016 22:16:38 +0000
State-Changed-Why:
#1277
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/50349 CVS commit: [netbsd-7-0] src/sys/external/bsd/drm2/drm
Date: Sat, 3 Dec 2016 12:23:57 +0000
Module Name: src
Committed By: martin
Date: Sat Dec 3 12:23:57 UTC 2016
Modified Files:
src/sys/external/bsd/drm2/drm [netbsd-7-0]: drm_vma_manager.c
Log Message:
Pull up following revision(s) (requested by maya in ticket #1277):
sys/external/bsd/drm2/drm/drm_vma_manager.c: revision 1.5
Lock the manager and not just the node for inserting/removing nodes
should fix/help PR kern/50349: radeondrmkms vt-switching crash
ok riastradh
To generate a diff of this commit:
cvs rdiff -u -r1.1.4.2 -r1.1.4.2.2.1 \
src/sys/external/bsd/drm2/drm/drm_vma_manager.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/50349 CVS commit: [netbsd-7] src/sys/external/bsd/drm2/drm
Date: Sat, 3 Dec 2016 12:24:50 +0000
Module Name: src
Committed By: martin
Date: Sat Dec 3 12:24:50 UTC 2016
Modified Files:
src/sys/external/bsd/drm2/drm [netbsd-7]: drm_vma_manager.c
Log Message:
Pull up following revision(s) (requested by maya in ticket #1277):
sys/external/bsd/drm2/drm/drm_vma_manager.c: revision 1.5
Lock the manager and not just the node for inserting/removing nodes
should fix/help PR kern/50349: radeondrmkms vt-switching crash
ok riastradh
To generate a diff of this commit:
cvs rdiff -u -r1.1.4.2 -r1.1.4.3 \
src/sys/external/bsd/drm2/drm/drm_vma_manager.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: maya@NetBSD.org
State-Changed-When: Sat, 03 Dec 2016 12:51:19 +0000
State-Changed-Why:
pullup complete - we can reopen if I misunderstood and the problem is not resolved.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.