NetBSD Problem Report #49553
From www@NetBSD.org Sat Jan 10 11:18:15 2015
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 3405BA582D
for <gnats-bugs@gnats.NetBSD.org>; Sat, 10 Jan 2015 11:18:15 +0000 (UTC)
Message-Id: <20150110111813.2829BA654D@mollari.NetBSD.org>
Date: Sat, 10 Jan 2015 11:18:13 +0000 (UTC)
From: prlw1@cam.ac.uk
Reply-To: prlw1@cam.ac.uk
To: gnats-bugs@NetBSD.org
Subject: Xorg hits 100% CPU on shutdown
X-Send-Pr-Version: www-1.0
>Number: 49553
>Category: xsrc
>Synopsis: Xorg hits 100% CPU on shutdown
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: xsrc-manager
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jan 10 11:20:00 +0000 2015
>Closed-Date: Tue Jun 07 14:45:57 +0000 2016
>Last-Modified: Tue Jun 07 14:45:57 +0000 2016
>Originator: Patrick Welche
>Release: NetBSD-7.99.4/amd64 2015-01-05
>Organization:
>Environment:
Seen on ivy bridge, sandy bridge and pineview systems.
>Description:
On shutdown, Xorg loops ad infinitum at line 70 in sna_threads.c
xsrc/external/mit/xf86-video-intel/dist/src/sna/sna_threads.c
68 while (1) {
69 while (t->func == NULL)
70 pthread_cond_wait(&t->cond, &t->mutex);
71 pthread_mutex_unlock(&t->mutex);
72
73 assert(t->func);
74 t->func(t->arg);
75
76 pthread_mutex_lock(&t->mutex);
77 t->arg = NULL;
78 t->func = NULL;
79 pthread_cond_signal(&t->cond);
80 }
In http://mail-index.netbsd.org/current-users/2015/01/09/msg026453.html
Martin reports similar behaviour for radeon r600.
My wooly hunch impression is that I have been seeing this since
revision 1.9
date: 2015-01-01 01:15:43 +0000; author: mrg; state: Exp; lines: +3 -3; co
+mmitid: 1Hjcx4xU3q0p6g4y;
due to hangs seen by several folks, for now revert:
http://mail-index.netbsd.org/source-changes/2014/11/04/msg060120.html
Log Message:
This code should be MP-safe. Use IPL_SCHED in place of IPL_DRM/IPL_VM and set
D_MPSAFE flag in cdevsw.
but I am loathe to revert to before that, as the running system is far more stable since!
>How-To-Repeat:
Run X, and e.g. shutdown -r now
(It seems to happen less often when running xsrc compiled with DBG=-g -O0)
>Fix:
>Release-Note:
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: xsrc/49553: Xorg hits 100% CPU on shutdown
Date: Mon, 12 Jan 2015 09:02:51 +0100
My case may be a bit different, and it also happens on non-DRMKMS kernels.
It all started with the latest Xorg updates.
My Xorg server sits in "a loop" here:
Attaching to process 1656
Reading symbols from /usr/X11R7/bin/Xorg...Reading symbols from /usr/libdata/debug//usr/X11R7/bin/Xorg.debug...done.
[..]
Loaded symbols for /usr/libexec/ld.elf_so
arena_dalloc (ptr=0x7f7ff7b0e140, chunk=0x7f7ff7b00000, arena=0x7f7ff7fdd000)
at /usr/src/lib/libc/stdlib/jemalloc.c:2542
2542 size = bin->reg_size;
(gdb) bt
#0 arena_dalloc (ptr=0x7f7ff7b0e140, chunk=0x7f7ff7b00000,
arena=0x7f7ff7fdd000) at /usr/src/lib/libc/stdlib/jemalloc.c:2542
#1 idalloc (ptr=ptr@entry=0x7f7ff7b0e140)
at /usr/src/lib/libc/stdlib/jemalloc.c:3219
#2 0x00007f7ff42b4153 in free (ptr=0x7f7ff7b0e140)
at /usr/src/lib/libc/stdlib/jemalloc.c:3901
#3 0x000000000055321b in _XSERVTransClose ()
#4 0x000000000054dcb1 in CloseWellKnownConnections ()
#5 0x0000000000548649 in AbortServer ()
#6 0x00000000005489fb in FatalError ()
#7 0x000000000054ba38 in ?? ()
#8 0x00007f7ff429fbe0 in _opendir (name=<optimized out>)
at /usr/src/lib/libc/gen/opendir.c:72
#9 0x000000010000000b in ?? ()
#10 0x0000000000000000 in ?? ()
(gdb) p *bin
Cannot access memory at address 0x1f7064001f6063e
(gdb) p bin
$1 = (arena_bin_t *) 0x1f7064001f6063e
A broken signal handler? Unfortunately the original issue is hidden this
way.
Martin
From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: xsrc/49553 Xorg hits 100% CPU on shutdown
Date: Sun, 8 Feb 2015 01:13:40 +0100
I see a very similar backtrace here, but this is not the first issue.
When you run X in a debugger, or when you disable its SIGSEGV handler
with Option "NoTrapSignals" "true", then you see that the first thing
is a segfault in atexit() where it tries to call a registered handler
at an unmapped address.
But while X is running, the address is mapped to the shared object
gallium_dri.so.0.
Looks like the gallium driver is unloaded without unregistering
(or calling ?) the atexit handler.
Greetings,
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: xsrc/49553 Xorg hits 100% CPU on shutdown
Date: Thu, 19 Feb 2015 09:54:18 +0000
In view of Michael's observations, I have been running X on a sandy bridge
with
Section "ServerFlags"
Option "NoTrapSignals" "true"
EndSection
for the last ten days, but haven't seen the gallium issue, so there seem
to be two different issues?
(Total subjectivity: I have a "feeling" that I see the issue less often
on the sandy bridge and more often on the ivy bridge)
Suggestion from Chris Wilson in
http://lists.freedesktop.org/archives/intel-gfx/2015-January/059088.html
The thread should be destroyed along with the parent process on
termination. I guess your pthreads implementation prevents that? If so,
and that is desirable, you will need to call pthread_kill() in the
Xorg driver destructor - only there is not a suitable callback, so you
would need to hack something into the resource system or add a counter.
State-Changed-From-To: open->feedback
State-Changed-By: snj@NetBSD.org
State-Changed-When: Tue, 02 Jun 2015 20:58:43 +0000
State-Changed-Why:
I can no longer reproduce this. Is it still broken for you?
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: xsrc/49553 (Xorg hits 100% CPU on shutdown)
Date: Sun, 7 Jun 2015 23:23:43 -0500 (CDT)
Not a party to original PR, but interested.
I still see this occasionally on machines which lack any DRM support
at all and on machines which still use (and can use) UMS DRM support.
If there is no X session, I try to stop 'xdm' first and if that hangs
waiting on Xorg to exit, I can hit Ctrl-C and kill Xorg explicitly (need
-9) and then I can shutdown/reboot the machine.
Usually on such machines, adding:
pkill -KILL -f "X :0"
to the 'TakeConsole' script is also needed to get a session to terminate
when the user's session-manager program terminates.
On the few machines (2) I've observed to successfully use KMS DRM, such
hangs no-longer occur.
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc: xsrc-manager@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: xsrc/49553 (Xorg hits 100% CPU on shutdown)
Date: Sun, 14 Jun 2015 08:23:38 +0100
I still see the 100% CPU usage on shutdown on the two computers
I just tested -current/amd64 on: a sandy bridge laptop, and a Dell
Optiplex 620 (Intel 82945G).
Option "NoTrapSignals" "true"
Is a good workaround.
State-Changed-From-To: feedback->open
State-Changed-By: prlw1@NetBSD.org
State-Changed-When: Sat, 20 Jun 2015 10:12:11 +0000
State-Changed-Why:
feedback provided
State-Changed-From-To: open->closed
State-Changed-By: prlw1@NetBSD.org
State-Changed-When: Tue, 07 Jun 2016 14:45:57 +0000
State-Changed-Why:
I can no longer reproduce this.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.