NetBSD Problem Report #49553

From www@NetBSD.org  Sat Jan 10 11:18:15 2015
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 3405BA582D
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 10 Jan 2015 11:18:15 +0000 (UTC)
Message-Id: <20150110111813.2829BA654D@mollari.NetBSD.org>
Date: Sat, 10 Jan 2015 11:18:13 +0000 (UTC)
From: prlw1@cam.ac.uk
Reply-To: prlw1@cam.ac.uk
To: gnats-bugs@NetBSD.org
Subject: Xorg hits 100% CPU on shutdown
X-Send-Pr-Version: www-1.0

>Number:         49553
>Category:       xsrc
>Synopsis:       Xorg hits 100% CPU on shutdown
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    xsrc-manager
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jan 10 11:20:00 +0000 2015
>Closed-Date:    Tue Jun 07 14:45:57 +0000 2016
>Last-Modified:  Tue Jun 07 14:45:57 +0000 2016
>Originator:     Patrick Welche
>Release:        NetBSD-7.99.4/amd64 2015-01-05
>Organization:
>Environment:
Seen on ivy bridge, sandy bridge and pineview systems.
>Description:
On shutdown, Xorg loops ad infinitum at line 70 in sna_threads.c

xsrc/external/mit/xf86-video-intel/dist/src/sna/sna_threads.c

    68          while (1) {
    69                  while (t->func == NULL)
    70                          pthread_cond_wait(&t->cond, &t->mutex);
    71                  pthread_mutex_unlock(&t->mutex);
    72  
    73                  assert(t->func);
    74                  t->func(t->arg);
    75  
    76                  pthread_mutex_lock(&t->mutex);
    77                  t->arg = NULL;
    78                  t->func = NULL;
    79                  pthread_cond_signal(&t->cond);
    80          }

In http://mail-index.netbsd.org/current-users/2015/01/09/msg026453.html
Martin reports similar behaviour for radeon r600.


My wooly hunch impression is that I have been seeing this since

  revision 1.9
  date: 2015-01-01 01:15:43 +0000;  author: mrg;  state: Exp;  lines: +3 -3;  co
+mmitid: 1Hjcx4xU3q0p6g4y;
  due to hangs seen by several folks, for now revert:
  http://mail-index.netbsd.org/source-changes/2014/11/04/msg060120.html

  Log Message:
  This code should be MP-safe. Use IPL_SCHED in place of IPL_DRM/IPL_VM and set
  D_MPSAFE flag in cdevsw.

but I am loathe to revert to before that, as the running system is far more stable since!

>How-To-Repeat:
Run X, and e.g. shutdown -r now
(It seems to happen less often when running xsrc compiled with DBG=-g -O0)
>Fix:

>Release-Note:

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: xsrc/49553: Xorg hits 100% CPU on shutdown
Date: Mon, 12 Jan 2015 09:02:51 +0100

 My case may be a bit different, and it also happens on non-DRMKMS kernels.
 It all started with the latest Xorg updates.

 My Xorg server sits in "a loop" here:

 Attaching to process 1656
 Reading symbols from /usr/X11R7/bin/Xorg...Reading symbols from /usr/libdata/debug//usr/X11R7/bin/Xorg.debug...done.
 [..]
 Loaded symbols for /usr/libexec/ld.elf_so
 arena_dalloc (ptr=0x7f7ff7b0e140, chunk=0x7f7ff7b00000, arena=0x7f7ff7fdd000)
     at /usr/src/lib/libc/stdlib/jemalloc.c:2542
 2542			size = bin->reg_size;
 (gdb) bt
 #0  arena_dalloc (ptr=0x7f7ff7b0e140, chunk=0x7f7ff7b00000, 
     arena=0x7f7ff7fdd000) at /usr/src/lib/libc/stdlib/jemalloc.c:2542
 #1  idalloc (ptr=ptr@entry=0x7f7ff7b0e140)
     at /usr/src/lib/libc/stdlib/jemalloc.c:3219
 #2  0x00007f7ff42b4153 in free (ptr=0x7f7ff7b0e140)
     at /usr/src/lib/libc/stdlib/jemalloc.c:3901
 #3  0x000000000055321b in _XSERVTransClose ()
 #4  0x000000000054dcb1 in CloseWellKnownConnections ()
 #5  0x0000000000548649 in AbortServer ()
 #6  0x00000000005489fb in FatalError ()
 #7  0x000000000054ba38 in ?? ()
 #8  0x00007f7ff429fbe0 in _opendir (name=<optimized out>)
     at /usr/src/lib/libc/gen/opendir.c:72
 #9  0x000000010000000b in ?? ()
 #10 0x0000000000000000 in ?? ()
 (gdb) p *bin
 Cannot access memory at address 0x1f7064001f6063e
 (gdb) p bin
 $1 = (arena_bin_t *) 0x1f7064001f6063e


 A broken signal handler? Unfortunately the original issue is hidden this
 way.

 Martin

From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: xsrc/49553 Xorg hits 100% CPU on shutdown
Date: Sun, 8 Feb 2015 01:13:40 +0100

 I see a very similar backtrace here, but this is not the first issue.

 When you run X in a debugger, or when you disable its SIGSEGV handler
 with Option "NoTrapSignals" "true", then you see that the first thing
 is a segfault in atexit() where it tries to call a registered handler
 at an unmapped address.

 But while X is running, the address is mapped to the shared object
 gallium_dri.so.0.

 Looks like the gallium driver is unloaded without unregistering
 (or calling ?) the atexit handler.


 Greetings,
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: xsrc/49553 Xorg hits 100% CPU on shutdown
Date: Thu, 19 Feb 2015 09:54:18 +0000

 In view of Michael's observations, I have been running X on a sandy bridge
 with

   Section "ServerFlags"
           Option          "NoTrapSignals" "true"
   EndSection

 for the last ten days, but haven't seen the gallium issue, so there seem
 to be two different issues?

 (Total subjectivity: I have a "feeling" that I see the issue less often
 on the sandy bridge and more often on the ivy bridge)

 Suggestion from Chris Wilson in
 http://lists.freedesktop.org/archives/intel-gfx/2015-January/059088.html

   The thread should be destroyed along with the parent process on
   termination. I guess your pthreads implementation prevents that? If so,
   and that is desirable, you will need to call pthread_kill() in the
   Xorg driver destructor - only there is not a suitable callback, so you
   would need to hack something into the resource system or add a counter.

State-Changed-From-To: open->feedback
State-Changed-By: snj@NetBSD.org
State-Changed-When: Tue, 02 Jun 2015 20:58:43 +0000
State-Changed-Why:
I can no longer reproduce this.  Is it still broken for you?


From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: xsrc/49553 (Xorg hits 100% CPU on shutdown)
Date: Sun, 7 Jun 2015 23:23:43 -0500 (CDT)

 Not a party to original PR, but interested.

 I still see this occasionally on machines which lack any DRM support
 at all and on machines which still use (and can use) UMS DRM support.

 If there is no X session, I try to stop 'xdm' first and if that hangs
 waiting on Xorg to exit, I can hit Ctrl-C and kill Xorg explicitly (need
 -9) and then I can shutdown/reboot the machine.

 Usually on such machines, adding:

   pkill -KILL -f "X :0"

 to the 'TakeConsole' script is also needed to get a session to terminate
 when the user's session-manager program terminates.

 On the few machines (2) I've observed to successfully use KMS DRM, such
 hangs no-longer occur.

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc: xsrc-manager@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: xsrc/49553 (Xorg hits 100% CPU on shutdown)
Date: Sun, 14 Jun 2015 08:23:38 +0100

 I still see the 100% CPU usage on shutdown  on the two computers
 I just tested -current/amd64 on: a sandy bridge laptop, and a Dell
 Optiplex 620 (Intel 82945G).

         Option          "NoTrapSignals" "true"

 Is a good workaround.

State-Changed-From-To: feedback->open
State-Changed-By: prlw1@NetBSD.org
State-Changed-When: Sat, 20 Jun 2015 10:12:11 +0000
State-Changed-Why:
feedback provided


State-Changed-From-To: open->closed
State-Changed-By: prlw1@NetBSD.org
State-Changed-When: Tue, 07 Jun 2016 14:45:57 +0000
State-Changed-Why:
I can no longer reproduce this.


>Unformatted:
Home
PR Database Search
(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.