NetBSD Problem Report #44958

From hauke@Espresso.Rhein-Neckar.DE  Thu May 12 19:57:09 2011
Return-Path: <hauke@Espresso.Rhein-Neckar.DE>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 0090E63B95D
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 12 May 2011 19:57:08 +0000 (UTC)
Message-Id: <201105121936.p4CJaqOu005047@pizza.causeuse.org>
Date: Thu, 12 May 2011 21:36:52 +0200 (CEST)
From: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
Reply-To: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
To: gnats-bugs@gnats.NetBSD.org
Cc: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
Subject: dig(9) busy-loops
X-Send-Pr-Version: 3.95

>Number:         44958
>Category:       bin
>Synopsis:       dig(8) busy-loops
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu May 12 20:00:00 +0000 2011
>Last-Modified:  Mon May 16 15:00:03 +0000 2011
>Originator:     Hauke Fath
>Release:        NetBSD 5.99.51
>Organization:
Falling Raindrops
>Environment:


System: NetBSD pizza.causeuse.org 5.99.51 NetBSD 5.99.51 (PIZZA_PF) #0: Thu May 12 18:43:40 CEST 2011 hf@Hochstuhl:/var/obj/netbsd-builds/developer/sparc/sys/arch/sparc/compile/PIZZA_PF sparc
Architecture: sparc
Machine: sparc
>Description:

	On a -current NetBSD/sparc mp installation, dig(8) busy-loops
	at 100% cpu, and has to be 'kill -9'ed. This happens
	frequently when dig is called from an ifwatchd(8) script, and
	less so when called interactively.

	I've categorized this 'bin', although it might actually be
	sparc-mp related.

% dig pizza.causeuse.org. @130.83.197.9

; <<>> DiG 9.8.0rc1 <<>> pizza.causeuse.org. @130.83.197.9
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29592
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;pizza.causeuse.org.            IN      A

;; ANSWER SECTION:
pizza.causeuse.org.     3600    IN      A       195.4.78.18

;; AUTHORITY SECTION:
causeuse.org.           3600    IN      NS      ns.causeuse.org.
causeuse.org.           3600    IN      NS      bounce.nt.e-technik.tu-darmstadt.de.

;; ADDITIONAL SECTION:
ns.causeuse.org.        3600    IN      A       130.83.197.9
bounce.nt.e-technik.tu-darmstadt.de. 22770 IN A 130.83.197.1

;; Query time: 108 msec
;; SERVER: 130.83.197.9#53(130.83.197.9)
;; WHEN: Thu May 12 21:03:02 2011
;; MSG SIZE  rcvd: 150

load: 0.24  cmd: dig 2655 [iowait 0x4004bd4c/0] 3.94u 9.60s 47% 4380k
load: 0.30  cmd: dig 2655 [iowait 0x4004bd4c/0] 4.98u 11.64s 54% 4380k
load: 0.30  cmd: dig 2655 [iowait 0x4004bd4c/0] 5.46u 13.18s 58% 4380k
^C^C^C^C^CKilled
%

	Building external/bsd/bind with NAMED_DEBUG=1 does not work
	currently because of a missing library, so I cannot provide
	debugger information. I'll try to come up with a ktrace.

>How-To-Repeat:

	Run something like dig pizza.causeuse.org. @130.83.197.9 on -current.

>Fix:
	No idea. I'll try building with NAMED_USE_PTHREADS=no as a
	workaround.

>Audit-Trail:
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org
Subject: Re: bin/44958: dig(9) busy-loops
Date: Thu, 12 May 2011 23:55:04 +0200

 At 20:00 Uhr +0000 12.05.2011, Hauke Fath wrote:
 >	Building external/bsd/bind with NAMED_DEBUG=1 does not work
 >	currently because of a missing library, so I cannot provide
 >	debugger information. I'll try to come up with a ktrace.

 The ktruss output produced by the following script

 <snip>
 #!/bin/sh
 #
 # ktruss dig(8)

 _trussdir="/var/tmp/dig.ktruss"

 mkdir -p ${_trussdir}
 cd ${_trussdir}

 ktruss -di -o ${_trussdir}/ktruss.$$ \
     /usr/bin/dig pizza.causeuse.org. @130.83.197.9 \
     > ${_trussdir}/ktruss.out.$$
 </snip>

 can be found here (15 K)
 (<http://bounce.nt.e-technik.tu-darmstadt.de/~hf/netbsd/pr44958/pr44958-dig-ktruss.out.gz>.

 	hauke

 -- 
      The ASCII Ribbon Campaign                    Hauke Fath
 ()     No HTML/RTF in email            Institut für Nachrichtentechnik
 /\     No Word docs in email                     TU Darmstadt
      Respect for open standards              Ruf +49-6151-16-3281

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/44958: dig(9) busy-loops
Date: Sat, 14 May 2011 06:42:37 +0000

 On Thu, May 12, 2011 at 08:00:01PM +0000, Hauke Fath wrote:
  > 	On a -current NetBSD/sparc mp installation, dig(8) busy-loops
  > 	at 100% cpu, and has to be 'kill -9'ed. This happens
  > 	frequently when dig is called from an ifwatchd(8) script, and
  > 	less so when called interactively.
  > 
  > 	I've categorized this 'bin', although it might actually be
  > 	sparc-mp related.

 I think it's pthreads-related; dig and the other pieces of bind are
 multithreaded for some reason and I vaguely recall that there is or
 recently has been a known issue with sparc and threads.

 The same thing was happening on amd64 at one point last year but it's
 been fixed since.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
Subject: Re: bin/44958: dig(9) busy-loops
Date: Sat, 14 May 2011 22:49:04 +0200

 Can you ^T while dig loops and note the %pc values it displays?
 Can you attach gdb while it loops?

 Martin

From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org
Subject: Re: bin/44958: dig(9) busy-loops
Date: Mon, 16 May 2011 16:51:58 +0200

 At 20:50 Uhr +0000 14.05.2011, Martin Husemann wrote:
 > Can you ^T while dig loops and note the %pc values it displays?
 > Can you attach gdb while it loops?

 I'll try that, as soon as i can...

 Unfortunately, (1) having switched providers I will be off the phone grid
 till June (don't ask...), so nothing to dig(1) from the home ss20, and (2)
 bind does not build with NAMED_DEBUG on -current because of a missing
 library (see
 <http://mail-index.netbsd.org/current-users/2011/05/10/msg016662.html>), so
 no symbols.

 	hauke

 -- 
      The ASCII Ribbon Campaign                    Hauke Fath
 ()     No HTML/RTF in email            Institut für Nachrichtentechnik
 /\     No Word docs in email                     TU Darmstadt
      Respect for open standards              Ruf +49-6151-16-3281

From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org
Subject: Re: bin/44958: dig(9) busy-loops
Date: Mon, 16 May 2011 16:59:06 +0200

 At 20:50 Uhr +0000 14.05.2011, Martin Husemann wrote:
 >Can you ^T while dig loops and note the %pc values it displays?

 ... you mean, like this?

 <snip>
 load: 0.24  cmd: dig 2655 [iowait 0x4004bd4c/0] 3.94u 9.60s 47% 4380k
 load: 0.30  cmd: dig 2655 [iowait 0x4004bd4c/0] 4.98u 11.64s 54% 4380k
 load: 0.30  cmd: dig 2655 [iowait 0x4004bd4c/0] 5.46u 13.18s 58% 4380k
 </snip>

 (from the original PR; the console didn't echo the ^T).

 	hauke

 -- 
      The ASCII Ribbon Campaign                    Hauke Fath
 ()     No HTML/RTF in email            Institut für Nachrichtentechnik
 /\     No Word docs in email                     TU Darmstadt
      Respect for open standards              Ruf +49-6151-16-3281

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.