NetBSD Problem Report #44958
From hauke@Espresso.Rhein-Neckar.DE Thu May 12 19:57:09 2011
Return-Path: <hauke@Espresso.Rhein-Neckar.DE>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 0090E63B95D
for <gnats-bugs@gnats.NetBSD.org>; Thu, 12 May 2011 19:57:08 +0000 (UTC)
Message-Id: <201105121936.p4CJaqOu005047@pizza.causeuse.org>
Date: Thu, 12 May 2011 21:36:52 +0200 (CEST)
From: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
Reply-To: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
To: gnats-bugs@gnats.NetBSD.org
Cc: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
Subject: dig(9) busy-loops
X-Send-Pr-Version: 3.95
>Number: 44958
>Category: bin
>Synopsis: dig(8) busy-loops
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu May 12 20:00:00 +0000 2011
>Last-Modified: Mon May 16 15:00:03 +0000 2011
>Originator: Hauke Fath
>Release: NetBSD 5.99.51
>Organization:
Falling Raindrops
>Environment:
System: NetBSD pizza.causeuse.org 5.99.51 NetBSD 5.99.51 (PIZZA_PF) #0: Thu May 12 18:43:40 CEST 2011 hf@Hochstuhl:/var/obj/netbsd-builds/developer/sparc/sys/arch/sparc/compile/PIZZA_PF sparc
Architecture: sparc
Machine: sparc
>Description:
On a -current NetBSD/sparc mp installation, dig(8) busy-loops
at 100% cpu, and has to be 'kill -9'ed. This happens
frequently when dig is called from an ifwatchd(8) script, and
less so when called interactively.
I've categorized this 'bin', although it might actually be
sparc-mp related.
% dig pizza.causeuse.org. @130.83.197.9
; <<>> DiG 9.8.0rc1 <<>> pizza.causeuse.org. @130.83.197.9
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29592
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;pizza.causeuse.org. IN A
;; ANSWER SECTION:
pizza.causeuse.org. 3600 IN A 195.4.78.18
;; AUTHORITY SECTION:
causeuse.org. 3600 IN NS ns.causeuse.org.
causeuse.org. 3600 IN NS bounce.nt.e-technik.tu-darmstadt.de.
;; ADDITIONAL SECTION:
ns.causeuse.org. 3600 IN A 130.83.197.9
bounce.nt.e-technik.tu-darmstadt.de. 22770 IN A 130.83.197.1
;; Query time: 108 msec
;; SERVER: 130.83.197.9#53(130.83.197.9)
;; WHEN: Thu May 12 21:03:02 2011
;; MSG SIZE rcvd: 150
load: 0.24 cmd: dig 2655 [iowait 0x4004bd4c/0] 3.94u 9.60s 47% 4380k
load: 0.30 cmd: dig 2655 [iowait 0x4004bd4c/0] 4.98u 11.64s 54% 4380k
load: 0.30 cmd: dig 2655 [iowait 0x4004bd4c/0] 5.46u 13.18s 58% 4380k
^C^C^C^C^CKilled
%
Building external/bsd/bind with NAMED_DEBUG=1 does not work
currently because of a missing library, so I cannot provide
debugger information. I'll try to come up with a ktrace.
>How-To-Repeat:
Run something like dig pizza.causeuse.org. @130.83.197.9 on -current.
>Fix:
No idea. I'll try building with NAMED_USE_PTHREADS=no as a
workaround.
>Audit-Trail:
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org
Subject: Re: bin/44958: dig(9) busy-loops
Date: Thu, 12 May 2011 23:55:04 +0200
At 20:00 Uhr +0000 12.05.2011, Hauke Fath wrote:
> Building external/bsd/bind with NAMED_DEBUG=1 does not work
> currently because of a missing library, so I cannot provide
> debugger information. I'll try to come up with a ktrace.
The ktruss output produced by the following script
<snip>
#!/bin/sh
#
# ktruss dig(8)
_trussdir="/var/tmp/dig.ktruss"
mkdir -p ${_trussdir}
cd ${_trussdir}
ktruss -di -o ${_trussdir}/ktruss.$$ \
/usr/bin/dig pizza.causeuse.org. @130.83.197.9 \
> ${_trussdir}/ktruss.out.$$
</snip>
can be found here (15 K)
(<http://bounce.nt.e-technik.tu-darmstadt.de/~hf/netbsd/pr44958/pr44958-dig-ktruss.out.gz>.
hauke
--
The ASCII Ribbon Campaign Hauke Fath
() No HTML/RTF in email Institut für Nachrichtentechnik
/\ No Word docs in email TU Darmstadt
Respect for open standards Ruf +49-6151-16-3281
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/44958: dig(9) busy-loops
Date: Sat, 14 May 2011 06:42:37 +0000
On Thu, May 12, 2011 at 08:00:01PM +0000, Hauke Fath wrote:
> On a -current NetBSD/sparc mp installation, dig(8) busy-loops
> at 100% cpu, and has to be 'kill -9'ed. This happens
> frequently when dig is called from an ifwatchd(8) script, and
> less so when called interactively.
>
> I've categorized this 'bin', although it might actually be
> sparc-mp related.
I think it's pthreads-related; dig and the other pieces of bind are
multithreaded for some reason and I vaguely recall that there is or
recently has been a known issue with sparc and threads.
The same thing was happening on amd64 at one point last year but it's
been fixed since.
--
David A. Holland
dholland@netbsd.org
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
Subject: Re: bin/44958: dig(9) busy-loops
Date: Sat, 14 May 2011 22:49:04 +0200
Can you ^T while dig loops and note the %pc values it displays?
Can you attach gdb while it loops?
Martin
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org
Subject: Re: bin/44958: dig(9) busy-loops
Date: Mon, 16 May 2011 16:51:58 +0200
At 20:50 Uhr +0000 14.05.2011, Martin Husemann wrote:
> Can you ^T while dig loops and note the %pc values it displays?
> Can you attach gdb while it loops?
I'll try that, as soon as i can...
Unfortunately, (1) having switched providers I will be off the phone grid
till June (don't ask...), so nothing to dig(1) from the home ss20, and (2)
bind does not build with NAMED_DEBUG on -current because of a missing
library (see
<http://mail-index.netbsd.org/current-users/2011/05/10/msg016662.html>), so
no symbols.
hauke
--
The ASCII Ribbon Campaign Hauke Fath
() No HTML/RTF in email Institut für Nachrichtentechnik
/\ No Word docs in email TU Darmstadt
Respect for open standards Ruf +49-6151-16-3281
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org
Subject: Re: bin/44958: dig(9) busy-loops
Date: Mon, 16 May 2011 16:59:06 +0200
At 20:50 Uhr +0000 14.05.2011, Martin Husemann wrote:
>Can you ^T while dig loops and note the %pc values it displays?
... you mean, like this?
<snip>
load: 0.24 cmd: dig 2655 [iowait 0x4004bd4c/0] 3.94u 9.60s 47% 4380k
load: 0.30 cmd: dig 2655 [iowait 0x4004bd4c/0] 4.98u 11.64s 54% 4380k
load: 0.30 cmd: dig 2655 [iowait 0x4004bd4c/0] 5.46u 13.18s 58% 4380k
</snip>
(from the original PR; the console didn't echo the ^T).
hauke
--
The ASCII Ribbon Campaign Hauke Fath
() No HTML/RTF in email Institut für Nachrichtentechnik
/\ No Word docs in email TU Darmstadt
Respect for open standards Ruf +49-6151-16-3281
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.