NetBSD Problem Report #34200

From root@polaris.garbled.net  Mon Aug 14 19:01:41 2006
Return-Path: <root@polaris.garbled.net>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 4D54963BA82
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 14 Aug 2006 19:01:41 +0000 (UTC)
Message-Id: <200608141901.k7EJ1dgS014149@polaris.garbled.net>
Date: Mon, 14 Aug 2006 12:01:39 -0700 (MST)
From: Tim Rightnour <root@polaris.garbled.net>
Reply-To: root@polaris.garbled.net
To: gnats-bugs@NetBSD.org
Subject: timed occasionally goes into infinate loop
X-Send-Pr-Version: 3.95

>Number:         34200
>Category:       bin
>Synopsis:       timed occasionally goes into infinate loop
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Aug 14 19:05:00 +0000 2006
>Originator:     Tim Rightnour
>Release:        NetBSD 3.0
>Organization:

>Environment:


System: NetBSD polaris.garbled.net 3.0 NetBSD 3.0 (GENERIC) #0: Mon Dec 19 01:04:02 UTC 2005 builds@works.netbsd.org:/home/builds/ab/netbsd-3-0-RELEASE/i386/200512182024Z-obj/home/builds/ab/netbsd-3-0-RELEASE/src/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
Every few weeks if find that my CPU is pegged on my timed master server. 
Investigation usually shows that timed is eating all the CPU on the box.  I
attempted to debug it a little, and this is what I turned up:

0x0804b4e7 in median ()
(gdb) where
#0  0x0804b4e7 in median ()
#1  0x0804b3b8 in networkdelta ()
#2  0x0804a7f8 in synch ()
#3  0x0804a358 in master ()
#4  0x0804dc86 in main ()
#5  0x080490f6 in ___start ()

It appears to be stuck somewhere in that function looping endlessly.  If I
attach ktrace to it, it never produces any output.  I suspect it's gotten
ahold of some odd values and is trying endlessly to average them.


>How-To-Repeat:
No idea.  I've been running timed for years on this system and never seen this.
Maybe one of my more recently added client machines is triggering it, or it
has to do with the number of boxes on the network?

>Fix:
Not sure, however, I suspect that the following in networkdelta.c:median() is
possibly where we are going wrong:

        for (pass = 1; ; pass++) {      /* loop over the data */

I suppose if you gave it certain values it might never exit that loop.  Unsure
what those values might be.  Maybe we need to put a maximum cap on passes?



>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.