NetBSD Problem Report #51720

From gson@gson.org  Thu Dec 15 17:30:21 2016
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 6970B7A2D6
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 15 Dec 2016 17:30:21 +0000 (UTC)
Message-Id: <20161215173014.D40EC7449D4@guava.gson.org>
Date: Thu, 15 Dec 2016 19:30:14 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: tcpdump uses 100% CPU
X-Send-Pr-Version: 3.95

>Number:         51720
>Category:       bin
>Synopsis:       tcpdump uses 100% CPU
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Dec 15 17:35:00 +0000 2016
>Last-Modified:  Thu Dec 15 21:55:01 +0000 2016
>Originator:     Andreas Gustafsson
>Release:        NetBSD 7.0
>Organization:

>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:

While debugging #51531, I started the following simple tcpdump command
at the server end of the dedicated network used for the test:

  tcpdump -n -i re0

The following day, I noticed that the tcpdump process (29583) was
consuming close to 100% CPU.  Running "sudo ktruss -p 29583" showed an
endless high-speed stream of

 29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN

The last stdout output from the tcpdump process itself was from about
five hours earlier, presumably because there had actually been no
traffic since then on the network in case.

After killing and restarting tcpdump, running ktruss -p against the
new tcpdump process, the output showed a much more reasonable once-per
second polling:

 18798      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) = 0
       ""
 18798      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) = 0
       ""

So something had caused tcpdump to go into a state of continuous
polling, but I don't know what.  Maybe the passage of time, maybe
the passage of a long time without traffic, maybe something else
entirely.

>How-To-Repeat:

Don't know - it has only happened once so far.

>Fix:

>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: bin/51720: tcpdump uses 100% CPU
Date: Thu, 15 Dec 2016 22:52:59 +0100

 On Thu, Dec 15, 2016 at 05:35:00PM +0000, Andreas Gustafsson wrote:
 > >Environment:
 > System: NetBSD
 > Architecture: x86_64
 > Machine: amd64
 > >Description:
 > 
 > While debugging #51531, I started the following simple tcpdump command
 > at the server end of the dedicated network used for the test:
 > 
 >   tcpdump -n -i re0
 > 
 > The following day, I noticed that the tcpdump process (29583) was
 > consuming close to 100% CPU.  Running "sudo ktruss -p 29583" showed an
 > endless high-speed stream of
 > 
 >  29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 >  29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 >  29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 >  29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 >  29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 >  29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 >  29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 >  29583      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) Err#35 EAGAIN
 >  
 > The last stdout output from the tcpdump process itself was from about
 > five hours earlier, presumably because there had actually been no
 > traffic since then on the network in case.
 > 
 > After killing and restarting tcpdump, running ktruss -p against the
 > new tcpdump process, the output showed a much more reasonable once-per
 > second polling:
 > 
 >  18798      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) = 0
 >        ""
 >  18798      1 tcpdump  read(0x3, 0x7f7ff7b16000, 0x80000) = 0
 >        ""
 > 
 > So something had caused tcpdump to go into a state of continuous
 > polling, but I don't know what.  Maybe the passage of time, maybe
 > the passage of a long time without traffic, maybe something else
 > entirely.

 I'm seeing this with arpwatch and ndpwatch too. They also use bpf.
 Usually either they use 100% CPU right from the start, or they
 run fine for days. If they start at 100% CPU I kill and restart them
 (this can need a few tries to get them in a good state).

 This is with netbsd-7/i386, I've never seen this with netbsd-6

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.