NetBSD Problem Report #54525

From gson@gson.org  Thu Sep  5 13:31:33 2019
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id B90187A179
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  5 Sep 2019 13:31:33 +0000 (UTC)
Message-Id: <20190905133127.EF9FC98964E@guava.gson.org>
Date: Thu,  5 Sep 2019 16:31:27 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: net/arp/t_arp:arp_rtm still fails
X-Send-Pr-Version: 3.95

>Number:         54525
>Category:       misc
>Synopsis:       net/arp/t_arp:arp_rtm still fails
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    ozaki-r
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Sep 05 13:35:00 +0000 2019
>Closed-Date:    Mon Apr 05 05:13:26 +0000 2021
>Last-Modified:  Mon Apr 05 16:15:01 +0000 2021
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2019.09.03.19.07.50
>Organization:

>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:

The failing net/arp/t_arp:arp_rtm test case was not fixed by
roy's commit of src/tests/net/arp/t_arp.sh 1.39, but is now
failing with a different error:

2019.09.01.19.26.21/test.log.gz:    arp_rtm: [33.751094s] Passed.
2019.09.01.19.54.04/test.log.gz:    arp_rtm: [38.892978s] Passed.
2019.09.01.21.55.13/test.log.gz:    arp_rtm: [38.221017s] Passed.
2019.09.01.22.09.02/test.log.gz:    arp_rtm: [20.831786s] Failed: atf-check failed; see the output of the test for details
2019.09.01.22.10.13/test.log.gz:    arp_rtm: [21.180874s] Failed: atf-check failed; see the output of the test for details
2019.09.01.23.01.01/test.log.gz:    arp_rtm: [21.318267s] Failed: atf-check failed; see the output of the test for details
2019.09.02.00.51.48/test.log.gz:    arp_rtm: [19.295217s] Failed: atf-check failed; see the output of the test for details
2019.09.02.01.28.41/test.log.gz:    arp_rtm: [21.197473s] Failed: atf-check failed; see the output of the test for details
2019.09.02.08.26.00/test.log.gz:    arp_rtm: [23.159463s] Failed: atf-check failed; see the output of the test for details
2019.09.02.10.35.15/test.log.gz:    arp_rtm: [16.001648s] Failed: atf-check failed; see the output of the test for details
2019.09.02.12.48.52/test.log.gz:    arp_rtm: [15.718583s] Failed: atf-check failed; see the output of the test for details
2019.09.02.20.09.30/test.log.gz:    arp_rtm: [21.493131s] Failed: atf-check failed; see the output of the test for details
2019.09.03.03.04.31/test.log.gz:    arp_rtm: [21.739860s] Failed: atf-check failed; see the output of the test for details
2019.09.03.10.36.17/test.log.gz:    arp_rtm: [20.835321s] Failed: atf-check failed; see the output of the test for details
2019.09.03.21.34.03/test.log.gz:    arp_rtm: [616.803112s] Failed: Test case timed out after 300 seconds
2019.09.04.03.15.20/test.log.gz:    arp_rtm: [619.731232s] Failed: Test case timed out after 300 seconds
2019.09.04.05.37.06/test.log.gz:    arp_rtm: [625.671949s] Failed: Test case timed out after 300 seconds
2019.09.04.08.13.43/test.log.gz:    arp_rtm: [619.558218s] Failed: Test case timed out after 300 seconds
2019.09.04.10.34.04/test.log.gz:    arp_rtm: [664.090334s] Failed: Test case timed out after 300 seconds
2019.09.04.12.14.09/test.log.gz:    arp_rtm: [620.453415s] Failed: Test case timed out after 300 seconds
2019.09.04.13.30.52/test.log.gz:    arp_rtm: [618.790491s] Failed: Test case timed out after 300 seconds

More log data at:

  http://releng.netbsd.org/b5reports/i386/commits-2019.09.html#2019.09.03.19.07.50

>How-To-Repeat:

Run the ATF tests.

>Fix:

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: misc-bug-people->roy
Responsible-Changed-By: gson@NetBSD.org
Responsible-Changed-When: Thu, 05 Sep 2019 15:21:49 +0000
Responsible-Changed-Why:
Over to committer.


From: Roy Marples <roy@marples.name>
To: gnats-bugs@netbsd.org, roy@netbsd.org, misc-bug-people@netbsd.org,
 netbsd-bugs@netbsd.org, gnats-admin@netbsd.org, gson@NetBSD.org,
 Andreas Gustafsson <gson@gson.org>
Cc: 
Subject: Re: misc/54525 (net/arp/t_arp:arp_rtm still fails)
Date: Thu, 5 Sep 2019 16:48:44 +0100

 On 05/09/2019 16:21, gson@NetBSD.org wrote:
 > Synopsis: net/arp/t_arp:arp_rtm still fails

 Termination reason

 FAILED: Test case timed out after 300 seconds
 Standard output stream

 Executing command [ rump_server -lrumpnet -lrumpnet_net 
 -lrumpnet_netinet -lrumpnet_shmif unix://commsock1 ]
 Executing command [ rump_server -lrumpnet -lrumpnet_net 
 -lrumpnet_netinet -lrumpnet_shmif unix://commsock2 ]
 Executing command [ rump.ifconfig shmif0 create ]
 Executing command [ rump.ifconfig shmif0 linkstr bus1 ]
 Executing command [ rump.ifconfig shmif0 inet 10.0.1.2/24 ]
 Executing command [ rump.ifconfig shmif0 up ]
 Executing command [ rump.ifconfig -w 10 ]
 Executing command [ rump.ifconfig shmif0 create ]
 Executing command [ rump.ifconfig shmif0 linkstr bus1 ]
 Executing command [ rump.ifconfig shmif0 inet 10.0.1.1/24 ]
 Executing command [ rump.ifconfig shmif0 up ]
 Executing command [ rump.ifconfig -w 10 ]
 Executing command [ rump.arp -n 10.0.1.1 ]
 Executing command [ rump.arp -n 10.0.1.2 ]
 Executing command [ rump.ping -n -w 1 -c 1 10.0.1.2 ]
 Executing command [ cat ./tmp ]
 Executing command [ rump.ping -n -w 6 -c 6 10.0.1.99 ]

 Looks like ping didn't obey the -w6 parameter if it takes 300 seconds!!!
 The test case works fine for me locally.

 Rump or ping issue maybe?

 Roy

From: Andreas Gustafsson <gson@NetBSD.org>
To: Roy Marples <roy@marples.name>
Cc: gnats-bugs@netbsd.org
Subject: Re: misc/54525 (net/arp/t_arp:arp_rtm still fails)
Date: Fri, 6 Sep 2019 18:19:04 +0300

 Roy Marples wrote:
 > Looks like ping didn't obey the -w6 parameter if it takes 300 seconds!!!

 Ping is not taking 300 seconds.  If I run "./t_arp arp_rtm" and
 suspend it after a minute or so, ps shows no ping process, but
 there is a "rump.route -n monitor -c 1" process, so presumably 
 the test is hanging in the "wait $pid" waiting for that rump.route
 process.
 -- 
 Andreas Gustafsson, gson@NetBSD.org

From: Roy Marples <roy@marples.name>
To: gnats-bugs@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
 Andreas Gustafsson <gson@gson.org>
Cc: 
Subject: Re: misc/54525 (net/arp/t_arp:arp_rtm still fails)
Date: Fri, 6 Sep 2019 18:50:40 +0100

 On 06/09/2019 16:20, Andreas Gustafsson wrote:
 >   Ping is not taking 300 seconds.  If I run "./t_arp arp_rtm" and
 >   suspend it after a minute or so, ps shows no ping process, but
 >   there is a "rump.route -n monitor -c 1" process, so presumably
 >   the test is hanging in the "wait $pid" waiting for that rump.route
 >   process.

 What's the IP range and subnets configured on the test machine please?
 Is it possible that 10.0.99.99 could be resolved?

 Roy

From: Andreas Gustafsson <gson@gson.org>
To: Roy Marples <roy@marples.name>
Cc: gnats-bugs@netbsd.org
Subject: Re: misc/54525 (net/arp/t_arp:arp_rtm still fails)
Date: Fri, 6 Sep 2019 23:03:46 +0300

 Roy Marples wrote:
 > What's the IP range and subnets configured on the test machine please?
 > Is it possible that 10.0.99.99 could be resolved?

 The failing tests on bablylon5.netbsd.org run in a QEMU VM
 where only the loopback interface is configured with an
 IPv4 address:

 # ifconfig -a
 wm0: flags=0x8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
         capabilities=2bf80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
         capabilities=2bf80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Tx>
         capabilities=2bf80<UDP6CSUM_Tx>
         enabled=0
         ec_capabilities=7<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU>
         ec_enabled=2<VLAN_HWTAGGING>
         address: 52:54:00:12:34:56
         media: Ethernet autoselect (none)
 lo0: flags=0x8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33176
         inet 127.0.0.1/8 flags 0x0
         inet6 ::1/128 flags 0x20<NODAD>
         inet6 fe80::1%lo0/64 flags 0x0 scopeid 0x2
 #
 -- 
 Andreas Gustafsson, gson@gson.org

From: "Roy Marples" <roy@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54525 CVS commit: src/tests/net/arp
Date: Mon, 9 Sep 2019 10:29:04 +0000

 Module Name:	src
 Committed By:	roy
 Date:		Mon Sep  9 10:29:04 UTC 2019

 Modified Files:
 	src/tests/net/arp: t_arp.sh

 Log Message:
 t_arp: Wait for 10 seconds for RTM_MISS

 Let's try increasing the ping timeout to try and fix PR misc/54525.


 To generate a diff of this commit:
 cvs rdiff -u -r1.39 -r1.40 src/tests/net/arp/t_arp.sh

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Andreas Gustafsson <gson@gson.org>
To: "Roy Marples" <roy@netbsd.org>,  gnats-bugs@netbsd.org
Cc: 
Subject: Re: PR/54525 CVS commit: src/tests/net/arp
Date: Thu, 12 Sep 2019 09:39:18 +0300

 Roy Marples wrote:
 >  Log Message:
 >  t_arp: Wait for 10 seconds for RTM_MISS
 >  
 >  Let's try increasing the ping timeout to try and fix PR misc/54525.

 On b5, the test went from failing reliably to failing randomly:

   $ babylon5.netbsd.org /bracket/i386/results/2019/$ zgrep 'arp_rtm: ' */test.log.gz
   [...]
   2019.09.09.07.46.49/test.log.gz:    arp_rtm: [614.779632s] Failed: Test case timed out after 300 seconds
   2019.09.09.10.29.04/test.log.gz:    arp_rtm: [612.285284s] Failed: Test case timed out after 300 seconds
   2019.09.09.13.57.13/test.log.gz:    arp_rtm: [614.186856s] Failed: Test case timed out after 300 seconds
   2019.09.09.14.40.40/test.log.gz:    arp_rtm: [603.590935s] Failed: Test case timed out after 300 seconds
   2019.09.09.18.11.20/test.log.gz:    arp_rtm: [49.829339s] Passed.
   2019.09.09.22.01.23/test.log.gz:    arp_rtm: [58.025601s] Passed.
   2019.09.10.09.32.05/test.log.gz:    arp_rtm: [615.053427s] Failed: Test case timed out after 300 seconds
   2019.09.10.23.19.34/test.log.gz:    arp_rtm: [58.627688s] Passed.

 So far, I have reproduced the failures on two different testbeds where
 the system under test runs under qemu hosted on NetBSD, but have not
 been able to reproduce it on real hardware, nor on qemu hosted on
 Linux.  This suggests the issue may be yet another manifestation of
 PR kern/43997.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: roy@netbsd.org
Subject: Re: PR/54525 CVS commit: src/tests/net/arp
Date: Thu, 12 Sep 2019 08:45:25 +0200

 On Thu, Sep 12, 2019 at 06:40:01AM +0000, Andreas Gustafsson wrote:
 >  So far, I have reproduced the failures on two different testbeds where
 >  the system under test runs under qemu hosted on NetBSD, but have not
 >  been able to reproduce it on real hardware, nor on qemu hosted on
 >  Linux.  This suggests the issue may be yet another manifestation of
 >  PR kern/43997.

 Maybe we could make the -w argument depend on qemu? Like: use -w5 everywhere
 but -w30 on qemu?

 Martin

Responsible-Changed-From-To: roy->blymn
Responsible-Changed-By: roy@NetBSD.org
Responsible-Changed-When: Wed, 20 Nov 2019 08:51:26 +0000
Responsible-Changed-Why:
I'm not having much success with fixing this :/


Responsible-Changed-From-To: blymn->ozaki-r
Responsible-Changed-By: roy@NetBSD.org
Responsible-Changed-When: Wed, 20 Nov 2019 08:54:35 +0000
Responsible-Changed-Why:
Right person - lol


State-Changed-From-To: open->closed
State-Changed-By: ozaki-r@NetBSD.org
State-Changed-When: Mon, 05 Apr 2021 05:13:26 +0000
State-Changed-Why:
It seems the test doesn't fail these days.


From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc: ozaki-r@NetBSD.org, roy@NetBSD.org
Subject: Re: misc/54525 (net/arp/t_arp:arp_rtm still fails)
Date: Mon, 5 Apr 2021 09:11:00 +0300

 ozaki-r@NetBSD.org wrote:
 > Synopsis: net/arp/t_arp:arp_rtm still fails
 > 
 > State-Changed-From-To: open->closed
 > State-Changed-By: ozaki-r@NetBSD.org
 > State-Changed-When: Mon, 05 Apr 2021 05:13:26 +0000
 > State-Changed-Why:
 > It seems the test doesn't fail these days.

 Confirmed.  Of the stretch of timeout failures around the time of the
 PR, the last one recorded on b5 was for source date 2020.03.09.15.40.50:

   babylon5.netbsd.org /bracket/i386/results$ zgrep 'arp_rtm: .*imed out' 202?/*/test.log.gz
   [...]
   2020/2020.03.08.06.44.26/test.log.gz:    arp_rtm: [611.922834s] Failed: Test case timed out after 300 seconds
   2020/2020.03.08.15.07.44/test.log.gz:    arp_rtm: [616.481079s] Failed: Test case timed out after 300 seconds
   2020/2020.03.09.05.10.19/test.log.gz:    arp_rtm: [605.977085s] Failed: Test case timed out after 300 seconds
   2020/2020.03.09.11.21.54/test.log.gz:    arp_rtm: [607.563452s] Failed: Test case timed out after 300 seconds
   2020/2020.03.09.15.40.50/test.log.gz:    arp_rtm: [606.187664s] Failed: Test case timed out after 300 seconds

 which makes roy's commit of if_arp.c 1.293 a likely candidate for
 the fix.

 There has also been two subsequent streteches of timeout failures,
 and they have also been fixed:

   2020/2020.05.21.08.43.57/test.log.gz:    arp_rtm: [5.126713s] Failed: Test case timed out after 300 seconds
   2020/2020.05.21.09.11.33/test.log.gz:    arp_rtm: [5.717317s] Failed: Test case timed out after 300 seconds
   2020/2020.05.21.09.24.17/test.log.gz:    arp_rtm: [4.769616s] Failed: Test case timed out after 300 seconds
   2020/2020.05.21.13.23.38/test.log.gz:    arp_rtm: [4.791273s] Failed: Test case timed out after 300 seconds
   2020/2020.05.24.19.26.40/test.log.gz:    arp_rtm: [5.163584s] Failed: Test case timed out after 300 seconds
   2020/2020.05.26.19.55.43/test.log.gz:    arp_rtm: [5.487326s] Failed: Test case timed out after 300 seconds

   2020/2020.09.15.10.58.02/test.log.gz:    arp_rtm: [597.145709s] Failed: Test case timed out after 300 seconds
   2020/2020.09.15.11.19.10/test.log.gz:    arp_rtm: [595.695683s] Failed: Test case timed out after 300 seconds
   2020/2020.09.15.16.22.04/test.log.gz:    arp_rtm: [598.129546s] Failed: Test case timed out after 300 seconds
   2020/2020.09.15.17.21.39/test.log.gz:    arp_rtm: [605.461260s] Failed: Test case timed out after 300 seconds

 -- 
 Andreas Gustafsson, gson@gson.org

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Andreas Gustafsson <gson@gson.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, Roy Marples <roy@netbsd.org>
Subject: Re: misc/54525 (net/arp/t_arp:arp_rtm still fails)
Date: Tue, 6 Apr 2021 01:10:49 +0900

 On Mon, Apr 5, 2021 at 3:11 PM Andreas Gustafsson <gson@gson.org> wrote:
 >
 > ozaki-r@NetBSD.org wrote:
 > > Synopsis: net/arp/t_arp:arp_rtm still fails
 > >
 > > State-Changed-From-To: open->closed
 > > State-Changed-By: ozaki-r@NetBSD.org
 > > State-Changed-When: Mon, 05 Apr 2021 05:13:26 +0000
 > > State-Changed-Why:
 > > It seems the test doesn't fail these days.
 >
 > Confirmed.  Of the stretch of timeout failures around the time of the
 > PR, the last one recorded on b5 was for source date 2020.03.09.15.40.50:
 >
 >   babylon5.netbsd.org /bracket/i386/results$ zgrep 'arp_rtm: .*imed out' 202?/*/test.log.gz
 >   [...]
 >   2020/2020.03.08.06.44.26/test.log.gz:    arp_rtm: [611.922834s] Failed: Test case timed out after 300 seconds
 >   2020/2020.03.08.15.07.44/test.log.gz:    arp_rtm: [616.481079s] Failed: Test case timed out after 300 seconds
 >   2020/2020.03.09.05.10.19/test.log.gz:    arp_rtm: [605.977085s] Failed: Test case timed out after 300 seconds
 >   2020/2020.03.09.11.21.54/test.log.gz:    arp_rtm: [607.563452s] Failed: Test case timed out after 300 seconds
 >   2020/2020.03.09.15.40.50/test.log.gz:    arp_rtm: [606.187664s] Failed: Test case timed out after 300 seconds
 >
 > which makes roy's commit of if_arp.c 1.293 a likely candidate for
 > the fix.
 >
 > There has also been two subsequent streteches of timeout failures,
 > and they have also been fixed:
 >
 >   2020/2020.05.21.08.43.57/test.log.gz:    arp_rtm: [5.126713s] Failed: Test case timed out after 300 seconds
 >   2020/2020.05.21.09.11.33/test.log.gz:    arp_rtm: [5.717317s] Failed: Test case timed out after 300 seconds
 >   2020/2020.05.21.09.24.17/test.log.gz:    arp_rtm: [4.769616s] Failed: Test case timed out after 300 seconds
 >   2020/2020.05.21.13.23.38/test.log.gz:    arp_rtm: [4.791273s] Failed: Test case timed out after 300 seconds
 >   2020/2020.05.24.19.26.40/test.log.gz:    arp_rtm: [5.163584s] Failed: Test case timed out after 300 seconds
 >   2020/2020.05.26.19.55.43/test.log.gz:    arp_rtm: [5.487326s] Failed: Test case timed out after 300 seconds
 >
 >   2020/2020.09.15.10.58.02/test.log.gz:    arp_rtm: [597.145709s] Failed: Test case timed out after 300 seconds
 >   2020/2020.09.15.11.19.10/test.log.gz:    arp_rtm: [595.695683s] Failed: Test case timed out after 300 seconds
 >   2020/2020.09.15.16.22.04/test.log.gz:    arp_rtm: [598.129546s] Failed: Test case timed out after 300 seconds
 >   2020/2020.09.15.17.21.39/test.log.gz:    arp_rtm: [605.461260s] Failed: Test case timed out after 300 seconds
 >
 > --
 > Andreas Gustafsson, gson@gson.org

 Thank you for the confirmation!
   ozaki-r

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.