NetBSD Problem Report #54525
From gson@gson.org Thu Sep 5 13:31:33 2019
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id B90187A179
for <gnats-bugs@gnats.NetBSD.org>; Thu, 5 Sep 2019 13:31:33 +0000 (UTC)
Message-Id: <20190905133127.EF9FC98964E@guava.gson.org>
Date: Thu, 5 Sep 2019 16:31:27 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: net/arp/t_arp:arp_rtm still fails
X-Send-Pr-Version: 3.95
>Number: 54525
>Category: misc
>Synopsis: net/arp/t_arp:arp_rtm still fails
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: ozaki-r
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Sep 05 13:35:00 +0000 2019
>Closed-Date: Mon Apr 05 05:13:26 +0000 2021
>Last-Modified: Mon Apr 05 16:15:01 +0000 2021
>Originator: Andreas Gustafsson
>Release: NetBSD-current, source date >= 2019.09.03.19.07.50
>Organization:
>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:
The failing net/arp/t_arp:arp_rtm test case was not fixed by
roy's commit of src/tests/net/arp/t_arp.sh 1.39, but is now
failing with a different error:
2019.09.01.19.26.21/test.log.gz: arp_rtm: [33.751094s] Passed.
2019.09.01.19.54.04/test.log.gz: arp_rtm: [38.892978s] Passed.
2019.09.01.21.55.13/test.log.gz: arp_rtm: [38.221017s] Passed.
2019.09.01.22.09.02/test.log.gz: arp_rtm: [20.831786s] Failed: atf-check failed; see the output of the test for details
2019.09.01.22.10.13/test.log.gz: arp_rtm: [21.180874s] Failed: atf-check failed; see the output of the test for details
2019.09.01.23.01.01/test.log.gz: arp_rtm: [21.318267s] Failed: atf-check failed; see the output of the test for details
2019.09.02.00.51.48/test.log.gz: arp_rtm: [19.295217s] Failed: atf-check failed; see the output of the test for details
2019.09.02.01.28.41/test.log.gz: arp_rtm: [21.197473s] Failed: atf-check failed; see the output of the test for details
2019.09.02.08.26.00/test.log.gz: arp_rtm: [23.159463s] Failed: atf-check failed; see the output of the test for details
2019.09.02.10.35.15/test.log.gz: arp_rtm: [16.001648s] Failed: atf-check failed; see the output of the test for details
2019.09.02.12.48.52/test.log.gz: arp_rtm: [15.718583s] Failed: atf-check failed; see the output of the test for details
2019.09.02.20.09.30/test.log.gz: arp_rtm: [21.493131s] Failed: atf-check failed; see the output of the test for details
2019.09.03.03.04.31/test.log.gz: arp_rtm: [21.739860s] Failed: atf-check failed; see the output of the test for details
2019.09.03.10.36.17/test.log.gz: arp_rtm: [20.835321s] Failed: atf-check failed; see the output of the test for details
2019.09.03.21.34.03/test.log.gz: arp_rtm: [616.803112s] Failed: Test case timed out after 300 seconds
2019.09.04.03.15.20/test.log.gz: arp_rtm: [619.731232s] Failed: Test case timed out after 300 seconds
2019.09.04.05.37.06/test.log.gz: arp_rtm: [625.671949s] Failed: Test case timed out after 300 seconds
2019.09.04.08.13.43/test.log.gz: arp_rtm: [619.558218s] Failed: Test case timed out after 300 seconds
2019.09.04.10.34.04/test.log.gz: arp_rtm: [664.090334s] Failed: Test case timed out after 300 seconds
2019.09.04.12.14.09/test.log.gz: arp_rtm: [620.453415s] Failed: Test case timed out after 300 seconds
2019.09.04.13.30.52/test.log.gz: arp_rtm: [618.790491s] Failed: Test case timed out after 300 seconds
More log data at:
http://releng.netbsd.org/b5reports/i386/commits-2019.09.html#2019.09.03.19.07.50
>How-To-Repeat:
Run the ATF tests.
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: misc-bug-people->roy
Responsible-Changed-By: gson@NetBSD.org
Responsible-Changed-When: Thu, 05 Sep 2019 15:21:49 +0000
Responsible-Changed-Why:
Over to committer.
From: Roy Marples <roy@marples.name>
To: gnats-bugs@netbsd.org, roy@netbsd.org, misc-bug-people@netbsd.org,
netbsd-bugs@netbsd.org, gnats-admin@netbsd.org, gson@NetBSD.org,
Andreas Gustafsson <gson@gson.org>
Cc:
Subject: Re: misc/54525 (net/arp/t_arp:arp_rtm still fails)
Date: Thu, 5 Sep 2019 16:48:44 +0100
On 05/09/2019 16:21, gson@NetBSD.org wrote:
> Synopsis: net/arp/t_arp:arp_rtm still fails
Termination reason
FAILED: Test case timed out after 300 seconds
Standard output stream
Executing command [ rump_server -lrumpnet -lrumpnet_net
-lrumpnet_netinet -lrumpnet_shmif unix://commsock1 ]
Executing command [ rump_server -lrumpnet -lrumpnet_net
-lrumpnet_netinet -lrumpnet_shmif unix://commsock2 ]
Executing command [ rump.ifconfig shmif0 create ]
Executing command [ rump.ifconfig shmif0 linkstr bus1 ]
Executing command [ rump.ifconfig shmif0 inet 10.0.1.2/24 ]
Executing command [ rump.ifconfig shmif0 up ]
Executing command [ rump.ifconfig -w 10 ]
Executing command [ rump.ifconfig shmif0 create ]
Executing command [ rump.ifconfig shmif0 linkstr bus1 ]
Executing command [ rump.ifconfig shmif0 inet 10.0.1.1/24 ]
Executing command [ rump.ifconfig shmif0 up ]
Executing command [ rump.ifconfig -w 10 ]
Executing command [ rump.arp -n 10.0.1.1 ]
Executing command [ rump.arp -n 10.0.1.2 ]
Executing command [ rump.ping -n -w 1 -c 1 10.0.1.2 ]
Executing command [ cat ./tmp ]
Executing command [ rump.ping -n -w 6 -c 6 10.0.1.99 ]
Looks like ping didn't obey the -w6 parameter if it takes 300 seconds!!!
The test case works fine for me locally.
Rump or ping issue maybe?
Roy
From: Andreas Gustafsson <gson@NetBSD.org>
To: Roy Marples <roy@marples.name>
Cc: gnats-bugs@netbsd.org
Subject: Re: misc/54525 (net/arp/t_arp:arp_rtm still fails)
Date: Fri, 6 Sep 2019 18:19:04 +0300
Roy Marples wrote:
> Looks like ping didn't obey the -w6 parameter if it takes 300 seconds!!!
Ping is not taking 300 seconds. If I run "./t_arp arp_rtm" and
suspend it after a minute or so, ps shows no ping process, but
there is a "rump.route -n monitor -c 1" process, so presumably
the test is hanging in the "wait $pid" waiting for that rump.route
process.
--
Andreas Gustafsson, gson@NetBSD.org
From: Roy Marples <roy@marples.name>
To: gnats-bugs@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
Andreas Gustafsson <gson@gson.org>
Cc:
Subject: Re: misc/54525 (net/arp/t_arp:arp_rtm still fails)
Date: Fri, 6 Sep 2019 18:50:40 +0100
On 06/09/2019 16:20, Andreas Gustafsson wrote:
> Ping is not taking 300 seconds. If I run "./t_arp arp_rtm" and
> suspend it after a minute or so, ps shows no ping process, but
> there is a "rump.route -n monitor -c 1" process, so presumably
> the test is hanging in the "wait $pid" waiting for that rump.route
> process.
What's the IP range and subnets configured on the test machine please?
Is it possible that 10.0.99.99 could be resolved?
Roy
From: Andreas Gustafsson <gson@gson.org>
To: Roy Marples <roy@marples.name>
Cc: gnats-bugs@netbsd.org
Subject: Re: misc/54525 (net/arp/t_arp:arp_rtm still fails)
Date: Fri, 6 Sep 2019 23:03:46 +0300
Roy Marples wrote:
> What's the IP range and subnets configured on the test machine please?
> Is it possible that 10.0.99.99 could be resolved?
The failing tests on bablylon5.netbsd.org run in a QEMU VM
where only the loopback interface is configured with an
IPv4 address:
# ifconfig -a
wm0: flags=0x8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
capabilities=2bf80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=2bf80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Tx>
capabilities=2bf80<UDP6CSUM_Tx>
enabled=0
ec_capabilities=7<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU>
ec_enabled=2<VLAN_HWTAGGING>
address: 52:54:00:12:34:56
media: Ethernet autoselect (none)
lo0: flags=0x8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33176
inet 127.0.0.1/8 flags 0x0
inet6 ::1/128 flags 0x20<NODAD>
inet6 fe80::1%lo0/64 flags 0x0 scopeid 0x2
#
--
Andreas Gustafsson, gson@gson.org
From: "Roy Marples" <roy@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54525 CVS commit: src/tests/net/arp
Date: Mon, 9 Sep 2019 10:29:04 +0000
Module Name: src
Committed By: roy
Date: Mon Sep 9 10:29:04 UTC 2019
Modified Files:
src/tests/net/arp: t_arp.sh
Log Message:
t_arp: Wait for 10 seconds for RTM_MISS
Let's try increasing the ping timeout to try and fix PR misc/54525.
To generate a diff of this commit:
cvs rdiff -u -r1.39 -r1.40 src/tests/net/arp/t_arp.sh
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Andreas Gustafsson <gson@gson.org>
To: "Roy Marples" <roy@netbsd.org>, gnats-bugs@netbsd.org
Cc:
Subject: Re: PR/54525 CVS commit: src/tests/net/arp
Date: Thu, 12 Sep 2019 09:39:18 +0300
Roy Marples wrote:
> Log Message:
> t_arp: Wait for 10 seconds for RTM_MISS
>
> Let's try increasing the ping timeout to try and fix PR misc/54525.
On b5, the test went from failing reliably to failing randomly:
$ babylon5.netbsd.org /bracket/i386/results/2019/$ zgrep 'arp_rtm: ' */test.log.gz
[...]
2019.09.09.07.46.49/test.log.gz: arp_rtm: [614.779632s] Failed: Test case timed out after 300 seconds
2019.09.09.10.29.04/test.log.gz: arp_rtm: [612.285284s] Failed: Test case timed out after 300 seconds
2019.09.09.13.57.13/test.log.gz: arp_rtm: [614.186856s] Failed: Test case timed out after 300 seconds
2019.09.09.14.40.40/test.log.gz: arp_rtm: [603.590935s] Failed: Test case timed out after 300 seconds
2019.09.09.18.11.20/test.log.gz: arp_rtm: [49.829339s] Passed.
2019.09.09.22.01.23/test.log.gz: arp_rtm: [58.025601s] Passed.
2019.09.10.09.32.05/test.log.gz: arp_rtm: [615.053427s] Failed: Test case timed out after 300 seconds
2019.09.10.23.19.34/test.log.gz: arp_rtm: [58.627688s] Passed.
So far, I have reproduced the failures on two different testbeds where
the system under test runs under qemu hosted on NetBSD, but have not
been able to reproduce it on real hardware, nor on qemu hosted on
Linux. This suggests the issue may be yet another manifestation of
PR kern/43997.
--
Andreas Gustafsson, gson@gson.org
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: roy@netbsd.org
Subject: Re: PR/54525 CVS commit: src/tests/net/arp
Date: Thu, 12 Sep 2019 08:45:25 +0200
On Thu, Sep 12, 2019 at 06:40:01AM +0000, Andreas Gustafsson wrote:
> So far, I have reproduced the failures on two different testbeds where
> the system under test runs under qemu hosted on NetBSD, but have not
> been able to reproduce it on real hardware, nor on qemu hosted on
> Linux. This suggests the issue may be yet another manifestation of
> PR kern/43997.
Maybe we could make the -w argument depend on qemu? Like: use -w5 everywhere
but -w30 on qemu?
Martin
Responsible-Changed-From-To: roy->blymn
Responsible-Changed-By: roy@NetBSD.org
Responsible-Changed-When: Wed, 20 Nov 2019 08:51:26 +0000
Responsible-Changed-Why:
I'm not having much success with fixing this :/
Responsible-Changed-From-To: blymn->ozaki-r
Responsible-Changed-By: roy@NetBSD.org
Responsible-Changed-When: Wed, 20 Nov 2019 08:54:35 +0000
Responsible-Changed-Why:
Right person - lol
State-Changed-From-To: open->closed
State-Changed-By: ozaki-r@NetBSD.org
State-Changed-When: Mon, 05 Apr 2021 05:13:26 +0000
State-Changed-Why:
It seems the test doesn't fail these days.
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc: ozaki-r@NetBSD.org, roy@NetBSD.org
Subject: Re: misc/54525 (net/arp/t_arp:arp_rtm still fails)
Date: Mon, 5 Apr 2021 09:11:00 +0300
ozaki-r@NetBSD.org wrote:
> Synopsis: net/arp/t_arp:arp_rtm still fails
>
> State-Changed-From-To: open->closed
> State-Changed-By: ozaki-r@NetBSD.org
> State-Changed-When: Mon, 05 Apr 2021 05:13:26 +0000
> State-Changed-Why:
> It seems the test doesn't fail these days.
Confirmed. Of the stretch of timeout failures around the time of the
PR, the last one recorded on b5 was for source date 2020.03.09.15.40.50:
babylon5.netbsd.org /bracket/i386/results$ zgrep 'arp_rtm: .*imed out' 202?/*/test.log.gz
[...]
2020/2020.03.08.06.44.26/test.log.gz: arp_rtm: [611.922834s] Failed: Test case timed out after 300 seconds
2020/2020.03.08.15.07.44/test.log.gz: arp_rtm: [616.481079s] Failed: Test case timed out after 300 seconds
2020/2020.03.09.05.10.19/test.log.gz: arp_rtm: [605.977085s] Failed: Test case timed out after 300 seconds
2020/2020.03.09.11.21.54/test.log.gz: arp_rtm: [607.563452s] Failed: Test case timed out after 300 seconds
2020/2020.03.09.15.40.50/test.log.gz: arp_rtm: [606.187664s] Failed: Test case timed out after 300 seconds
which makes roy's commit of if_arp.c 1.293 a likely candidate for
the fix.
There has also been two subsequent streteches of timeout failures,
and they have also been fixed:
2020/2020.05.21.08.43.57/test.log.gz: arp_rtm: [5.126713s] Failed: Test case timed out after 300 seconds
2020/2020.05.21.09.11.33/test.log.gz: arp_rtm: [5.717317s] Failed: Test case timed out after 300 seconds
2020/2020.05.21.09.24.17/test.log.gz: arp_rtm: [4.769616s] Failed: Test case timed out after 300 seconds
2020/2020.05.21.13.23.38/test.log.gz: arp_rtm: [4.791273s] Failed: Test case timed out after 300 seconds
2020/2020.05.24.19.26.40/test.log.gz: arp_rtm: [5.163584s] Failed: Test case timed out after 300 seconds
2020/2020.05.26.19.55.43/test.log.gz: arp_rtm: [5.487326s] Failed: Test case timed out after 300 seconds
2020/2020.09.15.10.58.02/test.log.gz: arp_rtm: [597.145709s] Failed: Test case timed out after 300 seconds
2020/2020.09.15.11.19.10/test.log.gz: arp_rtm: [595.695683s] Failed: Test case timed out after 300 seconds
2020/2020.09.15.16.22.04/test.log.gz: arp_rtm: [598.129546s] Failed: Test case timed out after 300 seconds
2020/2020.09.15.17.21.39/test.log.gz: arp_rtm: [605.461260s] Failed: Test case timed out after 300 seconds
--
Andreas Gustafsson, gson@gson.org
From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Andreas Gustafsson <gson@gson.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, Roy Marples <roy@netbsd.org>
Subject: Re: misc/54525 (net/arp/t_arp:arp_rtm still fails)
Date: Tue, 6 Apr 2021 01:10:49 +0900
On Mon, Apr 5, 2021 at 3:11 PM Andreas Gustafsson <gson@gson.org> wrote:
>
> ozaki-r@NetBSD.org wrote:
> > Synopsis: net/arp/t_arp:arp_rtm still fails
> >
> > State-Changed-From-To: open->closed
> > State-Changed-By: ozaki-r@NetBSD.org
> > State-Changed-When: Mon, 05 Apr 2021 05:13:26 +0000
> > State-Changed-Why:
> > It seems the test doesn't fail these days.
>
> Confirmed. Of the stretch of timeout failures around the time of the
> PR, the last one recorded on b5 was for source date 2020.03.09.15.40.50:
>
> babylon5.netbsd.org /bracket/i386/results$ zgrep 'arp_rtm: .*imed out' 202?/*/test.log.gz
> [...]
> 2020/2020.03.08.06.44.26/test.log.gz: arp_rtm: [611.922834s] Failed: Test case timed out after 300 seconds
> 2020/2020.03.08.15.07.44/test.log.gz: arp_rtm: [616.481079s] Failed: Test case timed out after 300 seconds
> 2020/2020.03.09.05.10.19/test.log.gz: arp_rtm: [605.977085s] Failed: Test case timed out after 300 seconds
> 2020/2020.03.09.11.21.54/test.log.gz: arp_rtm: [607.563452s] Failed: Test case timed out after 300 seconds
> 2020/2020.03.09.15.40.50/test.log.gz: arp_rtm: [606.187664s] Failed: Test case timed out after 300 seconds
>
> which makes roy's commit of if_arp.c 1.293 a likely candidate for
> the fix.
>
> There has also been two subsequent streteches of timeout failures,
> and they have also been fixed:
>
> 2020/2020.05.21.08.43.57/test.log.gz: arp_rtm: [5.126713s] Failed: Test case timed out after 300 seconds
> 2020/2020.05.21.09.11.33/test.log.gz: arp_rtm: [5.717317s] Failed: Test case timed out after 300 seconds
> 2020/2020.05.21.09.24.17/test.log.gz: arp_rtm: [4.769616s] Failed: Test case timed out after 300 seconds
> 2020/2020.05.21.13.23.38/test.log.gz: arp_rtm: [4.791273s] Failed: Test case timed out after 300 seconds
> 2020/2020.05.24.19.26.40/test.log.gz: arp_rtm: [5.163584s] Failed: Test case timed out after 300 seconds
> 2020/2020.05.26.19.55.43/test.log.gz: arp_rtm: [5.487326s] Failed: Test case timed out after 300 seconds
>
> 2020/2020.09.15.10.58.02/test.log.gz: arp_rtm: [597.145709s] Failed: Test case timed out after 300 seconds
> 2020/2020.09.15.11.19.10/test.log.gz: arp_rtm: [595.695683s] Failed: Test case timed out after 300 seconds
> 2020/2020.09.15.16.22.04/test.log.gz: arp_rtm: [598.129546s] Failed: Test case timed out after 300 seconds
> 2020/2020.09.15.17.21.39/test.log.gz: arp_rtm: [605.461260s] Failed: Test case timed out after 300 seconds
>
> --
> Andreas Gustafsson, gson@gson.org
Thank you for the confirmation!
ozaki-r
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.