NetBSD Problem Report #54923
From gson@gson.org Sun Feb 2 14:45:24 2020
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id DD1E71A9213
for <gnats-bugs@gnats.NetBSD.org>; Sun, 2 Feb 2020 14:45:24 +0000 (UTC)
Message-Id: <20200202144518.E10D7253FC2@guava.gson.org>
Date: Sun, 2 Feb 2020 16:45:18 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: pmax test runs fail to complete since Jan 15
X-Send-Pr-Version: 3.95
>Number: 54923
>Category: port-pmax
>Synopsis: pmax test runs fail to complete since Jan 15
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: ad
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Feb 02 16:38:24 +0000 2020
>Last-Modified: Wed May 20 21:50:01 +0000 2020
>Originator: Andreas Gustafsson
>Release: NetBSD-current, source date >= 2020.01.15.18.47.23
>Organization:
>Environment:
System: NetBSD
Architecture: mips
Machine: pmax
>Description:
The pmax tests on the TNF testbed have not completed since these commits:
2020.01.15.17.55.43 ad src/external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c 1.55
2020.01.15.17.55.43 ad src/sys/external/bsd/drm2/dist/drm/drm_gem.c 1.11
2020.01.15.17.55.43 ad src/sys/external/bsd/drm2/dist/drm/i915/i915_gem.c 1.55
2020.01.15.17.55.43 ad src/sys/external/bsd/drm2/dist/drm/i915/i915_gem_fence.c 1.6
2020.01.15.17.55.44 ad src/sys/external/bsd/drm2/include/linux/mm.h 1.10
2020.01.15.17.55.44 ad src/sys/miscfs/genfs/genfs_io.c 1.84
2020.01.15.17.55.44 ad src/sys/miscfs/genfs/genfs_node.h 1.23
2020.01.15.17.55.44 ad src/sys/nfs/nfs_bio.c 1.193
2020.01.15.17.55.44 ad src/sys/rump/librump/rumpkern/Makefile.rumpkern 1.182
2020.01.15.17.55.44 ad src/sys/rump/librump/rumpkern/vm.c 1.183
2020.01.15.17.55.44 ad src/sys/rump/librump/rumpvfs/vm_vfs.c 1.36
2020.01.15.17.55.44 ad src/sys/sys/cpu_data.h 1.49
2020.01.15.17.55.44 ad src/sys/ufs/lfs/lfs_pages.c 1.20
2020.01.15.17.55.44 ad src/sys/ufs/lfs/lfs_segment.c 1.281
2020.01.15.17.55.44 ad src/sys/ufs/lfs/lfs_vfsops.c 1.368
2020.01.15.17.55.44 ad src/sys/ufs/lfs/ulfs_inode.c 1.24
2020.01.15.17.55.44 ad src/sys/ufs/ufs/ufs_inode.c 1.108
2020.01.15.17.55.45 ad src/sys/uvm/files.uvm 1.33
2020.01.15.17.55.45 ad src/sys/uvm/uvm_anon.c 1.71
2020.01.15.17.55.45 ad src/sys/uvm/uvm_aobj.c 1.134
2020.01.15.17.55.45 ad src/sys/uvm/uvm_bio.c 1.103
2020.01.15.17.55.45 ad src/sys/uvm/uvm_extern.h 1.219
2020.01.15.17.55.45 ad src/sys/uvm/uvm_fault.c 1.215
2020.01.15.17.55.45 ad src/sys/uvm/uvm_loan.c 1.94
2020.01.15.17.55.45 ad src/sys/uvm/uvm_meter.c 1.74
2020.01.15.17.55.45 ad src/sys/uvm/uvm_object.c 1.20
2020.01.15.17.55.45 ad src/sys/uvm/uvm_object.h 1.36
2020.01.15.17.55.45 ad src/sys/uvm/uvm_page.c 1.224
2020.01.15.17.55.45 ad src/sys/uvm/uvm_page.h 1.96
2020.01.15.17.55.45 ad src/sys/uvm/uvm_page_array.c 1.3
2020.01.15.17.55.45 ad src/sys/uvm/uvm_page_status.c 1.2
2020.01.15.17.55.45 ad src/sys/uvm/uvm_pager.c 1.120
2020.01.15.17.55.45 ad src/sys/uvm/uvm_pdaemon.c 1.123
2020.01.15.17.55.45 ad src/sys/uvm/uvm_vnode.c 1.105
2020.01.15.17.56.46 ad src/usr.bin/vmstat/vmstat.c 1.235
2020.01.15.18.45.57 ad src/sys/sys/param.h 1.641
2020.01.15.18.47.23 ad src/sys/arch/amd64/amd64/locore.S 1.200
2020.01.15.18.47.23 ad src/sys/arch/i386/i386/locore.S 1.177
The logs I have looked at all ended like this:
sbin/resize_ffs/t_shrink (534/838): 4 test cases
shrink_24M_16M_v0_32768: [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8022725c <(no symbol)> ]
[ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
[ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
[ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
[ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
[ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
[ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
[ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
[ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
[ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
[ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
[ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
(repeat until timeout)
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: port-pmax-maintainer->ad
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Wed, 05 Feb 2020 00:03:39 +0000
Responsible-Changed-Why:
Will take a look.
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-pmax/54923 (pmax test runs fail to complete since Jan 15)
Date: Wed, 5 Feb 2020 07:48:23 +0200
ad@NetBSD.org wrote:
> Will take a look.
FWIW, the last three test runs have completed, but I'm having a hard
time figuring out which commit could have fixed it. The first
completed run is:
http://releng.netbsd.org/b5reports/pmax/commits-2020.02.html#2020.02.02.09.19.48
I have started a few more builds to narrow it down.
--
Andreas Gustafsson, gson@gson.org
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-pmax/54923 (pmax test runs fail to complete since Jan 15)
Date: Sat, 15 Feb 2020 13:30:30 +0200
On Feb 5, I wrote:
> FWIW, the last three test runs have completed, but I'm having a hard
> time figuring out which commit could have fixed it. The first
> completed run is:
>
> http://releng.netbsd.org/b5reports/pmax/commits-2020.02.html#2020.02.02.09.19.48
>
> I have started a few more builds to narrow it down.
After many additional test runs, I'm still confused as to what is
going on. My own testbed agress with the TNF one that the problem
started with ad's commits of Jan 15:
http://releng.netbsd.org/b5reports/pmax/commits-2020.01.html#2020.01.15.17.55.43
http://www.gson.org/netbsd/bugs/build/pmax/commits-2020.01.html#2020.01.15.17.55.43
On both testbeds, the tests then consistently failed to complete until
around Feb 2, but since then, the behavior has been random and
different between the testbeds:
http://releng.netbsd.org/b5reports/pmax/commits-2020.02.html#2020.02.02.06.41.27
http://www.gson.org/netbsd/bugs/build/pmax/commits-2020.02.html#2020.02.02.03.41.12
The pmax tests have been printing the message "warning: LOW reference"
since ~forever, but what has changed is that they now end up printing
them in a tight loop (or not, as the case may be).
The hpcmips tests have failed to complete in every run since Jan 15,
but it's hard to tell if the problem started with the same commit as
the pmax one because they were already randomly failing to complete
before that:
http://releng.netbsd.org/b5reports/hpcmips/commits-2020.01.html#2020.01.15.15.30.46
--
Andreas Gustafsson, gson@gson.org
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-pmax/54923: pmax test runs fail to complete since Jan 15
Date: Wed, 20 May 2020 21:46:31 +0000
It's still shagged.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.