NetBSD Problem Report #54923

From gson@gson.org  Sun Feb  2 14:45:24 2020
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id DD1E71A9213
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  2 Feb 2020 14:45:24 +0000 (UTC)
Message-Id: <20200202144518.E10D7253FC2@guava.gson.org>
Date: Sun,  2 Feb 2020 16:45:18 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: pmax test runs fail to complete since Jan 15
X-Send-Pr-Version: 3.95

>Number:         54923
>Category:       port-pmax
>Synopsis:       pmax test runs fail to complete since Jan 15
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    ad
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Feb 02 16:38:24 +0000 2020
>Last-Modified:  Wed May 20 21:50:01 +0000 2020
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2020.01.15.18.47.23
>Organization:

>Environment:
System: NetBSD
Architecture: mips
Machine: pmax
>Description:

The pmax tests on the TNF testbed have not completed since these commits:

   2020.01.15.17.55.43 ad src/external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c 1.55
   2020.01.15.17.55.43 ad src/sys/external/bsd/drm2/dist/drm/drm_gem.c 1.11
   2020.01.15.17.55.43 ad src/sys/external/bsd/drm2/dist/drm/i915/i915_gem.c 1.55
   2020.01.15.17.55.43 ad src/sys/external/bsd/drm2/dist/drm/i915/i915_gem_fence.c 1.6
   2020.01.15.17.55.44 ad src/sys/external/bsd/drm2/include/linux/mm.h 1.10
   2020.01.15.17.55.44 ad src/sys/miscfs/genfs/genfs_io.c 1.84
   2020.01.15.17.55.44 ad src/sys/miscfs/genfs/genfs_node.h 1.23
   2020.01.15.17.55.44 ad src/sys/nfs/nfs_bio.c 1.193
   2020.01.15.17.55.44 ad src/sys/rump/librump/rumpkern/Makefile.rumpkern 1.182
   2020.01.15.17.55.44 ad src/sys/rump/librump/rumpkern/vm.c 1.183
   2020.01.15.17.55.44 ad src/sys/rump/librump/rumpvfs/vm_vfs.c 1.36
   2020.01.15.17.55.44 ad src/sys/sys/cpu_data.h 1.49
   2020.01.15.17.55.44 ad src/sys/ufs/lfs/lfs_pages.c 1.20
   2020.01.15.17.55.44 ad src/sys/ufs/lfs/lfs_segment.c 1.281
   2020.01.15.17.55.44 ad src/sys/ufs/lfs/lfs_vfsops.c 1.368
   2020.01.15.17.55.44 ad src/sys/ufs/lfs/ulfs_inode.c 1.24
   2020.01.15.17.55.44 ad src/sys/ufs/ufs/ufs_inode.c 1.108
   2020.01.15.17.55.45 ad src/sys/uvm/files.uvm 1.33
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_anon.c 1.71
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_aobj.c 1.134
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_bio.c 1.103
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_extern.h 1.219
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_fault.c 1.215
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_loan.c 1.94
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_meter.c 1.74
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_object.c 1.20
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_object.h 1.36
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_page.c 1.224
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_page.h 1.96
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_page_array.c 1.3
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_page_status.c 1.2
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_pager.c 1.120
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_pdaemon.c 1.123
   2020.01.15.17.55.45 ad src/sys/uvm/uvm_vnode.c 1.105
   2020.01.15.17.56.46 ad src/usr.bin/vmstat/vmstat.c 1.235
   2020.01.15.18.45.57 ad src/sys/sys/param.h 1.641
   2020.01.15.18.47.23 ad src/sys/arch/amd64/amd64/locore.S 1.200
   2020.01.15.18.47.23 ad src/sys/arch/i386/i386/locore.S 1.177

The logs I have looked at all ended like this:

  sbin/resize_ffs/t_shrink (534/838): 4 test cases
      shrink_24M_16M_v0_32768: [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8022725c <(no symbol)> ]
  [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
  [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
  [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
  [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
  [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
  [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
  [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
  [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
  [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
  [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
  [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x8000001c <(no symbol)> ]
  (repeat until timeout)

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: port-pmax-maintainer->ad
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Wed, 05 Feb 2020 00:03:39 +0000
Responsible-Changed-Why:
Will take a look.


From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-pmax/54923 (pmax test runs fail to complete since Jan 15)
Date: Wed, 5 Feb 2020 07:48:23 +0200

 ad@NetBSD.org wrote:
 > Will take a look.

 FWIW, the last three test runs have completed, but I'm having a hard
 time figuring out which commit could have fixed it.  The first
 completed run is:

   http://releng.netbsd.org/b5reports/pmax/commits-2020.02.html#2020.02.02.09.19.48

 I have started a few more builds to narrow it down.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-pmax/54923 (pmax test runs fail to complete since Jan 15)
Date: Sat, 15 Feb 2020 13:30:30 +0200

 On Feb 5, I wrote:
 > FWIW, the last three test runs have completed, but I'm having a hard
 > time figuring out which commit could have fixed it.  The first
 > completed run is:
 > 
 >   http://releng.netbsd.org/b5reports/pmax/commits-2020.02.html#2020.02.02.09.19.48
 > 
 > I have started a few more builds to narrow it down.

 After many additional test runs, I'm still confused as to what is
 going on.  My own testbed agress with the TNF one that the problem
 started with ad's commits of Jan 15:

   http://releng.netbsd.org/b5reports/pmax/commits-2020.01.html#2020.01.15.17.55.43
   http://www.gson.org/netbsd/bugs/build/pmax/commits-2020.01.html#2020.01.15.17.55.43

 On both testbeds, the tests then consistently failed to complete until
 around Feb 2, but since then, the behavior has been random and
 different between the testbeds:

   http://releng.netbsd.org/b5reports/pmax/commits-2020.02.html#2020.02.02.06.41.27
   http://www.gson.org/netbsd/bugs/build/pmax/commits-2020.02.html#2020.02.02.03.41.12

 The pmax tests have been printing the message "warning: LOW reference"
 since ~forever, but what has changed is that they now end up printing
 them in a tight loop (or not, as the case may be).

 The hpcmips tests have failed to complete in every run since Jan 15,
 but it's hard to tell if the problem started with the same commit as
 the pmax one because they were already randomly failing to complete
 before that:

   http://releng.netbsd.org/b5reports/hpcmips/commits-2020.01.html#2020.01.15.15.30.46

 -- 
 Andreas Gustafsson, gson@gson.org

From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-pmax/54923: pmax test runs fail to complete since Jan 15
Date: Wed, 20 May 2020 21:46:31 +0000

 It's still shagged.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.