NetBSD Problem Report #55356

From gson@gson.org  Sun Jun  7 12:26:06 2020
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 7611B1A9219
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  7 Jun 2020 12:26:06 +0000 (UTC)
Message-Id: <20200607122559.03F01253EE0@guava.gson.org>
Date: Sun,  7 Jun 2020 15:25:58 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: stress_killer test case now fails on real hardware
X-Send-Pr-Version: 3.95

>Number:         55356
>Category:       kern
>Synopsis:       stress_killer test case now fails on real hardware
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    ad
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jun 07 12:30:00 +0000 2020
>Last-Modified:  Fri Aug 28 19:15:52 +0000 2020
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2020.05.30.19.23.25
>Organization:
>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:

The stress_killer test case of the rump/rumpkern/t_sp test first
started taking about 60x longer to run, and then started to fail in
most runs, on my bare metal amd64 testbed:

  guido /bracket/amd64-baremetal/results/2020 $ zgrep -a 'stress_killer: ' */test.log.gz
  [...]
  2020.05.21.05.58.00/test.log.gz:    stress_killer: [5.552385s] Passed.
  2020.05.21.15.28.35/test.log.gz:    stress_killer: [5.544183s] Passed.
  2020.05.21.20.43.23/test.log.gz:    stress_killer: [5.530431s] Passed.
  2020.05.22.20.27.16/test.log.gz:    stress_killer: [5.537363s] Passed.
  2020.05.23.10.48.43/test.log.gz:    stress_killer: [5.542154s] Passed.
  2020.05.23.14.10.47/test.log.gz:    stress_killer: [5.547290s] Passed.
  2020.05.23.17.28.27/test.log.gz:    stress_killer: [5.540176s] Passed.
  2020.05.23.19.52.12/test.log.gz:    stress_killer: [5.522058s] Passed.
  2020.05.23.20.22.42/test.log.gz:    stress_killer: [5.531893s] Passed.
  2020.05.23.20.45.11/test.log.gz:    stress_killer: [305.500845s] Passed.
  2020.05.24.10.42.28/test.log.gz:    stress_killer: [305.509691s] Passed.
  2020.05.24.18.44.46/test.log.gz:    stress_killer: [305.514671s] Passed.
  2020.05.24.19.28.10/test.log.gz:    stress_killer: [305.506673s] Passed.
  2020.05.24.21.09.49/test.log.gz:    stress_killer: [305.506917s] Passed.
  2020.05.25.21.05.01/test.log.gz:    stress_killer: [305.526726s] Passed.
  2020.05.25.22.04.51/test.log.gz:    stress_killer: [305.517912s] Passed.
  2020.05.25.23.48.26/test.log.gz:    stress_killer: [305.515757s] Passed.
  2020.05.26.00.50.54/test.log.gz:    stress_killer: [305.519245s] Passed.
  2020.05.26.05.25.21/test.log.gz:    stress_killer: [305.512454s] Passed.
  2020.05.26.19.55.43/test.log.gz:    stress_killer: [306.517213s] Passed.
  2020.05.27.20.49.14/test.log.gz:    stress_killer: [305.516698s] Passed.
  2020.05.28.20.29.21/test.log.gz:    stress_killer: [305.526518s] Passed.
  2020.05.29.10.01.23/test.log.gz:    stress_killer: [305.510955s] Passed.
  2020.05.29.12.30.41/test.log.gz:    stress_killer: [305.517822s] Passed.
  2020.05.29.20.56.27/test.log.gz:    stress_killer: [305.515205s] Passed.
  2020.05.29.22.40.15/test.log.gz:    stress_killer: [305.518754s] Passed.
  2020.05.30.08.50.31/test.log.gz:    stress_killer: [305.516808s] Passed.
  2020.05.30.14.16.57/test.log.gz:    stress_killer: [305.514601s] Passed.
  2020.05.30.19.23.25/test.log.gz:    stress_killer: [1.503034s] Failed: atf-check failed; see the output of the test for details
  2020.05.30.19.51.32/test.log.gz:    stress_killer: [0.821367s] Failed: atf-check failed; see the output of the test for details
  2020.05.30.20.23.25/test.log.gz:    stress_killer: [0.516637s] Failed: atf-check failed; see the output of the test for details
  2020.05.30.20.47.59/test.log.gz:    stress_killer: [1.594727s] Failed: atf-check failed; see the output of the test for details
  2020.05.30.21.00.31/test.log.gz:    stress_killer: [0.398490s] Failed: atf-check failed; see the output of the test for details
  2020.05.30.23.52.09/test.log.gz:    stress_killer: [305.487468s] Passed.
  2020.05.31.01.39.33/test.log.gz:    stress_killer: [0.289542s] Failed: atf-check failed; see the output of the test for details
  2020.05.31.06.27.06/test.log.gz:    stress_killer: [0.615813s] Failed: atf-check failed; see the output of the test for details
  2020.05.31.08.05.30/test.log.gz:    stress_killer: [0.812253s] Failed: atf-check failed; see the output of the test for details
  2020.05.31.09.59.37/test.log.gz:    stress_killer: [305.476629s] Passed.
  2020.05.31.16.32.07/test.log.gz:    stress_killer: [0.386591s] Failed: atf-check failed; see the output of the test for details
  2020.05.31.18.33.08/test.log.gz:    stress_killer: [2.135669s] Failed: atf-check failed; see the output of the test for details
  2020.06.01.00.00.37/test.log.gz:    stress_killer: [3.784872s] Failed: atf-check failed; see the output of the test for details
  2020.06.01.18.55.37/test.log.gz:    stress_killer: [0.397609s] Failed: atf-check failed; see the output of the test for details
  [...]

The increase in run time coincided with this commit:

  2020.05.23.20.45.10 ad src/sys/kern/kern_lwp.c 1.238
  2020.05.23.20.45.10 ad src/sys/kern/kern_proc.c 1.252
  2020.05.23.20.45.10 ad src/sys/kern/kern_turnstile.c 1.40
  2020.05.23.20.45.10 ad src/sys/kern/sys_lwp.c 1.81
  2020.05.23.20.45.11 ad src/sys/rump/librump/rumpkern/lwproc.c 1.49
  2020.05.23.20.45.11 ad src/sys/sys/lwp.h 1.210
  2020.05.23.20.45.11 ad src/sys/sys/proc.h 1.366
  2020.05.23.20.45.11 ad src/sys/sys/sleepq.h 1.31

and the first failure happened after the period of build breakage
during which the following commits were made:

  2020.05.30.15.55.47 jdolecek src/sys/dev/ic/bwfm.c 1.25
  2020.05.30.15.55.47 jdolecek src/sys/dev/ic/bwfmvar.h 1.10
  2020.05.30.15.55.47 jdolecek src/sys/dev/pci/if_bwfm_pci.c 1.9
  2020.05.30.15.55.47 jdolecek src/sys/dev/sdmmc/if_bwfm_sdio.c 1.18
  2020.05.30.16.03.58 roy src/lib/libterminfo/curterm.c 1.14
  2020.05.30.16.12.56 thorpej src/sys/kern/core_elf32.c 1.64
  2020.05.30.16.35.02 thorpej src/sys/dev/pci/if_gem_pci.c 1.51
  2020.05.30.17.12.01 jdolecek src/share/man/man9/malloc.9 1.56
  2020.05.30.17.18.22 jdolecek src/share/man/man9/malloc.9 1.57
  2020.05.30.17.19.45 jakllsch src/sys/dev/usb/aubtfwl.c 1.10
  2020.05.30.17.50.39 jmcneill src/sys/arch/aarch64/aarch64/cpu.c 1.46
  2020.05.30.18.06.17 ad src/sys/kern/vfs_cache.c 1.145
  2020.05.30.18.40.28 riastradh src/common/lib/libc/hash/sha3/sha3.c 1.2
  2020.05.30.19.16.53 ad src/sys/rump/librump/rumpkern/lwproc.c 1.51
  2020.05.30.19.23.25 jdolecek src/sys/dev/usb/if_bwfm_usb.c 1.12

Logs:

  http://www.gson.org/netbsd/bugs/build/amd64-baremetal/commits-2020.05.html#2020.05.23.20.45.11
  http://www.gson.org/netbsd/bugs/build/amd64-baremetal/commits-2020.05.html#2020.05.30.19.23.25

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->ad
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Sun, 07 Jun 2020 22:24:30 +0000
Responsible-Changed-Why:
At a guess it's another timing bug in rump.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.