NetBSD Problem Report #55356
From gson@gson.org Sun Jun 7 12:26:06 2020
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 7611B1A9219
for <gnats-bugs@gnats.NetBSD.org>; Sun, 7 Jun 2020 12:26:06 +0000 (UTC)
Message-Id: <20200607122559.03F01253EE0@guava.gson.org>
Date: Sun, 7 Jun 2020 15:25:58 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: stress_killer test case now fails on real hardware
X-Send-Pr-Version: 3.95
>Number: 55356
>Category: kern
>Synopsis: stress_killer test case now fails on real hardware
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: ad
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Jun 07 12:30:00 +0000 2020
>Last-Modified: Fri Aug 28 19:15:52 +0000 2020
>Originator: Andreas Gustafsson
>Release: NetBSD-current, source date >= 2020.05.30.19.23.25
>Organization:
>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:
The stress_killer test case of the rump/rumpkern/t_sp test first
started taking about 60x longer to run, and then started to fail in
most runs, on my bare metal amd64 testbed:
guido /bracket/amd64-baremetal/results/2020 $ zgrep -a 'stress_killer: ' */test.log.gz
[...]
2020.05.21.05.58.00/test.log.gz: stress_killer: [5.552385s] Passed.
2020.05.21.15.28.35/test.log.gz: stress_killer: [5.544183s] Passed.
2020.05.21.20.43.23/test.log.gz: stress_killer: [5.530431s] Passed.
2020.05.22.20.27.16/test.log.gz: stress_killer: [5.537363s] Passed.
2020.05.23.10.48.43/test.log.gz: stress_killer: [5.542154s] Passed.
2020.05.23.14.10.47/test.log.gz: stress_killer: [5.547290s] Passed.
2020.05.23.17.28.27/test.log.gz: stress_killer: [5.540176s] Passed.
2020.05.23.19.52.12/test.log.gz: stress_killer: [5.522058s] Passed.
2020.05.23.20.22.42/test.log.gz: stress_killer: [5.531893s] Passed.
2020.05.23.20.45.11/test.log.gz: stress_killer: [305.500845s] Passed.
2020.05.24.10.42.28/test.log.gz: stress_killer: [305.509691s] Passed.
2020.05.24.18.44.46/test.log.gz: stress_killer: [305.514671s] Passed.
2020.05.24.19.28.10/test.log.gz: stress_killer: [305.506673s] Passed.
2020.05.24.21.09.49/test.log.gz: stress_killer: [305.506917s] Passed.
2020.05.25.21.05.01/test.log.gz: stress_killer: [305.526726s] Passed.
2020.05.25.22.04.51/test.log.gz: stress_killer: [305.517912s] Passed.
2020.05.25.23.48.26/test.log.gz: stress_killer: [305.515757s] Passed.
2020.05.26.00.50.54/test.log.gz: stress_killer: [305.519245s] Passed.
2020.05.26.05.25.21/test.log.gz: stress_killer: [305.512454s] Passed.
2020.05.26.19.55.43/test.log.gz: stress_killer: [306.517213s] Passed.
2020.05.27.20.49.14/test.log.gz: stress_killer: [305.516698s] Passed.
2020.05.28.20.29.21/test.log.gz: stress_killer: [305.526518s] Passed.
2020.05.29.10.01.23/test.log.gz: stress_killer: [305.510955s] Passed.
2020.05.29.12.30.41/test.log.gz: stress_killer: [305.517822s] Passed.
2020.05.29.20.56.27/test.log.gz: stress_killer: [305.515205s] Passed.
2020.05.29.22.40.15/test.log.gz: stress_killer: [305.518754s] Passed.
2020.05.30.08.50.31/test.log.gz: stress_killer: [305.516808s] Passed.
2020.05.30.14.16.57/test.log.gz: stress_killer: [305.514601s] Passed.
2020.05.30.19.23.25/test.log.gz: stress_killer: [1.503034s] Failed: atf-check failed; see the output of the test for details
2020.05.30.19.51.32/test.log.gz: stress_killer: [0.821367s] Failed: atf-check failed; see the output of the test for details
2020.05.30.20.23.25/test.log.gz: stress_killer: [0.516637s] Failed: atf-check failed; see the output of the test for details
2020.05.30.20.47.59/test.log.gz: stress_killer: [1.594727s] Failed: atf-check failed; see the output of the test for details
2020.05.30.21.00.31/test.log.gz: stress_killer: [0.398490s] Failed: atf-check failed; see the output of the test for details
2020.05.30.23.52.09/test.log.gz: stress_killer: [305.487468s] Passed.
2020.05.31.01.39.33/test.log.gz: stress_killer: [0.289542s] Failed: atf-check failed; see the output of the test for details
2020.05.31.06.27.06/test.log.gz: stress_killer: [0.615813s] Failed: atf-check failed; see the output of the test for details
2020.05.31.08.05.30/test.log.gz: stress_killer: [0.812253s] Failed: atf-check failed; see the output of the test for details
2020.05.31.09.59.37/test.log.gz: stress_killer: [305.476629s] Passed.
2020.05.31.16.32.07/test.log.gz: stress_killer: [0.386591s] Failed: atf-check failed; see the output of the test for details
2020.05.31.18.33.08/test.log.gz: stress_killer: [2.135669s] Failed: atf-check failed; see the output of the test for details
2020.06.01.00.00.37/test.log.gz: stress_killer: [3.784872s] Failed: atf-check failed; see the output of the test for details
2020.06.01.18.55.37/test.log.gz: stress_killer: [0.397609s] Failed: atf-check failed; see the output of the test for details
[...]
The increase in run time coincided with this commit:
2020.05.23.20.45.10 ad src/sys/kern/kern_lwp.c 1.238
2020.05.23.20.45.10 ad src/sys/kern/kern_proc.c 1.252
2020.05.23.20.45.10 ad src/sys/kern/kern_turnstile.c 1.40
2020.05.23.20.45.10 ad src/sys/kern/sys_lwp.c 1.81
2020.05.23.20.45.11 ad src/sys/rump/librump/rumpkern/lwproc.c 1.49
2020.05.23.20.45.11 ad src/sys/sys/lwp.h 1.210
2020.05.23.20.45.11 ad src/sys/sys/proc.h 1.366
2020.05.23.20.45.11 ad src/sys/sys/sleepq.h 1.31
and the first failure happened after the period of build breakage
during which the following commits were made:
2020.05.30.15.55.47 jdolecek src/sys/dev/ic/bwfm.c 1.25
2020.05.30.15.55.47 jdolecek src/sys/dev/ic/bwfmvar.h 1.10
2020.05.30.15.55.47 jdolecek src/sys/dev/pci/if_bwfm_pci.c 1.9
2020.05.30.15.55.47 jdolecek src/sys/dev/sdmmc/if_bwfm_sdio.c 1.18
2020.05.30.16.03.58 roy src/lib/libterminfo/curterm.c 1.14
2020.05.30.16.12.56 thorpej src/sys/kern/core_elf32.c 1.64
2020.05.30.16.35.02 thorpej src/sys/dev/pci/if_gem_pci.c 1.51
2020.05.30.17.12.01 jdolecek src/share/man/man9/malloc.9 1.56
2020.05.30.17.18.22 jdolecek src/share/man/man9/malloc.9 1.57
2020.05.30.17.19.45 jakllsch src/sys/dev/usb/aubtfwl.c 1.10
2020.05.30.17.50.39 jmcneill src/sys/arch/aarch64/aarch64/cpu.c 1.46
2020.05.30.18.06.17 ad src/sys/kern/vfs_cache.c 1.145
2020.05.30.18.40.28 riastradh src/common/lib/libc/hash/sha3/sha3.c 1.2
2020.05.30.19.16.53 ad src/sys/rump/librump/rumpkern/lwproc.c 1.51
2020.05.30.19.23.25 jdolecek src/sys/dev/usb/if_bwfm_usb.c 1.12
Logs:
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/commits-2020.05.html#2020.05.23.20.45.11
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/commits-2020.05.html#2020.05.30.19.23.25
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->ad
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Sun, 07 Jun 2020 22:24:30 +0000
Responsible-Changed-Why:
At a guess it's another timing bug in rump.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.