NetBSD Problem Report #56442
From gson@gson.org Wed Oct 6 06:44:38 2021
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 548F11A921F
for <gnats-bugs@gnats.NetBSD.org>; Wed, 6 Oct 2021 06:44:38 +0000 (UTC)
Message-Id: <20211006064431.11E4B25417E@guava.gson.org>
Date: Wed, 6 Oct 2021 09:44:31 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Tests hang under NVMM with options DEBUG
X-Send-Pr-Version: 3.95
>Number: 56442
>Category: kern
>Synopsis: Tests hang under NVMM with options DEBUG
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Oct 06 06:45:00 +0000 2021
>Originator: Andreas Gustafsson
>Release: NetBSD-current, source date >= 2019.12.01.13.20.42
>Organization:
>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:
If I enable "options DEBUG" in the NetBSD-current/amd64 GENERIC kernel
configuration, build a release, boot it in "qemu -accel nvmm", and run
the kernel/t_trapsignal test, the test hangs during the fpe_ignore
test case and is not killed by the ATF timeout mechanism, nor can it
be killed manually by hitting control-c or control-\. Console
keystrokes are still echoed and I am able to enter ddb by sending a
serial break:
# cd /usr/tests/kernel
# atf-run t_trapsignal | atf-report
Tests root: /usr/tests/kernel
t_trapsignal (1/1): 20 test cases
bus_handle: [0.213148s] Passed.
bus_handle_recurse: [0.209931s] Passed.
bus_ignore: [0.199009s] Passed.
bus_mask: [0.200927s] Passed.
bus_simple: [0.200246s] Passed.
fpe_handle: [0.219451s] Passed.
fpe_handle_recurse: [598.294510s] Failed: Test case timed out after 300 seconds
fpe_ignore:^Z^C^C
^C^C^C^C
^\^\
^C^C[ 115781.8123080] fatal breakpoint trap in supervisor mode
[ 115781.8123080] trap type 1 code 0 rip 0xffffffff8021dd8d cs 0x8 rflags 0x202 cr2 0x72bf48be8fe0 ilevel 0x8 rsp 0xffffcb002a623e88
[ 115781.8123080] curlwp 0xffffeb5ed06676c0 pid 881.1 lowest kstack 0xffffcb002a6202c0
Stopped in pid 881.1 (h_segv) at netbsd:breakpoint+0x5: leave
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
comintr() at netbsd:comintr+0x8e5
db{0}>
A bisection identified the following commit as the point where the
problem started:
2019.12.01.13.20.42 ad src/sys/kern/kern_runq.c 1.51
2019.12.01.13.20.42 ad src/sys/kern/sched_4bsd.c 1.39
2019.12.01.13.20.42 ad src/sys/kern/sched_m2.c 1.35
The problem does not occur
- without "options DEBUG"
- in qemu without "-accel nvmm"
- in qemu with "-accel nvmm -smp 2"
- on real amd64 multiprocessor hardware
I also tried to test on real amd64 uniprocessor hardware, but was
thwarted by the unrelated problem reported in PR 51531.
>How-To-Repeat:
See above.
>Fix:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.