NetBSD Problem Report #55272
From martin@duskware.de Mon May 18 15:53:49 2020
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 374D81A9217
for <gnats-bugs@gnats.NetBSD.org>; Mon, 18 May 2020 15:53:49 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: userland watchdog process may be outstalled
X-Send-Pr-Version: 3.95
>Number: 55272
>Category: kern
>Synopsis: userland watchdog process may be outstalled
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon May 18 15:55:00 +0000 2020
>Last-Modified: Fri Feb 19 18:40:01 +0000 2021
>Originator: Martin Husemann
>Release: NetBSD 9.99.63
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD unpluged.duskware.de 9.99.63 NetBSD 9.99.63 (UNPLUGED) #322: Mon May 18 14:20:23 CEST 2020 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/evbarm/compile/UNPLUGED evbarm
Architecture: earm
Machine: evbarm
>Description:
Since some weeks a userland watchdog with something like:
wdogctl=YES wdogctl_flags="-u mvsoctmr0"
in /etc/rc.conf may be outstalled by running busy tests, like the libarchive
test program. The timer has 21 second period.
>How-To-Repeat:
See above, just run atf tests (cd /usr/tests && atf-run)
>Fix:
n/a
>Audit-Trail:
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55272 CVS commit: src/tests/lib/libarchive
Date: Tue, 16 Jun 2020 07:59:07 +0000
Module Name: src
Committed By: martin
Date: Tue Jun 16 07:59:07 UTC 2020
Modified Files:
src/tests/lib/libarchive: t_libarchive.sh
Log Message:
PR kern/55272: skip this test on uniprocessor machines, it is too dangerous
and can kill the host kernel if a userland watchdog is running
To generate a diff of this commit:
cvs rdiff -u -r1.4 -r1.5 src/tests/lib/libarchive/t_libarchive.sh
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/55272: userland watchdog process may be outstalled
Date: Sat, 20 Jun 2020 13:39:09 +0200
I have seen this now also on other machines, even SMP ones.
I have a macppc, dual 800 MHz G4 machine, 1.5 GB RAM, it uses a software
watchdog too.
With -current I can reproducably kill it by doing a full ATF test run. It
dies here:
fs/msdosfs/t_snapshot (687/846): 2 test cases
snapshot: [1.077320s] Passed.
snapshotstress: [4.312312s] Passed.
[5.391946s]
fs/nfs/t_mountd (688/846): 1 test cases
mountdhup: [1.079431s] Expected failure: PR kern/5844: op failed with EACCES
[1.085206s]
fs/nfs/t_rquotad (689/846): 6 test cases
get_nfs_be_1_both:
[ 9641.0182210] swwdog: 60 second timer expired
[ 9641.0182210] panic: watchdog timer expired
[ 9641.0182210] cpu0: Begin traceback...
[ 9641.0182210] 0x1000fdf0: at vpanic+0x12c
[ 9641.0482453] 0x1000fe20: at panic+0x50
[ 9641.0582418] 0x1000fe60: at swwdog_panic+0x90
[ 9641.0682466] 0x1000fe70: at callout_softclock+0x418
[ 9641.0782523] 0x1000feb0: at softint_dispatch+0x140
[ 9641.0882560] 0x1000ff20: at softint_fast_dispatch+0xdc
[ 9641.0882560] saved LR(0xfb3ffb79) is invalid.cpu0: End traceback...
[ 9641.0882560] halting CPU 1
[ 9641.2083150] dumpsys: TBD
[ 9641.2083150] rebooting
However, the test itself suceeds when run in isolation:
# cd /usr/tests/fs/nfs && atf-run t_rquotad|atf-report
Tests root: /usr/tests/fs/nfs
t_rquotad (1/1): 6 test cases
get_nfs_be_1_both: [3.183507s] Passed.
get_nfs_be_1_group: [2.943784s] Passed.
get_nfs_be_1_user: [2.711919s] Passed.
get_nfs_le_1_both: [3.143645s] Passed.
get_nfs_le_1_group: [2.876600s] Passed.
get_nfs_le_1_user: [2.667976s] Passed.
[17.542653s]
Summary for 1 test programs:
6 passed test cases.
0 failed test cases.
0 expected failed test cases.
0 skipped test cases.
The lockup (and starvation of the userland watchdog tickle) only happen
with left over/locked up rump_server processes from previous tests.
This is a showstopper for the netbsd-10 branch.
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/55272: userland watchdog process may be outstalled
Date: Fri, 19 Feb 2021 19:33:15 +0100
This still happens in regular test runs for me and it seems to be a regression
from netbsd-9. I consider it a netbsd-10 showstopper.
Martin
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55272 CVS commit: src/tests/lib/libarchive
Date: Fri, 19 Feb 2021 18:36:50 +0000
Module Name: src
Committed By: martin
Date: Fri Feb 19 18:36:50 UTC 2021
Modified Files:
src/tests/lib/libarchive: t_libarchive.sh
Log Message:
PR kern/55272: do not skip this test on single cpu machines - it is not
the only test causing the watchdog starvation and we better find and
fix the real issue.
To generate a diff of this commit:
cvs rdiff -u -r1.7 -r1.8 src/tests/lib/libarchive/t_libarchive.sh
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.