NetBSD Problem Report #59546
From www@netbsd.org Wed Jul 23 12:46:29 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits)
client-signature RSA-PSS (2048 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 50A171A923C
for <gnats-bugs@gnats.NetBSD.org>; Wed, 23 Jul 2025 12:46:29 +0000 (UTC)
Message-Id: <20250723124628.32D841A923E@mollari.NetBSD.org>
Date: Wed, 23 Jul 2025 12:46:28 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: kernel wedged until NMI and then happily proceeds
X-Send-Pr-Version: www-1.0
>Number: 59546
>Category: port-xen
>Synopsis: kernel wedged until NMI and then happily proceeds
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-xen-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jul 23 12:50:00 +0000 2025
>Originator: Taylor R Campbell
>Release: netbsd-10
>Organization:
The NapBSD Foundation
>Environment:
NetBSD mollari.NetBSD.org 10.1_STABLE NetBSD 10.1_STABLE (amd64-DOMU_SERVER) #3: Sat Jun 21 13:32:10 UTC 2025 spz@franklin.NetBSD.org:/home/netbsd/10/amd64/obj/sys/arch/amd64/compile/amd64-DOMU_SERVER amd64
Xen kernel: 4.18 (20231116)
>Description:
Twice in the past 24h, a Xen domU has become unresponsive over the network and console -- not even to the hw.cnmagic key sequence over the console to drop into ddb.
The system does respond to `xl trigger <domid> nmi' and drops into ddb, where stack traces on all CPUs look reasonable (one in a userland process, three in idle loop). After continuing from ddb, it's fine -- except that it thinks no time has passed after hours of being wedged, which tells me that the hardclock timer interrupt has not been firing.
Similarly, since the console was unresponsive, I infer that the xencons interrupt was not firing, since it synchronously evaluates hw.cnmagic.
So I suspect something is awry with splfoo/splx, or x86_read_psl/x86_disable_intr/x86_write_psl, or something about interrupt delivery in the Xen kernel.
>How-To-Repeat:
no idea
>Fix:
Yes, please!
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.