NetBSD Problem Report #59546

From www@netbsd.org  Wed Jul 23 12:46:29 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits)
	 client-signature RSA-PSS (2048 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 50A171A923C
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 23 Jul 2025 12:46:29 +0000 (UTC)
Message-Id: <20250723124628.32D841A923E@mollari.NetBSD.org>
Date: Wed, 23 Jul 2025 12:46:28 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: kernel wedged until NMI and then happily proceeds
X-Send-Pr-Version: www-1.0

>Number:         59546
>Category:       port-xen
>Synopsis:       kernel wedged until NMI and then happily proceeds
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-xen-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 23 12:50:00 +0000 2025
>Originator:     Taylor R Campbell
>Release:        netbsd-10
>Organization:
The NapBSD Foundation
>Environment:
NetBSD mollari.NetBSD.org 10.1_STABLE NetBSD 10.1_STABLE (amd64-DOMU_SERVER) #3: Sat Jun 21 13:32:10 UTC 2025  spz@franklin.NetBSD.org:/home/netbsd/10/amd64/obj/sys/arch/amd64/compile/amd64-DOMU_SERVER amd64

Xen kernel: 4.18 (20231116)
>Description:
Twice in the past 24h, a Xen domU has become unresponsive over the network and console -- not even to the hw.cnmagic key sequence over the console to drop into ddb.

The system does respond to `xl trigger <domid> nmi' and drops into ddb, where stack traces on all CPUs look reasonable (one in a userland process, three in idle loop).  After continuing from ddb, it's fine -- except that it thinks no time has passed after hours of being wedged, which tells me that the hardclock timer interrupt has not been firing.

Similarly, since the console was unresponsive, I infer that the xencons interrupt was not firing, since it synchronously evaluates hw.cnmagic.

So I suspect something is awry with splfoo/splx, or x86_read_psl/x86_disable_intr/x86_write_psl, or something about interrupt delivery in the Xen kernel.
>How-To-Repeat:
no idea
>Fix:
Yes, please!

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.