NetBSD Problem Report #58660
From www@netbsd.org Mon Sep 2 16:52:16 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits)
client-signature RSA-PSS (2048 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 4B0AE1A9241
for <gnats-bugs@gnats.NetBSD.org>; Mon, 2 Sep 2024 16:52:16 +0000 (UTC)
Message-Id: <20240902165215.04CE91A9242@mollari.NetBSD.org>
Date: Mon, 2 Sep 2024 16:52:14 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: ddb(4) uses IPI instead of NMI to save other CPU registers
X-Send-Pr-Version: www-1.0
>Number: 58660
>Category: port-amd64
>Synopsis: ddb(4) uses IPI instead of NMI to save other CPU registers
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-amd64-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Sep 02 16:55:00 +0000 2024
>Originator: Taylor R Campbell
>Release: current, 10, 9, ...
>Organization:
The NetBSD Fo
>Environment:
>Description:
When the kernel enters ddb on one CPU, it asks all the other CPUs to save a trap frame for diagnostics and suspend themselves.
231 #ifdef MULTIPROCESSOR
232 if (!db_suspend_others()) {
233 ddb_suspend(regs);
234 } else {
235 curcpu()->ci_ddb_regs = &dbreg;
236 ddb_regp = &dbreg;
237 #endif
https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/db_interface.c?r=1.42#231
(Side note: That indentation is bad and should be fixed.)
286 void
287 ddb_ipi(struct trapframe frame)
288 {
289
290 ddb_suspend(&frame);
291 }
292
293 static void
294 ddb_suspend(struct trapframe *frame)
295 {
296 volatile struct cpu_info *ci = curcpu();
297 db_regs_t regs;
298
299 regs = *frame;
300
301 ci->ci_ddb_regs = ®s;
302
303 atomic_or_32(&ci->ci_flags, CPUF_PAUSE);
304
305 while (ci->ci_flags & CPUF_PAUSE)
306 x86_pause();
307 ci->ci_ddb_regs = 0;
308 tlbflushg();
309 }
https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/db_interface.c?r=1.42#286
Currently, it does this by sending an IPI (interprocessor interrupt), either via i82489 or via x2apic, to the other CPUs:
136 static int
137 db_suspend_others(void)
138 {
139 int cpu_me = cpu_number();
140 int win;
141
142 #ifndef XENPV
143 if (ddb_vec == 0)
144 return 1;
145 #endif /* XENPV */
146
147 __cpu_simple_lock(&db_lock);
148 if (ddb_cpu == NOCPU)
149 ddb_cpu = cpu_me;
150 win = (ddb_cpu == cpu_me);
151 __cpu_simple_unlock(&db_lock);
152 if (win) {
153 #ifdef XENPV
154 xen_broadcast_ipi(XEN_IPI_DDB);
155 #else
156 #if NLAPIC > 0
157 if (ddb_vec != 0)
158 x86_ipi(ddb_vec, LAPIC_DEST_ALLEXCL,
159 LAPIC_DLMODE_FIXED);
https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/db_interface.c?r=1.42#136
124 static int i82489_ipi(int vec, int target, int dl);
125 static int x2apic_ipi(int vec, int target, int dl);
126 int (*x86_ipi)(int, int, int) = i82489_ipi;
...
331 if (x2apic_mode) {
332 x86_ipi = x2apic_ipi;
https://nxr.netbsd.org/xref/src/sys/arch/x86/x86/lapic.c?r=1.90#124
However, ordinary IPIs can be blocked at any time, whereas NMIs are only blocked when one has been delivered before the next IRET instruction. We should change, or allow, ddb to use NMIs to trigger the other-CPU save/suspend logic.
>How-To-Repeat:
Get a CPU stuck in a loop with hardware interrupts masked, wait for a heartbeat panic, and try to examine another CPU's trap frame, only to find there's nothing there.
(See also PR bin/58010: crash(8) doesn't support `mach cpu N' to examine registers/stack of other CPUs, https://gnats.NetBSD.org/58010)
>Fix:
Yes, please!
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.