NetBSD Problem Report #59424
From www@netbsd.org Fri May 16 01:40:07 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id EADE01A923E
for <gnats-bugs@gnats.NetBSD.org>; Fri, 16 May 2025 01:40:06 +0000 (UTC)
Message-Id: <20250516014005.6A6811A9241@mollari.NetBSD.org>
Date: Fri, 16 May 2025 01:40:05 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: hardclock ticks run at breakneck pace under qemu
X-Send-Pr-Version: www-1.0
>Number: 59424
>Category: port-amd64
>Synopsis: hardclock ticks run at breakneck pace under qemu
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: imil
>State: feedback
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri May 16 01:45:00 +0000 2025
>Closed-Date:
>Last-Modified: Fri May 16 04:53:58 +0000 2025
>Originator: Taylor R Campbell
>Release: current
>Organization:
The Clock of the Lapic Now Foundation
>Environment:
>Description:
Recent current kernels running under qemu -- with nvmm on a
netbsd-10 host, unchanged in a while -- have been running with
dramatically accelerated hardclock timers, even when they are
built with HZ=100 as confirmed by sysctl kern.clockrate:
# sysctl kern.clockrate
kern.clockrate: tick = 10000, tickadj = 40, hz = 100, profhz = 100, stathz = 100
The timecounter (hpet) seems to run normally, so `date' shows
the right progression of time `sleep 10' sleeps for about 10sec
of real time. But sampling the hardclock_ticks with crash
shows it growing at 500 ticks per second, rather than 100 ticks
per second as expected, and:
# vmstat -i | grep timer
cpu0 timer 542109 507
So the hardclock timer is running at about 500 Hz rather than
100 Hz. And while `date' and `sleep 10' work, callouts are
accelerated so that it's hard to type without tripping wskbd
repeat.
I compared an older kernel that works (from February or so) and
a newer kernel that is broken (from this past week) and this
part stood out -- the `good' kernel had:
[ 1.000004] cpu0: [re]calibrating local timer
[ 1.000004] cpu0: apic clock running at 24 MHz
...
[ 1.028453] cpu0: [re]calibrating local timer
[ 1.028453] cpu0: apic clock running at 1000 MHz
But the `bad' kernel only had the first message, not the
second. And printing lapic_per_second with crash(8) confirmed
that it is 24 * 10^6, not 1000 * 10^6.
I went looking for that message and lapic_per_second
initialization, and found the following recent change:
changeset: 1192122:0ff4d825447c
branch: trunk
user: imil <imil@NetBSD.org>
date: Fri May 02 07:08:10 2025 +0000
files: sys/arch/x86/include/apicvar.h sys/arch/x86/x86/cpu.c sys/arch/x86/x86/identcpu_subr.c sys/arch/x86/x86/lapic.c
description:
Add support for CPUID leaf 0x40000010 to detect TSC and LAPIC frequency on
hypervisors implementing the VMware-defined interface
This change enables virtual machines to obtain TSC and LAPIC frequency
information directly from the hypervisor via CPUID leaf 0x40000010, avoiding
the need for runtime calibration, thus reducing boot speed in supported
environments.
Tested on GENERIC and MICROVM kernels, QEMU/KVM and QEMU/NVMM (current and
10.1), Intel and AMD CPUs, NetBSD/amd64 and i386.
...
diff -r 82b0bdcd458e -r 0ff4d825447c sys/arch/x86/x86/identcpu_subr.c
--- a/sys/arch/x86/x86/identcpu_subr.c Fri May 02 03:26:26 2025 +0000
+++ b/sys/arch/x86/x86/identcpu_subr.c Fri May 02 07:08:10 2025 +0000
@@ -1,4 +1,4 @@
-/* $NetBSD: identcpu_subr.c,v 1.13 2025/03/06 15:35:05 imil Exp $ */
+/* $NetBSD: identcpu_subr.c,v 1.14 2025/05/02 07:08:11 imil Exp $ */
/*-
* Copyright (c) 2020 The NetBSD Foundation, Inc.
...
@@ -133,12 +166,28 @@ cpu_tsc_freq_cpuid(struct cpu_info *ci)
#if defined(_KERNEL) && NLAPIC > 0
if ((khz != 0) && (lapic_per_second == 0)) {
lapic_per_second = khz * 1000;
+ lapic_from_cpuid = true;
aprint_debug_dev(ci->ci_dev,
"lapic_per_second set to %" PRIu32 "\n",
lapic_per_second);
}
#endif
This appears to skip the second lapic calibration step on some
machines where the _physical_ processor base frequency is
determined by CPUID. On this machine, in the host and passed
through to the guest in qemu (nvmm does nothing to munge these
CPUID leaves):
CPUID[15h]
denom@eax=2 num@ebx=176 freq@ecx=0 zero@edx=0
CPUID[16h]
basefreq@eax=4200MHz maxfreq@ebx=2100MHz busfreq@ecx=100MHz zero@edx=0
tsc_freq_cpuid determines the frequency from this.
https://nxr.netbsd.org/xref/src/sys/arch/x86/x86/identcpu_subr.c?r=1.14#108
But it winds up wrong, according to the second calibration with
respect to the TSC. My guess is that qemu chooses its own
lapic frequency (1000 MHz) rather than taking the same
frequency as the physical hardware (24 MHz), but since nvmm
passes CPUID leaves 15h and 16h verbatim, the guest is
confused.
Some relevant parameters from different contexts:
(physical host)
crash> x/d lapic_per_second
lapic_per_second: 24000000
crash> x/d lapic_tval
lapic_tval: 24000
crash> x/d hz
hz: 1000
(good kernel, nvmm, hardclock runs roughly at expected frequency)
crash> x/d lapic_per_second
lapic_per_second: 1000007000
crash> x/d lapic_tval
lapic_tval: 10000070
crash> x/d hz
hz: 100
(bad kernel, nvmm, hardclock runs fast)
crash> x/d lapic_per_second
lapic_per_second: 24000000
crash> x/d lapic_tval
lapic_tval: 240000
crash> x/d hz
hz: 100
(bad kernel, no nvmm, hardclock runs roughly at expected frequency)
crash> x/d lapic_per_second
lapic_per_second: 1000572000
crash> x/d lapic_tval
lapic_tval: 10005720
crash> x/d hz
hz: 100
It's not yet clear to me why the hardclock timer is running at
roughly 5x the frequency it should in the affected guests: the
reported frequency of 24 MHz and the measured frequency of 1000
MHz are off by a factor of about 41. But maybe qemu is trying
to attain a higher frequency and hitting the limit on relative
sleeps on a host with a 1000 Hz hardclock timer, which is two
ticks per sleep at least, or 500 Hz.
>How-To-Repeat:
Boot a current kernel under qemu with nvmm.
>Fix:
1. On the guest side: The change to take the hypervisor TSC
frequency from CPUID[4000_0010h] included another change
which is to skip lapic calibration even if we _don't_ get
the TSC frequency from that hypervisor leaf. Maybe this
change is the right thing, generally, but I think it is
likely not necessary for fast-boot MICROVM (which goes via
the CPUID[4000_0010h] path instead anyway) and I think we
should revert this change for now.
2. On the host side: nvmm should maybe not pass through the
lapic frequency verbatim, and should maybe munge it to
reflect the frequency that qemu emulates. Perhaps other
hypervisors have some precedent here for reporting the lapic
frequency to the guest -- I haven't investigated.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: port-amd64-maintainer->imil
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Fri, 16 May 2025 01:50:12 +0000
Responsible-Changed-Why:
Can you take a look at this? It appears to have been broken by your
change -- and reverting just that one line I quoted fixes the symptoms
of an excessive-frequency hardclock.
From: "Emile iMil Heitor" <imil@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59424 CVS commit: src/sys/arch/x86/x86
Date: Fri, 16 May 2025 04:35:54 +0000
Module Name: src
Committed By: imil
Date: Fri May 16 04:35:54 UTC 2025
Modified Files:
src/sys/arch/x86/x86: identcpu_subr.c
Log Message:
PR port-amd64/59424: remove useless lapic_from_cpuid = true
To generate a diff of this commit:
cvs rdiff -u -r1.14 -r1.15 src/sys/arch/x86/x86/identcpu_subr.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: imil@NetBSD.org
State-Changed-When: Fri, 16 May 2025 04:53:58 +0000
State-Changed-Why:
Fix implemented: removed faulty lapic_from_cpuid assignation
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.