NetBSD Problem Report #59424

From www@netbsd.org  Fri May 16 01:40:07 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id EADE01A923E
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 16 May 2025 01:40:06 +0000 (UTC)
Message-Id: <20250516014005.6A6811A9241@mollari.NetBSD.org>
Date: Fri, 16 May 2025 01:40:05 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: hardclock ticks run at breakneck pace under qemu
X-Send-Pr-Version: www-1.0

>Number:         59424
>Category:       port-amd64
>Synopsis:       hardclock ticks run at breakneck pace under qemu
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    imil
>State:          feedback
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 16 01:45:00 +0000 2025
>Closed-Date:    
>Last-Modified:  Fri May 16 04:53:58 +0000 2025
>Originator:     Taylor R Campbell
>Release:        current
>Organization:
The Clock of the Lapic Now Foundation
>Environment:
>Description:

	Recent current kernels running under qemu -- with nvmm on a
	netbsd-10 host, unchanged in a while -- have been running with
	dramatically accelerated hardclock timers, even when they are
	built with HZ=100 as confirmed by sysctl kern.clockrate:

# sysctl kern.clockrate
kern.clockrate: tick = 10000, tickadj = 40, hz = 100, profhz = 100, stathz = 100

	The timecounter (hpet) seems to run normally, so `date' shows
	the right progression of time `sleep 10' sleeps for about 10sec
	of real time.  But sampling the hardclock_ticks with crash
	shows it growing at 500 ticks per second, rather than 100 ticks
	per second as expected, and:

# vmstat -i | grep timer
cpu0 timer     542109  507

	So the hardclock timer is running at about 500 Hz rather than
	100 Hz.  And while `date' and `sleep 10' work, callouts are
	accelerated so that it's hard to type without tripping wskbd
	repeat.

	I compared an older kernel that works (from February or so) and
	a newer kernel that is broken (from this past week) and this
	part stood out -- the `good' kernel had:

[     1.000004] cpu0: [re]calibrating local timer
[     1.000004] cpu0: apic clock running at 24 MHz
...
[     1.028453] cpu0: [re]calibrating local timer
[     1.028453] cpu0: apic clock running at 1000 MHz

	But the `bad' kernel only had the first message, not the
	second.  And printing lapic_per_second with crash(8) confirmed
	that it is 24 * 10^6, not 1000 * 10^6.

	I went looking for that message and lapic_per_second
	initialization, and found the following recent change:

changeset:   1192122:0ff4d825447c
branch:      trunk
user:        imil <imil@NetBSD.org>
date:        Fri May 02 07:08:10 2025 +0000
files:       sys/arch/x86/include/apicvar.h sys/arch/x86/x86/cpu.c sys/arch/x86/x86/identcpu_subr.c sys/arch/x86/x86/lapic.c
description:
Add support for CPUID leaf 0x40000010 to detect TSC and LAPIC frequency on
hypervisors implementing the VMware-defined interface

This change enables virtual machines to obtain TSC and LAPIC frequency
information directly from the hypervisor via CPUID leaf 0x40000010, avoiding
the need for runtime calibration, thus reducing boot speed in supported
environments.

Tested on GENERIC and MICROVM kernels, QEMU/KVM and QEMU/NVMM (current and
10.1), Intel and AMD CPUs, NetBSD/amd64 and i386.
...
diff -r 82b0bdcd458e -r 0ff4d825447c sys/arch/x86/x86/identcpu_subr.c
--- a/sys/arch/x86/x86/identcpu_subr.c	Fri May 02 03:26:26 2025 +0000
+++ b/sys/arch/x86/x86/identcpu_subr.c	Fri May 02 07:08:10 2025 +0000
@@ -1,4 +1,4 @@
-/* $NetBSD: identcpu_subr.c,v 1.13 2025/03/06 15:35:05 imil Exp $ */
+/* $NetBSD: identcpu_subr.c,v 1.14 2025/05/02 07:08:11 imil Exp $ */

 /*-
  * Copyright (c) 2020 The NetBSD Foundation, Inc.
...
@@ -133,12 +166,28 @@ cpu_tsc_freq_cpuid(struct cpu_info *ci)
 #if defined(_KERNEL) &&  NLAPIC > 0
 		if ((khz != 0) && (lapic_per_second == 0)) {
 			lapic_per_second = khz * 1000;
+			lapic_from_cpuid = true;
 			aprint_debug_dev(ci->ci_dev,
 			    "lapic_per_second set to %" PRIu32 "\n",
 			    lapic_per_second);
 		}
 #endif

	This appears to skip the second lapic calibration step on some
	machines where the _physical_ processor base frequency is
	determined by CPUID.  On this machine, in the host and passed
	through to the guest in qemu (nvmm does nothing to munge these
	CPUID leaves):

CPUID[15h]
	denom@eax=2 num@ebx=176 freq@ecx=0 zero@edx=0
CPUID[16h]
	basefreq@eax=4200MHz maxfreq@ebx=2100MHz busfreq@ecx=100MHz zero@edx=0

	tsc_freq_cpuid determines the frequency from this.

https://nxr.netbsd.org/xref/src/sys/arch/x86/x86/identcpu_subr.c?r=1.14#108

	But it winds up wrong, according to the second calibration with
	respect to the TSC.  My guess is that qemu chooses its own
	lapic frequency (1000 MHz) rather than taking the same
	frequency as the physical hardware (24 MHz), but since nvmm
	passes CPUID leaves 15h and 16h verbatim, the guest is
	confused.

	Some relevant parameters from different contexts:

	(physical host)
crash> x/d lapic_per_second
lapic_per_second:       24000000
crash> x/d lapic_tval
lapic_tval:     24000
crash> x/d hz
hz:     1000

	(good kernel, nvmm, hardclock runs roughly at expected frequency)
crash> x/d lapic_per_second
lapic_per_second:       1000007000
crash> x/d lapic_tval
lapic_tval:     10000070
crash> x/d hz
hz:     100

	(bad kernel, nvmm, hardclock runs fast)
crash> x/d lapic_per_second
lapic_per_second:       24000000
crash> x/d lapic_tval
lapic_tval:     240000
crash> x/d hz
hz:     100

	(bad kernel, no nvmm, hardclock runs roughly at expected frequency)
crash> x/d lapic_per_second
lapic_per_second:       1000572000
crash> x/d lapic_tval
lapic_tval:     10005720
crash> x/d hz
hz:     100

	It's not yet clear to me why the hardclock timer is running at
	roughly 5x the frequency it should in the affected guests: the
	reported frequency of 24 MHz and the measured frequency of 1000
	MHz are off by a factor of about 41.  But maybe qemu is trying
	to attain a higher frequency and hitting the limit on relative
	sleeps on a host with a 1000 Hz hardclock timer, which is two
	ticks per sleep at least, or 500 Hz. 

>How-To-Repeat:

	Boot a current kernel under qemu with nvmm.

>Fix:

	1. On the guest side: The change to take the hypervisor TSC
	   frequency from CPUID[4000_0010h] included another change
	   which is to skip lapic calibration even if we _don't_ get
	   the TSC frequency from that hypervisor leaf.  Maybe this
	   change is the right thing, generally, but I think it is
	   likely not necessary for fast-boot MICROVM (which goes via
	   the CPUID[4000_0010h] path instead anyway) and I think we
	   should revert this change for now.

	2. On the host side: nvmm should maybe not pass through the
	   lapic frequency verbatim, and should maybe munge it to
	   reflect the frequency that qemu emulates.  Perhaps other
	   hypervisors have some precedent here for reporting the lapic
	   frequency to the guest -- I haven't investigated.

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: port-amd64-maintainer->imil
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Fri, 16 May 2025 01:50:12 +0000
Responsible-Changed-Why:
Can you take a look at this?  It appears to have been broken by your
change -- and reverting just that one line I quoted fixes the symptoms
of an excessive-frequency hardclock.


From: "Emile iMil Heitor" <imil@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/59424 CVS commit: src/sys/arch/x86/x86
Date: Fri, 16 May 2025 04:35:54 +0000

 Module Name:	src
 Committed By:	imil
 Date:		Fri May 16 04:35:54 UTC 2025

 Modified Files:
 	src/sys/arch/x86/x86: identcpu_subr.c

 Log Message:
 PR port-amd64/59424: remove useless lapic_from_cpuid = true


 To generate a diff of this commit:
 cvs rdiff -u -r1.14 -r1.15 src/sys/arch/x86/x86/identcpu_subr.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: imil@NetBSD.org
State-Changed-When: Fri, 16 May 2025 04:53:58 +0000
State-Changed-Why:
Fix implemented: removed faulty lapic_from_cpuid assignation


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.