NetBSD Problem Report #57920

From www@netbsd.org  Sat Feb 10 19:55:30 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 5AC441A9239
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 10 Feb 2024 19:55:30 +0000 (UTC)
Message-Id: <20240210195528.8497C1A923A@mollari.NetBSD.org>
Date: Sat, 10 Feb 2024 19:55:28 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: hardclock(9) contract is unclear about missed ticks
X-Send-Pr-Version: www-1.0

>Number:         57920
>Category:       kern
>Synopsis:       hardclock(9) contract is unclear about missed ticks
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Feb 10 20:00:00 +0000 2024
>Last-Modified:  Sun Apr 07 23:00:02 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current
>Organization:
The NetBSD Hardclock
>Environment:
>Description:
Quoth the hardclock(9) man page:

     The hardclock() function is called hz(9) times per second.  It implements
     the real-time system clock.  The argument frame is an opaque, machine-
     dependent structure that encapsulates the previous machine state.

What happens if the machine-dependent periodic timer interrupt is delayed or some timer interrupts have been missed, but the underlying timer hardware can tell by how much it has been delayed or how many interrupts are missed?

Reasons for this include entering and exiting ddb, suspending and resuming hardware, scheduling delays on virtual hardware, and flaky hardware

Here are some options if n > 1 periods have elapsed since the last hardclock tick:

1. Call hardclock once, i.e., pretend nothing happened and let the timecounter sort out clock jumps.
2. Call hardclock n times, i.e., try to catch up as fast as we can even if that means hardclocks happen much faster than 1/hz times per second.
3. Call hardclock MIN(n, k) times for some time k, i.e., try to catch up but by at most k/hz seconds.

Some drivers, like the i8254 driver in arch/x86/isa/clock.c and the Intel local APIC driver in arch/x86/x86/lapic.c, do (1); some drivers, like the PowerPC e500 clock driver in arch/powerpc/booke/e500_timer.c, do (2); other drivers, like the Xen clock driver in arch/xen/xen/xen_clock.c, do (3).  Which should it be?
>How-To-Repeat:
code inspection, diagnosing heartbeat issues with ddb on riscv, writing a new clock driver and wondering what to do in this case
>Fix:
Yes, please!

Perhaps hardclock(9) should be extended with an argument saying how many ticks the MD clock driver thinks have elapsed; if >1, it missed some.  We can have the policy about what to do in this case -- dtrace probe, event counter, printf, callout scheduling, whatever -- in MI code, and leave only the mechanism for detecting missed ticks in MD code.

>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57920 CVS commit: src/sys/arch/riscv/riscv
Date: Sun, 7 Apr 2024 22:59:13 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Apr  7 22:59:13 UTC 2024

 Modified Files:
 	src/sys/arch/riscv/riscv: clock_machdep.c

 Log Message:
 riscv: Schedule next hardclock tick in the future, not the past.

 If we have missed hardclock ticks, schedule up to one tick interval
 in the future anyway; don't try to play hardclock catchup by
 scheduling for when the next hardclock tick _should_ have been, in
 the past, leading to ticking as fast as possible until we've caught
 up.  as fast as possible until we've caught up.

 Playing hardclock catchup triggers heartbeat panics when continuing
 from ddb, if you've been in ddb for >15sec.  Other hardclock drivers
 like x86 lapic don't play hardclock catchup either.

 PR kern/57920


 To generate a diff of this commit:
 cvs rdiff -u -r1.7 -r1.8 src/sys/arch/riscv/riscv/clock_machdep.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.