NetBSD Problem Report #58925
From www@netbsd.org Sat Dec 21 15:36:58 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits)
client-signature RSA-PSS (2048 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id C54C81A923A
for <gnats-bugs@gnats.NetBSD.org>; Sat, 21 Dec 2024 15:36:58 +0000 (UTC)
Message-Id: <20241221153657.6CA0F1A923B@mollari.NetBSD.org>
Date: Sat, 21 Dec 2024 15:36:57 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: itimer(9) responds erratically to clock wound back
X-Send-Pr-Version: www-1.0
>Number: 58925
>Category: kern
>Synopsis: itimer(9) responds erratically to clock wound back
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Dec 21 15:40:01 +0000 2024
>Originator: Taylor R Campbell
>Release: current, 10, 9, ...
>Organization:
The IntervalBSD Lostation
>Environment:
>Description:
Suppose you set an interval timer to fire every minute starting at 03:00:00 on the real-time clock.
Suppose the clock is wound back to 02:00:00 (because of initial ntp adjustment, not because of misguided summer time configuration, say).
Suppose when the underlying callout fires, the clock reads 02:00:01.
What clock time does the timer get rescheduled for?
(a) 02:01:00
(b) 02:01:36
(c) 02:01:52
(d) 02:02:00
(e) 03:01:00
I think the correct answer is (a) and all the other ones are wrong. Unfortunately, what we currently implement -- assuming it is actually the wee hours of the morning of January 1st, 1970, in Reykjavik -- is (b). And before we switched from microsecond-based calculations to nanosecond-based calculations, for a few months in 2008 what we implemented was (c).
Why did this happen? It happens in the logic that tries to compute the end of the interval [it_value - k*it_interval, it_value - (k - 1)*it_interval] that the current time lies in (now_ns is the current time in nanoseconds since the epoch, last_val is it_time in nanoseconds since the epoch, and interval is it_interval in nanoseconds):
839 uint64_t last_val, next_val, interval, now_ns;
...
876 next_val = now_ns +
877 (now_ns - last_val + interval - 1) % interval;
878
879 if (backwards)
880 next_val += interval;
https://nxr.netbsd.org/xref/src/sys/kern/kern_time.c?r=1.223#836
Here, now_ns = 7201*10^9, last_val = 10800*10^9, and interval = 60*10^9. So now_ns - last_val = -3599*10^9, and now_ns - last_val + interval - 1 -3539*10^9 - 1. So reducing modulo interval (with truncating division, as C uses) should give -59*10^9 - 1, right?
Except this is unsigned 64-bit arithmetic! So the number we're reducing modulo interval is not -3539*10^9 - 1 but rather 2^64 - 3539*10^9 - 1 = 18446747612709551617, so the reduction gives 34709551615 ~ 34.7sec. Add that to now_ns and interval and you get 7295709551615 or about 02:01:36.
Now there's another problem which is that this logic isn't quite right anyway. Even if we switch it to signed arithmetic, we end up scheduling it at 7202*10^9 - 1 ~ 02:00:02, a nanosecond before than one second after it previously fired! And there's a stray nanosecond fencepost.
Note that C division is truncating (round quotient toward zero), which means, e.g.:
-3599 % 60 = -59
rather than 1 as you would get with flooring division (round quotient toward negative infinity) or euclidean division (remainder is always nonnegative). So the sign of the remainder, in the clock-wound-backwards case, is always negative -- it's what you have to add to the end of the current interval to get back to now. Or, it's what you have to _subtract_ from now to get to the end of the current interval.
So I think what this should really be is not:
now_ns += interval + ((now_ns - last_ns + interval - 1) % interval);
but rather (with an adjustment for the nanosecond of fencepost error to account for the just-less-than-interval addend being divided so it is not discarded altogether by reduction):
now_ns -= ((now_ns - last_ns + interval - 1) % interval) + 1;
Of course, that's only for the clock-wound-backward case; for the clock-wound-forward case the sign has to be reversed. So we end up with:
/*
* now [backwards] overruns now [forwards]
* | v v |
* |--+----+-*--x----+----+----|----+----+----+-*--x----+-->
* \__/ \__/
* remainder remainder
* (nonpositive) (nonnegative)
*/
remainder = ((now_ns - last_ns + interval - 1) % interval) + 1;
if (backwards) {
now_ns -= remainder;
} else {
now_ns += remainder;
it->it_overruns += (now_ns - last_ns) % interval; /* XXX int overflow */
}
>How-To-Repeat:
1. set a periodic interval timer (setitimer, timer_settime, timerfd_settime)
2. wind the clock back (settimeofday, clock_settime, ntp_adjtime, ...)
>Fix:
As above -- but let's get some automatic tests for this arithmetic written down first!
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.