NetBSD Problem Report #56322
From dholland@netbsd.org Thu Jul 22 00:13:47 2021
Return-Path: <dholland@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 009361A921F
for <gnats-bugs@gnats.NetBSD.org>; Thu, 22 Jul 2021 00:13:46 +0000 (UTC)
Message-Id: <20210722001346.C1F2E84E8A@mail.netbsd.org>
Date: Thu, 22 Jul 2021 00:13:46 +0000 (UTC)
From: dholland@NetBSD.org
Reply-To: dholland@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: Excessive clock drift
X-Send-Pr-Version: 3.95
>Number: 56322
>Category: kern
>Synopsis: Excessive clock drift
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Jul 22 00:15:00 +0000 2021
>Last-Modified: Wed Oct 02 18:25:01 +0000 2024
>Originator: David A. Holland
>Release: NetBSD 9.99.85 (20210623)
>Organization:
>Environment:
System: NetBSD valkyrie 9.99.85 NetBSD 9.99.85 (VALKYRIE) #7: Wed Jun 23 18:32:25 EDT 2021 dholland@valkyrie:/usr/src/sys/arch/amd64/compile/VALKYRIE amd64
Architecture: x86_64
Machine: amd64
>Description:
Four days ago (and a couple hours) I noticed the clock was badly
behind, and synced it with ntpdate.
It is now on the order of 1:30 (that's one and a half minutes) behind,
by comparison to another machine that runs ntpd. This machine does
not, for no particular reason.
This is a regression - prior to updating to 9.99.85 the clock lagged
but nowhere near so much.
This is the relevant material I see in dmesg, which isn't much use.
[ 1.000000] timecounter: Timecounters tick every 10.000 msec
[ 1.000000] timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
[ 1.000003] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
[ 1.051023] timecounter: Timecounter "TSC" frequency 3517182870 Hz quality 3000
Before this update I used to get messages of the form
autoconfiguration error: ERROR: 2607 cycle TSC drift observed
These have gone away, but I suspect the underlying problem is that the
TSC is bad and this is no longer being detected.
Machine is a 6-way AMD Family 15h.
I don't want to be fixing the time regularly because I'm sure it will
regularly remind me of PR 56097. I could turn on ntpd, but that's at
best a bandaid, and with this much drift it might just lose sync
anyway.
>How-To-Repeat:
>Fix:
>Audit-Trail:
From: Frank Kardel <kardel@kardel.name>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 22 Jul 2021 07:47:33 +0200
Can you post
sysctl kern.timecounter
?
I assume the with the
ERROR: 2607 cycle TSC drift observed
TSC was not picked as timecounter on your system.
Can you try
sysctl -w kern.timecounter.hardware=i8254
and see whether that improves the situation.
I agree that ntpd is unlikely to curb that large drift.
ntpd will on correct a drift up to 43 seconds per day (500ppm).
Frank
On 07/22/21 02:15, dholland@NetBSD.org wrote:
>> Number: 56322
>> Category: kern
>> Synopsis: Excessive clock drift
>> Confidential: no
>> Severity: serious
>> Priority: medium
>> Responsible: kern-bug-people
>> State: open
>> Class: sw-bug
>> Submitter-Id: net
>> Arrival-Date: Thu Jul 22 00:15:00 +0000 2021
>> Originator: David A. Holland
>> Release: NetBSD 9.99.85 (20210623)
>> Organization:
>> Environment:
> System: NetBSD valkyrie 9.99.85 NetBSD 9.99.85 (VALKYRIE) #7: Wed Jun 23 18:32:25 EDT 2021 dholland@valkyrie:/usr/src/sys/arch/amd64/compile/VALKYRIE amd64
> Architecture: x86_64
> Machine: amd64
>> Description:
> Four days ago (and a couple hours) I noticed the clock was badly
> behind, and synced it with ntpdate.
>
> It is now on the order of 1:30 (that's one and a half minutes) behind,
> by comparison to another machine that runs ntpd. This machine does
> not, for no particular reason.
>
> This is a regression - prior to updating to 9.99.85 the clock lagged
> but nowhere near so much.
>
> This is the relevant material I see in dmesg, which isn't much use.
>
> [ 1.000000] timecounter: Timecounters tick every 10.000 msec
> [ 1.000000] timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
> [ 1.000003] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
> [ 1.051023] timecounter: Timecounter "TSC" frequency 3517182870 Hz quality 3000
>
> Before this update I used to get messages of the form
> autoconfiguration error: ERROR: 2607 cycle TSC drift observed
>
> These have gone away, but I suspect the underlying problem is that the
> TSC is bad and this is no longer being detected.
>
> Machine is a 6-way AMD Family 15h.
>
> I don't want to be fixing the time regularly because I'm sure it will
> regularly remind me of PR 56097. I could turn on ntpd, but that's at
> best a bandaid, and with this much drift it might just lose sync
> anyway.
>
>> How-To-Repeat:
>> Fix:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, dholland@NetBSD.org
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 22 Jul 2021 06:55:47 +0000
On Thu, Jul 22, 2021 at 05:50:01AM +0000, Frank Kardel wrote:
> Can you post
>
> sysctl kern.timecounter
> ?
>
> I assume the with the
>
> ERROR: 2607 cycle TSC drift observed
>
> TSC was not picked as timecounter on your system.
It was not then (afaik) but it was now. So,
> Can you try
>
> sysctl -w kern.timecounter.hardware=i8254
>
> and see whether that improves the situation.
Yes.
As of ~now when I did that it had dropped to two full minutes behind,
in something like seven hours, so tomorrow morning should indicate
whether the situation's better.
--
David A. Holland
dholland@netbsd.org
From: Frank Kardel <kardel@kardel.name>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 22 Jul 2021 10:10:51 +0200
On 07/22/21 09:00, David Holland wrote:
>
> > Can you try
> >
> > sysctl -w kern.timecounter.hardware=i8254
> >
> > and see whether that improves the situation.
>
> Yes.
>
> As of ~now when I did that it had dropped to two full minutes behind,
> in something like seven hours, so tomorrow morning should indicate
> whether the situation's better.
>
120s / 25200 sec ~ 0.0048 = 4800ppm
Something must be really wrong with TSC frequency estimation or
stability on that system.
Also I would have expected a timecounter like hpet. Could that be
disabled in the BIOS?
> --
> David A. Holland
> dholland@netbsd.org
>
Frank
From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, dholland@NetBSD.org
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 22 Jul 2021 12:26:11 +0200
On Thu, Jul 22, 2021 at 05:50:01AM +0000, Frank Kardel wrote:
> Can you try
>
> sysctl -w kern.timecounter.hardware=i8254
On x86, you normally have much better options available. HPET or the PCI
hostbridge (which the ACPI time counter also uses) are much faster and
have a higher resolution.
Joerg
From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 22 Jul 2021 17:02:47 +0200
True, but what was listed in dmesg output by dholland@ was
This is the relevant material I see in dmesg, which isn't much use.
[ 1.000000] timecounter: Timecounters tick every 10.000 msec
[ 1.000000] timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
[ 1.000003] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
[ 1.051023] timecounter: Timecounter "TSC" frequency 3517182870 Hz quality 3000
So I had to propose the next known workable alternative instead of timecounters that
haven't been detected :-). HPET is sometimes disabled in BIOS.
So far we do not know the output of "sysctl kern.timercounter".
Frank
On 07/22/21 12:30, Joerg Sonnenberger wrote:
> The following reply was made to PR kern/56322; it has been noted by GNATS.
>
> From: Joerg Sonnenberger <joerg@bec.de>
> To: gnats-bugs@netbsd.org
> Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org, dholland@NetBSD.org
> Subject: Re: kern/56322: Excessive clock drift
> Date: Thu, 22 Jul 2021 12:26:11 +0200
>
> On Thu, Jul 22, 2021 at 05:50:01AM +0000, Frank Kardel wrote:
> > Can you try
> >
> > sysctl -w kern.timecounter.hardware=i8254
>
> On x86, you normally have much better options available. HPET or the PCI
> hostbridge (which the ACPI time counter also uses) are much faster and
> have a higher resolution.
>
> Joerg
>
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 22 Jul 2021 23:29:25 +0000
On Thu, Jul 22, 2021 at 07:00:03AM +0000, David Holland wrote:
> > Can you try
> >
> > sysctl -w kern.timecounter.hardware=i8254
> >
> > and see whether that improves the situation.
>
> Yes.
>
> As of ~now when I did that it had dropped to two full minutes behind,
> in something like seven hours, so tomorrow morning should indicate
> whether the situation's better.
Drift since then has been ~0, so the issue reduces to: why are we no
longer detecting that the TSC is bad?
--
David A. Holland
dholland@netbsd.org
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 22 Jul 2021 23:30:32 +0000
On Thu, Jul 22, 2021 at 03:05:01PM +0000, Frank Kardel wrote:
> So I had to propose the next known workable alternative instead of
> timecounters that haven't been detected :-). HPET is sometimes
> disabled in BIOS.
Like I said, I'll check that next time I reboot this machine, which
isn't ordinarily very often.
> So far we do not know the output of "sysctl kern.timercounter".
Like I already said, it was using the bad HPET.
--
David A. Holland
dholland@netbsd.org
From: Frank Kardel <kardel@kardel.name>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, dholland@NetBSD.org
Cc:
Subject: Re: kern/56322: Excessive clock drift
Date: Fri, 23 Jul 2021 09:22:16 +0200
Did you mean to say HPET? I thought it was using TSC?
I am interested in the value of kern.timecounter.choice like
in the output below
EXAMPLE:
kern.timecounter.choice = TSC(q=3000, f=2200001000 Hz)
clockinterrupt(q=0, f=100 Hz) lapic(q=-100, f=25000000 Hz) hpet0(q=2000,
f=24000000 Hz) ACPI-Fast(q=1000, f=3579545 Hz) i8254(q=100, f=1193182
Hz) dummy(q=-1000000, f=1000000 Hz)
kern.timecounter.hardware = TSC
kern.timecounter.timestepwarnings = 1
Frank
On 07/23/21 01:35, David Holland wrote:
> The following reply was made to PR kern/56322; it has been noted by GNATS.
>
> From: David Holland <dholland-bugs@netbsd.org>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: kern/56322: Excessive clock drift
> Date: Thu, 22 Jul 2021 23:30:32 +0000
>
> On Thu, Jul 22, 2021 at 03:05:01PM +0000, Frank Kardel wrote:
> > So I had to propose the next known workable alternative instead of
> > timecounters that haven't been detected :-). HPET is sometimes
> > disabled in BIOS.
>
> Like I said, I'll check that next time I reboot this machine, which
> isn't ordinarily very often.
>
> > So far we do not know the output of "sysctl kern.timercounter".
>
> Like I already said, it was using the bad HPET.
>
> --
> David A. Holland
> dholland@netbsd.org
>
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, dholland@NetBSD.org
Subject: Re: kern/56322: Excessive clock drift
Date: Sat, 24 Jul 2021 03:12:24 +0000
On Fri, Jul 23, 2021 at 07:25:01AM +0000, Frank Kardel wrote:
> Did you mean to say HPET? I thought it was using TSC?
Er, yeah.
> I am interested in the value of kern.timecounter.choice like
>
> in the output below
>
> EXAMPLE:
>
> kern.timecounter.choice = TSC(q=3000, f=2200001000 Hz)
> clockinterrupt(q=0, f=100 Hz) lapic(q=-100, f=25000000 Hz) hpet0(q=2000,
> f=24000000 Hz) ACPI-Fast(q=1000, f=3579545 Hz) i8254(q=100, f=1193182
> Hz) dummy(q=-1000000, f=1000000 Hz)
> kern.timecounter.hardware = TSC
> kern.timecounter.timestepwarnings = 1
kern.timecounter.choice = TSC(q=3000, f=3517182870 Hz) clockinterrupt(q=0, f=100 Hz) lapic(q=-100, f=200982000 Hz) i8254(q=100, f=1193182 Hz) dummy(q=-1000000, f=1000000 Hz)
kern.timecounter.hardware = i8254
kern.timecounter.timestepwarnings = 0
Other than the lapic entry, which doesn't appear to be particularly
interesting, no information that wasn't in the dmesg output.
(The hardware is now i8254 because I changed it; it started as TSC.)
--
David A. Holland
dholland@netbsd.org
From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 4 Aug 2022 08:26:21 +0200
As for the unstable TSC on AMD APU family 15h:
https://patchwork.kernel.org/project/kvm/patch/1406800033-13404-2-git-send-email-imammedo@redhat.com/
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: "matthew green" <mrg@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56322 CVS commit: src/sys/arch/x86/x86
Date: Mon, 19 Feb 2024 09:22:32 +0000
Module Name: src
Committed By: mrg
Date: Mon Feb 19 09:22:31 UTC 2024
Modified Files:
src/sys/arch/x86/x86: tsc.c
Log Message:
make TSC get a quality of -100 on AMD Family 15h and 16h
this should "fix" PR#56322 and is known as AMD errata
"778: Processor Core Time Stamp Counters May Experience Drift"
To generate a diff of this commit:
cvs rdiff -u -r1.58 -r1.59 src/sys/arch/x86/x86/tsc.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56322 CVS commit: [netbsd-10] src/sys/arch/x86/x86
Date: Wed, 2 Oct 2024 18:24:35 +0000
Module Name: src
Committed By: martin
Date: Wed Oct 2 18:24:35 UTC 2024
Modified Files:
src/sys/arch/x86/x86 [netbsd-10]: tsc.c
Log Message:
Pull up following revision(s) (requested by rin in ticket #915):
sys/arch/x86/x86/tsc.c: revision 1.59
sys/arch/x86/x86/tsc.c: revision 1.60
make TSC get a quality of -100 on AMD Family 15h and 16h
this should "fix" PR#56322 and is known as AMD errata
"778: Processor Core Time Stamp Counters May Experience Drift"
remove unintended printf() in previous. (thx dh)
To generate a diff of this commit:
cvs rdiff -u -r1.57 -r1.57.4.1 src/sys/arch/x86/x86/tsc.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.