NetBSD Problem Report #56322

From dholland@netbsd.org  Thu Jul 22 00:13:47 2021
Return-Path: <dholland@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 009361A921F
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 22 Jul 2021 00:13:46 +0000 (UTC)
Message-Id: <20210722001346.C1F2E84E8A@mail.netbsd.org>
Date: Thu, 22 Jul 2021 00:13:46 +0000 (UTC)
From: dholland@NetBSD.org
Reply-To: dholland@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: Excessive clock drift
X-Send-Pr-Version: 3.95

>Number:         56322
>Category:       kern
>Synopsis:       Excessive clock drift
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jul 22 00:15:00 +0000 2021
>Last-Modified:  Mon Feb 19 09:25:01 +0000 2024
>Originator:     David A. Holland
>Release:        NetBSD 9.99.85 (20210623)
>Organization:
>Environment:
System: NetBSD valkyrie 9.99.85 NetBSD 9.99.85 (VALKYRIE) #7: Wed Jun 23 18:32:25 EDT 2021  dholland@valkyrie:/usr/src/sys/arch/amd64/compile/VALKYRIE amd64
Architecture: x86_64
Machine: amd64
>Description:

Four days ago (and a couple hours) I noticed the clock was badly
behind, and synced it with ntpdate.

It is now on the order of 1:30 (that's one and a half minutes) behind,
by comparison to another machine that runs ntpd. This machine does
not, for no particular reason.

This is a regression - prior to updating to 9.99.85 the clock lagged
but nowhere near so much.

This is the relevant material I see in dmesg, which isn't much use.

[     1.000000] timecounter: Timecounters tick every 10.000 msec
[     1.000000] timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
[     1.000003] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
[     1.051023] timecounter: Timecounter "TSC" frequency 3517182870 Hz quality 3000

Before this update I used to get messages of the form
   autoconfiguration error: ERROR: 2607 cycle TSC drift observed

These have gone away, but I suspect the underlying problem is that the
TSC is bad and this is no longer being detected.

Machine is a 6-way AMD Family 15h.

I don't want to be fixing the time regularly because I'm sure it will
regularly remind me of PR 56097. I could turn on ntpd, but that's at
best a bandaid, and with this much drift it might just lose sync
anyway.

>How-To-Repeat:

>Fix:

>Audit-Trail:
From: Frank Kardel <kardel@kardel.name>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 22 Jul 2021 07:47:33 +0200

 Can you post

 sysctl kern.timecounter
 ?

 I assume the with the

 ERROR: 2607 cycle TSC drift observed

 TSC was not picked as timecounter on your system.

 Can you try

 sysctl -w kern.timecounter.hardware=i8254

 and see whether that improves the situation.

 I agree that ntpd is unlikely to curb that large drift.

 ntpd will on correct a drift up to 43 seconds per day (500ppm).

 Frank

 On 07/22/21 02:15, dholland@NetBSD.org wrote:
 >> Number:         56322
 >> Category:       kern
 >> Synopsis:       Excessive clock drift
 >> Confidential:   no
 >> Severity:       serious
 >> Priority:       medium
 >> Responsible:    kern-bug-people
 >> State:          open
 >> Class:          sw-bug
 >> Submitter-Id:   net
 >> Arrival-Date:   Thu Jul 22 00:15:00 +0000 2021
 >> Originator:     David A. Holland
 >> Release:        NetBSD 9.99.85 (20210623)
 >> Organization:
 >> Environment:
 > System: NetBSD valkyrie 9.99.85 NetBSD 9.99.85 (VALKYRIE) #7: Wed Jun 23 18:32:25 EDT 2021  dholland@valkyrie:/usr/src/sys/arch/amd64/compile/VALKYRIE amd64
 > Architecture: x86_64
 > Machine: amd64
 >> Description:
 > Four days ago (and a couple hours) I noticed the clock was badly
 > behind, and synced it with ntpdate.
 >
 > It is now on the order of 1:30 (that's one and a half minutes) behind,
 > by comparison to another machine that runs ntpd. This machine does
 > not, for no particular reason.
 >
 > This is a regression - prior to updating to 9.99.85 the clock lagged
 > but nowhere near so much.
 >
 > This is the relevant material I see in dmesg, which isn't much use.
 >
 > [     1.000000] timecounter: Timecounters tick every 10.000 msec
 > [     1.000000] timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
 > [     1.000003] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
 > [     1.051023] timecounter: Timecounter "TSC" frequency 3517182870 Hz quality 3000
 >
 > Before this update I used to get messages of the form
 >     autoconfiguration error: ERROR: 2607 cycle TSC drift observed
 >
 > These have gone away, but I suspect the underlying problem is that the
 > TSC is bad and this is no longer being detected.
 >
 > Machine is a 6-way AMD Family 15h.
 >
 > I don't want to be fixing the time regularly because I'm sure it will
 > regularly remind me of PR 56097. I could turn on ntpd, but that's at
 > best a bandaid, and with this much drift it might just lose sync
 > anyway.
 >
 >> How-To-Repeat:
 >> Fix:

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, dholland@NetBSD.org
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 22 Jul 2021 06:55:47 +0000

 On Thu, Jul 22, 2021 at 05:50:01AM +0000, Frank Kardel wrote:
  >  Can you post
  >  
  >  sysctl kern.timecounter
  >  ?
  >  
  >  I assume the with the
  >  
  >  ERROR: 2607 cycle TSC drift observed
  >  
  >  TSC was not picked as timecounter on your system.

 It was not then (afaik) but it was now. So,

  >  Can you try
  >  
  >  sysctl -w kern.timecounter.hardware=i8254
  >  
  >  and see whether that improves the situation.

 Yes.

 As of ~now when I did that it had dropped to two full minutes behind,
 in something like seven hours, so tomorrow morning should indicate
 whether the situation's better.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Frank Kardel <kardel@kardel.name>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 22 Jul 2021 10:10:51 +0200

 On 07/22/21 09:00, David Holland wrote:

 >   
 >    >  Can you try
 >    >
 >    >  sysctl -w kern.timecounter.hardware=i8254
 >    >
 >    >  and see whether that improves the situation.
 >   
 >   Yes.
 >   
 >   As of ~now when I did that it had dropped to two full minutes behind,
 >   in something like seven hours, so tomorrow morning should indicate
 >   whether the situation's better.
 >   
 120s / 25200 sec ~ 0.0048 = 4800ppm

 Something must be really wrong with TSC frequency estimation or 
 stability on that system.
 Also I would have expected a timecounter like hpet. Could that be 
 disabled in the BIOS?
 >   --
 >   David A. Holland
 >   dholland@netbsd.org
 >   
 Frank

From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, dholland@NetBSD.org
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 22 Jul 2021 12:26:11 +0200

 On Thu, Jul 22, 2021 at 05:50:01AM +0000, Frank Kardel wrote:
 >  Can you try
 >  
 >  sysctl -w kern.timecounter.hardware=i8254

 On x86, you normally have much better options available. HPET or the PCI
 hostbridge (which the ACPI time counter also uses) are much faster and
 have a higher resolution.

 Joerg

From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 22 Jul 2021 17:02:47 +0200

 True, but what was listed in dmesg output by dholland@ was

 This is the relevant material I see in dmesg, which isn't much use.

 [     1.000000] timecounter: Timecounters tick every 10.000 msec
 [     1.000000] timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
 [     1.000003] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
 [     1.051023] timecounter: Timecounter "TSC" frequency 3517182870 Hz quality 3000

 So I had to propose the next known workable alternative instead of timecounters that
 haven't been detected :-). HPET is sometimes disabled in BIOS.
 So far we do not know the output of "sysctl kern.timercounter".

 Frank

 On 07/22/21 12:30, Joerg Sonnenberger wrote:
 > The following reply was made to PR kern/56322; it has been noted by GNATS.
 >
 > From: Joerg Sonnenberger <joerg@bec.de>
 > To: gnats-bugs@netbsd.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 > 	netbsd-bugs@netbsd.org, dholland@NetBSD.org
 > Subject: Re: kern/56322: Excessive clock drift
 > Date: Thu, 22 Jul 2021 12:26:11 +0200
 >
 >   On Thu, Jul 22, 2021 at 05:50:01AM +0000, Frank Kardel wrote:
 >   >  Can you try
 >   >
 >   >  sysctl -w kern.timecounter.hardware=i8254
 >   
 >   On x86, you normally have much better options available. HPET or the PCI
 >   hostbridge (which the ACPI time counter also uses) are much faster and
 >   have a higher resolution.
 >   
 >   Joerg
 >   

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 22 Jul 2021 23:29:25 +0000

 On Thu, Jul 22, 2021 at 07:00:03AM +0000, David Holland wrote:
  >   >  Can you try
  >   >  
  >   >  sysctl -w kern.timecounter.hardware=i8254
  >   >  
  >   >  and see whether that improves the situation.
  >  
  >  Yes.
  >  
  >  As of ~now when I did that it had dropped to two full minutes behind,
  >  in something like seven hours, so tomorrow morning should indicate
  >  whether the situation's better.

 Drift since then has been ~0, so the issue reduces to: why are we no
 longer detecting that the TSC is bad?

 -- 
 David A. Holland
 dholland@netbsd.org

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 22 Jul 2021 23:30:32 +0000

 On Thu, Jul 22, 2021 at 03:05:01PM +0000, Frank Kardel wrote:
  > So I had to propose the next known workable alternative instead of
  > timecounters that haven't been detected :-). HPET is sometimes
  > disabled in BIOS.

 Like I said, I'll check that next time I reboot this machine, which
 isn't ordinarily very often.

  >  So far we do not know the output of "sysctl kern.timercounter".

 Like I already said, it was using the bad HPET.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Frank Kardel <kardel@kardel.name>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, dholland@NetBSD.org
Cc: 
Subject: Re: kern/56322: Excessive clock drift
Date: Fri, 23 Jul 2021 09:22:16 +0200

 Did you mean to say HPET? I thought it was using TSC?

 I am interested in the value of kern.timecounter.choice like

 in the output below

 EXAMPLE:

 kern.timecounter.choice = TSC(q=3000, f=2200001000 Hz) 
 clockinterrupt(q=0, f=100 Hz) lapic(q=-100, f=25000000 Hz) hpet0(q=2000, 
 f=24000000 Hz) ACPI-Fast(q=1000, f=3579545 Hz) i8254(q=100, f=1193182 
 Hz) dummy(q=-1000000, f=1000000 Hz)
 kern.timecounter.hardware = TSC
 kern.timecounter.timestepwarnings = 1

 Frank


 On 07/23/21 01:35, David Holland wrote:
 > The following reply was made to PR kern/56322; it has been noted by GNATS.
 >
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: kern/56322: Excessive clock drift
 > Date: Thu, 22 Jul 2021 23:30:32 +0000
 >
 >   On Thu, Jul 22, 2021 at 03:05:01PM +0000, Frank Kardel wrote:
 >    > So I had to propose the next known workable alternative instead of
 >    > timecounters that haven't been detected :-). HPET is sometimes
 >    > disabled in BIOS.
 >   
 >   Like I said, I'll check that next time I reboot this machine, which
 >   isn't ordinarily very often.
 >   
 >    >  So far we do not know the output of "sysctl kern.timercounter".
 >   
 >   Like I already said, it was using the bad HPET.
 >   
 >   --
 >   David A. Holland
 >   dholland@netbsd.org
 >   

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, dholland@NetBSD.org
Subject: Re: kern/56322: Excessive clock drift
Date: Sat, 24 Jul 2021 03:12:24 +0000

 On Fri, Jul 23, 2021 at 07:25:01AM +0000, Frank Kardel wrote:
  >  Did you mean to say HPET? I thought it was using TSC?

 Er, yeah.

  >  I am interested in the value of kern.timecounter.choice like
  >  
  >  in the output below
  >  
  >  EXAMPLE:
  >  
  >  kern.timecounter.choice = TSC(q=3000, f=2200001000 Hz) 
  >  clockinterrupt(q=0, f=100 Hz) lapic(q=-100, f=25000000 Hz) hpet0(q=2000, 
  >  f=24000000 Hz) ACPI-Fast(q=1000, f=3579545 Hz) i8254(q=100, f=1193182 
  >  Hz) dummy(q=-1000000, f=1000000 Hz)
  >  kern.timecounter.hardware = TSC
  >  kern.timecounter.timestepwarnings = 1

 kern.timecounter.choice = TSC(q=3000, f=3517182870 Hz) clockinterrupt(q=0, f=100 Hz) lapic(q=-100, f=200982000 Hz) i8254(q=100, f=1193182 Hz) dummy(q=-1000000, f=1000000 Hz)
 kern.timecounter.hardware = i8254
 kern.timecounter.timestepwarnings = 0

 Other than the lapic entry, which doesn't appear to be particularly
 interesting, no information that wasn't in the dmesg output.

 (The hardware is now i8254 because I changed it; it started as TSC.)

 -- 
 David A. Holland
 dholland@netbsd.org

From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56322: Excessive clock drift
Date: Thu, 4 Aug 2022 08:26:21 +0200

 As for the unstable TSC on AMD APU family 15h:

 https://patchwork.kernel.org/project/kvm/patch/1406800033-13404-2-git-send-email-imammedo@redhat.com/

 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: "matthew green" <mrg@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56322 CVS commit: src/sys/arch/x86/x86
Date: Mon, 19 Feb 2024 09:22:32 +0000

 Module Name:	src
 Committed By:	mrg
 Date:		Mon Feb 19 09:22:31 UTC 2024

 Modified Files:
 	src/sys/arch/x86/x86: tsc.c

 Log Message:
 make TSC get a quality of -100 on AMD Family 15h and 16h

 this should "fix" PR#56322 and is known as AMD errata
 "778: Processor Core Time Stamp Counters May Experience Drift"


 To generate a diff of this commit:
 cvs rdiff -u -r1.58 -r1.59 src/sys/arch/x86/x86/tsc.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.