NetBSD Problem Report #56002
From mlelstv@arnold.localdomain Sun Feb 21 09:51:21 2021
Return-Path: <mlelstv@arnold.localdomain>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 8246D1A921F
for <gnats-bugs@gnats.NetBSD.org>; Sun, 21 Feb 2021 09:51:21 +0000 (UTC)
Message-Id: <20210221095119.30C7FBC13A@arnold.localdomain>
Date: Sun, 21 Feb 2021 09:51:19 +0000 (UTC)
From: mlelstv@serpens.de
Reply-To: mlelstv@serpens.de
To: gnats-bugs@NetBSD.org
Subject: lockstat shows huge times on aarch64
X-Send-Pr-Version: 3.95
>Number: 56002
>Category: kern
>Synopsis: lockstat shows huge times on aarch64
>Confidential: no
>Severity: serious
>Priority: low
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Feb 21 09:55:00 +0000 2021
>Last-Modified: Mon Feb 22 09:30:02 +0000 2021
>Originator: mlelstv@serpens.de
>Release: NetBSD 9.99.80
>Organization:
>Environment:
System: NetBSD arnold 9.99.80 NetBSD 9.99.80 (EGGHED64) #178: Sun Feb 21 07:34:53 UTC 2021 mlelstv@slowpoke:/home/netbsd-current/obj.evbarm64-el/scratch/netbsd-current/src/sys/arch/evbarm/compile/EGGHED64 evbarm
Architecture: aarch64
Machine: evbarm
>Description:
lockstat may report bogus lock durations like:
Total% Count Time/ms Lock Caller
------ ------- --------- ---------------------- ------------------------------
100.00 21134 36893495941.20 ffff0001ffac8100 <all>
0.00 11095 5162.87 ffff0001ffac8100 sopoll+28
The Time/ms value is calculated from a time value of ULONG_MAX, but
which is only the result of a large uint64_t value that is multiplied by
a double and exceeds the uint64_t range. The value is clamped to ULONG_MAX.
The large value comes directly from the lockstat driver. It summarizes e.g.
lock spin times with
LOCKSTART_START_TIMER(flag, var)
spin
LOCKSTART_STOP_TIMER(flag, var)
At start the cpu_counter() value is substracted from var.
At stop the cpu_counter() value is added to var.
This works fine if variable and cpu_counter() use the full integer range,
also when the integer type overflows.
For aarch64 however we use:
#define cpu_counter() cpu_counter32()
but still compute with (zero-extended) uint64_t values.
This causes a discontinuity when the 32bit value overflows and lockstat
accumulates then negative values.
>How-To-Repeat:
Use lockstat on aarch64.
>Fix:
Find a 64bit cpu_counter.
Emulate a 64bit cpu_counter by tracking overflows.
Multiply the cpu_counter() (and cpu_frequency()) values by 2^32
>Audit-Trail:
From: "Ryo Shimizu" <ryo@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56002 CVS commit: src/sys/arch/aarch64/include
Date: Mon, 22 Feb 2021 09:29:39 +0000
Module Name: src
Committed By: ryo
Date: Mon Feb 22 09:29:38 UTC 2021
Modified Files:
src/sys/arch/aarch64/include: cpu_counter.h
Log Message:
PR/56002: aarch64 has a true 64bit CPU cycle counter, we will use it.
This fix solves PR/56002 on aarch64, but this problems can occur on
all other architectures where cpu_counter() is 32bit.
To generate a diff of this commit:
cvs rdiff -u -r1.1 -r1.2 src/sys/arch/aarch64/include/cpu_counter.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.