NetBSD Problem Report #56002

From mlelstv@arnold.localdomain  Sun Feb 21 09:51:21 2021
Return-Path: <mlelstv@arnold.localdomain>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 8246D1A921F
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 21 Feb 2021 09:51:21 +0000 (UTC)
Message-Id: <20210221095119.30C7FBC13A@arnold.localdomain>
Date: Sun, 21 Feb 2021 09:51:19 +0000 (UTC)
From: mlelstv@serpens.de
Reply-To: mlelstv@serpens.de
To: gnats-bugs@NetBSD.org
Subject: lockstat shows huge times on aarch64
X-Send-Pr-Version: 3.95

>Number:         56002
>Category:       kern
>Synopsis:       lockstat shows huge times on aarch64
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Feb 21 09:55:00 +0000 2021
>Last-Modified:  Mon Feb 22 09:30:02 +0000 2021
>Originator:     mlelstv@serpens.de
>Release:        NetBSD 9.99.80
>Organization:

>Environment:


System: NetBSD arnold 9.99.80 NetBSD 9.99.80 (EGGHED64) #178: Sun Feb 21 07:34:53 UTC 2021 mlelstv@slowpoke:/home/netbsd-current/obj.evbarm64-el/scratch/netbsd-current/src/sys/arch/evbarm/compile/EGGHED64 evbarm
Architecture: aarch64
Machine: evbarm
>Description:

lockstat may report bogus lock durations like:

Total%  Count   Time/ms          Lock                       Caller
------ ------- --------- ---------------------- ------------------------------
100.00   21134 36893495941.20 ffff0001ffac8100       <all>
  0.00   11095   5162.87 ffff0001ffac8100       sopoll+28

The Time/ms value is calculated from a time value of ULONG_MAX, but
which is only the result of a large uint64_t value that is multiplied by
a double and exceeds the uint64_t range. The value is clamped to ULONG_MAX.

The large value comes directly from the lockstat driver. It summarizes e.g.
lock spin times with

LOCKSTART_START_TIMER(flag, var)
spin
LOCKSTART_STOP_TIMER(flag, var)

At start the cpu_counter() value is substracted from var.
At stop the cpu_counter() value is added to var.

This works fine if variable and cpu_counter() use the full integer range,
also when the integer type overflows.

For aarch64 however we use:

#define cpu_counter() cpu_counter32()

but still compute with (zero-extended) uint64_t values.

This causes a discontinuity when the 32bit value overflows and lockstat
accumulates then negative values.

>How-To-Repeat:

Use lockstat on aarch64.

>Fix:

Find a 64bit cpu_counter.
Emulate a 64bit cpu_counter by tracking overflows.
Multiply the cpu_counter() (and cpu_frequency()) values by 2^32

>Audit-Trail:
From: "Ryo Shimizu" <ryo@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56002 CVS commit: src/sys/arch/aarch64/include
Date: Mon, 22 Feb 2021 09:29:39 +0000

 Module Name:	src
 Committed By:	ryo
 Date:		Mon Feb 22 09:29:38 UTC 2021

 Modified Files:
 	src/sys/arch/aarch64/include: cpu_counter.h

 Log Message:
 PR/56002: aarch64 has a true 64bit CPU cycle counter, we will use it.

 This fix solves PR/56002 on aarch64, but this problems can occur on
 all other architectures where cpu_counter() is 32bit.


 To generate a diff of this commit:
 cvs rdiff -u -r1.1 -r1.2 src/sys/arch/aarch64/include/cpu_counter.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.