NetBSD Problem Report #59691
From www@netbsd.org Sun Oct 5 12:23:56 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id C85401A923C
for <gnats-bugs@gnats.NetBSD.org>; Sun, 5 Oct 2025 12:23:55 +0000 (UTC)
Message-Id: <20251005122354.76F691A923E@mollari.NetBSD.org>
Date: Sun, 5 Oct 2025 12:23:54 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: tstohz(9) fails to round up on some inputs
X-Send-Pr-Version: www-1.0
>Number: 59691
>Notify-List: bsiegert@NetBSD.org
>Category: kern
>Synopsis: tstohz(9) fails to round up on some inputs
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: riastradh
>State: pending-pullups
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Oct 05 12:25:01 +0000 2025
>Closed-Date:
>Last-Modified: Sat Oct 18 22:50:34 +0000 2025
>Originator: Taylor R Campbell
>Release: current, 11, 10, 9, ...
>Organization:
The NetTSD Roundation
>Environment:
>Description:
tstohz(9) is supposed to return the number of ticks needed to
sleep before a certain duration, like tvtohz(9) but for struct
timespec rather than struct timeval.
However, it works by first rounding the struct timespec
(nanosecond precision) _down_ to a struct timeval (microsecond
precision), and then calling tvtohz(9). So, for example, while
tvtohz(0.000001 sec) gives 2 ticks (one full tick, plus
whatever time is left between now and the next hardclock tick,
which may be time epsilon away), tstohz(0.000000001 sec) gives
0 ticks (tvtohz returns 0 ticks for 0 time).
This is a likely cause of one source of infinite loops in
timers leading to heartbeat panics:
PR kern/59339: heartbeat watchdog fires since 10.99.14
https://gnats.NetBSD.org/59339
PR kern/59465: Recurring kernel panic with -current (10.99.14):
"heart stopped beating"
https://gnats.NetBSD.org/59465
PR kern/59679: Multiple "heart stopped beating" / "softints
stuck for 16 seconds" panics
https://gnats.NetBSD.org/59679
>How-To-Repeat:
tvtohz(&(struct timespec){.tv_sec = 0, .tv_nsec = 1})
>Fix:
in progress
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->riastradh
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Sun, 05 Oct 2025 14:07:45 +0000
Responsible-Changed-Why:
mine, patch in progress (part of PR 59339 patches)
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59691 CVS commit: src/tests/kernel
Date: Sun, 5 Oct 2025 18:46:26 +0000
Module Name: src
Committed By: riastradh
Date: Sun Oct 5 18:46:26 UTC 2025
Modified Files:
src/tests/kernel: t_time_arith.c
Log Message:
tvtohz(9): Add some automatic tests.
Preparation for:
PR kern/59691: tstohz(9) fails to round up on some inputs
To generate a diff of this commit:
cvs rdiff -u -r1.3 -r1.4 src/tests/kernel/t_time_arith.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59691 CVS commit: src
Date: Sun, 5 Oct 2025 18:51:50 +0000
Module Name: src
Committed By: riastradh
Date: Sun Oct 5 18:51:50 UTC 2025
Modified Files:
src/sys/kern: subr_time.c subr_time_arith.c
src/tests/kernel: t_time_arith.c
Log Message:
tstohz(9): Add some automatic tests.
Move this from subr_time.c to subr_time_arith.c to facilitate them.
PR kern/59691: tstohz(9) fails to round up on some inputs
To generate a diff of this commit:
cvs rdiff -u -r1.41 -r1.42 src/sys/kern/subr_time.c
cvs rdiff -u -r1.3 -r1.4 src/sys/kern/subr_time_arith.c
cvs rdiff -u -r1.4 -r1.5 src/tests/kernel/t_time_arith.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sun, 05 Oct 2025 20:03:14 +0000
State-Changed-Why:
Plus this commit, which didn't get appended here, possibly because it
references too many PRs:
https://mail-index.netbsd.org/source-changes/2025/10/05/msg158450.html
Module Name: src
Committed By: riastradh
Date: Sun Oct 5 18:54:02 UTC 2025
Modified Files:
src/sys/kern: subr_time_arith.c
src/tests/kernel: t_time_arith.c
Log Message:
tstohz(9): Round up, not down.
This is used for timeouts, and it is bad if it returns a timeout that
is too short, particularly if it rounds `wait a little' to `don't
wait at all'.
This still has some substantial rounding errors, e.g. at hz=8191 with
a period of just over 122 085 ns, tstohz(122 084 ns) this returns 3
when it should return 2 because it rounds up to tvtohz(123 us) for
which 3 is the correct answer -- but at least it's still rounding up.
I'll leave those as xfail for now.
PR kern/59691: tstohz(9) fails to round up on some inputs
This was likely the underlying cause of various heartbeat panics
users have been seeing -- I hypothesize that for short timeouts that
reschedule themselves, the itimer callout would call itself in a loop
and never return from callout_softclock because of this rounding
down:
PR kern/59339: heartbeat watchdog fires since 10.99.14
PR kern/59465: Recurring kernel panic with -current (10.99.14):
"heart stopped beating"
PR kern/59679: Multiple "heart stopped beating" / "softints stuck for
16 seconds" panics
To generate a diff of this commit:
cvs rdiff -u -r1.4 -r1.5 src/sys/kern/subr_time_arith.c
cvs rdiff -u -r1.5 -r1.6 src/tests/kernel/t_time_arith.c
Note: some large rounding errors remain in tstohz, but at least now the
direction of rounding is correct.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59691 CVS commit: src/tests/kernel
Date: Mon, 6 Oct 2025 12:05:04 +0000
Module Name: src
Committed By: riastradh
Date: Mon Oct 6 12:05:04 UTC 2025
Modified Files:
src/tests/kernel: t_time_arith.c
Log Message:
tstohz(9): Fix missing digit in three test cases.
Tripped on i386 testbed but not on amd64, and generally on LP32 but
not on LP64, because tvohz chooses branches differently depending on
LONG_MAX.
PR kern/59691: tstohz(9) fails to round up on some inputs
To generate a diff of this commit:
cvs rdiff -u -r1.6 -r1.7 src/tests/kernel/t_time_arith.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: David Brownlee <abs@absd.org>
To: gnats-bugs@netbsd.org
Cc: riastradh@netbsd.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
campbell+netbsd@mumble.net
Subject: Re: kern/59691 (tstohz(9) fails to round up on some inputs)
Date: Tue, 7 Oct 2025 18:35:12 +0100
Am running netbsd-11 with the following applied. No recurrence of
kern/59679 yet (All expected, but on the assumption that additional
testing is not unwelcome)
cvs rdiff -kk -u -r1.41 -r1.42 src/sys/kern/subr_time.c
cvs rdiff -kk -u -r1.3 -r1.4 src/sys/kern/subr_time_arith.c
cvs rdiff -kk -u -r1.4 -r1.5 src/sys/kern/subr_time_arith.c
Thanks
State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sat, 18 Oct 2025 22:50:34 +0000
State-Changed-Why:
pullup-11 #57
TBD: pullup-10 and pullup-9, if applicable
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.