NetBSD Problem Report #59339
From www@netbsd.org Mon Apr 21 19:39:08 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id C5C861A9239
for <gnats-bugs@gnats.NetBSD.org>; Mon, 21 Apr 2025 19:39:08 +0000 (UTC)
Message-Id: <20250421193907.9F5101A923D@mollari.NetBSD.org>
Date: Mon, 21 Apr 2025 19:39:07 +0000 (UTC)
From: prlw1@cam.ac.uk
Reply-To: prlw1@cam.ac.uk
To: gnats-bugs@NetBSD.org
Subject: heartbeat watchdog fires since 10.99.14
X-Send-Pr-Version: www-1.0
>Number: 59339
>Notify-List: riastradh@NetBSD.org, wiz@NetBSD.org, bsiegert@NetBSD.org
>Category: kern
>Synopsis: heartbeat watchdog fires since 10.99.14
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Apr 21 19:40:00 +0000 2025
>Closed-Date: Mon Oct 06 13:21:41 +0000 2025
>Last-Modified: Mon Oct 06 16:15:02 +0000 2025
>Originator: Patrick Welche
>Release: NetBSD-10.99.14/amd64
>Organization:
>Environment:
src of 8:05 UTC 17 April 2025
>Description:
Possibly related to other hangs in the thread
https://mail-index.netbsd.org/current-users/2025/04/17/msg046436.html
A formerly stable laptop has had two heartbeat panics
System panicked: cpu0: softints stuck for 16 seconds
crash> bt
end() at 0
kern_reboot() at kern_reboot+0x93
vpanic() at vpanic+0x16b
panic() at vprintf
heartbeat() at heartbeat+0x1f2
hardclock() at hardclock+0x9c
Xresume_lapic_ltimer() at Xresume_lapic_ltimer+0x1e
--- interrupt ---
mutex_spin_exit() at mutex_spin_exit+0x5a
callout_softclock() at callout_softclock+0xad
softint_dispatch() at softint_dispatch+0x8f
crash> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
2917 > 2917 7 0 8060000 ffff8052e4a14000 tar
0 > 5 7 0 200 ffff8055abee1c00 softclk/0
System panicked: cpu1[27 softclk/1]: heart stopped beating
crash> bt
end() at 0
kern_reboot() at kern_reboot+0x93
vpanic() at vpanic+0x16b
panic() at vprintf
defibrillator() at defibrillator+0x6b
ipi_msg_cpu_handler() at ipi_msg_cpu_handler+0x50
ipi_cpu_handler() at ipi_cpu_handler+0x73
x86_ipi_handler() at x86_ipi_handler+0x6b
Xresume_lapic_ipi() at Xresume_lapic_ipi+0x18
--- interrupt ---
getnanotime() at getnanotime+0x53
tshzto() at tshzto+0x1b
itimer_callout() at itimer_callout+0xb1
callout_softclock() at callout_softclock+0xad
softint_dispatch() at softint_dispatch+0x8f
crash: _kvm_kvatop(ffff9704a471b0b8)
crash: kvm_read(0xffff9704a471b0b8, 8): invalid translation (invalid PTE)
crash> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
1266 > 1266 7 0 8020100 ffff81f02e1a7800 cron
992 > 992 7 1 8020000 ffff81f026788c00 X
857 > 857 7 2 8020100 ffff81f027510400 xdm
0 > 27 7 1 200 ffff81f020f56400 softclk/1
I can't think of any similarities between the two situations.
>How-To-Repeat:
Not obviously reproducible, e.g., the untar during which the first
hang happened succeeded on reboot. firefox was running the first, not
the second, in fact not much was going on in the second.
>Fix:
>Release-Note:
>Audit-Trail:
From: matthew green <mrg@eterna23.net>
To: gnats-bugs@netbsd.org, prlw1@cam.ac.uk
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Tue, 22 Apr 2025 07:54:55 +1000
> System panicked: cpu0: softints stuck for 16 seconds
this means cpu0 is locked up, and some other cpu detected it and
crashed. the stack below is not the interesting cpu, but you
found the relevant LWPs to inspect:
> crash> bt
> end() at 0
> kern_reboot() at kern_reboot+0x93
> vpanic() at vpanic+0x16b
> panic() at vprintf
> heartbeat() at heartbeat+0x1f2
> hardclock() at hardclock+0x9c
> Xresume_lapic_ltimer() at Xresume_lapic_ltimer+0x1e
> --- interrupt ---
> mutex_spin_exit() at mutex_spin_exit+0x5a
> callout_softclock() at callout_softclock+0xad
> softint_dispatch() at softint_dispatch+0x8f
> crash> ps
> PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
> 2917 > 2917 7 0 8060000 ffff8052e4a14000 tar
> 0 > 5 7 0 200 ffff8055abee1c00 softclk/0
can you do "bt/a ffff8052e4a14000" and "bt/a ffff8055abee1c00"?
or with the other crash, any process on the cpu reported (always
cpu0, i think?) with the ">" state like above (ie, running.)
i expect the above will show that softclk/0 has fast switched
the tar process (ie, softclk/0 bt may end up being the same as
tar with some additional frames.) normally, there should only
be one active LWP per cpu, but fast softints do.
thanks.
.mrg.
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org, matthew green <mrg@eterna23.net>
Cc:
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Tue, 22 Apr 2025 10:29:26 +0100
On Mon, Apr 21, 2025 at 10:00:02PM +0000, matthew green via gnats wrote:
> The following reply was made to PR kern/59339; it has been noted by GNATS.
>
> From: matthew green <mrg@eterna23.net>
> To: gnats-bugs@netbsd.org, prlw1@cam.ac.uk
> Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org
> Subject: re: kern/59339: heartbeat watchdog fires since 10.99.14
> Date: Tue, 22 Apr 2025 07:54:55 +1000
>
> > System panicked: cpu0: softints stuck for 16 seconds
>
> this means cpu0 is locked up, and some other cpu detected it and
> crashed. the stack below is not the interesting cpu, but you
> found the relevant LWPs to inspect:
>
> > crash> bt
> > end() at 0
> > kern_reboot() at kern_reboot+0x93
> > vpanic() at vpanic+0x16b
> > panic() at vprintf
> > heartbeat() at heartbeat+0x1f2
> > hardclock() at hardclock+0x9c
> > Xresume_lapic_ltimer() at Xresume_lapic_ltimer+0x1e
> > --- interrupt ---
> > mutex_spin_exit() at mutex_spin_exit+0x5a
> > callout_softclock() at callout_softclock+0xad
> > softint_dispatch() at softint_dispatch+0x8f
> > crash> ps
> > PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
> > 2917 > 2917 7 0 8060000 ffff8052e4a14000 tar
> > 0 > 5 7 0 200 ffff8055abee1c00 softclk/0
>
> can you do "bt/a ffff8052e4a14000" and "bt/a ffff8055abee1c00"?
tar:
crash> bt/a ffff8052e4a14000
crash: _kvm_kvatop(7f7fffffe068)
crash: kvm_read(0x7f7fffffe068, 8): invalid translation (invalid level 4 PDE)
trace: pid 2917 lid 2917
softclk:
crash> bt/a ffff8055abee1c00
trace: pid 0 lid 5 at 0xffffc404a4608f90
callout_softclock() at callout_softclock+0xad
softint_dispatch() at softint_dispatch+0x8f
> or with the other crash, any process on the cpu reported (always
> cpu0, i think?) with the ">" state like above (ie, running.)
cron:
crash> bt/a ffff81f02e1a7800
trace: pid 1266 lid 1266 at 0xffff9704b6f54e90
sys___nanosleep50() at sys___nanosleep50+0x46
syscall() at syscall+0x95
--- syscall (number 430) ---
syscall+0x95:
X:
crash> bt/a ffff81f026788c00
trace: pid 992 lid 992 at 0xffff9704b65847a0
uvm_pglistalloc() at uvm_pglistalloc+0x9de
_bus_dmamem_alloc_range.constprop.0() at _bus_dmamem_alloc_range.constprop.0+0x6
4
bus_dmamem_alloc() at bus_dmamem_alloc+0x78
i915_gem_object_get_pages_internal() at i915_gem_object_get_pages_internal+0xad
____i915_gem_object_get_pages() at ____i915_gem_object_get_pages+0x1f
__i915_gem_object_get_pages() at __i915_gem_object_get_pages+0x45
pool_active() at pool_active+0x68
i915_active_acquire() at i915_active_acquire+0x7d
intel_engine_get_pool() at intel_engine_get_pool+0x223
i915_gem_do_execbuffer() at i915_gem_do_execbuffer+0xc71
i915_gem_execbuffer2_ioctl() at i915_gem_execbuffer2_ioctl+0x1be
drm_ioctl() at drm_ioctl+0x241
drm_ioctl_shim() at drm_ioctl_shim+0x30
sys_ioctl() at sys_ioctl+0x4ae
syscall() at syscall+0x95
--- syscall (number 54) ---
syscall+0x95:
xdm:
crash> bt/a ffff81f027510400
trace: pid 857 lid 857 at 0x8
_KERNEL_OPT_DDB_HISTORY_SIZE() at _KERNEL_OPT_DDB_HISTORY_SIZE+0x46
crash: _kvm_kvatop(10)
crash: kvm_read(0x10, 8): invalid translation (invalid level 4 PDE)
softclk:
crash> bt/a ffff81f020f56400
trace: pid 0 lid 27 at 0x680692b5
__LARGE_PAGE_SIZE() at 6aa46bf
crash: _kvm_kvatop(680692bd)
crash: kvm_read(0x680692bd, 8): invalid translation (invalid level 4 PDE)
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Thu, 8 May 2025 07:40:56 +0100
See another trace from wiz in
https://mail-index.netbsd.org/current-users/2025/05/06/msg046492.html
I just had another one with 10.99.14 source of May 6 16:43 UTC
essentially while booting. Hadn't reached the xdm prompt yet.
crash> bt
end() at 0
kern_reboot() at kern_reboot+0x93
vpanic() at vpanic+0x16b
panic() at vprintf
defibrillator() at defibrillator+0x6b
ipi_msg_cpu_handler() at ipi_msg_cpu_handler+0x50
ipi_cpu_handler() at ipi_cpu_handler+0x73
x86_ipi_handler() at x86_ipi_handler+0x6b
Xresume_lapic_ipi() at Xresume_lapic_ipi+0x18
--- interrupt ---
itimer_transition() at itimer_transition+0x1f8
itimer_callout() at itimer_callout+0x5e
callout_softclock() at callout_softclock+0xad
softint_dispatch() at softint_dispatch+0x8f
crash> ps | grep '>' | grep -v idle/
828 > 828 7 2 8020000 ffff86e2fa154400 X
0 > 98 7 2 200 ffff86e2f47d0c00 softclk/2
crash> show proc /a ffff86e2fa154400
lwp_t ffff86e2fa154400
X: pid 828 proc ffff86e2fb22e100 vmspace/map ffff86e2fb206800 flags 4200
> lwp 828 ffff86e2fa154400 pcb ffff9b84b6328000
stat 7 flags 8020000 cpu 2 pri 43 ref 0
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Patrick Welche <prlw1@welche.eu>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org, wiz@NetBSD.org
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Fri, 9 May 2025 00:03:52 +0000
This is a multi-part message in MIME format.
--=_3KgWU08dsAmnyfg6ET7fg3RAcRLpOHQE
If you catch a core dump with a stack trace in itimer_transition, can
you print all the locals and parameters in gdb?
I believe this all started happening shortly after
https://mail-index.NetBSD.org/source-changes/2025/04/01/msg156191.html
and all the stack traces have been in callout processing, including
one in the itimer_transition subroutine which I rewrote in that
commit, so it's likely to be related.
My guess is that there's some itimer parameter, or bad timing, which
causes itimer_transition to do something foolish -- that or the
conversion from the next timeout as a timespec to a duration in ticks,
in tshztoup/tshzto, is going wrong.
Can you please try the attached patch to check for some possible types
of foolishness earlier? If this confirms my hypothesis, it will turn
a silent lockup into a panic.
--=_3KgWU08dsAmnyfg6ET7fg3RAcRLpOHQE
Content-Type: text/plain; charset="ISO-8859-1"; name="pr59339-itimercalloutassert"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr59339-itimercalloutassert.patch"
# HG changeset patch
# User Taylor R Campbell <riastradh@NetBSD.org>
# Date 1746748216 0
# Thu May 08 23:50:16 2025 +0000
# Branch trunk
# Node ID a18d30166c2594b955c89dab60a2e459a49d39c0
# Parent 5a05c6d6e6c147293aba92413649d75270baf356
# EXP-Topic riastradh-pr59339-heartbeat
WIP: Assert that itimer is scheduled for future.
PR kern/59339: heartbeat watchdog fires since 10.99.14
diff -r 5a05c6d6e6c1 -r a18d30166c25 sys/kern/kern_time.c
--- a/sys/kern/kern_time.c Tue May 06 23:18:37 2025 +0000
+++ b/sys/kern/kern_time.c Thu May 08 23:50:16 2025 +0000
@@ -821,6 +821,7 @@ itimer_decr(struct itimer *it, int nsec)
static void
itimer_arm_real(struct itimer * const it)
{
+ int ticks;
=20
KASSERT(!it->it_dying);
KASSERT(!CLOCK_VIRTUAL_P(it->it_clockid));
@@ -830,10 +831,15 @@ itimer_arm_real(struct itimer * const it
* Don't need to check tshzto() return value, here.
* callout_schedule() does it for us.
*/
- callout_schedule(&it->it_ch,
- (it->it_clockid =3D=3D CLOCK_MONOTONIC
- ? tshztoup(&it->it_time.it_value)
- : tshzto(&it->it_time.it_value)));
+ ticks =3D (it->it_clockid =3D=3D CLOCK_MONOTONIC
+ ? tshztoup(&it->it_time.it_value)
+ : tshzto(&it->it_time.it_value));
+ KASSERTMSG(ticks > 0, "[%u] it->it_time.it_value=3D%lld.%09ld ticks=3D%d",
+ it->it_clockid,
+ (long long)it->it_time.it_value.tv_sec,
+ (long)it->it_time.it_value.tv_nsec,
+ ticks);
+ callout_schedule(&it->it_ch, ticks);
}
=20
/*
@@ -872,6 +878,18 @@ itimer_callout(void *arg)
* now, compute the next itimer value and count overruns.
*/
itimer_transition(&it->it_time, &now, &next, &overruns);
+ KASSERTMSG(timespeccmp(&now, &next, <),
+ "[clock %u]"
+ " it->it_time.it_value=3D%lld.%09ld"
+ " it->it_time.it_interval=3D%lld.%09ld"
+ " now=3D%lld.%09ld next=3D%lld.%09ld",
+ it->it_clockid,
+ (long long)it->it_time.it_value.tv_sec,
+ (long)it->it_time.it_value.tv_nsec,
+ (long long)it->it_time.it_interval.tv_sec,
+ (long)it->it_time.it_interval.tv_nsec,
+ (long long)now.tv_sec, (long)now.tv_nsec,
+ (long long)next.tv_sec, (long)next.tv_nsec);
it->it_time.it_value =3D next;
it->it_overruns +=3D overruns;
=20
--=_3KgWU08dsAmnyfg6ET7fg3RAcRLpOHQE--
State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Fri, 09 May 2025 00:06:02 +0000
State-Changed-Why:
feedback requested
From: Thomas Klausner <wiz@NetBSD.org>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: Patrick Welche <prlw1@welche.eu>, gnats-bugs@NetBSD.org
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Fri, 9 May 2025 10:55:03 +0200
On Fri, May 09, 2025 at 12:03:52AM +0000, Taylor R Campbell wrote:
> If you catch a core dump with a stack trace in itimer_transition, can
> you print all the locals and parameters in gdb?
>
> I believe this all started happening shortly after
> https://mail-index.NetBSD.org/source-changes/2025/04/01/msg156191.html
> and all the stack traces have been in callout processing, including
> one in the itimer_transition subroutine which I rewrote in that
> commit, so it's likely to be related.
>
> My guess is that there's some itimer parameter, or bad timing, which
> causes itimer_transition to do something foolish -- that or the
> conversion from the next timeout as a timespec to a duration in ticks,
> in tshztoup/tshzto, is going wrong.
>
> Can you please try the attached patch to check for some possible types
> of foolishness earlier? If this confirms my hypothesis, it will turn
> a silent lockup into a panic.
A kernel with this patch paniced twice in 10 minutes. I didn't even
have time to start mutt and send an email after the first panic.
crash says
System panicked: kernel diagnostic assertion "ticks > 0" failed: file "/usr/src/sys/kern/kern_time.c", line 837 [0] it->it_time.it_value=1746779983.758601200 ticks=0
and
System panicked: kernel diagnostic assertion "ticks > 0" failed: file "/usr/src/sys/kern/kern_time.c", line 837 [0] it->it_time.it_value=1746780327.140798025 ticks=0
Let me know what further information you need, or if you want me to
upload the crash dumps & debug kernels somewhere.
Thomas
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Thomas Klausner <wiz@NetBSD.org>
Cc: Patrick Welche <prlw1@welche.eu>, gnats-bugs@NetBSD.org
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Fri, 9 May 2025 11:55:27 +0000
> Date: Fri, 9 May 2025 10:55:03 +0200
> From: Thomas Klausner <wiz@NetBSD.org>
>=20
> A kernel with this patch paniced twice in 10 minutes. I didn't even
> have time to start mutt and send an email after the first panic.
>=20
> crash says
>=20
> System panicked: kernel diagnostic assertion "ticks > 0" failed: file "/u=
sr/src/sys/kern/kern_time.c", line 837 [0] it->it_time.it_value=3D174677998=
3.758601200 ticks=3D0
>=20
> and
>=20
> System panicked: kernel diagnostic assertion "ticks > 0" failed: file "/u=
sr/src/sys/kern/kern_time.c", line 837 [0] it->it_time.it_value=3D174678032=
7.140798025 ticks=3D0
>=20
> Let me know what further information you need, or if you want me to
> upload the crash dumps & debug kernels somewhere.
Can you get a stack trace from those? Meanwhile I will get an updated
patch either for more diagnostics or a candidate fix.
From: Thomas Klausner <wiz@NetBSD.org>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Fri, 9 May 2025 14:59:02 +0200
On Fri, May 09, 2025 at 11:55:27AM +0000, Taylor R Campbell wrote:
> > Date: Fri, 9 May 2025 10:55:03 +0200
> > From: Thomas Klausner <wiz@NetBSD.org>
> >
> > A kernel with this patch paniced twice in 10 minutes. I didn't even
> > have time to start mutt and send an email after the first panic.
> >
> > crash says
> >
> > System panicked: kernel diagnostic assertion "ticks > 0" failed: file "/usr/src/sys/kern/kern_time.c", line 837 [0] it->it_time.it_value=1746779983.758601200 ticks=0
> >
> > and
> >
> > System panicked: kernel diagnostic assertion "ticks > 0" failed: file "/usr/src/sys/kern/kern_time.c", line 837 [0] it->it_time.it_value=1746780327.140798025 ticks=0
> >
> > Let me know what further information you need, or if you want me to
> > upload the crash dumps & debug kernels somewhere.
>
> Can you get a stack trace from those? Meanwhile I will get an updated
> patch either for more diagnostics or a candidate fix.
Seems to have been the same problem both times:
(gdb) bt
#0 0xffffffff80239b85 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:709
#1 0xffffffff80de3ab5 in kern_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:91
#2 0xffffffff80e29873 in vpanic (fmt=0xffffffff814f1ea0 "kernel %sassertion \"%s\" failed: file \"%s\", line %d [%u] it->it_time.it_value=%lld.%09ld ticks=%d", ap=ap@entry=0xffff8ba48411de08)
at /usr/src/sys/kern/subr_prf.c:288
#3 0xffffffff80ff803c in kern_assert (fmt=fmt@entry=0xffffffff814f1ea0 "kernel %sassertion \"%s\" failed: file \"%s\", line %d [%u] it->it_time.it_value=%lld.%09ld ticks=%d")
at /usr/src/sys/lib/libkern/kern_assert.c:51
#4 0xffffffff80dff166 in itimer_arm_real (it=0xffff89c5007f1e80) at /usr/src/sys/kern/kern_time.c:837
#5 0xffffffff80e007eb in itimer_settime (it=it@entry=0xffff89c5007f1e80) at /usr/src/sys/kern/kern_time.c:946
#6 0xffffffff80e019de in dotimer_settime (timerid=4, value=value@entry=0xffff8ba48411df40, ovalue=ovalue@entry=0x0, flags=1, p=<optimized out>) at /usr/src/sys/kern/kern_time.c:1463
#7 0xffffffff80e01be0 in sys___timer_settime50 (l=0xffff89c4872ebc00, uap=0xffff8ba48411e000, retval=<optimized out>) at /usr/src/sys/kern/kern_time.c:1391
#8 0xffffffff805b7cc1 in sy_call (rval=0xffff8ba48411dfb0, uap=0xffff8ba48411e000, l=0xffff89c4872ebc00, sy=0xffffffff8188a550 <sysent+10704>) at /usr/src/sys/sys/syscallvar.h:65
#9 sy_invoke (code=446, rval=0xffff8ba48411dfb0, uap=0xffff8ba48411e000, l=0xffff89c4872ebc00, sy=0xffffffff8188a550 <sysent+10704>) at /usr/src/sys/sys/syscallvar.h:94
#10 syscall (frame=0xffff8ba48411e000) at /usr/src/sys/arch/x86/x86/syscall.c:137
#11 0xffffffff8021025d in handle_syscall ()
(gdb) fr 4
#4 0xffffffff80dff166 in itimer_arm_real (it=0xffffbb4807e73bc0) at /disk/storage-202410/archive/foreign/src/sys/kern/kern_time.c:837
837 KASSERTMSG(ticks > 0, "[%u] it->it_time.it_value=%lld.%09ld ticks=%d",
(gdb) p *it
$1 = {{it_real = {it_ch = {_c_store = {0xffffbb485b69c1c0, 0xffffffff819837e0 <cache_stat_callout>, 0xffffffff80dff1ff <itimer_callout>, 0xffffbb4807e73bc0, 0xffffffff818b7840 <callout_cpu0>, 0x10000007530, 0x11deeba1,
0x0, 0x0, 0x0}}, it_rtchgq = {le_next = 0x0, le_prev = 0x0}}, it_virtual = {it_vlist = 0xffffbb485b69c1c0, it_list = {le_next = 0xffffffff819837e0 <cache_stat_callout>,
le_prev = 0xffffffff80dff1ff <itimer_callout>}, it_active = 192}}, it_ops = 0xffffffff8138a470 <ptimer_itimer_ops>, it_time = {it_interval = {tv_sec = 0, tv_nsec = 0}, it_value = {tv_sec = 1746780327,
tv_nsec = 140798025}}, it_clockid = 0, it_overruns = 0, it_dying = false}
(gdb) p ticks
$2 = 0
(gdb) p it->it_clockid
$3 = 0
# so not CLOCK_MONOTONIC, which is 3
(gdb) p it->it_time.it_value
$4 = {tv_sec = 1746780327, tv_nsec = 140798025}
Anything else I can provide?
Thomas
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Thomas Klausner <wiz@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org,
Patrick Welche <prlw1@welche.eu>
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Fri, 9 May 2025 13:32:04 +0000
This is a multi-part message in MIME format.
--=_LIipI4/8TCpaNLGJjkojmsUX7yA77xBy
Content-Transfer-Encoding: quoted-printable
> Date: Fri, 9 May 2025 14:59:02 +0200
> From: Thomas Klausner <wiz@NetBSD.org>
>=20
> Seems to have been the same problem both times:
>=20
> (gdb) bt
> #4 0xffffffff80dff166 in itimer_arm_real (it=3D0xffff89c5007f1e80) at /u=
sr/src/sys/kern/kern_time.c:837
> #5 0xffffffff80e007eb in itimer_settime (it=3Dit@entry=3D0xffff89c5007f1=
e80) at /usr/src/sys/kern/kern_time.c:946
OK, that patch was a dud -- the assertion may not pass in that path,
but it should pass via itimer_callout. Try the attached one instead?
--=_LIipI4/8TCpaNLGJjkojmsUX7yA77xBy
Content-Type: text/plain; charset="ISO-8859-1"; name="pr59339-itimercalloutassert-v2"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr59339-itimercalloutassert-v2.patch"
# HG changeset patch
# User Taylor R Campbell <riastradh@NetBSD.org>
# Date 1746796090 0
# Fri May 09 13:08:10 2025 +0000
# Branch trunk
# Node ID 05a98ecf5eda0704c8b94356f22e97870c808378
# Parent 55a6ded9e19ed0908fbb94d13fa69cda6d0b4571
# EXP-Topic riastradh-pr59339-heartbeat
WIP: Rearming a timer should never go backwards. Assert it.
Handle overflow case gracefully while here (not likely relevant to
the symptoms we've seen, though).
PR kern/59339: heartbeat watchdog fires since 10.99.14
diff -r 55a6ded9e19e -r 05a98ecf5eda sys/kern/kern_time.c
--- a/sys/kern/kern_time.c Thu May 08 20:51:40 2025 +0000
+++ b/sys/kern/kern_time.c Fri May 09 13:08:10 2025 +0000
@@ -837,6 +837,55 @@ itimer_arm_real(struct itimer * const it
}
=20
/*
+ * itimer_rearm_real:
+ *
+ * Re-arm a non-virtual timer, given the current clock readout.
+ */
+static void
+itimer_rearm_real(struct itimer * const it, const struct timespec * const =
now)
+{
+ const struct timespec * const interval =3D &it->it_time.it_interval;
+ const struct timespec * const next =3D &it->it_time.it_value;
+ struct timespec delta;
+ int ticks;
+
+ KASSERT(!it->it_dying);
+ KASSERT(!CLOCK_VIRTUAL_P(it->it_clockid));
+ KASSERT(!callout_pending(&it->it_ch));
+
+ if (__predict_true(timespecisset(next))) {
+ KASSERTMSG(timespeccmp(now, next, <),
+ "[clock %u]"
+ " now=3D%lld.%09ld interval=3D%lld.%09ld then=3D%lld.%09ld",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec);
+ timespecsub(next, now, &delta);
+ } else {
+ /*
+ * If the arithmetic overflowed, just take the interval
+ * as the time to wait. This is unlikely to happen
+ * unless you are messing with your clock to set it
+ * ahead by hundreds of years.
+ */
+ delta =3D it->it_time.it_interval; /* overflow */
+ }
+ ticks =3D tstohz(&delta);
+ KASSERTMSG(ticks > 0,
+ "[clock %u]"
+ " now=3D%lld.%09ld interval=3D%lld.%09ld then=3D%lld.%09ld"
+ " delta=3D%lld.%09ld ticks=3D%d",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec,
+ (long long)delta.tv_sec, (long)delta.tv_nsec,
+ ticks);
+ callout_schedule(&it->it_ch, ticks);
+}
+
+/*
* itimer_callout:
*
* Callout to expire a non-virtual timer. Queue it up for processing,
@@ -872,6 +921,18 @@ itimer_callout(void *arg)
* now, compute the next itimer value and count overruns.
*/
itimer_transition(&it->it_time, &now, &next, &overruns);
+ KASSERTMSG(timespeccmp(&now, &next, <),
+ "[clock %u]"
+ " it->it_time.it_value=3D%lld.%09ld"
+ " it->it_time.it_interval=3D%lld.%09ld"
+ " now=3D%lld.%09ld next=3D%lld.%09ld",
+ it->it_clockid,
+ (long long)it->it_time.it_value.tv_sec,
+ (long)it->it_time.it_value.tv_nsec,
+ (long long)it->it_time.it_interval.tv_sec,
+ (long)it->it_time.it_interval.tv_nsec,
+ (long long)now.tv_sec, (long)now.tv_nsec,
+ (long long)next.tv_sec, (long)next.tv_nsec);
it->it_time.it_value =3D next;
it->it_overruns +=3D overruns;
=20
@@ -879,7 +940,7 @@ itimer_callout(void *arg)
* Reset the callout, if it's not going away.
*/
if (!it->it_dying)
- itimer_arm_real(it);
+ itimer_rearm_real(it, &now);
itimer_unlock();
}
=20
--=_LIipI4/8TCpaNLGJjkojmsUX7yA77xBy--
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Thomas Klausner <wiz@NetBSD.org>, Patrick Welche <prlw1@welche.eu>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Sat, 10 May 2025 01:03:40 +0000
This is a multi-part message in MIME format.
--=_8ZQlRQ4u9eg1J0DCFMQoE3LK0lMgXmEF
Better yet, try the attached patch. It will print a message to the
console if the circumstances that I hypothesize _could_ cause the hang
occur, but avoids those circumstances at the same time.
In particular, in current, itimer_callout will:
1. sample the clock giving now,
2. decide at what timespec t to next schedule the itimer for based on
the clock sample in (1),
3. sample the clock again giving now1,
4. decide how many ticks to sleep from now1 to t.
If the duration from (1) to (3) is long enough, even though the
decision in (2) is designed to guarantee now < t (except in case of
arithmetic overflow, not likely relevant unless someone is attacking
your ntp client on the network), it may turn out that t < now1, which
may lead the itimer to fire immediately again, possibly leading into
some kind of infinite loop that triggers the heartbeat panic.
However, that's the point at which my hypothesis breaks down: even if
the itimer callout reschedules itself with callout_schedule(ch, 0),
that shouldn't prevent the heartbeat callout from running, because (if
I'm reading kern_timeout.c correctly) callouts at the same time run in
FIFO order. So I'm kinda grasping at straws here.
The patch skips step (3) -- except to print as a diagnostic -- and
uses the sample now from (1) in the decision at (4) of how many ticks
to sleep until t.
--=_8ZQlRQ4u9eg1J0DCFMQoE3LK0lMgXmEF
Content-Type: text/plain; charset="ISO-8859-1"; name="pr59339-itimercalloutassert-v3"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr59339-itimercalloutassert-v3.patch"
diff -r 55a6ded9e19e -r 2808a674f18f sys/kern/kern_time.c
--- a/sys/kern/kern_time.c Thu May 08 20:51:40 2025 +0000
+++ b/sys/kern/kern_time.c Sat May 10 00:37:46 2025 +0000
@@ -837,6 +837,82 @@ itimer_arm_real(struct itimer * const it
}
=20
/*
+ * itimer_rearm_real:
+ *
+ * Re-arm a non-virtual timer, given the current clock readout.
+ */
+static void
+itimer_rearm_real(struct itimer * const it, const struct timespec * const =
now)
+{
+ const struct timespec * const interval =3D &it->it_time.it_interval;
+ const struct timespec * const next =3D &it->it_time.it_value;
+ struct timespec delta;
+ int ticks;
+
+ KASSERT(!it->it_dying);
+ KASSERT(!CLOCK_VIRTUAL_P(it->it_clockid));
+ KASSERT(!callout_pending(&it->it_ch));
+
+ if (__predict_true(timespecisset(next))) {
+ struct timespec now1;
+
+ if (it->it_clockid =3D=3D CLOCK_MONOTONIC) {
+ getnanouptime(&now1);
+ } else {
+ getnanotime(&now1);
+ }
+ if (timespeccmp(next, &now1, <=3D)) {
+ printf("PR kern/59339 (heartbeat watchdog fires"
+ " since 10.99.14) would fire:"
+ " [clock %u]"
+ " [now=3D%lld.%09ld -> %lld.%09ld]"
+ " interval=3D%lld.%09ld next=3D%lld.%09ld\n",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)now1.tv_sec, (long)now1.tv_nsec,
+ (long long)interval->tv_sec,
+ (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec);
+ }
+
+ KASSERTMSG(timespeccmp(now, next, <),
+ "[clock %u]"
+ " now=3D%lld.%09ld interval=3D%lld.%09ld next=3D%lld.%09ld",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec);
+ timespecsub(next, now, &delta);
+ } else {
+ /*
+ * If the arithmetic overflowed, just take the interval
+ * as the time to wait. This is unlikely to happen
+ * unless you are messing with your clock to set it
+ * ahead by hundreds of years.
+ */
+ printf("[clock %u]"
+ " itimer arithmetic overflowed:"
+ " now=3D%lld.%09ld interval=3D%lld.%09ld\n",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec);
+ delta =3D it->it_time.it_interval; /* overflow */
+ }
+ ticks =3D tstohz(&delta);
+ KASSERTMSG(ticks > 0,
+ "[clock %u]"
+ " now=3D%lld.%09ld interval=3D%lld.%09ld next=3D%lld.%09ld"
+ " delta=3D%lld.%09ld ticks=3D%d",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec,
+ (long long)delta.tv_sec, (long)delta.tv_nsec,
+ ticks);
+ callout_schedule(&it->it_ch, ticks);
+}
+
+/*
* itimer_callout:
*
* Callout to expire a non-virtual timer. Queue it up for processing,
@@ -872,6 +948,18 @@ itimer_callout(void *arg)
* now, compute the next itimer value and count overruns.
*/
itimer_transition(&it->it_time, &now, &next, &overruns);
+ KASSERTMSG(timespeccmp(&now, &next, <),
+ "[clock %u]"
+ " it->it_time.it_value=3D%lld.%09ld"
+ " it->it_time.it_interval=3D%lld.%09ld"
+ " now=3D%lld.%09ld next=3D%lld.%09ld",
+ it->it_clockid,
+ (long long)it->it_time.it_value.tv_sec,
+ (long)it->it_time.it_value.tv_nsec,
+ (long long)it->it_time.it_interval.tv_sec,
+ (long)it->it_time.it_interval.tv_nsec,
+ (long long)now.tv_sec, (long)now.tv_nsec,
+ (long long)next.tv_sec, (long)next.tv_nsec);
it->it_time.it_value =3D next;
it->it_overruns +=3D overruns;
=20
@@ -879,7 +967,7 @@ itimer_callout(void *arg)
* Reset the callout, if it's not going away.
*/
if (!it->it_dying)
- itimer_arm_real(it);
+ itimer_rearm_real(it, &now);
itimer_unlock();
}
=20
--=_8ZQlRQ4u9eg1J0DCFMQoE3LK0lMgXmEF--
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Sat, 10 May 2025 08:27:50 +0100
Did see the same as wiz eventually:
kernel diagnostic assertion "ticks > 0" failed: file "../../../../kern/kern_time.c", line 875 [clock 0] now=1746822031.135167992 interval=0.010000000 then=1746822031.135168147 delta=0.000000155 ticks=0
Now running with most recent patch
From: Thomas Klausner <wiz@NetBSD.org>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Sat, 10 May 2025 10:10:30 +0200
On Sat, May 10, 2025 at 01:03:40AM +0000, Taylor R Campbell wrote:
> Better yet, try the attached patch. It will print a message to the
> console if the circumstances that I hypothesize _could_ cause the hang
> occur, but avoids those circumstances at the same time.
With a kernel from ~3 hours ago and this patch (v3) applied, I saw:
panic: kernel diagnostic assertion "ticks > 0" failed: file "/disk/storage-202410/archive/foreign/src/sys/kern/kern_time.c", line 902 [clock 0] now=1746864040.736221265 interval=0.010000000 next=1746864040.736222027 delta=0.000000762 ticks=0
gdb says:
(gdb) bt
#0 0xffffffff80239b85 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:709
#1 0xffffffff80de3b15 in kern_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:91
#2 0xffffffff80e29b33 in vpanic (fmt=0xffffffff814f20c0 "kernel %sassertion \"%s\" failed: file \"%s\", line %d [clock %u] now=%lld.%09ld interval=%lld.%09ld next=%lld.%09ld delta=%lld.%09ld ticks=%d",
ap=ap@entry=0xffffce246a3f8ec8) at /usr/src/sys/kern/subr_prf.c:288
#3 0xffffffff80ff837c in kern_assert (fmt=fmt@entry=0xffffffff814f20c0 "kernel %sassertion \"%s\" failed: file \"%s\", line %d [clock %u] now=%lld.%09ld interval=%lld.%09ld next=%lld.%09ld delta=%lld.%09ld ticks=%d")
at /usr/src/sys/lib/libkern/kern_assert.c:51
#4 0xffffffff80dff441 in itimer_rearm_real (now=0xffffce246a3f8f80, it=0xffff84a9f2e4a5c0) at /usr/src/sys/kern/kern_time.c:902
#5 itimer_callout (arg=0xffff84a9f2e4a5c0) at /usr/src/sys/kern/kern_time.c:970
#6 0xffffffff80e02519 in callout_softclock (v=<optimized out>) at /usr/src/sys/kern/kern_timeout.c:872
#7 0xffffffff80df201e in softint_execute (s=2, l=0xffff84b0570db000) at /usr/src/sys/kern/kern_softint.c:599
#8 softint_dispatch (pinned=<optimized out>, s=2) at /usr/src/sys/kern/kern_softint.c:848
#9 0xffffffff8023480c in Xsoftintr ()
(gdb) fr 4
#4 0xffffffff80dff441 in itimer_rearm_real (now=0xffffce246a3f8f80, it=0xffff84a9f2e4a5c0) at /usr/src/sys/kern/kern_time.c:902
warning: 902 /usr/src/sys/kern/kern_time.c: No such file or directory
(gdb) p *now
$1 = {tv_sec = 1746864040, tv_nsec = 736221265}
(gdb) p *it
$2 = {{it_real = {it_ch = {_c_store = {0xffffce010b0be098, 0xffffce010b0be098, 0xffffffff80dff106 <itimer_callout>, 0xffff84a9f2e4a5c0, 0xffffce010b0be000, 0x10c000b4ee4, 0x11deeba1, 0x0, 0x0, 0x0}}, it_rtchgq = {
le_next = 0x0, le_prev = 0x0}}, it_virtual = {it_vlist = 0xffffce010b0be098, it_list = {le_next = 0xffffce010b0be098, le_prev = 0xffffffff80dff106 <itimer_callout>}, it_active = 192}},
it_ops = 0xffffffff8138a470 <ptimer_itimer_ops>, it_time = {it_interval = {tv_sec = 0, tv_nsec = 10000000}, it_value = {tv_sec = 1746864040, tv_nsec = 736222027}}, it_clockid = 0, it_overruns = 0, it_dying = false}
Let me know what further information you need!
Thomas
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Thomas Klausner <wiz@NetBSD.org>, Patrick Welche <prlw1@welche.eu>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Sat, 10 May 2025 08:41:06 +0000
This is a multi-part message in MIME format.
--=_IbGdYHmSjJAiKZ8zRyLyvEB11GuaQdmi
New patch! More diagnostics, more potential problems avoided.
--=_IbGdYHmSjJAiKZ8zRyLyvEB11GuaQdmi
Content-Type: text/plain; charset="ISO-8859-1"; name="pr59339-itimercalloutassert-v4"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr59339-itimercalloutassert-v4.patch"
diff -r 55a6ded9e19e -r a27cadc62a5e sys/kern/kern_time.c
--- a/sys/kern/kern_time.c Thu May 08 20:51:40 2025 +0000
+++ b/sys/kern/kern_time.c Sat May 10 08:36:45 2025 +0000
@@ -837,6 +837,105 @@ itimer_arm_real(struct itimer * const it
}
=20
/*
+ * itimer_rearm_real:
+ *
+ * Re-arm a non-virtual timer, given the current clock readout.
+ */
+static void
+itimer_rearm_real(struct itimer * const it, const struct timespec * const =
now,
+ const struct timespec * const last)
+{
+ const struct timespec * const interval =3D &it->it_time.it_interval;
+ const struct timespec * const next =3D &it->it_time.it_value;
+ struct timespec delta;
+ int ticks;
+
+ KASSERT(!it->it_dying);
+ KASSERT(!CLOCK_VIRTUAL_P(it->it_clockid));
+ KASSERT(!callout_pending(&it->it_ch));
+
+ if (__predict_true(timespecisset(next))) {
+ struct timespec now1;
+
+ if (it->it_clockid =3D=3D CLOCK_MONOTONIC) {
+ getnanouptime(&now1);
+ } else {
+ getnanotime(&now1);
+ }
+ if (timespeccmp(next, &now1, <=3D)) {
+ printf("PR kern/59339 (heartbeat watchdog fires"
+ " since 10.99.14) would fire:"
+ " [clock %u]"
+ " [now=3D%lld.%09ld -> %lld.%09ld],"
+ " last=3D%lld.%09ld + interval=3D%lld.%09ld"
+ " -> next=3D%lld.%09ld\n",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)now1.tv_sec, (long)now1.tv_nsec,
+ (long long)last->tv_sec, (long)last->tv_nsec,
+ (long long)interval->tv_sec,
+ (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec);
+ }
+
+ KASSERTMSG(timespeccmp(now, next, <),
+ "[clock %u]"
+ " now=3D%lld.%09ld,"
+ " last=3D%lld.%09ld + interval=3D%lld.%09ld"
+ " -> next=3D%lld.%09ld",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)last->tv_sec, (long)last->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec);
+ timespecsub(next, now, &delta);
+ } else {
+ /*
+ * If the arithmetic overflowed, just take the interval
+ * as the time to wait. This is unlikely to happen
+ * unless you are messing with your clock to set it
+ * ahead by hundreds of years.
+ */
+ printf("[clock %u]"
+ " itimer arithmetic overflowed:"
+ " now=3D%lld.%09ld, last=3D%lld.%09ld + interval=3D%lld.%09ld\n",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)last->tv_sec, (long)last->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec);
+ delta =3D it->it_time.it_interval; /* overflow */
+ }
+ ticks =3D tstohz(&delta);
+ if (delta.tv_sec =3D=3D 0 && delta.tv_nsec < 1000) {
+ printf("[clock %u]"
+ " now=3D%lld.%09ld,"
+ " last=3D%lld.%09ld + interval=3D%lld.%09ld"
+ " -> next=3D%lld.%09ld;"
+ " delta=3D%lld.%09ld ticks rounded to %d",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)last->tv_sec, (long)last->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec,
+ (long long)delta.tv_sec, (long)delta.tv_nsec,
+ ticks);
+ }
+ KASSERTMSG(ticks > 0,
+ "[clock %u]"
+ " now=3D%lld.%09ld,"
+ " last=3D%lld.%09ld + interval=3D%lld.%09ld -> next=3D%lld.%09ld;"
+ " delta=3D%lld.%09ld ticks=3D%d",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)last->tv_sec, (long)last->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec,
+ (long long)delta.tv_sec, (long)delta.tv_nsec,
+ ticks);
+ callout_schedule(&it->it_ch, ticks);
+}
+
+/*
* itimer_callout:
*
* Callout to expire a non-virtual timer. Queue it up for processing,
@@ -848,7 +947,7 @@ itimer_arm_real(struct itimer * const it
static void
itimer_callout(void *arg)
{
- struct timespec now, next;
+ struct timespec last, now, next;
struct itimer * const it =3D arg;
int overruns;
=20
@@ -871,7 +970,20 @@ itimer_callout(void *arg)
* Given the current itimer value and interval and the time
* now, compute the next itimer value and count overruns.
*/
+ last =3D it->it_time.it_value;
itimer_transition(&it->it_time, &now, &next, &overruns);
+ KASSERTMSG(timespeccmp(&now, &next, <),
+ "[clock %u]"
+ " it->it_time.it_value=3D%lld.%09ld"
+ " it->it_time.it_interval=3D%lld.%09ld"
+ " now=3D%lld.%09ld next=3D%lld.%09ld",
+ it->it_clockid,
+ (long long)it->it_time.it_value.tv_sec,
+ (long)it->it_time.it_value.tv_nsec,
+ (long long)it->it_time.it_interval.tv_sec,
+ (long)it->it_time.it_interval.tv_nsec,
+ (long long)now.tv_sec, (long)now.tv_nsec,
+ (long long)next.tv_sec, (long)next.tv_nsec);
it->it_time.it_value =3D next;
it->it_overruns +=3D overruns;
=20
@@ -879,7 +991,7 @@ itimer_callout(void *arg)
* Reset the callout, if it's not going away.
*/
if (!it->it_dying)
- itimer_arm_real(it);
+ itimer_rearm_real(it, &now, &last);
itimer_unlock();
}
=20
diff -r 55a6ded9e19e -r a27cadc62a5e sys/kern/subr_time.c
--- a/sys/kern/subr_time.c Thu May 08 20:51:40 2025 +0000
+++ b/sys/kern/subr_time.c Sat May 10 08:36:45 2025 +0000
@@ -86,17 +86,28 @@ tshztoup(const struct timespec *tsp)
=20
/*
* Compute number of ticks in the specified amount of time.
+ *
+ * Round up. Return 0 iff ts <=3D 0.
*/
int
tstohz(const struct timespec *ts)
{
struct timeval tv;
=20
+ KASSERT(ts->tv_sec >=3D 0);
+ KASSERT(ts->tv_nsec >=3D 0);
+ KASSERT(ts->tv_nsec < 1000000000);
+
/*
* usec has great enough resolution for hz, so convert to a
* timeval and use tvtohz() above.
*/
- TIMESPEC_TO_TIMEVAL(&tv, ts);
+ tv.tv_sec =3D ts->tv_sec;
+ tv.tv_usec =3D (ts->tv_nsec + 999)/1000;
+ if (tv.tv_usec >=3D 1000000) {
+ tv.tv_sec++;
+ tv.tv_usec -=3D 1000000;
+ }
return tvtohz(&tv);
}
=20
--=_IbGdYHmSjJAiKZ8zRyLyvEB11GuaQdmi--
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Sat, 10 May 2025 10:27:44 +0100
Just catching up with wiz. For me:
(gdb) print *now
$1 = {tv_sec = 1746868783, tv_nsec = 165198707}
(gdb) print *it
$2 = {{it_real = {it_ch = {_c_store = {0xffffa3002131c098, 0xffffa3002131c098,
0xffffffff808bcb46 <itimer_callout>, 0xffffa2a587774b00,
0xffffa3002131c000, 0x10c0004239c, 0x11deeba1, 0x0, 0x0, 0x0}},
it_rtchgq = {le_next = 0x0, le_prev = 0x0}}, it_virtual = {
it_vlist = 0xffffa3002131c098, it_list = {le_next = 0xffffa3002131c098,
le_prev = 0xffffffff808bcb46 <itimer_callout>}, it_active = false}},
it_ops = 0xffffffff80d7d950 <ptimer_itimer_ops>, it_time = {it_interval = {
tv_sec = 0, tv_nsec = 10000000}, it_value = {tv_sec = 1746868783,
tv_nsec = 165199491}}, it_clockid = 0, it_overruns = 0, it_dying = false}
and already a new patch to try!
From: Thomas Klausner <wiz@NetBSD.org>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Sat, 10 May 2025 11:29:01 +0200
On Sat, May 10, 2025 at 08:41:06AM +0000, Taylor R Campbell wrote:
> New patch! More diagnostics, more potential problems avoided.
Much worse, two panics in less than 5 minutes each.
May 10 11:17:20 exadelic savecore: reboot after panic: [ 270.1245676] panic: kernel diagnostic assertion "ts->tv_sec >= 0" failed: file "/disk/storage-202410/archive/foreign/src/sys/kern/subr_time.c", line 97
May 10 11:21:41 exadelic savecore: reboot after panic: [ 221.8767863] panic: kernel diagnostic assertion "ts->tv_sec >= 0" failed: file "/disk/storage-202410/archive/foreign/src/sys/kern/subr_time.c", line 97
(gdb) target kvm netbsd.14.core
0xffffffff80239b85 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:709
warning: 709 /usr/src/sys/arch/amd64/amd64/machdep.c: No such file or directory
(gdb) bt
#0 0xffffffff80239b85 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:709
#1 0xffffffff80de3b15 in kern_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:91
#2 0xffffffff80e29be3 in vpanic (fmt=0xffffffff81427440 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xffff98a47e05bde8) at /usr/src/sys/kern/subr_prf.c:288
#3 0xffffffff80ff84cc in kern_assert (fmt=fmt@entry=0xffffffff81427440 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") at /usr/src/sys/lib/libkern/kern_assert.c:51
#4 0xffffffff80e2ba6e in tstohz (ts=ts@entry=0xffff98a47e05be80) at /usr/src/sys/kern/subr_time.c:97
#5 0xffffffff80e2bb3b in tshzto (tsp=tsp@entry=0xffff876be5c93cb8) at /usr/src/sys/kern/subr_time.c:73
#6 0xffffffff80e00a98 in itimer_arm_real (it=<optimized out>) at /usr/src/sys/kern/kern_time.c:836
#7 itimer_settime (it=it@entry=0xffff876be5c93c40) at /usr/src/sys/kern/kern_time.c:1040
#8 0xffffffff80e01105 in dosetitimer (p=p@entry=0xffff876c28b45c00, which=which@entry=0, itvp=itvp@entry=0xffff98a47e05bf20) at /usr/src/sys/kern/kern_time.c:1836
#9 0xffffffff80e02379 in dosetitimer (itvp=0xffff98a47e05bf20, which=0, p=0xffff876c28b45c00) at /usr/src/sys/kern/kern_time.c:1748
#10 sys___setitimer50 (l=0xffff876c23737400, uap=0xffff98a47e05c000, retval=0xffff98a47e05bfb0) at /usr/src/sys/kern/kern_time.c:1733
#11 0xffffffff805b7cc1 in sy_call (rval=0xffff98a47e05bfb0, uap=0xffff98a47e05c000, l=0xffff876c23737400, sy=0xffffffff8188a358 <sysent+10200>) at /usr/src/sys/sys/syscallvar.h:65
#12 sy_invoke (code=425, rval=0xffff98a47e05bfb0, uap=0xffff98a47e05c000, l=0xffff876c23737400, sy=0xffffffff8188a358 <sysent+10200>) at /usr/src/sys/sys/syscallvar.h:94
#13 syscall (frame=0xffff98a47e05c000) at /usr/src/sys/arch/x86/x86/syscall.c:137
#14 0xffffffff8021025d in handle_syscall ()
(gdb) fr 6
#6 0xffffffff80e00a98 in itimer_arm_real (it=<optimized out>) at /usr/src/sys/kern/kern_time.c:836
warning: 836 /usr/src/sys/kern/kern_time.c: No such file or directory
(gdb) p *it
value has been optimized out
(gdb) fr 7
#7 itimer_settime (it=it@entry=0xffff876be5c93c40) at /usr/src/sys/kern/kern_time.c:1040
1040 in /usr/src/sys/kern/kern_time.c
(gdb) p *it
$1 = {{it_real = {it_ch = {_c_store = {0xffff98810ad5f098, 0xffff98810ad5f098, 0xffffffff80dff106 <itimer_callout>, 0xffff876be5c93c40, 0xffff98810ad5f000, 0x10000006902, 0x11deeba1, 0x0, 0x0, 0x0}}, it_rtchgq = {
le_next = 0x0, le_prev = 0x0}}, it_virtual = {it_vlist = 0xffff98810ad5f098, it_list = {le_next = 0xffff98810ad5f098, le_prev = 0xffffffff80dff106 <itimer_callout>}, it_active = 64}},
it_ops = 0xffffffff8138a470 <ptimer_itimer_ops>, it_time = {it_interval = {tv_sec = 0, tv_nsec = 10000000}, it_value = {tv_sec = 1746868565, tv_nsec = 444217852}}, it_clockid = 0, it_overruns = 0, it_dying = false}
(gdb)
(gdb) target kvm netbsd.15.core
0xffffffff80239b85 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:709
warning: 709 /usr/src/sys/arch/amd64/amd64/machdep.c: No such file or directory
(gdb) bt
#0 0xffffffff80239b85 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:709
#1 0xffffffff80de3b15 in kern_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:91
#2 0xffffffff80e29be3 in vpanic (fmt=0xffffffff81427440 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xffffc0247bf84de8) at /usr/src/sys/kern/subr_prf.c:288
#3 0xffffffff80ff84cc in kern_assert (fmt=fmt@entry=0xffffffff81427440 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") at /usr/src/sys/lib/libkern/kern_assert.c:51
#4 0xffffffff80e2ba6e in tstohz (ts=ts@entry=0xffffc0247bf84e80) at /usr/src/sys/kern/subr_time.c:97
#5 0xffffffff80e2bb3b in tshzto (tsp=tsp@entry=0xffff888c44bf9f38) at /usr/src/sys/kern/subr_time.c:73
#6 0xffffffff80e00a98 in itimer_arm_real (it=<optimized out>) at /usr/src/sys/kern/kern_time.c:836
#7 itimer_settime (it=it@entry=0xffff888c44bf9ec0) at /usr/src/sys/kern/kern_time.c:1040
#8 0xffffffff80e01d4a in dotimer_settime (timerid=4, value=value@entry=0xffffc0247bf84f40, ovalue=ovalue@entry=0x0, flags=1, p=<optimized out>) at /usr/src/sys/kern/kern_time.c:1557
#9 0xffffffff80e01f44 in sys___timer_settime50 (l=0xffff888a9d1c7c00, uap=0xffffc0247bf85000, retval=<optimized out>) at /usr/src/sys/kern/kern_time.c:1485
#10 0xffffffff805b7cc1 in sy_call (rval=0xffffc0247bf84fb0, uap=0xffffc0247bf85000, l=0xffff888a9d1c7c00, sy=0xffffffff8188a550 <sysent+10704>) at /usr/src/sys/sys/syscallvar.h:65
#11 sy_invoke (code=446, rval=0xffffc0247bf84fb0, uap=0xffffc0247bf85000, l=0xffff888a9d1c7c00, sy=0xffffffff8188a550 <sysent+10704>) at /usr/src/sys/sys/syscallvar.h:94
#12 syscall (frame=0xffffc0247bf85000) at /usr/src/sys/arch/x86/x86/syscall.c:137
#13 0xffffffff8021025d in handle_syscall ()
(gdb) p *it
$1 = {{it_real = {it_ch = {_c_store = {0xffff888acc47b4c0, 0xffffc0010ad87328, 0xffffffff80dff106 <itimer_callout>, 0xffff888c44bf9ec0, 0xffffc0010ad87000, 0x10000005628, 0x11deeba1, 0x0, 0x0, 0x0}}, it_rtchgq = {
le_next = 0x0, le_prev = 0x0}}, it_virtual = {it_vlist = 0xffff888acc47b4c0, it_list = {le_next = 0xffffc0010ad87328, le_prev = 0xffffffff80dff106 <itimer_callout>}, it_active = 192}},
it_ops = 0xffffffff8138a470 <ptimer_itimer_ops>, it_time = {it_interval = {tv_sec = 0, tv_nsec = 0}, it_value = {tv_sec = 1746868825, tv_nsec = 142185780}}, it_clockid = 0, it_overruns = 0, it_dying = false}
(gdb)
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Thomas Klausner <wiz@NetBSD.org>, Patrick Welche <prlw1@welche.eu>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Sat, 10 May 2025 13:03:37 +0000
This is a multi-part message in MIME format.
--=_rQ5/ZdmDD9MmXcsf79EVgs7gGkvXK6eU
> Date: Sat, 10 May 2025 11:29:01 +0200
> From: Thomas Klausner <wiz@NetBSD.org>
>
> On Sat, May 10, 2025 at 08:41:06AM +0000, Taylor R Campbell wrote:
> > New patch! More diagnostics, more potential problems avoided.
>
> Much worse, two panics in less than 5 minutes each.
Sorry -- that assertion I added to tstohz is wrong, nix it.
Replacement patch attached.
--=_rQ5/ZdmDD9MmXcsf79EVgs7gGkvXK6eU
Content-Type: text/plain; charset="ISO-8859-1"; name="pr59339-itimercalloutassert-v5"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr59339-itimercalloutassert-v5.patch"
diff -r 55a6ded9e19e -r f01484399631 sys/kern/kern_time.c
--- a/sys/kern/kern_time.c Thu May 08 20:51:40 2025 +0000
+++ b/sys/kern/kern_time.c Sat May 10 08:36:45 2025 +0000
@@ -837,6 +837,105 @@ itimer_arm_real(struct itimer * const it
}
=20
/*
+ * itimer_rearm_real:
+ *
+ * Re-arm a non-virtual timer, given the current clock readout.
+ */
+static void
+itimer_rearm_real(struct itimer * const it, const struct timespec * const =
now,
+ const struct timespec * const last)
+{
+ const struct timespec * const interval =3D &it->it_time.it_interval;
+ const struct timespec * const next =3D &it->it_time.it_value;
+ struct timespec delta;
+ int ticks;
+
+ KASSERT(!it->it_dying);
+ KASSERT(!CLOCK_VIRTUAL_P(it->it_clockid));
+ KASSERT(!callout_pending(&it->it_ch));
+
+ if (__predict_true(timespecisset(next))) {
+ struct timespec now1;
+
+ if (it->it_clockid =3D=3D CLOCK_MONOTONIC) {
+ getnanouptime(&now1);
+ } else {
+ getnanotime(&now1);
+ }
+ if (timespeccmp(next, &now1, <=3D)) {
+ printf("PR kern/59339 (heartbeat watchdog fires"
+ " since 10.99.14) would fire:"
+ " [clock %u]"
+ " [now=3D%lld.%09ld -> %lld.%09ld],"
+ " last=3D%lld.%09ld + interval=3D%lld.%09ld"
+ " -> next=3D%lld.%09ld\n",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)now1.tv_sec, (long)now1.tv_nsec,
+ (long long)last->tv_sec, (long)last->tv_nsec,
+ (long long)interval->tv_sec,
+ (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec);
+ }
+
+ KASSERTMSG(timespeccmp(now, next, <),
+ "[clock %u]"
+ " now=3D%lld.%09ld,"
+ " last=3D%lld.%09ld + interval=3D%lld.%09ld"
+ " -> next=3D%lld.%09ld",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)last->tv_sec, (long)last->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec);
+ timespecsub(next, now, &delta);
+ } else {
+ /*
+ * If the arithmetic overflowed, just take the interval
+ * as the time to wait. This is unlikely to happen
+ * unless you are messing with your clock to set it
+ * ahead by hundreds of years.
+ */
+ printf("[clock %u]"
+ " itimer arithmetic overflowed:"
+ " now=3D%lld.%09ld, last=3D%lld.%09ld + interval=3D%lld.%09ld\n",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)last->tv_sec, (long)last->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec);
+ delta =3D it->it_time.it_interval; /* overflow */
+ }
+ ticks =3D tstohz(&delta);
+ if (delta.tv_sec =3D=3D 0 && delta.tv_nsec < 1000) {
+ printf("[clock %u]"
+ " now=3D%lld.%09ld,"
+ " last=3D%lld.%09ld + interval=3D%lld.%09ld"
+ " -> next=3D%lld.%09ld;"
+ " delta=3D%lld.%09ld ticks rounded to %d",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)last->tv_sec, (long)last->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec,
+ (long long)delta.tv_sec, (long)delta.tv_nsec,
+ ticks);
+ }
+ KASSERTMSG(ticks > 0,
+ "[clock %u]"
+ " now=3D%lld.%09ld,"
+ " last=3D%lld.%09ld + interval=3D%lld.%09ld -> next=3D%lld.%09ld;"
+ " delta=3D%lld.%09ld ticks=3D%d",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)last->tv_sec, (long)last->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec,
+ (long long)delta.tv_sec, (long)delta.tv_nsec,
+ ticks);
+ callout_schedule(&it->it_ch, ticks);
+}
+
+/*
* itimer_callout:
*
* Callout to expire a non-virtual timer. Queue it up for processing,
@@ -848,7 +947,7 @@ itimer_arm_real(struct itimer * const it
static void
itimer_callout(void *arg)
{
- struct timespec now, next;
+ struct timespec last, now, next;
struct itimer * const it =3D arg;
int overruns;
=20
@@ -871,7 +970,20 @@ itimer_callout(void *arg)
* Given the current itimer value and interval and the time
* now, compute the next itimer value and count overruns.
*/
+ last =3D it->it_time.it_value;
itimer_transition(&it->it_time, &now, &next, &overruns);
+ KASSERTMSG(timespeccmp(&now, &next, <),
+ "[clock %u]"
+ " it->it_time.it_value=3D%lld.%09ld"
+ " it->it_time.it_interval=3D%lld.%09ld"
+ " now=3D%lld.%09ld next=3D%lld.%09ld",
+ it->it_clockid,
+ (long long)it->it_time.it_value.tv_sec,
+ (long)it->it_time.it_value.tv_nsec,
+ (long long)it->it_time.it_interval.tv_sec,
+ (long)it->it_time.it_interval.tv_nsec,
+ (long long)now.tv_sec, (long)now.tv_nsec,
+ (long long)next.tv_sec, (long)next.tv_nsec);
it->it_time.it_value =3D next;
it->it_overruns +=3D overruns;
=20
@@ -879,7 +991,7 @@ itimer_callout(void *arg)
* Reset the callout, if it's not going away.
*/
if (!it->it_dying)
- itimer_arm_real(it);
+ itimer_rearm_real(it, &now, &last);
itimer_unlock();
}
=20
diff -r 55a6ded9e19e -r f01484399631 sys/kern/subr_time.c
--- a/sys/kern/subr_time.c Thu May 08 20:51:40 2025 +0000
+++ b/sys/kern/subr_time.c Sat May 10 08:36:45 2025 +0000
@@ -86,17 +86,27 @@ tshztoup(const struct timespec *tsp)
=20
/*
* Compute number of ticks in the specified amount of time.
+ *
+ * Round up. Return 0 iff ts <=3D 0.
*/
int
tstohz(const struct timespec *ts)
{
struct timeval tv;
=20
+ KASSERT(ts->tv_nsec >=3D 0);
+ KASSERT(ts->tv_nsec < 1000000000);
+
/*
* usec has great enough resolution for hz, so convert to a
* timeval and use tvtohz() above.
*/
- TIMESPEC_TO_TIMEVAL(&tv, ts);
+ tv.tv_sec =3D ts->tv_sec;
+ tv.tv_usec =3D (ts->tv_nsec + 999)/1000;
+ if (tv.tv_usec >=3D 1000000) {
+ tv.tv_sec++;
+ tv.tv_usec -=3D 1000000;
+ }
return tvtohz(&tv);
}
=20
--=_rQ5/ZdmDD9MmXcsf79EVgs7gGkvXK6eU--
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Sun, 11 May 2025 13:53:47 +0100
Running for 1.75 hours with the latest patch, I see:
[clock 0] now=3D1746963705.059402942, last=3D1746963705.049403167 + interva=
l=3D0.010000000 -> next=3D1746963705.059403167; delta=3D0.000000225 ticks r=
ounded to 2[clock 0] now=3D1746963706.089410468, last=3D1746963706.08941125=
9 + interval=3D0.010000000 -> next=3D1746963706.089411259; delta=3D0.000000=
791 ticks rounded to 2[clock 0] now=3D1746963706.399408981, last=3D17469637=
06.389409000 + interval=3D0.010000000 -> next=3D1746963706.399409000; delta=
=3D0.000000019 ticks rounded to 2[clock 0] now=3D1746963736.249409664, last=
=3D1746963736.239410031 + interval=3D0.010000000 -> next=3D1746963736.24941=
0031; delta=3D0.000000367 ticks rounded to 2[clock 0] now=3D1746963826.5394=
08864, last=3D1746963826.539409266 + interval=3D0.010000000 -> next=3D17469=
63826.539409266; delta=3D0.000000402 ticks rounded to 2{drm:netbsd:intel_pi=
pe_update_start+0x322} *ERROR* Potential atomic update failure on pipe A: -=
35
[clock 0] now=3D1746965392.899407876, last=3D1746965392.889407902 + interva=
l=3D0.010000000 -> next=3D1746965392.899407902; delta=3D0.000000026 ticks r=
ounded to 2[clock 0] now=3D1746965528.879407030, last=3D1746965528.86940750=
4 + interval=3D0.010000000 -> next=3D1746965528.879407504; delta=3D0.000000=
474 ticks rounded to 2[clock 0] now=3D1746965534.629408024, last=3D17469655=
34.629408212 + interval=3D0.010000000 -> next=3D1746965534.629408212; delta=
=3D0.000000188 ticks rounded to 2[clock 0] now=3D1746966319.279407861, last=
=3D1746966319.279408058 + interval=3D0.010000000 -> next=3D1746966319.27940=
8058; delta=3D0.000000197 ticks rounded to 2[clock 0] now=3D1746966721.1594=
07670, last=3D1746966721.149408231 + interval=3D0.010000000 -> next=3D17469=
66721.159408231; delta=3D0.000000561 ticks rounded to 2[clock 0] now=3D1746=
966723.459408738, last=3D1746966723.449409530 + interval=3D0.010000000 -> n=
ext=3D1746966723.459409530; delta=3D0.000000792 ticks rounded to 2[clock 0]=
now=3D1746966723.489407439, last=3D1746966723.479408121 + interval=3D0.010=
000000 -> next=3D1746966723.489408121; delta=3D0.000000682 ticks rounded to=
2[clock 0] now=3D1746966998.029407270, last=3D1746966998.029407316 + inter=
val=3D0.010000000 -> next=3D1746966998.029407316; delta=3D0.000000046 ticks=
rounded to 2
in dmesg
From: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: Taylor R Campbell <riastradh@NetBSD.org>
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Sun, 11 May 2025 22:38:22 +0200
So I rebooted into a kernel with pr59339-itimercalloutassert-v5.patch
yesterday, and it paniced this afternooon shortly after I started a
bulk build. Looking at var/log/messages I found one weird line from
this morning:
May 11 08:55:47 exadelic /netbsd: cks rounded to 2[clock 0] now=3D174691038=
5.902206688, last=3D1746910385.882207038 + interval=3D0.010000000 -> next=
=3D1746910385.902207038; delta=3D0.000000350 ticks rounded to 2[clock 0] no=
w=3D1746910385.922206613, last=3D1746910385.902207038 + interval=3D0.010000=
000 -> next=3D1746910385.922207038; delta=3D0.000000425 ticks rounded to 2[=
clock 0] now=3D1746910385.942206438, last=3D1746910385.922207038 + interval=
=3D0.010000000 -> next=3D1746910385.942207038; delta=3D0.000000600 ticks ro=
unded to 2[clock 0] now=3D1746910385.962206343, last=3D1746910385.942207038=
+ interval=3D0.010000000 -> next=3D1746910385.962207038; delta=3D0.0000006=
95 ticks rounded to 2[clock 0] now=3D1746910386.472204408, last=3D174691038=
6.462204553 + interval=3D0.010000000 -> next=3D1746910386.472204553; delta=
=3D0.000000145 ticks rounded to 2[clock 0] now=3D1746910386.502204526, last=
=3D1746910386.472204553 + interval=3D0.010000000 -> next=3D1746910386.50220=
4553; delta=3D0.000000027 ticks rounded to 2[clock 0] now=3D1746910386.5222=
04241, last=3D1746910386.502204553 + interval=3D0.010000000 -> next=3D17469=
10386.522204553; delta=3D0.000000312 ticks rounded to 2[clock 0] now=3D1746=
910386.542203735, last=3D1746910386.522204553 + interval=3D0.010000000 -> n=
ext=3D1746910386.542204553; delta=3D0.000000818 ticks rounded to 2[clock 0]=
now=3D1746910386.562204231, last=3D1746910386.542204553 + interval=3D0.010=
000000 -> next=3D1746910386.562204553; delta=3D0.000000322 ticks rounded to=
2[clock 0] now=3D1746910386.582204146, last=3D1746910386.562204553 + inter=
val=3D0.010000000 -> next=3D1746910386.582204553; delta=3D0.000000407 ticks=
rounded to 2[clock 0] now=3D1746910386.602204010, last=3D1746910386.582204=
553 + interval=3D0.010000000 -> next=3D1746910386.602204553; delta=3D0.0000=
00543 ticks rounded to 2[clock 0] now=3D1746910386.622204025, last=3D174691=
0386.602204553 + interval=3D0.010000000 -> next=3D1746910386.622204553; del=
ta=3D0.000000528 ticks rounded to 2[clock 0] now=3D1746910387.052202191, la=
st=3D1746910387.042202326 + interval=3D0.010000000 -> next=3D1746910387.052=
202326; delta=3D0.000000135 ticks rounded to 2[clock 0] now=3D1746910387.07=
2202116, last=3D1746910387.052202326 + interval=3D0.010000000 -> next=3D174=
6910387.072202326; delta=3D0.000000210 ticks rounded to 2[clock 0] now=3D17=
46910387.092202121, last=3D1746910387.072202326 + interval=3D0.010000000 ->=
next=3D1746910387.092202326; delta=3D0.000000205 ticks rounded to 2[clock =
0] now=3D1746910387.112202096, last=3D1746910387.092202326 + interval=3D0.0=
10000000 -> next=3D1746910387.112202326; delta=3D0.000000230 ticks rounded =
to 2[clock 0] now=3D1746910387.132201981, last=3D1746910387.112202326 + int=
erval=3D0.010000000 -> next=3D1746910387.132202326; delta=3D0.000000345 tic=
ks rounded to 2[clock 0] now=3D1746910387.152201905, last=3D1746910387.1322=
02326 + interval=3D0.010000000 -> next=3D1746910387.152202326; delta=3D0.00=
0000421 ticks rounded to 2[clock 0] now=3D1746910387.172201840, last=3D1746=
910387.152202326 + interval=3D0.010000000 -> next=3D1746910387.172202326; d=
elta=3D0.000000486 ticks rounded to 2[clock 0] now=3D1746910387.192201635, =
last=3D1746910387.172202326 + interval=3D0.010000000 -> next=3D1746910387.1=
92202326; delta=3D0.000000691 ticks rounded to 2[clock 0] now=3D1746910387.=
212201680, last=3D1746910387.192202326 + interval=3D0.010000000 -> next=3D1=
746910387.212202326; delta=3D0.000000646 ticks rounded to 2[clock 0] now=3D=
1746910387.232201515, last=3D1746910387.212202326 + interval=3D0.010000000 =
-> next=3D1746910387.232202326; delta=3D0.000000811 ticks rounded to 2[cloc=
k 0] now=3D1746910388.292197515, last=3D1746910388.282197580 + interval=3D0=
=2E010000000 -> next=3D1746910388.292197580; delta=3D0.000000065 ticks roun=
ded to 2[clock 0] now=3D1746910388.312197550, last=3D1746910388.292197580 +=
interval=3D0.010000000 -> next=3D1746910388.312197580; delta=3D0.000000030=
ticks rounded to 2[clock 0] now=3D1746910388.332197394, last=3D1746910388.=
312197580 + interval=3D0.010000000 -> next=3D1746910388.332197580; delta=3D=
0.000000186 ticks rounded to 2[clock 0] now=3D1746910388.352197480, last=3D=
1746910388.332197580 + interval=3D0.010000000 -> next=3D1746910388.35219758=
0; delta=3D0.000000100 ticks rounded to 2[clock 0] now=3D1746910388.3721973=
14, last=3D1746910388.352197580 + interval=3D0.010000000 -> next=3D17469103=
88.372197580; delta=3D0.000000266 ticks rounded to 2[clock 0] now=3D1746910=
388.392196879, last=3D1746910388.372197580 + interval=3D0.010000000 -> next=
=3D1746910388.392197580; delta=3D0.000000701 ticks rounded to 2[clock 0] no=
w=3D1746910388.412197174, last=3D1746910388.392197580 + interval=3D0.010000=
000 -> next=3D1746910388.412197580; delta=3D0.000000406 ticks rounded to 2[=
clock 0] now=3D1746910388.432197169, last=3D1746910388.412197580 + interval=
=3D0.010000000 -> next=3D1746910388.432197580; delta=3D0.000000411 ticks ro=
unded to 2[clock 0] now=3D1746910388.452197054, last=3D1746910388.432197580=
+ interval=3D0.010000000 -> next=3D1746910388.452197580; delta=3D0.0000005=
26 ticks rounded to 2[clock 0] now=3D1746910388.472196939, last=3D174691038=
8.452197580 + interval=3D0.010000000 -> next=3D1746910388.472197580; delta=
=3D0.000000641 ticks rounded to 2[clock 0] now=3D1746910388.492196974, last=
=3D1746910388.472197580 + interval=3D0.010000000 -> next=3D1746910388.49219=
7580; delta=3D0.000000606 ticks rounded to 2[clock 0] now=3D1746910388.5321=
96764, last=3D1746910388.512197580 + interval=3D0.010000000 -> next=3D17469=
10388.532197580; delta=3D0.000000816 ticks rounded to 2[clock 0] now=3D1746=
910388.552196728, last=3D1746910388.532197580 + interval=3D0.010000000 -> n=
ext=3D1746910388.552197580; delta=3D0.000000852 ticks rounded to 2[clock 0]=
now=3D1746910389.252194109, last=3D1746910389.252194147 + interval=3D0.010=
000000 -> next=3D1746910389.252194147; delta=3D0.000000038 ticks rounded to=
2[clock 0] now=3D1746910389.272193854, last=3D1746910389.252194147 + inter=
val=3D0.010000000 -> next=3D1746910389.272194147; delta=3D0.000000293 ticks=
rounded to 2[clock 0] now=3D1746910389.292193849, last=3D1746910389.272194=
147 + interval=3D0.010000000 -> next=3D1746910389.292194147; delta=3D0.0000=
00298 ticks rounded to 2[clock 0] now=3D1746910389.312193694, last=3D174691=
0389.292194147 + interval=3D0.010000000 -> next=3D1746910389.312194147; del=
ta=3D0.000000453 ticks rounded to 2[clock 0] now=3D1746910389.332193819, la=
st=3D1746910389.312194147 + interval=3D0.010000000 -> next=3D1746910389.332=
194147; delta=3D0.000000328 ticks rounded to 2[clock 0] now=3D1746910389.35=
2193734, last=3D1746910389.332194147 + interval=3D0.010000000 -> next=3D174=
6910389.352194147; delta=3D0.000000413 ticks rounded to 2[clock 0] now=3D17=
46910389.392193473, last=3D1746910389.372194147 + interval=3D0.010000000 ->=
next=3D1746910389.392194147; delta=3D0.000000674 ticks rounded to 2[clock =
0] now=3D1746910389.412193418, last=3D1746910389.392194147 + interval=3D0.0=
10000000 -> next=3D1746910389.412194147; delta=3D0.000000729 ticks rounded =
to 2[clock 0] now=3D1746910389.432193323, last=3D1746910389.412194147 + int=
erval=3D0.010000000 -> next=3D1746910389.432194147; delta=3D0.000000824 tic=
ks rounded to 2[clock 0] now=3D1746910389.452193368, last=3D1746910389.4321=
94147 + interval=3D0.010000000 -> next=3D1746910389.452194147; delta=3D0.00=
0000779 ticks rounded to 2[clock 0] now=3D1746910389.472193323, last=3D1746=
910389.452194147 + interval=3D0.010000000 -> next=3D1746910389.472194147; d=
elta=3D0.000000824 ticks rounded to 2[clock 0] now=3D1746910391.552185369, =
last=3D1746910391.542185429 + interval=3D0.010000000 -> next=3D1746910391.5=
52185429; delta=3D0.000000060 ticks rounded to 2[clock 0] now=3D1746910391.=
572185394, last=3D1746910391.552185429 + interval=3D0.010000000 -> next=3D1=
746910391.572185429; delta=3D0.000000035 ticks rounded to 2[clock 0] now=3D=
1746910391.592185269, last=3D1746910391.572185429 + interval=3D0.010000000 =
-> next=3D1746910391.592185429; delta=3D0.000000160 ticks rounded to 2[cloc=
k 0] now=3D1746910391.612185154, last=3D1746910391.592185429 + interval=3D0=
=2E010000000 -> next=3D1746910391.612185429; delta=3D0.000000275 ticks roun=
ded to 2[clock 0] now=3D1746910391.632185199, last=3D1746910391.612185429 +=
interval=3D0.010000000 -> next=3D1746910391.632185429; delta=3D0.000000230=
ticks rounded to 2[clock 0] now=3D1746910391.652185074, last=3D1746910391.=
632185429 + interval=3D0.010000000 -> next=3D1746910391.652185429; delta=3D=
0.000000355 ticks rounded to 2[clock 0] now=3D1746910391.672185189, last=3D=
1746910391.652185429 + interval=3D0.010000000 -> next=3D1746910391.67218542=
9; delta=3D0.000000240 ticks rounded to 2[clock 0] now=3D1746910391.6921849=
53, last=3D1746910391.672185429 + interval=3D0.010000000 -> next=3D17469103=
91.692185429; delta=3D0.000000476 ticks rounded to 2[clock 0] now=3D1746910=
391.712184718, last=3D1746910391.692185429 + interval=3D0.010000000 -> next=
=3D1746910391.712185429; delta=3D0.000000711 ticks rounded to 2[clock 0] no=
w=3D1746910392.492181847, last=3D1746910392.482181852 + interval=3D0.010000=
000 -> next=3D1746910392.492181852; delta=3D0.000000005 ticks rounded to 2[=
clock 0] now=3D1746910392.512181802, last=3D1746910392.492181852 + interval=
=3D0.010000000 -> next=3D1746910392.512181852; delta=3D0.000000050 ticks ro=
unded to 2[clock 0] now=3D1746910392.532181707, last=3D1746910392.512181852=
+ interval=3D0.010000000 -> next=3D1746910392.532181852; delta=3D0.0000001=
45 ticks rounded to 2[clock 0] now=3D1746910392.552181762, last=3D174691039=
2.532181852 + interval=3D0.010000000 -> next=3D1746910392.552181852; delta=
=3D0.000000090 ticks rounded to 2[clock 0] now=3D1746910392.572181556, last=
=3D1746910392.552181852 + interval=3D0.010000000 -> next=3D1746910392.57218=
1852; delta=3D0.000000296 ticks rounded to 2[clock 0] now=3D1746910392.5921=
81431, last=3D1746910392.572181852 + interval=3D0.010000000 -> next=3D17469=
10392.592181852; delta=3D0.000000421 ticks rounded to 2[clock 0] now=3D1746=
910392.612181386, last=3D1746910392.592181852 + interval=3D0.010000000 -> n=
ext=3D1746910392.612181852; delta=3D0.000000466 ticks rounded to 2[clock 0]=
now=3D1746910392.632181381, last=3D1746910392.612181852 + interval=3D0.010=
000000 -> next=3D1746910392.632181852; delta=3D0.000000471 ticks rounded to=
2[clock 0] now=3D1746910392.652181196, last=3D1746910392.632181852 + inter=
val=3D0.010000000 -> next=3D1746910392.652181852; delta=3D0.000000656 ticks=
rounded to 2[clock 0] now=3D1746910393.282178998, last=3D1746910393.272179=
026 + interval=3D0.010000000 -> next=3D1746910393.282179026; delta=3D0.0000=
00028 ticks rounded to 2[clock 0] now=3D1746910393.302178773, last=3D174691=
0393.282179026 + interval=3D0.010000000 -> next=3D1746910393.302179026; del=
ta=3D0.000000253 ticks rounded to 2[clock 0] now=3D1746910393.322178928, la=
st=3D1746910393.302179026 + interval=3D0.010000000 -> next=3D1746910393.322=
179026; delta=3D0.000000098 ticks rounded to 2[clock 0] now=3D1746910393.34=
2178893, last=3D1746910393.322179026 + interval=3D0.010000000 -> next=3D174=
6910393.342179026; delta=3D0.000000133 ticks rounded to 2[clock 0] now=3D17=
46910393.362178577, last=3D1746910393.342179026 + interval=3D0.010000000 ->=
next=3D1746910393.362179026; delta=3D0.000000449 ticks rounded to 2[clock =
0] now=3D1746910393.382178482, last=3D1746910393.362179026 + interval=3D0.0=
10000000 -> next=3D1746910393.382179026; delta=3D0.000000544 ticks rounded =
to 2[clock 0] now=3D1746910394.102175736, last=3D1746910394.102176034 + int=
erval=3D0.010000000 -> next=3D1746910394.102176034; delta=3D0.000000298 tic=
ks rounded to 2[clock 0] now=3D1746910394.122175781, last=3D1746910394.1021=
76034 + interval=3D0.010000000 -> next=3D1746910394.122176034; delta=3D0.00=
0000253 ticks rounded to 2[clock 0] now=3D1746910394.142175936, last=3D1746=
910394.122176034 + interval=3D0.010000000 -> next=3D1746910394.142176034; d=
elta=3D0.000000098 ticks rounded to 2[clock 0] now=3D1746910394.162175580, =
last=3D1746910394.142176034 + interval=3D0.010000000 -> next=3D1746910394.1=
62176034; delta=3D0.000000454 ticks rounded to 2[clock 0] now=3D1746910394.=
182175495, last=3D1746910394.162176034 + interval=3D0.010000000 -> next=3D1=
746910394.182176034; delta=3D0.000000539 ticks rounded to 2[clock 0] now=3D=
1746910394.202175420, last=3D1746910394.182176034 + interval=3D0.010000000 =
-> next=3D1746910394.202176034; delta=3D0.000000614 ticks rounded to 2[cloc=
k 0] now=3D1746910394.222175365, last=3D1746910394.202176034 + interval=3D0=
=2E010000000 -> next=3D1746910394.222176034; delta=3D0.000000669 ticks roun=
ded to 2[clock 0] now=3D1746910394.242175240, last=3D1746910394.222176034 +=
interval=3D0.010000000 -> next=3D1746910394.242176034; delta=3D0.000000794=
ticks rounded to 2[clock 0] now=3D1746910394.262175185, last=3D1746910394.=
242176034 + interval=3D0.010000000 -> next=3D1746910394.262176034; delta=3D=
0.000000849 ticks rounded to 2[clock 0] now=3D1746910394.282175150, last=3D=
1746910394.262176034 + interval=3D0.010000000 -> next=3D1746910394.28217603=
4; delta=3D0.000000884 ticks rounded to 2[clock 0] now=3D1746910394.3621748=
39, last=3D1746910394.352175014 + interval=3D0.010000000 -> next=3D17469103=
94.362175014; delta=3D0.000000175 ticks rounded to 2[clock 0] now=3D1746910=
394.382174774, last=3D1746910394.372174839 + interval=3D0.010000000 -> next=
=3D1746910394.382174839; delta=3D0.000000065 ticks rounded to 2[clock 0] no=
w=3D1746910395.172171864, last=3D1746910395.162171999 + interval=3D0.010000=
000 -> next=3D1746910395.172171999; delta=3D0.000000135 ticks rounded to 2[=
clock 0] now=3D1746910395.192171668, last=3D1746910395.172171999 + interval=
=3D0.010000000 -> next=3D1746910395.192171999; delta=3D0.000000331 ticks ro=
unded to 2[clock 0] now=3D1746910395.212171693, last=3D1746910395.192171999=
+ interval=3D0.010000000 -> next=3D1746910395.212171999; delta=3D0.0000003=
06 ticks rounded to 2[clock 0] now=3D1746910395.232171348, last=3D174691039=
5.212171999 + interval=3D0.010000000 -> next=3D1746910395.232171999; delta=
=3D0.000000651 ticks rounded to 2[clock 0] now=3D1746910395.252171142, last=
=3D1746910395.232171999 + interval=3D0.010000000 -> next=3D1746910395.25217=
1999; delta=3D0.000000857 ticks rounded to 2[clock 0] now=3D1746910395.2721=
71388, last=3D1746910395.252171999 + interval=3D0.010000000 -> next=3D17469=
10395.272171999; delta=3D0.000000611 ticks rounded to 2[clock 0] now=3D1746=
910395.292171403, last=3D1746910395.272171999 + interval=3D0.010000000 -> n=
ext=3D1746910395.292171999; delta=3D0.000000596 ticks rounded to 2[clock 0]=
now=3D1746910395.312171337, last=3D1746910395.292171999 + interval=3D0.010=
000000 -> next=3D1746910395.312171999; delta=3D0.000000662 ticks rounded to=
2[clock 0] now=3D1746910395.332171202, last=3D1746910395.312171999 + inter=
val=3D0.010000000 -> next=3D1746910395.332171999; delta=3D0.000000797 ticks=
rounded to 2[clock 0] now=3D1746910395.352171107, last=3D1746910395.332171=
999 + interval=3D0.010000000 -> next=3D1746910395.352171999; delta=3D0.0000=
00892 ticks rounded to 2[clock 0] now=3D1746910395.372171012, last=3D174691=
0395.352171999 + interval=3D0.010000000 -> next=3D1746910395.372171999; del=
ta=3D0.000000987 ticks rounded to 2[clock 0] now=3D1746910395.412170951, la=
st=3D1746910395.402170957 + interval=3D0.010000000 -> next=3D1746910395.412=
170957; delta=3D0.000000006 ticks rounded to 2[clock 0] now=3D1746910395.45=
2170290, last=3D1746910395.442170957 + interval=3D0.010000000 -> next=3D174=
6910395.452170957; delta=3D0.000000667 ticks rounded to 2[clock 0] now=3D17=
46910395.472170626, last=3D1746910395.452170957 + interval=3D0.010000000 ->=
next=3D1746910395.472170957; delta=3D0.000000331 ticks rounded to 2[clock =
0] now=3D1746910395.492170550, last=3D1746910395.472170957 + interval=3D0.0=
10000000 -> next=3D1746910395.492170957; delta=3D0.000000407 ticks rounded =
to 2[clock 0] now=3D1746910395.512170626, last=3D1746910395.492170957 + int=
erval=3D0.010000000 -> next=3D1746910395.512170957; delta=3D0.000000331 tic=
ks rounded to 2[clock 0] now=3D1746910395.532169969, last=3D1746910395.5121=
70957 + interval=3D0.010000000 -> next=3D1746910395.532170957; delta=3D0.00=
0000988 ticks rounded to 2[clock 0] now=3D1746910395.552170435, last=3D1746=
910395.532170957 + interval=3D0.010000000 -> next=3D1746910395.552170957; d=
elta=3D0.000000522 ticks rounded to 2[clock 0] now=3D1746910395.572170270, =
last=3D1746910395.552170957 + interval=3D0.010000000 -> next=3D1746910395.5=
72170957; delta=3D0.000000687 ticks rounded to 2[clock 0] now=3D1746910395.=
592170114, last=3D1746910395.572170957 + interval=3D0.010000000 -> next=3D1=
746910395.592170957; delta=3D0.000000843 ticks rounded to 2[clock 0] now=3D=
1746910395.822169400, last=3D1746910395.812169500 + interval=3D0.010000000 =
-> next=3D1746910395.822169500; delta=3D0.000000100 ticks rounded to 2[cloc=
k 0] now=3D1746910395.842169355, last=3D1746910395.822169500 + interval=3D0=
=2E010000000 -> next=3D1746910395.842169500; delta=3D0.000000145 ticks roun=
ded to 2[clock 0] now=3D1746910395.862169259, last=3D1746910395.842169500 +=
interval=3D0.010000000 -> next=3D1746910395.862169500; delta=3D0.000000241=
ticks rounded to 2[clock 0] now=3D1746910395.882169094, last=3D1746910395.=
862169500 + interval=3D0.010000000 -> next=3D1746910395.882169500; delta=3D=
0.000000406 ticks rounded to 2[clock 0] now=3D1746910395.902169099, last=3D=
1746910395.882169500 + interval=3D0.010000000 -> next=3D1746910395.90216950=
0; delta=3D0.000000401 ticks rounded to 2[clock 0] now=3D1746910395.9221691=
84, last=3D1746910395.902169500 + interval=3D0.010000000 -> next=3D17469103=
95.922169500; delta=3D0.000000316 ticks rounded to 2[clock 0] now=3D1746910=
395.9
and then later the panic line for this afternoon:
May 11 14:21:05 exadelic savecore: reboot after panic: panic: kernel diagno=
stic assertion "*timo > 0" failed: file "/usr/src/sys/kern/subr_time.c", li=
ne 261=20
(gdb) target kvm netbsd.16.core
0xffffffff80239b85 in cpu_reboot (howto=3Dhowto@entry=3D260, bootstr=3Dboot=
str@entry=3D0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:709
warning: 709 /usr/src/sys/arch/amd64/amd64/machdep.c: No such file or di=
rectory
(gdb) bt
#0 0xffffffff80239b85 in cpu_reboot (howto=3Dhowto@entry=3D260, bootstr=3D=
bootstr@entry=3D0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:709
#1 0xffffffff80de3b15 in kern_reboot (howto=3Dhowto@entry=3D260, bootstr=
=3Dbootstr@entry=3D0x0) at /usr/src/sys/kern/kern_reboot.c:91
#2 0xffffffff80e29be3 in vpanic (fmt=3D0xffffffff81427440 "kernel %sassert=
ion \"%s\" failed: file \"%s\", line %d ", ap=3Dap@entry=3D0xffff85a4b2730e=
48) at /usr/src/sys/kern/subr_prf.c:288
#3 0xffffffff80ff848c in kern_assert (fmt=3Dfmt@entry=3D0xffffffff81427440=
"kernel %sassertion \"%s\" failed: file \"%s\", line %d ") at /usr/src/sys=
/lib/libkern/kern_assert.c:51
#4 0xffffffff80e2c0fd in ts2timo (clock_id=3Dclock_id@entry=3D3, flags=3Df=
lags@entry=3D0, ts=3Dts@entry=3D0xffff85a4b2730f70, timo=3Dtimo@entry=3D0xf=
fff85a4b2730eec, start=3Dstart@entry=3D0xffff85a4b2730ef0)
at /usr/src/sys/kern/subr_time.c:261
#5 0xffffffff80dffb54 in nanosleep1 (l=3Dl@entry=3D0xffff80851713a800, clo=
ck_id=3Dclock_id@entry=3D3, flags=3Dflags@entry=3D0, rqt=3Drqt@entry=3D0xff=
ff85a4b2730f70, rmt=3D0xffff85a4b2730f60)
at /usr/src/sys/kern/kern_time.c:420
#6 0xffffffff80dffdbb in sys___nanosleep50 (l=3D0xffff80851713a800, uap=3D=
0xffff85a4b2731000, retval=3D<optimized out>) at /usr/src/sys/kern/kern_tim=
e.c:373
#7 0xffffffff805b7cc1 in sy_call (rval=3D0xffff85a4b2730fb0, uap=3D0xffff8=
5a4b2731000, l=3D0xffff80851713a800, sy=3D0xffffffff8188a3d0 <sysent+10320>=
) at /usr/src/sys/sys/syscallvar.h:65
#8 sy_invoke (code=3D430, rval=3D0xffff85a4b2730fb0, uap=3D0xffff85a4b2731=
000, l=3D0xffff80851713a800, sy=3D0xffffffff8188a3d0 <sysent+10320>) at /us=
r/src/sys/sys/syscallvar.h:94
#9 syscall (frame=3D0xffff85a4b2731000) at /usr/src/sys/arch/x86/x86/sysca=
ll.c:137
#10 0xffffffff8021025d in handle_syscall ()
(gdb) fr 4
#4 0xffffffff80e2c0fd in ts2timo (clock_id=3Dclock_id@entry=3D3, flags=3Df=
lags@entry=3D0, ts=3Dts@entry=3D0xffff85a4b2730f70, timo=3Dtimo@entry=3D0xf=
fff85a4b2730eec, start=3Dstart@entry=3D0xffff85a4b2730ef0)
at /usr/src/sys/kern/subr_time.c:261
warning: 261 /usr/src/sys/kern/subr_time.c: No such file or directory
(gdb) p *ts
$1 =3D {tv_sec =3D 9223372036854775807, tv_nsec =3D 999999999}
(gdb) p *timo
$2 =3D 0
(gdb) p *start
$3 =3D {tv_sec =3D 81605, tv_nsec =3D 753968635}
(gdb)
Thomas
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Thomas Klausner <wiz@NetBSD.org>, Patrick Welche <prlw1@welche.eu>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Mon, 12 May 2025 02:26:47 +0000
Sixth time's the charm, right? (This time I checked to make sure all
the printfs end with \n! And I have removed the noisy one which is
evidently no longer valuable.)
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Thomas Klausner <wiz@NetBSD.org>, Patrick Welche <prlw1@welche.eu>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Mon, 12 May 2025 02:27:35 +0000
This is a multi-part message in MIME format.
--=_9mocicosntHKdkPBqLK6zsx2at0gzuB8
Sixth time's the charm, right? (This time I checked to make sure all
the printfs end with \n! And I have removed the noisy one which is
evidently no longer valuable.)
--=_9mocicosntHKdkPBqLK6zsx2at0gzuB8
Content-Type: text/plain; charset="ISO-8859-1"; name="pr59339-itimercalloutassert-v6"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr59339-itimercalloutassert-v6.patch"
diff -r 55a6ded9e19e -r 3a9986775d9a sys/kern/kern_time.c
--- a/sys/kern/kern_time.c Thu May 08 20:51:40 2025 +0000
+++ b/sys/kern/kern_time.c Sun May 11 23:38:46 2025 +0000
@@ -837,6 +837,91 @@ itimer_arm_real(struct itimer * const it
}
=20
/*
+ * itimer_rearm_real:
+ *
+ * Re-arm a non-virtual timer, given the current clock readout.
+ */
+static void
+itimer_rearm_real(struct itimer * const it, const struct timespec * const =
now,
+ const struct timespec * const last)
+{
+ const struct timespec * const interval =3D &it->it_time.it_interval;
+ const struct timespec * const next =3D &it->it_time.it_value;
+ struct timespec delta;
+ int ticks;
+
+ KASSERT(!it->it_dying);
+ KASSERT(!CLOCK_VIRTUAL_P(it->it_clockid));
+ KASSERT(!callout_pending(&it->it_ch));
+
+ if (__predict_true(timespecisset(next))) {
+ struct timespec now1;
+
+ if (it->it_clockid =3D=3D CLOCK_MONOTONIC) {
+ getnanouptime(&now1);
+ } else {
+ getnanotime(&now1);
+ }
+ if (timespeccmp(next, &now1, <=3D)) {
+ printf("PR kern/59339 (heartbeat watchdog fires"
+ " since 10.99.14) would fire:"
+ " [clock %u]"
+ " [now=3D%lld.%09ld -> %lld.%09ld],"
+ " last=3D%lld.%09ld + interval=3D%lld.%09ld"
+ " -> next=3D%lld.%09ld\n",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)now1.tv_sec, (long)now1.tv_nsec,
+ (long long)last->tv_sec, (long)last->tv_nsec,
+ (long long)interval->tv_sec,
+ (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec);
+ }
+
+ KASSERTMSG(timespeccmp(now, next, <),
+ "[clock %u]"
+ " now=3D%lld.%09ld,"
+ " last=3D%lld.%09ld + interval=3D%lld.%09ld"
+ " -> next=3D%lld.%09ld",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)last->tv_sec, (long)last->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec);
+ timespecsub(next, now, &delta);
+ } else {
+ /*
+ * If the arithmetic overflowed, just take the interval
+ * as the time to wait. This is unlikely to happen
+ * unless you are messing with your clock to set it
+ * ahead by hundreds of years.
+ */
+ printf("[clock %u]"
+ " itimer arithmetic overflowed:"
+ " now=3D%lld.%09ld, last=3D%lld.%09ld + interval=3D%lld.%09ld\n",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)last->tv_sec, (long)last->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec);
+ delta =3D it->it_time.it_interval; /* overflow */
+ }
+ ticks =3D tstohz(&delta);
+ KASSERTMSG(ticks > 0,
+ "[clock %u]"
+ " now=3D%lld.%09ld,"
+ " last=3D%lld.%09ld + interval=3D%lld.%09ld -> next=3D%lld.%09ld;"
+ " delta=3D%lld.%09ld ticks=3D%d",
+ it->it_clockid,
+ (long long)now->tv_sec, (long)now->tv_nsec,
+ (long long)last->tv_sec, (long)last->tv_nsec,
+ (long long)interval->tv_sec, (long)interval->tv_nsec,
+ (long long)next->tv_sec, (long)next->tv_nsec,
+ (long long)delta.tv_sec, (long)delta.tv_nsec,
+ ticks);
+ callout_schedule(&it->it_ch, ticks);
+}
+
+/*
* itimer_callout:
*
* Callout to expire a non-virtual timer. Queue it up for processing,
@@ -848,7 +933,7 @@ itimer_arm_real(struct itimer * const it
static void
itimer_callout(void *arg)
{
- struct timespec now, next;
+ struct timespec last, now, next;
struct itimer * const it =3D arg;
int overruns;
=20
@@ -871,7 +956,20 @@ itimer_callout(void *arg)
* Given the current itimer value and interval and the time
* now, compute the next itimer value and count overruns.
*/
+ last =3D it->it_time.it_value;
itimer_transition(&it->it_time, &now, &next, &overruns);
+ KASSERTMSG(timespeccmp(&now, &next, <),
+ "[clock %u]"
+ " it->it_time.it_value=3D%lld.%09ld"
+ " it->it_time.it_interval=3D%lld.%09ld"
+ " now=3D%lld.%09ld next=3D%lld.%09ld",
+ it->it_clockid,
+ (long long)it->it_time.it_value.tv_sec,
+ (long)it->it_time.it_value.tv_nsec,
+ (long long)it->it_time.it_interval.tv_sec,
+ (long)it->it_time.it_interval.tv_nsec,
+ (long long)now.tv_sec, (long)now.tv_nsec,
+ (long long)next.tv_sec, (long)next.tv_nsec);
it->it_time.it_value =3D next;
it->it_overruns +=3D overruns;
=20
@@ -879,7 +977,7 @@ itimer_callout(void *arg)
* Reset the callout, if it's not going away.
*/
if (!it->it_dying)
- itimer_arm_real(it);
+ itimer_rearm_real(it, &now, &last);
itimer_unlock();
}
=20
diff -r 55a6ded9e19e -r 3a9986775d9a sys/kern/subr_time.c
--- a/sys/kern/subr_time.c Thu May 08 20:51:40 2025 +0000
+++ b/sys/kern/subr_time.c Sun May 11 23:38:46 2025 +0000
@@ -86,17 +86,29 @@ tshztoup(const struct timespec *tsp)
=20
/*
* Compute number of ticks in the specified amount of time.
+ *
+ * Round up, clamped to INT_MAX. Return 0 iff ts <=3D 0.
*/
int
tstohz(const struct timespec *ts)
{
struct timeval tv;
=20
+ KASSERT(ts->tv_nsec >=3D 0);
+ KASSERT(ts->tv_nsec < 1000000000);
+
/*
* usec has great enough resolution for hz, so convert to a
* timeval and use tvtohz() above.
*/
- TIMESPEC_TO_TIMEVAL(&tv, ts);
+ tv.tv_sec =3D ts->tv_sec;
+ tv.tv_usec =3D (ts->tv_nsec + 999)/1000;
+ if (tv.tv_usec >=3D 1000000) {
+ if (__predict_false(tv.tv_sec =3D=3D __type_max(time_t)))
+ return INT_MAX;
+ tv.tv_sec++;
+ tv.tv_usec -=3D 1000000;
+ }
return tvtohz(&tv);
}
=20
@@ -248,7 +260,8 @@ ts2timo(clockid_t clock_id, int flags, s
return ETIMEDOUT;
=20
*timo =3D tstohz(ts);
- KASSERT(*timo > 0);
+ KASSERTMSG(*timo > 0, "ts=3D%lld.%09ld *timo=3D%d",
+ (long long)ts->tv_sec, (long)ts->tv_nsec, *timo);
=20
return 0;
}
--=_9mocicosntHKdkPBqLK6zsx2at0gzuB8--
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc: prlw1@cam.ac.uk, riastradh@NetBSD.org, wiz@NetBSD.org
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Sat, 7 Jun 2025 10:33:41 +0100
On Mon, May 12, 2025 at 02:30:04AM +0000, Taylor R Campbell via gnats wrote:
> Sixth time's the charm, right?
Apparently so - no hang-ups in nearly a month!
From: Thomas Klausner <wiz@NetBSD.org>
To: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Cc:
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Sat, 7 Jun 2025 11:36:57 +0200
On Sat, Jun 07, 2025 at 10:33:41AM +0100, Patrick Welche wrote:
> On Mon, May 12, 2025 at 02:30:04AM +0000, Taylor R Campbell via gnats wrote:
> > Sixth time's the charm, right?
>
> Apparently so - no hang-ups in nearly a month!
Not for me either.
Thomas
State-Changed-From-To: feedback->open
State-Changed-By: prlw1@NetBSD.org
State-Changed-When: Sat, 21 Jun 2025 07:41:57 +0000
State-Changed-Why:
positive feedback provided. OK to commit patch?
From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Fri, 29 Aug 2025 20:30:11 -0400
I'd opened PR kern/59465 with what seemed to be specific reproducers
(for me). Later I came across this PR and applied the last version of
the patches there. I haven't been able to reproduce the issue with those
patches in place, so I'm guessing my PR is a duplicate of this.
Thanks,
Dave
From: Taylor R Campbell <riastradh@NetBSD.org>
To: prlw1@NetBSD.org, prlw1@cam.ac.uk, wiz@NetBSD.org
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/59339 (heartbeat watchdog fires since 10.99.14)
Date: Sun, 5 Oct 2025 14:03:46 +0000
> Date: Sat, 21 Jun 2025 07:41:57 +0000 (UTC)
> From: prlw1@NetBSD.org
>=20
> positive feedback provided. OK to commit patch?
Those of you who have been running with the latest patch (v6): have
you seen any of the prints in dmesg?
They would look something like this (with different numbers):
PR kern/59339 (heartbeat watchdog fires since 10.99.14) would fire: [clock =
3] [now=3D123.123450000 -> 123.123460000], last=3D123.123445000 + interval=
=3D0.000010000 -> next=3D123.123455000
Or (extremely unlikely unless you're fiddling with your clock to set
it forward by centuries):
[clock 0] itimer arithmetic overflowed: now=3D9223372036854775798.000000000=
, last=3D123.456789123 + interval=3D20.000000000
I think the part of the change that avoids these conditions is worth
keeping anyway but I'm curious to see if it is happening in practice.
From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/59339 (heartbeat watchdog fires since 10.99.14)
Date: Sun, 05 Oct 2025 16:04:49 -0400
On Sun, 5 Oct 2025 at 14:03:46 +0000, Taylor R Campbell wrote:
> Those of you who have been running with the latest patch (v6): have
> you seen any of the prints in dmesg?
>=20
> They would look something like this (with different numbers):
>=20
> PR kern/59339 (heartbeat watchdog fires since 10.99.14) would fire:
> [clock 3] [now=3D123.123450000 -> 123.123460000], last=3D123.123445000 +
> interval=3D0.000010000 -> next=3D123.123455000
>=20
> Or (extremely unlikely unless you're fiddling with your clock to set
> it forward by centuries):
>=20
> [clock 0] itimer arithmetic overflowed:
> now=3D9223372036854775798.000000000, last=3D123.456789123 +
> interval=3D20.000000000
>=20
> I think the part of the change that avoids these conditions is worth
> keeping anyway but I'm curious to see if it is happening in practice.
I haven't seen either of these. Presently running 11.99.3 with the v6
patches applied. (In my case current uptime about two weeks, a mix of
idling and intense activity building pkgsrc packages.)
Thanks,
Dave
State-Changed-From-To: open->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 06 Oct 2025 13:21:41 +0000
State-Changed-Why:
fixed in HEAD, individual issues for pullups tracked in separate PRs,
mainly:
PR kern/59692: tstohz(9) fails to round up on some inputs
https://gnats.NetBSD.org/59691
From: Patrick Welche <prlw1@welche.eu>
To: gnats-bugs@netbsd.org
Cc: prlw1@cam.ac.uk, riastradh@NetBSD.org, wiz@NetBSD.org
Subject: Re: kern/59339 (heartbeat watchdog fires since 10.99.14)
Date: Mon, 6 Oct 2025 17:14:06 +0100
On Sun, Oct 05, 2025 at 02:05:01PM +0000, Taylor R Campbell via gnats wrote:
> Those of you who have been running with the latest patch (v6): have
> you seen any of the prints in dmesg?
>
> They would look something like this (with different numbers):
>
> PR kern/59339 (heartbeat watchdog fires since 10.99.14) would fire: [clock =
> 3] [now=3D123.123450000 -> 123.123460000], last=3D123.123445000 + interval=
> =3D0.000010000 -> next=3D123.123455000
>
> Or (extremely unlikely unless you're fiddling with your clock to set
> it forward by centuries):
>
> [clock 0] itimer arithmetic overflowed: now=3D9223372036854775798.000000000=
> , last=3D123.456789123 + interval=3D20.000000000
>
> I think the part of the change that avoids these conditions is worth
> keeping anyway but I'm curious to see if it is happening in practice.
No - all quiet in all these months, and for sure back to 14 September:
$ zgrep heartbeat /var/log/messages*z
$
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.