NetBSD Problem Report #58011

From www@netbsd.org  Fri Mar  8 23:18:51 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id C7CA91A923B
	for <gnats-bugs@gnats.NetBSD.org>; Fri,  8 Mar 2024 23:18:50 +0000 (UTC)
Message-Id: <20240308231849.778B41A923C@mollari.NetBSD.org>
Date: Fri,  8 Mar 2024 23:18:49 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: delay in panic reboot path can trigger heartbeat(9) messages in a loop
X-Send-Pr-Version: www-1.0

>Number:         58011
>Category:       kern
>Synopsis:       delay in panic reboot path can trigger heartbeat(9) messages in a loop
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Mar 08 23:20:00 +0000 2024
>Closed-Date:    Sun Mar 24 16:49:08 +0000 2024
>Last-Modified:  Sun Mar 24 16:49:08 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current
>Organization:
The HeartBSD Foundation
>Environment:
>Description:
If cpu0 panics but takes a while to reboot, hardclock ticks may continue to fire on (say) cpu1, leading heartbeat(9) to notice that cpu0 appears not to be making progress.

Since panicstr is already set, cpu1's IPI to cpu0 doesn't trigger another heartbeat panic.  But it does trigger a useless message to the console -- and at hz=100, it does this a hundred times per second, which quickly fills the dmesg history with useless messages.

It seems to me that the hardclock timer should stop running on all CPUs at this point.  However, short of that, there's no need for heartbeat(9) checks to happen at all after panicstr is already set -- they probably won't help diagnose anything.
>How-To-Repeat:
maybe by triggering a panic on a machine where dumps are configured but take >15sec
>Fix:
return early from heartbeat() if panicstr != NULL

>Release-Note:

>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/58011 CVS commit: src/sys/kern
Date: Fri, 8 Mar 2024 23:34:03 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Fri Mar  8 23:34:03 UTC 2024

 Modified Files:
 	src/sys/kern: kern_heartbeat.c

 Log Message:
 heartbeat(9): Return early if panicstr is set.

 This way we avoid doing unnecessary work -- and print unnecessary
 messages -- to _not_ trigger another panic anyway.

 PR kern/58011


 To generate a diff of this commit:
 cvs rdiff -u -r1.12 -r1.13 src/sys/kern/kern_heartbeat.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: maya@NetBSD.org
State-Changed-When: Sun, 24 Mar 2024 16:49:08 +0000
State-Changed-Why:
"Should be closed" - Riastradh


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.