NetBSD Problem Report #34101
From jld@panix.com Fri Jul 28 03:14:00 2006
Return-Path: <jld@panix.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 1BC6063B850
for <gnats-bugs@gnats.NetBSD.org>; Fri, 28 Jul 2006 03:14:00 +0000 (UTC)
Message-Id: <200607280313.k6S3Dvp26825@byzantium.nyc.access.net>
Date: Thu, 27 Jul 2006 23:13:57 -0400 (EDT)
From: jld@panix.com
Reply-To: jld@panix.com
To: gnats-bugs@NetBSD.org
Subject: ltsleep during panic hangs system
X-Send-Pr-Version: 3.95
>Number: 34101
>Category: kern
>Synopsis: ltsleep during panic hangs system
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Jul 28 03:15:00 +0000 2006
>Originator: Jed Davis
>Release: NetBSD 3.0
>Organization:
PANIX Public Access Internet and UNIX, NYC
>Environment:
System: NetBSD panix3.panix.com 3.0 NetBSD 3.0 (PANIX-FIVE) #0: Fri Apr 14 21:05:29 EDT 2006 root@juggler.panix.com:/devel/netbsd/3.0/src/sys/arch/i386/compile/PANIX-FIVE i386
Architecture: i386
Machine: i386
>Description:
The top of ltsleep() contains this:
/*
* XXXSMP
* This is probably bogus. Figure out what the right
* thing to do here really is.
* Note that not sleeping if ltsleep is called with curlwp == NULL
* in the shutdown case is disgusting but partly necessary given
* how shutdown (barely) works.
*/
if (cold || (doing_shutdown && (panicstr || (l == NULL)))) {
/*
* After a panic, or during autoconfiguration,
* just give interrupts a chance, then just return;
* don't run any other procs or panic below,
* in case this is the idle process and already asleep.
*/
The problem with that is that, if the system is panicking and trying
to reboot (which may include an attempt to sync disks), and a kernel
thread that loops calling ltsleep to wait for work (e.g., aiodoned, or
i386's MD apm_thread) gets woken up, it will run forever and the
system will never succeed in rebooting.
However, it appears to be like that for a reason, and thus that the
correct solution is not to just yank it out and try to sleep normally.
PR port-i386/33353 was opened to the specific instance of this problem
with apm_thread, in which special case it might be reasonable to have
the affected thread just exit if it's woken during a panic -- but that
seems like not the right solution somehow (even if it'd work).
>How-To-Repeat:
This happens most of the time when a host at Panix experiences a panic;
enough that we've had to locally modify swwdog(4) to pass RB_NOSYNC and
use it as a workaround.
>Fix:
That's what I'm filing this PR to find out. A somewhat distasteful
workaround is noted above.
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.