NetBSD Problem Report #48651
From www@NetBSD.org Mon Mar 10 20:45:33 2014
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id B3AC4A57E9
for <gnats-bugs@gnats.NetBSD.org>; Mon, 10 Mar 2014 20:45:33 +0000 (UTC)
Message-Id: <20140310204532.7A997A5828@mollari.NetBSD.org>
Date: Mon, 10 Mar 2014 20:45:32 +0000 (UTC)
From: mbowie@rocket-space.com
Reply-To: mbowie@rocket-space.com
To: gnats-bugs@NetBSD.org
Subject: Lock up or panic under what appears to be heavy kevent load
X-Send-Pr-Version: www-1.0
>Number: 48651
>Category: kern
>Synopsis: Lock up or panic under what appears to be heavy kevent load
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Mar 10 20:50:00 +0000 2014
>Closed-Date: Thu Jan 07 08:52:48 +0000 2021
>Last-Modified: Thu Jan 07 08:52:48 +0000 2021
>Originator: Mike Bowie
>Release: 6.1.3
>Organization:
RocketSpace, Inc.
>Environment:
NetBSD sfo3-nms01.rocketstre.am 6.1.3 NetBSD 6.1.3 (GENERIC) amd64
>Description:
Generic kernel with polling Java (openjdk7-1.7.51) process (OpenNMS) running either panic's or locks up after a varying amount of time. (May be up to five days... may be less than five hours.)
>How-To-Repeat:
Not exactly sure how to make this sure reproducible, but we just cold-boot the box and start the monitoring process; then wait for it to drop off the network.
In our case the device is headless in a remote DC, so connecting back to the DRAC console either reveals the db{0}> prompt or an unresponsive console. (With no scroll-back.)
db{0}> bt
filt_sowrite() at netbsd:filt_sowrite+0x22
kevent1() at netbsd:kevent1+0x61f
sys___kevent50() at netbsd:sys___kevent50+0x33
syscall() at netbsd:syscall+0xc4
db{0}> cont
uvm_fault(0xfffffe842a143328, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff807901d8 cs 8 rflags 10287 cr2 0 cpl 0 rsp fffffe811dfe3990
kernel: page fault trap, code=0
Stopped in pid 2618.141268 (java) at netbsd:filt_sowrite+0x22: movq 0
(%rbx),%r14
db{0}>
>Fix:
>Release-Note:
>Audit-Trail:
From: Mike Bowie <mbowie@rocket-space.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/48651: Lock up or panic under what appears to be heavy
kevent load
Date: Mon, 14 Jul 2014 13:23:04 -0700
Reproducing the case under a kernel with DEBUG, DIAGNOSTIC and LOCKDEBUG
reveals the following after reboot:
Checking for core dump...
savecore: kvm_read: invalid translation (invalid level 2 PDE)
Jul 14 12:51:58 sfo3-nms01 savecore: reboot after panic: kernel
diagnostic assertion "kq->kq_count == 0" failed: file
"/usr/src/sys/kern/kern_event.c", line 1453
savecore: reboot after panic: kernel diagnostic assertion "kq->kq_count
== 0" failed: file "/usr/src/sys/kern/kern_event.c", line 1453
savecore: system went down at Sun Jul 13 18:55:19 2014
savecore: writing core to /usr/crash/netbsd.1.core
Responsible-Changed-From-To: port-amd64-maintainer->kern-bug-people
Responsible-Changed-By: maxv@NetBSD.org
Responsible-Changed-When: Tue, 15 Aug 2017 10:03:33 +0000
Responsible-Changed-Why:
Not specific to amd64. By the way, it seems to me that the issue got fixed
in kern_event.c::rev1.92
:w
State-Changed-From-To: open->closed
State-Changed-By: maya@NetBSD.org
State-Changed-When: Thu, 07 Jan 2021 08:52:48 +0000
State-Changed-Why:
Assuming fixed. kevent had a lot of bugs at the time and most of them are fixed. If you are still having issues, please let us know.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.