NetBSD Problem Report #48651

From www@NetBSD.org  Mon Mar 10 20:45:33 2014
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id B3AC4A57E9
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 10 Mar 2014 20:45:33 +0000 (UTC)
Message-Id: <20140310204532.7A997A5828@mollari.NetBSD.org>
Date: Mon, 10 Mar 2014 20:45:32 +0000 (UTC)
From: mbowie@rocket-space.com
Reply-To: mbowie@rocket-space.com
To: gnats-bugs@NetBSD.org
Subject: Lock up or panic under what appears to be heavy kevent load
X-Send-Pr-Version: www-1.0

>Number:         48651
>Category:       kern
>Synopsis:       Lock up or panic under what appears to be heavy kevent load
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 10 20:50:00 +0000 2014
>Closed-Date:    Thu Jan 07 08:52:48 +0000 2021
>Last-Modified:  Thu Jan 07 08:52:48 +0000 2021
>Originator:     Mike Bowie
>Release:        6.1.3
>Organization:
RocketSpace, Inc.
>Environment:
NetBSD sfo3-nms01.rocketstre.am 6.1.3 NetBSD 6.1.3 (GENERIC) amd64
>Description:
Generic kernel with polling Java (openjdk7-1.7.51) process (OpenNMS) running either panic's or locks up after a varying amount of time. (May be up to five days... may be less than five hours.)
>How-To-Repeat:
Not exactly sure how to make this sure reproducible, but we just cold-boot the box and start the monitoring process; then wait for it to drop off the network.

In our case the device is headless in a remote DC, so connecting back to the DRAC console either reveals the db{0}> prompt or an unresponsive console. (With no scroll-back.)

db{0}> bt
filt_sowrite() at netbsd:filt_sowrite+0x22
kevent1() at netbsd:kevent1+0x61f
sys___kevent50() at netbsd:sys___kevent50+0x33
syscall() at netbsd:syscall+0xc4
db{0}> cont
uvm_fault(0xfffffe842a143328, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff807901d8 cs 8 rflags 10287 cr2  0 cpl 0 rsp fffffe811dfe3990
kernel: page fault trap, code=0
Stopped in pid 2618.141268 (java) at    netbsd:filt_sowrite+0x22:       movq    0
(%rbx),%r14
db{0}> 

>Fix:

>Release-Note:

>Audit-Trail:
From: Mike Bowie <mbowie@rocket-space.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-amd64/48651: Lock up or panic under what appears to be heavy
 kevent load
Date: Mon, 14 Jul 2014 13:23:04 -0700

 Reproducing the case under a kernel with DEBUG, DIAGNOSTIC and LOCKDEBUG 
 reveals the following after reboot:

 Checking for core dump...
 savecore: kvm_read: invalid translation (invalid level 2 PDE)
 Jul 14 12:51:58 sfo3-nms01 savecore: reboot after panic: kernel 
 diagnostic assertion "kq->kq_count == 0" failed: file 
 "/usr/src/sys/kern/kern_event.c", line 1453
 savecore: reboot after panic: kernel diagnostic assertion "kq->kq_count 
 == 0" failed: file "/usr/src/sys/kern/kern_event.c", line 1453
 savecore: system went down at Sun Jul 13 18:55:19 2014

 savecore: writing core to /usr/crash/netbsd.1.core

Responsible-Changed-From-To: port-amd64-maintainer->kern-bug-people
Responsible-Changed-By: maxv@NetBSD.org
Responsible-Changed-When: Tue, 15 Aug 2017 10:03:33 +0000
Responsible-Changed-Why:
Not specific to amd64. By the way, it seems to me that the issue got fixed
in kern_event.c::rev1.92
:w


State-Changed-From-To: open->closed
State-Changed-By: maya@NetBSD.org
State-Changed-When: Thu, 07 Jan 2021 08:52:48 +0000
State-Changed-Why:
Assuming fixed. kevent had a lot of bugs at the time and most of them are fixed. If you are still having issues, please let us know.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.