NetBSD Problem Report #45479
From kimura-h@work02.gnavi.co.jp Mon Oct 17 08:26:23 2011
Return-Path: <kimura-h@work02.gnavi.co.jp>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 3158163D4A2
for <gnats-bugs@gnats.NetBSD.org>; Mon, 17 Oct 2011 08:26:23 +0000 (UTC)
Message-Id: <20111017070755.99C0F7083F@work02.gnavi.co.jp>
Date: Mon, 17 Oct 2011 16:07:55 +0900 (JST)
From: KOGULE Ryo <aqua_dabbler@me.com>
Reply-To: KOGULE Ryo <aqua_dabbler@me.com>
To: gnats-bugs@gnats.NetBSD.org
Subject: Lock error panic during RabbitMQ running
X-Send-Pr-Version: 3.95
>Number: 45479
>Category: kern
>Synopsis: Lock error panic during RabbitMQ running
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Oct 17 08:30:00 +0000 2011
>Closed-Date: Thu Nov 17 22:46:38 +0000 2011
>Last-Modified: Sat Nov 19 22:25:02 +0000 2011
>Originator: KOGULE Ryo
>Release: NetBSD 5.99.56, Oct 13 15:13:53 JST 2011
>Organization:
>Environment:
System: NetBSD mq01.gnavi.co.jp 5.99.56 NetBSD 5.99.56 (GNAVI) #0: Thu Oct 13 16:18:45 JST 2011 nbsddev@work02.gnavi.co.jp:/home/nbsddev/distrib/obj/sys/arch/amd64/compile/GNAVI amd64
Architecture: x86_64
Machine: amd64
Kernel configuration differences from GENERIC:
$ diff -ud sys/arch/amd64/conf/GENERIC sys/arch/amd64/conf/GNAVI
--- sys/arch/amd64/conf/GENERIC 2011-10-11 10:28:03.000000000 +0900
+++ sys/arch/amd64/conf/GNAVI 2011-10-16 15:50:51.000000000 +0900
@@ -190,18 +190,18 @@
#options IPFILTER_DEFAULT_BLOCK # block all packets by default
#options TCP_DEBUG # Record last TCP_NDEBUG packets with SO_DEBUG
-#options ALTQ # Manipulate network interfaces' output queues
-#options ALTQ_BLUE # Stochastic Fair Blue
-#options ALTQ_CBQ # Class-Based Queueing
-#options ALTQ_CDNR # Diffserv Traffic Conditioner
-#options ALTQ_FIFOQ # First-In First-Out Queue
-#options ALTQ_FLOWVALVE # RED/flow-valve (red-penalty-box)
-#options ALTQ_HFSC # Hierarchical Fair Service Curve
-#options ALTQ_LOCALQ # Local queueing discipline
-#options ALTQ_PRIQ # Priority Queueing
-#options ALTQ_RED # Random Early Detection
-#options ALTQ_RIO # RED with IN/OUT
-#options ALTQ_WFQ # Weighted Fair Queueing
+options ALTQ # Manipulate network interfaces' output queues
+options ALTQ_BLUE # Stochastic Fair Blue
+options ALTQ_CBQ # Class-Based Queueing
+options ALTQ_CDNR # Diffserv Traffic Conditioner
+options ALTQ_FIFOQ # First-In First-Out Queue
+options ALTQ_FLOWVALVE # RED/flow-valve (red-penalty-box)
+options ALTQ_HFSC # Hierarchical Fair Service Curve
+options ALTQ_LOCALQ # Local queueing discipline
+options ALTQ_PRIQ # Priority Queueing
+options ALTQ_RED # Random Early Detection
+options ALTQ_RIO # RED with IN/OUT
+options ALTQ_WFQ # Weighted Fair Queueing
# These options enable verbose messages for several subsystems.
# Warning, these may compile large string tables into the kernel!
@@ -1179,8 +1179,8 @@
#options RND_COM # use "com" randomness as well (BROKEN)
pseudo-device clockctl # user control of clock subsystem
pseudo-device ksyms # /dev/ksyms
-#pseudo-device pf # PF packet filter
-#pseudo-device pflog # PF log if
+pseudo-device pf # PF packet filter
+pseudo-device pflog # PF log if
pseudo-device lockstat # lock profiling
pseudo-device bcsp # BlueCore Serial Protocol
pseudo-device btuart # Bluetooth HCI UART (H4)
RabbitMQ status:
# rabbitmqctl status
Status of node rabbit@mq01 ...
[{pid,2130},
{running_applications,
[{rabbitmq_management,"RabbitMQ Management Console","2.6.1"},
{webmachine,"webmachine","1.7.0-rmq2.6.1-hg0c4b60a"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","2.6.1"},
{amqp_client,"RabbitMQ AMQP Client","2.6.1"},
{rabbit,"RabbitMQ","2.6.1"},
{os_mon,"CPO CXC 138 46","2.2.6"},
{sasl,"SASL CXC 138 11","2.1.9.4"},
{rabbitmq_mochiweb,"RabbitMQ Mochiweb Embedding","2.6.1"},
{mochiweb,"MochiMedia Web Server","1.3-rmq2.6.1-git9a53dbd"},
{inets,"INETS CXC 138 49","5.6"},
{mnesia,"MNESIA CXC 138 12","4.4.19"},
{stdlib,"ERTS CXC 138 10","1.17.4"},
{kernel,"ERTS CXC 138 10","2.14.4"}]},
{os,{unix,netbsd}},
{erlang_version,
"Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:8:8] [rq:8] [async-threads:30] [hipe] [kernel-poll:true]\n"},
{memory,
[{total,85407792},
{processes,53008480},
{processes_used,52998160},
{system,32399312},
{atom,1336577},
{atom_used,1312058},
{binary,5295544},
{code,14412198},
{ets,6762784}]}]
...done.
Erlang links:
# ldd /usr/pkg/lib/erlang/erts-5.8.4/bin/beam.smp
/usr/pkg/lib/erlang/erts-5.8.4/bin/beam.smp:
-lutil.7 => /usr/lib/libutil.so.7
-lgcc_s.1 => /lib/libgcc_s.so.1
-lc.12 => /usr/lib/libc.so.12
-lm.0 => /usr/lib/libm.so.0
-lcurses.7 => /usr/lib/libcurses.so.7
-lterminfo.1 => /usr/lib/libterminfo.so.1
-lpthread.1 => /usr/lib/libpthread.so.1
# ldd /usr/pkg/lib/erlang/erts-5.8.4/bin/inet_gethost
/usr/pkg/lib/erlang/erts-5.8.4/bin/inet_gethost:
-lutil.7 => /usr/lib/libutil.so.7
-lgcc_s.1 => /lib/libgcc_s.so.1
-lc.12 => /usr/lib/libc.so.12
-lm.0 => /usr/lib/libm.so.0
>Description:
NetBSD/amd64 which runs RabbitMQ <http://www.rabbitmq.com/> reboots
periodically. The cycle is once to several per a day. It seems to have
enough resources such as memory, cpu times et al. The operating system
reboots silenty at most of times, but we had a luck to get panic messages
at /var/log/message once. They are:
Oct 15 14:34:08 mq01 /netbsd: panic: lock error
Oct 15 14:34:08 mq01 /netbsd: cpu4: Begin traceback...
Oct 15 14:34:08 mq01 /netbsd: printf_nolog() at netbsd:printf_nolog
Oct 15 14:34:08 mq01 /netbsd: lockdebug_abort() at netbsd:lockdebug_abort+0x3a
Oct 15 14:34:08 mq01 /netbsd: mutex_vector_enter() at netbsd:mutex_vector_enter+0x438
Oct 15 14:34:08 mq01 /netbsd: fd_close() at netbsd:fd_close+0x8f
Oct 15 14:34:08 mq01 /netbsd: fd_getfile() at netbsd:fd_getfile+0xb4
Oct 15 14:34:08 mq01 /netbsd: kqueue_register() at netbsd:kqueue_register+0x247
Oct 15 14:34:08 mq01 /netbsd: kevent1() at netbsd:kevent1+0x157
Oct 15 14:34:08 mq01 /netbsd: sys___kevent50() at netbsd:sys___kevent50+0x33
Oct 15 14:34:08 mq01 /netbsd: syscall() at netbsd:syscall+0xac
Oct 15 14:34:08 mq01 /netbsd: cpu4: End traceback...
We could send the core (over 100MB) if necessary.
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: KOGULE Ryo <aqua_dabbler@me.com>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/45479: Lock error panic during RabbitMQ running
Date: Wed, 19 Oct 2011 23:41:21 +0100
KOGULE Ryo <aqua_dabbler@me.com> wrote:
> Oct 15 14:34:08 mq01 /netbsd: panic: lock error
> Oct 15 14:34:08 mq01 /netbsd: cpu4: Begin traceback...
> Oct 15 14:34:08 mq01 /netbsd: printf_nolog() at netbsd:printf_nolog
> Oct 15 14:34:08 mq01 /netbsd: lockdebug_abort() at netbsd:lockdebug_abort
> +0x3a Oct 15 14:34:08 mq01 /netbsd: mutex_vector_enter() at
> netbsd:mutex_vector_enter+0x438 Oct 15 14:34:08 mq01 /netbsd: fd_close()
> at netbsd:fd_close+0x8f Oct 15 14:34:08 mq01 /netbsd: fd_getfile() at
> netbsd:fd_getfile+0xb4 Oct 15 14:34:08 mq01 /netbsd: kqueue_register() at
> netbsd:kqueue_register+0x247 Oct 15 14:34:08 mq01 /netbsd: kevent1() at
> netbsd:kevent1+0x157 Oct 15 14:34:08 mq01 /netbsd: sys___kevent50() at
> netbsd:sys___kevent50+0x33 Oct 15 14:34:08 mq01 /netbsd: syscall() at
> netbsd:syscall+0xac Oct 15 14:34:08 mq01 /netbsd: cpu4: End traceback...
It locks against oneself.
http://www.netbsd.org/~rmind/kqueue_register_fix.diff
--
Mindaugas
From: Kogule Ryo <aqua_dabbler@me.com>
To: gnats-bugs@NetBSD.org, Mindaugas Rasiukevicius <rmind@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/45479: Lock error panic during RabbitMQ running
Date: Mon, 24 Oct 2011 12:04:52 +0900
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--Apple-Mail-1--331607032
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=iso-8859-1
On Oct 20, 2011, at 7:45, Mindaugas Rasiukevicius wrote:
> It locks against oneself.
>=20
> http://www.netbsd.org/~rmind/kqueue_register_fix.diff
Thank you for your patch. I will run the patched kernel on our =
production environment for a while, a week or so. We will keep our eyes =
on it and report again.
--Apple-Mail-1--331607032
content-type: application/pgp-signature; x-mac-type=70674453;
name=PGP.sig
content-description: =?iso-2022-jp?B?GyRCJDMkbCRPJUclOCU/JWs9cEw+JDUkbCQ/JWElQyU7GyhC?=
=?iso-2022-jp?B?GyRCITwlOCROSXRKLCRHJDkbKEI=?=
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
iQEcBAEBAgAGBQJOpNXVAAoJEDio1UcneFfZJikH+wdPRjlCESPMN1bqPdAyLIOb
2qzh0K9gJ31zYMIzejPOJFLjU3pr5T7plSg4dP6maB+ndeK0wMbrlbNIBPzcCMIk
fzfnixN2vVh5+biXArfxhClsccTlk+mI01Q5Q1HfinSNz320fw5hUSiGyhnYV2Td
XuEBqiyCTut+2MHJfNCeTbedPkeaOZI9lYQwXFy2cW0B6meE9IUkt2nQVBdqy4ve
1rByvfYFMXu7tRtRqMERUO9G5WUuYMPC/zg8kBU1QsifX84osE70XNQCr4K4mSNH
f0+EFie/ssuF3NTm5O3XZ6sDlBEubdZsE8N+QxqFdXMopktON8mpdIO9MoOtg+o=
=1WBo
-----END PGP SIGNATURE-----
--Apple-Mail-1--331607032--
From: Kogule Ryo <aqua_dabbler@me.com>
To: gnats-bugs@NetBSD.org, Mindaugas Rasiukevicius <rmind@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/45479: Lock error panic during RabbitMQ running
Date: Tue, 01 Nov 2011 10:10:06 +0900
The patched kernel has never hung last week. It is very stable.
I appreciate your support.
From: "Mindaugas Rasiukevicius" <rmind@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/45479 CVS commit: src/sys/kern
Date: Thu, 17 Nov 2011 22:41:55 +0000
Module Name: src
Committed By: rmind
Date: Thu Nov 17 22:41:55 UTC 2011
Modified Files:
src/sys/kern: kern_event.c
Log Message:
kqueue_register: avoid calling fd_getfile() with filedesc_t::fd_lock held.
Fixes PR/45479 by KOGULE Ryo.
To generate a diff of this commit:
cvs rdiff -u -r1.73 -r1.74 src/sys/kern/kern_event.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Thu, 17 Nov 2011 22:46:38 +0000
State-Changed-Why:
Fixed, pull-up requested for netbsd-5.
Thanks for the report.
From: "Stephen Borrill" <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/45479 CVS commit: [netbsd-5] src/sys/kern
Date: Sat, 19 Nov 2011 21:57:13 +0000
Module Name: src
Committed By: sborrill
Date: Sat Nov 19 21:57:13 UTC 2011
Modified Files:
src/sys/kern [netbsd-5]: kern_event.c
Log Message:
Pull up the following revisions(s) (requested by rmind in ticket #1695):
sys/kern/kern_event.c: revision 1.74
kqueue_register: avoid calling fd_getfile() with filedesc_t::fd_lock held.
Fixes PR/45479 by KOGULE Ryo.
To generate a diff of this commit:
cvs rdiff -u -r1.60.6.3 -r1.60.6.4 src/sys/kern/kern_event.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Stephen Borrill" <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/45479 CVS commit: [netbsd-5-1] src/sys/kern
Date: Sat, 19 Nov 2011 22:22:56 +0000
Module Name: src
Committed By: sborrill
Date: Sat Nov 19 22:22:56 UTC 2011
Modified Files:
src/sys/kern [netbsd-5-1]: kern_event.c
Log Message:
Pull up the following revisions(s) (requested by rmind in ticket #1695):
sys/kern/kern_event.c: revision 1.74
kqueue_register: avoid calling fd_getfile() with filedesc_t::fd_lock held.
Fixes PR/45479 by KOGULE Ryo.
To generate a diff of this commit:
cvs rdiff -u -r1.60.6.2 -r1.60.6.2.2.1 src/sys/kern/kern_event.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Stephen Borrill" <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/45479 CVS commit: [netbsd-5-0] src/sys/kern
Date: Sat, 19 Nov 2011 22:24:13 +0000
Module Name: src
Committed By: sborrill
Date: Sat Nov 19 22:24:12 UTC 2011
Modified Files:
src/sys/kern [netbsd-5-0]: kern_event.c
Log Message:
Pull up the following revisions(s) (requested by rmind in ticket #1695):
sys/kern/kern_event.c: revision 1.74
kqueue_register: avoid calling fd_getfile() with filedesc_t::fd_lock held.
Fixes PR/45479 by KOGULE Ryo.
To generate a diff of this commit:
cvs rdiff -u -r1.60.6.1.2.1 -r1.60.6.1.2.2 src/sys/kern/kern_event.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.