NetBSD Problem Report #45479

From kimura-h@work02.gnavi.co.jp  Mon Oct 17 08:26:23 2011
Return-Path: <kimura-h@work02.gnavi.co.jp>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 3158163D4A2
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 17 Oct 2011 08:26:23 +0000 (UTC)
Message-Id: <20111017070755.99C0F7083F@work02.gnavi.co.jp>
Date: Mon, 17 Oct 2011 16:07:55 +0900 (JST)
From: KOGULE Ryo <aqua_dabbler@me.com>
Reply-To: KOGULE Ryo <aqua_dabbler@me.com>
To: gnats-bugs@gnats.NetBSD.org
Subject: Lock error panic during RabbitMQ running
X-Send-Pr-Version: 3.95

>Number:         45479
>Category:       kern
>Synopsis:       Lock error panic during RabbitMQ running
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Oct 17 08:30:00 +0000 2011
>Closed-Date:    Thu Nov 17 22:46:38 +0000 2011
>Last-Modified:  Sat Nov 19 22:25:02 +0000 2011
>Originator:     KOGULE Ryo
>Release:        NetBSD 5.99.56, Oct 13 15:13:53 JST 2011
>Organization:
>Environment:
System: NetBSD mq01.gnavi.co.jp 5.99.56 NetBSD 5.99.56 (GNAVI) #0: Thu Oct 13 16:18:45 JST 2011 nbsddev@work02.gnavi.co.jp:/home/nbsddev/distrib/obj/sys/arch/amd64/compile/GNAVI amd64
Architecture: x86_64
Machine: amd64
Kernel configuration differences from GENERIC:
$ diff -ud sys/arch/amd64/conf/GENERIC sys/arch/amd64/conf/GNAVI
--- sys/arch/amd64/conf/GENERIC 2011-10-11 10:28:03.000000000 +0900
+++ sys/arch/amd64/conf/GNAVI   2011-10-16 15:50:51.000000000 +0900
@@ -190,18 +190,18 @@
 #options       IPFILTER_DEFAULT_BLOCK  # block all packets by default
 #options       TCP_DEBUG       # Record last TCP_NDEBUG packets with SO_DEBUG

-#options       ALTQ            # Manipulate network interfaces' output queues
-#options       ALTQ_BLUE       # Stochastic Fair Blue
-#options       ALTQ_CBQ        # Class-Based Queueing
-#options       ALTQ_CDNR       # Diffserv Traffic Conditioner
-#options       ALTQ_FIFOQ      # First-In First-Out Queue
-#options       ALTQ_FLOWVALVE  # RED/flow-valve (red-penalty-box)
-#options       ALTQ_HFSC       # Hierarchical Fair Service Curve
-#options       ALTQ_LOCALQ     # Local queueing discipline
-#options       ALTQ_PRIQ       # Priority Queueing
-#options       ALTQ_RED        # Random Early Detection
-#options       ALTQ_RIO        # RED with IN/OUT
-#options       ALTQ_WFQ        # Weighted Fair Queueing
+options        ALTQ            # Manipulate network interfaces' output queues
+options        ALTQ_BLUE       # Stochastic Fair Blue
+options        ALTQ_CBQ        # Class-Based Queueing
+options        ALTQ_CDNR       # Diffserv Traffic Conditioner
+options        ALTQ_FIFOQ      # First-In First-Out Queue
+options        ALTQ_FLOWVALVE  # RED/flow-valve (red-penalty-box)
+options        ALTQ_HFSC       # Hierarchical Fair Service Curve
+options        ALTQ_LOCALQ     # Local queueing discipline
+options        ALTQ_PRIQ       # Priority Queueing
+options        ALTQ_RED        # Random Early Detection
+options        ALTQ_RIO        # RED with IN/OUT
+options        ALTQ_WFQ        # Weighted Fair Queueing

 # These options enable verbose messages for several subsystems.
 # Warning, these may compile large string tables into the kernel!
@@ -1179,8 +1179,8 @@
 #options       RND_COM                 # use "com" randomness as well (BROKEN)
 pseudo-device  clockctl                # user control of clock subsystem
 pseudo-device  ksyms                   # /dev/ksyms
-#pseudo-device pf                      # PF packet filter
-#pseudo-device pflog                   # PF log if
+pseudo-device  pf                      # PF packet filter
+pseudo-device  pflog                   # PF log if
 pseudo-device  lockstat                # lock profiling
 pseudo-device  bcsp                    # BlueCore Serial Protocol
 pseudo-device  btuart                  # Bluetooth HCI UART (H4)
RabbitMQ status:
# rabbitmqctl status
Status of node rabbit@mq01 ...
[{pid,2130},
 {running_applications,
     [{rabbitmq_management,"RabbitMQ Management Console","2.6.1"},
      {webmachine,"webmachine","1.7.0-rmq2.6.1-hg0c4b60a"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","2.6.1"},
      {amqp_client,"RabbitMQ AMQP Client","2.6.1"},
      {rabbit,"RabbitMQ","2.6.1"},
      {os_mon,"CPO  CXC 138 46","2.2.6"},
      {sasl,"SASL  CXC 138 11","2.1.9.4"},
      {rabbitmq_mochiweb,"RabbitMQ Mochiweb Embedding","2.6.1"},
      {mochiweb,"MochiMedia Web Server","1.3-rmq2.6.1-git9a53dbd"},
      {inets,"INETS  CXC 138 49","5.6"},
      {mnesia,"MNESIA  CXC 138 12","4.4.19"},
      {stdlib,"ERTS  CXC 138 10","1.17.4"},
      {kernel,"ERTS  CXC 138 10","2.14.4"}]},
 {os,{unix,netbsd}},
 {erlang_version,
     "Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:8:8] [rq:8] [async-threads:30] [hipe] [kernel-poll:true]\n"},
 {memory,
     [{total,85407792},
      {processes,53008480},
      {processes_used,52998160},
      {system,32399312},
      {atom,1336577},
      {atom_used,1312058},
      {binary,5295544},
      {code,14412198},
      {ets,6762784}]}]
...done.
Erlang links:
# ldd /usr/pkg/lib/erlang/erts-5.8.4/bin/beam.smp
/usr/pkg/lib/erlang/erts-5.8.4/bin/beam.smp:
        -lutil.7 => /usr/lib/libutil.so.7
        -lgcc_s.1 => /lib/libgcc_s.so.1
        -lc.12 => /usr/lib/libc.so.12
        -lm.0 => /usr/lib/libm.so.0
        -lcurses.7 => /usr/lib/libcurses.so.7
        -lterminfo.1 => /usr/lib/libterminfo.so.1
        -lpthread.1 => /usr/lib/libpthread.so.1
# ldd /usr/pkg/lib/erlang/erts-5.8.4/bin/inet_gethost
/usr/pkg/lib/erlang/erts-5.8.4/bin/inet_gethost:
        -lutil.7 => /usr/lib/libutil.so.7
        -lgcc_s.1 => /lib/libgcc_s.so.1
        -lc.12 => /usr/lib/libc.so.12
        -lm.0 => /usr/lib/libm.so.0
>Description:
NetBSD/amd64 which runs RabbitMQ <http://www.rabbitmq.com/> reboots
periodically.  The cycle is once to several per a day.  It seems to have
enough resources such as memory, cpu times et al.  The operating system
reboots silenty at most of times, but we had a luck to get panic messages
at /var/log/message once.  They are:

Oct 15 14:34:08 mq01 /netbsd: panic: lock error
Oct 15 14:34:08 mq01 /netbsd: cpu4: Begin traceback...
Oct 15 14:34:08 mq01 /netbsd: printf_nolog() at netbsd:printf_nolog
Oct 15 14:34:08 mq01 /netbsd: lockdebug_abort() at netbsd:lockdebug_abort+0x3a
Oct 15 14:34:08 mq01 /netbsd: mutex_vector_enter() at netbsd:mutex_vector_enter+0x438
Oct 15 14:34:08 mq01 /netbsd: fd_close() at netbsd:fd_close+0x8f
Oct 15 14:34:08 mq01 /netbsd: fd_getfile() at netbsd:fd_getfile+0xb4
Oct 15 14:34:08 mq01 /netbsd: kqueue_register() at netbsd:kqueue_register+0x247
Oct 15 14:34:08 mq01 /netbsd: kevent1() at netbsd:kevent1+0x157
Oct 15 14:34:08 mq01 /netbsd: sys___kevent50() at netbsd:sys___kevent50+0x33
Oct 15 14:34:08 mq01 /netbsd: syscall() at netbsd:syscall+0xac
Oct 15 14:34:08 mq01 /netbsd: cpu4: End traceback...

We could send the core (over 100MB) if necessary.
>How-To-Repeat:
>Fix:

>Release-Note:

>Audit-Trail:

From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: KOGULE Ryo <aqua_dabbler@me.com>, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/45479: Lock error panic during RabbitMQ running
Date: Wed, 19 Oct 2011 23:41:21 +0100

 KOGULE Ryo <aqua_dabbler@me.com> wrote:
 > Oct 15 14:34:08 mq01 /netbsd: panic: lock error
 > Oct 15 14:34:08 mq01 /netbsd: cpu4: Begin traceback...
 > Oct 15 14:34:08 mq01 /netbsd: printf_nolog() at netbsd:printf_nolog
 > Oct 15 14:34:08 mq01 /netbsd: lockdebug_abort() at netbsd:lockdebug_abort
 > +0x3a Oct 15 14:34:08 mq01 /netbsd: mutex_vector_enter() at
 > netbsd:mutex_vector_enter+0x438 Oct 15 14:34:08 mq01 /netbsd: fd_close()
 > at netbsd:fd_close+0x8f Oct 15 14:34:08 mq01 /netbsd: fd_getfile() at
 > netbsd:fd_getfile+0xb4 Oct 15 14:34:08 mq01 /netbsd: kqueue_register() at
 > netbsd:kqueue_register+0x247 Oct 15 14:34:08 mq01 /netbsd: kevent1() at
 > netbsd:kevent1+0x157 Oct 15 14:34:08 mq01 /netbsd: sys___kevent50() at
 > netbsd:sys___kevent50+0x33 Oct 15 14:34:08 mq01 /netbsd: syscall() at
 > netbsd:syscall+0xac Oct 15 14:34:08 mq01 /netbsd: cpu4: End traceback...

 It locks against oneself.

 http://www.netbsd.org/~rmind/kqueue_register_fix.diff

 -- 
 Mindaugas

From: Kogule Ryo <aqua_dabbler@me.com>
To: gnats-bugs@NetBSD.org, Mindaugas Rasiukevicius <rmind@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/45479: Lock error panic during RabbitMQ running
Date: Mon, 24 Oct 2011 12:04:52 +0900

 This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
 --Apple-Mail-1--331607032
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/plain; charset=iso-8859-1

 On Oct 20, 2011, at 7:45, Mindaugas Rasiukevicius wrote:
 > It locks against oneself.
 >=20
 > http://www.netbsd.org/~rmind/kqueue_register_fix.diff

 Thank you for your patch.  I will run the patched kernel on our =
 production environment for a while, a week or so.  We will keep our eyes =
 on it and report again.


 --Apple-Mail-1--331607032
 content-type: application/pgp-signature; x-mac-type=70674453;
 	name=PGP.sig
 content-description: =?iso-2022-jp?B?GyRCJDMkbCRPJUclOCU/JWs9cEw+JDUkbCQ/JWElQyU7GyhC?=
  =?iso-2022-jp?B?GyRCITwlOCROSXRKLCRHJDkbKEI=?=
 content-disposition: inline; filename=PGP.sig
 content-transfer-encoding: 7bit

 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
 Comment: GPGTools - http://gpgtools.org

 iQEcBAEBAgAGBQJOpNXVAAoJEDio1UcneFfZJikH+wdPRjlCESPMN1bqPdAyLIOb
 2qzh0K9gJ31zYMIzejPOJFLjU3pr5T7plSg4dP6maB+ndeK0wMbrlbNIBPzcCMIk
 fzfnixN2vVh5+biXArfxhClsccTlk+mI01Q5Q1HfinSNz320fw5hUSiGyhnYV2Td
 XuEBqiyCTut+2MHJfNCeTbedPkeaOZI9lYQwXFy2cW0B6meE9IUkt2nQVBdqy4ve
 1rByvfYFMXu7tRtRqMERUO9G5WUuYMPC/zg8kBU1QsifX84osE70XNQCr4K4mSNH
 f0+EFie/ssuF3NTm5O3XZ6sDlBEubdZsE8N+QxqFdXMopktON8mpdIO9MoOtg+o=
 =1WBo
 -----END PGP SIGNATURE-----

 --Apple-Mail-1--331607032--

From: Kogule Ryo <aqua_dabbler@me.com>
To: gnats-bugs@NetBSD.org, Mindaugas Rasiukevicius <rmind@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/45479: Lock error panic during RabbitMQ running
Date: Tue, 01 Nov 2011 10:10:06 +0900

 The patched kernel has never hung last week.  It is very stable.

 I appreciate your support.

From: "Mindaugas Rasiukevicius" <rmind@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/45479 CVS commit: src/sys/kern
Date: Thu, 17 Nov 2011 22:41:55 +0000

 Module Name:	src
 Committed By:	rmind
 Date:		Thu Nov 17 22:41:55 UTC 2011

 Modified Files:
 	src/sys/kern: kern_event.c

 Log Message:
 kqueue_register: avoid calling fd_getfile() with filedesc_t::fd_lock held.

 Fixes PR/45479 by KOGULE Ryo.


 To generate a diff of this commit:
 cvs rdiff -u -r1.73 -r1.74 src/sys/kern/kern_event.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Thu, 17 Nov 2011 22:46:38 +0000
State-Changed-Why:
Fixed, pull-up requested for netbsd-5.
Thanks for the report.


From: "Stephen Borrill" <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/45479 CVS commit: [netbsd-5] src/sys/kern
Date: Sat, 19 Nov 2011 21:57:13 +0000

 Module Name:	src
 Committed By:	sborrill
 Date:		Sat Nov 19 21:57:13 UTC 2011

 Modified Files:
 	src/sys/kern [netbsd-5]: kern_event.c

 Log Message:
 Pull up the following revisions(s) (requested by rmind in ticket #1695):
 	sys/kern/kern_event.c:	revision 1.74

 kqueue_register: avoid calling fd_getfile() with filedesc_t::fd_lock held.
 Fixes PR/45479 by KOGULE Ryo.


 To generate a diff of this commit:
 cvs rdiff -u -r1.60.6.3 -r1.60.6.4 src/sys/kern/kern_event.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Stephen Borrill" <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/45479 CVS commit: [netbsd-5-1] src/sys/kern
Date: Sat, 19 Nov 2011 22:22:56 +0000

 Module Name:	src
 Committed By:	sborrill
 Date:		Sat Nov 19 22:22:56 UTC 2011

 Modified Files:
 	src/sys/kern [netbsd-5-1]: kern_event.c

 Log Message:
 Pull up the following revisions(s) (requested by rmind in ticket #1695):
 	sys/kern/kern_event.c:	revision 1.74

 kqueue_register: avoid calling fd_getfile() with filedesc_t::fd_lock held.
 Fixes PR/45479 by KOGULE Ryo.


 To generate a diff of this commit:
 cvs rdiff -u -r1.60.6.2 -r1.60.6.2.2.1 src/sys/kern/kern_event.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Stephen Borrill" <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/45479 CVS commit: [netbsd-5-0] src/sys/kern
Date: Sat, 19 Nov 2011 22:24:13 +0000

 Module Name:	src
 Committed By:	sborrill
 Date:		Sat Nov 19 22:24:12 UTC 2011

 Modified Files:
 	src/sys/kern [netbsd-5-0]: kern_event.c

 Log Message:
 Pull up the following revisions(s) (requested by rmind in ticket #1695):
 	sys/kern/kern_event.c:	revision 1.74

 kqueue_register: avoid calling fd_getfile() with filedesc_t::fd_lock held.
 Fixes PR/45479 by KOGULE Ryo.


 To generate a diff of this commit:
 cvs rdiff -u -r1.60.6.1.2.1 -r1.60.6.1.2.2 src/sys/kern/kern_event.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.