NetBSD Problem Report #51056
From martin@duskware.de Sat Apr 9 07:55:45 2016
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id ADE777A218
for <gnats-bugs@gnats.NetBSD.org>; Sat, 9 Apr 2016 07:55:45 +0000 (UTC)
Date: Sat, 09 Apr 2016 09:55:41 CEST
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: reader-writer lock error
X-Send-Pr-Version: 3.95
>Number: 51056
>Category: kern
>Synopsis: reader-writer lock error
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: ozaki-r
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Apr 09 08:00:00 +0000 2016
>Closed-Date: Wed May 25 06:45:34 +0000 2016
>Last-Modified: Wed May 25 06:45:34 +0000 2016
>Originator: Martin Husemann
>Release: NetBSD 7.99.27
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD night-owl.duskware.de 7.99.27 NetBSD 7.99.27 (NIGHT-OWL) #402: Tue Apr 5 16:22:09 CEST 2016 martin@night-owl.duskware.de:/usr/src/sys/arch/amd64/compile/NIGHT-OWL amd64
Architecture: x86_64
Machine: amd64
>Description:
Seen this twice since updating to 7.99.27:
#9 0xffffffff806ef0c3 in vpanic (
fmt=fmt@entry=0xffffffff80cc8940 "lock error: %s: %s: %s: lock %p cpu %d lwp %p", ap=ap@entry=0xfffffe80bf7a6c58) at ../../../../kern/subr_prf.c:340
#10 0xffffffff806ef180 in panic (
fmt=fmt@entry=0xffffffff80cc8940 "lock error: %s: %s: %s: lock %p cpu %d lwp %p") at ../../../../kern/subr_prf.c:258
#11 0xffffffff806e7b26 in lockdebug_abort (lock=0xfffffe8135c46888,
ops=ops@entry=0xffffffff80faff30 <rwlock_lockops>,
func=func@entry=0xffffffff80bc5bc0 <__func__.6032> "rw_vector_enter",
msg=msg@entry=0xffffffff80cc321d "locking against myself")
at ../../../../kern/subr_lockdebug.c:867
#12 0xffffffff806c1398 in rw_abort (rw=rw@entry=0xfffffe8135c46888,
func=func@entry=0xffffffff80bc5bc0 <__func__.6032> "rw_vector_enter",
msg=msg@entry=0xffffffff80cc321d "locking against myself")
at ../../../../kern/kern_rwlock.c:192
#13 0xffffffff806c186b in rw_vector_enter (rw=0xfffffe8135c46888, op=RW_READER)
at ../../../../kern/kern_rwlock.c:341
#14 0xffffffff804eecfd in in6_lltable_lookup (llt=<optimized out>,
flags=<optimized out>, l3addr=<optimized out>)
at ../../../../netinet6/in6.c:2487
#15 0xffffffff80505c08 in lla_lookup (l3addr=0xfffffe80bf7a6dbc, flags=0,
llt=<optimized out>) at ../../../../net/if_llatbl.h:295
#16 nd6_lookup (addr6=<optimized out>, ifp=0xffff800007006d30,
wlock=wlock@entry=false) at ../../../../netinet6/nd6.c:864
#17 0xffffffff8050ad26 in nd6_is_llinfo_probreach (dr=<optimized out>)
at ../../../../netinet6/nd6_rtr.c:113
#18 find_pfxlist_reachable_router (pr=0xfffffe8107c0a158)
at ../../../../netinet6/nd6_rtr.c:1384
#19 0xffffffff8050b8c6 in pfxlist_onlink_check ()
at ../../../../netinet6/nd6_rtr.c:1417
#20 0xffffffff805052a9 in nd6_free (ln=0xfffffe8135c46788, gc=0)
at ../../../../netinet6/nd6.c:1177
#21 0xffffffff8050590d in nd6_llinfo_timer (arg=0xfffffe8135c46788)
at ../../../../netinet6/nd6.c:490
#22 0xffffffff806d29a1 in callout_softclock (v=<optimized out>)
at ../../../../kern/kern_timeout.c:743
#23 0xffffffff806c73b4 in softint_execute (l=<optimized out>, s=2,
si=0xffff800045e710c0) at ../../../../kern/kern_softint.c:589
#24 softint_dispatch (pinned=<optimized out>, s=2)
at ../../../../kern/kern_softint.c:871
#25 0xffffffff80113f7f in Xsoftintr ()
netbsd.gdb and crash dump available on request.
>How-To-Repeat:
just happens randomly, nothing special
>Fix:
n/a
>Release-Note:
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51056: reader-writer lock error
Date: Sat, 9 Apr 2016 16:17:51 +0200
I had the same crash again with a very recentish kernel, now running the
same with LOCKDEBUG.
This is on a notebook with two interfaces, bge0 (currently not connected)
and athn0 (used, obviously, with both IPv4 and IPv6).
I do disabel DAD via sysctl.conf:
net.inet6.ip6.dad_count=0
sinc that was the only way I could make the wifi work properly with
IPv6 - I think there was some discussion about this some time ago and
a better way would be to have duplicates coming in via wlan interfaces
to be ignored (but I get distracted).
Just noting for completeness.
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51056: reader-writer lock error
Date: Sat, 9 Apr 2016 22:08:07 +0200
On Sat, Apr 09, 2016 at 04:17:51PM +0200, Martin Husemann wrote:
> I had the same crash again with a very recentish kernel, now running the
> same with LOCKDEBUG.
And it says (if I deciphered the photo correctly) the lock in question
was locked here:
nd6_llinfo_timer(void *arg)
{
struct llentry *ln = arg;
struct ifnet *ifp;
struct nd_ifinfo *ndi = NULL;
bool send_ns = false;
const struct in6_addr *daddr6 = NULL;
mutex_enter(softnet_lock);
KERNEL_LOCK(1, NULL);
LLE_WLOCK(ln);
i.e.:
#13 0xffffffff8050595d in nd6_llinfo_timer (arg=0xfffffe8135c863c8)
at ../../../../netinet6/nd6.c:490
where
(gdb) p arg
$3 = (void *) 0xfffffe8135c863c8
(gdb) p *(struct llentry *)arg
$4 = {lle_next = {le_next = 0x0, le_prev = 0xfffffe810744fa10}, r_l3addr = {
addr4 = {s_addr = 33587454}, addr6 = {__u6_addr = {
__u6_addr8 = "\376\000\002\000\000\000\000\236Ç\246\377\376\233\267r", __u6_addr16 = {33022, 512, 0, 0, 51102, 65446, 39934, 29367}, __u6_addr32 = {
33587454, 0, 4289120158, 1924635646}}}}, ll_addr = {
mac_aligned = 126132915980188, mac16 = {51100, 39846, 29367},
mac8 = "\234Ç\246\233\267r", '\000' <repeats 13 times>}, spare0 = 0,
spare1 = 0, lle_tbl = 0xfffffe81071997d0, lle_head = 0xfffffe810744fa10,
lle_free = 0xffffffff804ee628 <in6_lltable_destroy_lle>, lle_ll_free = 0x0,
la_hold = 0x0, la_numheld = 0, la_expire = 0, la_flags = 8264, la_asked = 3,
la_preempt = 0, ln_byhint = 0, ln_state = 0, ln_router = 1, ln_ntick = 0,
lle_refcnt = 412, lle_chain = {le_next = 0x0, le_prev = 0x0}, lle_timer = {
_c_store = {0xffffffff815025e8 <callout_cpu0+168>,
0xffffffff815025e8 <callout_cpu0+168>,
0xffffffff8050557c <nd6_llinfo_timer>, 0xfffffe8135c863c8,
0xffffffff81502540 <callout_cpu0>, 0x108000e39cf, 0x11deeba1, 0x0, 0x0,
0x0}}, lle_lock = {rw_owner = 18446742429671606372}, la_opaque = 0x0}
and then in
#6 0xffffffff804eed4d in in6_lltable_lookup (llt=<optimized out>,
flags=<optimized out>, l3addr=<optimized out>)
at ../../../../netinet6/in6.c:2487
if (flags & LLE_EXCLUSIVE)
LLE_WLOCK(lle);
else
LLE_RLOCK(lle);
return lle;
}
we try to lock it again.
(gdb) p lle
$5 = (struct llentry *) 0xfffffe8135c863c8
(gdb) p *lle
$6 = {lle_next = {le_next = 0x0, le_prev = 0xfffffe810744fa10}, r_l3addr = {
addr4 = {s_addr = 33587454}, addr6 = {__u6_addr = {
__u6_addr8 = "\376\000\002\000\000\000\000\236Ç\246\377\376\233\267r", __u6_addr16 = {33022, 512, 0, 0, 51102, 65446, 39934, 29367}, __u6_addr32 = {
33587454, 0, 4289120158, 1924635646}}}}, ll_addr = {
mac_aligned = 126132915980188, mac16 = {51100, 39846, 29367},
mac8 = "\234Ç\246\233\267r", '\000' <repeats 13 times>}, spare0 = 0,
spare1 = 0, lle_tbl = 0xfffffe81071997d0, lle_head = 0xfffffe810744fa10,
lle_free = 0xffffffff804ee628 <in6_lltable_destroy_lle>, lle_ll_free = 0x0,
la_hold = 0x0, la_numheld = 0, la_expire = 0, la_flags = 8264, la_asked = 3,
la_preempt = 0, ln_byhint = 0, ln_state = 0, ln_router = 1, ln_ntick = 0,
lle_refcnt = 412, lle_chain = {le_next = 0x0, le_prev = 0x0}, lle_timer = {
_c_store = {0xffffffff815025e8 <callout_cpu0+168>,
0xffffffff815025e8 <callout_cpu0+168>,
0xffffffff8050557c <nd6_llinfo_timer>, 0xfffffe8135c863c8,
0xffffffff81502540 <callout_cpu0>, 0x108000e39cf, 0x11deeba1, 0x0, 0x0,
0x0}}, lle_lock = {rw_owner = 18446742429671606372}, la_opaque = 0x0}
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51056: reader-writer lock error
Date: Sat, 9 Apr 2016 22:26:07 +0200
Here is my current v6 routing information:
Neighbor Linklayer Address Netif Expire S Flags
fe80::9ec7:a6ff:fe9b:b772%athn0 9c:c7:a6:9b:b7:72 athn0 36s R Rp
2001:4dd0:ff00:9258:c617:feff:fece:3734 c4:17:fe:ce:37:34 athn0 permanent R p
fe80::c617:feff:fece:3734%athn0 c4:17:fe:ce:37:34 athn0 permanent R p
fe80::226:2dff:fe90:46d1%bge0 00:26:2d:90:46:d1 bge0 permanent R p
and:
Internet6:
Destination Gateway Flags Refs Use Mtu Interface
::/104 ::1 UGRS - - 33648 lo0
::/96 ::1 UGRS - - 33648 lo0
default fe80::9ec7:a6ff:fe9b:b772 UG - - - athn0
::1 ::1 UH - - 33648 lo0
::127.0.0.0/104 ::1 UGRS - - 33648 lo0
::224.0.0.0/100 ::1 UGRS - - 33648 lo0
::255.0.0.0/104 ::1 UGRS - - 33648 lo0
::ffff:0.0.0.0/96 ::1 UGRS - - 33648 lo0
2001:db8::/32 ::1 UGRS - - 33648 lo0
2001:4dd0:ff00:9258::/64 link#2 UC - - - athn0
2001:4dd0:ff00:9258:c617:feff:fece:3734 link#2 UHl - - - lo0
2002::/24 ::1 UGRS - - 33648 lo0
2002:7f00::/24 ::1 UGRS - - 33648 lo0
2002:e000::/20 ::1 UGRS - - 33648 lo0
2002:ff00::/24 ::1 UGRS - - 33648 lo0
fe80::/10 ::1 UGRS - - 33648 lo0
fe80::%bge0/64 link#1 UC - - - bge0
fe80::226:2dff:fe90:46d1 link#1 UHl - - - lo0
fe80::%athn0/64 link#2 UC - - - athn0
fe80::c617:feff:fece:3734 link#2 UHl - - - lo0
fe80::%lo0/64 fe80::1 U - - - lo0
fe80::1 lo0 UHl - - - lo0
ff01:1::/32 link#1 UC - - - bge0
ff01:2::/32 link#2 UC - - - athn0
ff01:3::/32 ::1 UC - - 33648 lo0
ff02::%bge0/32 link#1 UC - - - bge0
ff02::%athn0/32 link#2 UC - - - athn0
ff02::%lo0/32 ::1 UC - - 33648 lo0
Responsible-Changed-From-To: kern-bug-people->ozaki-r
Responsible-Changed-By: ozaki-r@NetBSD.org
Responsible-Changed-When: Sun, 10 Apr 2016 07:32:16 +0000
Responsible-Changed-Why:
mine
From: "Ryota Ozaki" <ozaki-r@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/51056 CVS commit: src/sys/netinet6
Date: Sun, 10 Apr 2016 08:15:52 +0000
Module Name: src
Committed By: ozaki-r
Date: Sun Apr 10 08:15:52 UTC 2016
Modified Files:
src/sys/netinet6: nd6.c
Log Message:
Don't call pfxlist_onlink_check with holding llentry lock
Sync nd6_free with FreeBSD (as of 2016-04-10).
Should fix PR kern/51056.
To generate a diff of this commit:
cvs rdiff -u -r1.189 -r1.190 src/sys/netinet6/nd6.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: ozaki-r@NetBSD.org
State-Changed-When: Sun, 10 Apr 2016 08:22:13 +0000
State-Changed-Why:
A possible fix has been committed. Let me know if the failure still happens.
State-Changed-From-To: feedback->closed
State-Changed-By: ozaki-r@NetBSD.org
State-Changed-When: Wed, 25 May 2016 06:45:34 +0000
State-Changed-Why:
No recurrence over one month should be ok to close.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.