NetBSD Problem Report #51056

From martin@duskware.de  Sat Apr  9 07:55:45 2016
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id ADE777A218
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  9 Apr 2016 07:55:45 +0000 (UTC)
Date: Sat, 09 Apr 2016 09:55:41 CEST
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: reader-writer lock error
X-Send-Pr-Version: 3.95

>Number:         51056
>Category:       kern
>Synopsis:       reader-writer lock error
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    ozaki-r
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Apr 09 08:00:00 +0000 2016
>Closed-Date:    Wed May 25 06:45:34 +0000 2016
>Last-Modified:  Wed May 25 06:45:34 +0000 2016
>Originator:     Martin Husemann
>Release:        NetBSD 7.99.27
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD night-owl.duskware.de 7.99.27 NetBSD 7.99.27 (NIGHT-OWL) #402: Tue Apr 5 16:22:09 CEST 2016 martin@night-owl.duskware.de:/usr/src/sys/arch/amd64/compile/NIGHT-OWL amd64
Architecture: x86_64
Machine: amd64
>Description:

Seen this twice since updating to 7.99.27:

#9  0xffffffff806ef0c3 in vpanic (
    fmt=fmt@entry=0xffffffff80cc8940 "lock error: %s: %s: %s: lock %p cpu %d lwp %p", ap=ap@entry=0xfffffe80bf7a6c58) at ../../../../kern/subr_prf.c:340
#10 0xffffffff806ef180 in panic (
    fmt=fmt@entry=0xffffffff80cc8940 "lock error: %s: %s: %s: lock %p cpu %d lwp %p") at ../../../../kern/subr_prf.c:258
#11 0xffffffff806e7b26 in lockdebug_abort (lock=0xfffffe8135c46888, 
    ops=ops@entry=0xffffffff80faff30 <rwlock_lockops>, 
    func=func@entry=0xffffffff80bc5bc0 <__func__.6032> "rw_vector_enter", 
    msg=msg@entry=0xffffffff80cc321d "locking against myself")
    at ../../../../kern/subr_lockdebug.c:867
#12 0xffffffff806c1398 in rw_abort (rw=rw@entry=0xfffffe8135c46888, 
    func=func@entry=0xffffffff80bc5bc0 <__func__.6032> "rw_vector_enter", 
    msg=msg@entry=0xffffffff80cc321d "locking against myself")
    at ../../../../kern/kern_rwlock.c:192
#13 0xffffffff806c186b in rw_vector_enter (rw=0xfffffe8135c46888, op=RW_READER)
    at ../../../../kern/kern_rwlock.c:341
#14 0xffffffff804eecfd in in6_lltable_lookup (llt=<optimized out>, 
    flags=<optimized out>, l3addr=<optimized out>)
    at ../../../../netinet6/in6.c:2487
#15 0xffffffff80505c08 in lla_lookup (l3addr=0xfffffe80bf7a6dbc, flags=0, 
    llt=<optimized out>) at ../../../../net/if_llatbl.h:295
#16 nd6_lookup (addr6=<optimized out>, ifp=0xffff800007006d30, 
    wlock=wlock@entry=false) at ../../../../netinet6/nd6.c:864
#17 0xffffffff8050ad26 in nd6_is_llinfo_probreach (dr=<optimized out>)
    at ../../../../netinet6/nd6_rtr.c:113
#18 find_pfxlist_reachable_router (pr=0xfffffe8107c0a158)
    at ../../../../netinet6/nd6_rtr.c:1384
#19 0xffffffff8050b8c6 in pfxlist_onlink_check ()
    at ../../../../netinet6/nd6_rtr.c:1417
#20 0xffffffff805052a9 in nd6_free (ln=0xfffffe8135c46788, gc=0)
    at ../../../../netinet6/nd6.c:1177
#21 0xffffffff8050590d in nd6_llinfo_timer (arg=0xfffffe8135c46788)
    at ../../../../netinet6/nd6.c:490
#22 0xffffffff806d29a1 in callout_softclock (v=<optimized out>)
    at ../../../../kern/kern_timeout.c:743
#23 0xffffffff806c73b4 in softint_execute (l=<optimized out>, s=2, 
    si=0xffff800045e710c0) at ../../../../kern/kern_softint.c:589
#24 softint_dispatch (pinned=<optimized out>, s=2)
    at ../../../../kern/kern_softint.c:871
#25 0xffffffff80113f7f in Xsoftintr ()

netbsd.gdb and crash dump available on request.


>How-To-Repeat:
just happens randomly, nothing special

>Fix:
n/a

>Release-Note:

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51056: reader-writer lock error
Date: Sat, 9 Apr 2016 16:17:51 +0200

 I had the same crash again with a very recentish kernel, now running the
 same with LOCKDEBUG.

 This is on a notebook with two interfaces, bge0 (currently not connected)
 and athn0 (used, obviously, with both IPv4 and IPv6).

 I do disabel DAD via sysctl.conf:

 net.inet6.ip6.dad_count=0


 sinc that was the only way I could make the wifi work properly with 
 IPv6 - I think there was some discussion about this some time ago and
 a better way would be to have duplicates coming in via wlan interfaces
 to be ignored (but I get distracted).

 Just noting for completeness.

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51056: reader-writer lock error
Date: Sat, 9 Apr 2016 22:08:07 +0200

 On Sat, Apr 09, 2016 at 04:17:51PM +0200, Martin Husemann wrote:
 > I had the same crash again with a very recentish kernel, now running the
 > same with LOCKDEBUG.

 And it says (if I deciphered the photo correctly) the lock in question
 was locked here:

 nd6_llinfo_timer(void *arg)
 {
         struct llentry *ln = arg;
         struct ifnet *ifp;
         struct nd_ifinfo *ndi = NULL;
         bool send_ns = false;
         const struct in6_addr *daddr6 = NULL;

         mutex_enter(softnet_lock);
         KERNEL_LOCK(1, NULL);

         LLE_WLOCK(ln);


 i.e.:
 #13 0xffffffff8050595d in nd6_llinfo_timer (arg=0xfffffe8135c863c8)
     at ../../../../netinet6/nd6.c:490

 where

 (gdb) p arg
 $3 = (void *) 0xfffffe8135c863c8

 (gdb) p *(struct llentry *)arg
 $4 = {lle_next = {le_next = 0x0, le_prev = 0xfffffe810744fa10}, r_l3addr = {
     addr4 = {s_addr = 33587454}, addr6 = {__u6_addr = {
         __u6_addr8 = "\376\000\002\000\000\000\000\236Ç\246\377\376\233\267r", __u6_addr16 = {33022, 512, 0, 0, 51102, 65446, 39934, 29367}, __u6_addr32 = {
           33587454, 0, 4289120158, 1924635646}}}}, ll_addr = {
     mac_aligned = 126132915980188, mac16 = {51100, 39846, 29367}, 
     mac8 = "\234Ç\246\233\267r", '\000' <repeats 13 times>}, spare0 = 0, 
   spare1 = 0, lle_tbl = 0xfffffe81071997d0, lle_head = 0xfffffe810744fa10, 
   lle_free = 0xffffffff804ee628 <in6_lltable_destroy_lle>, lle_ll_free = 0x0, 
   la_hold = 0x0, la_numheld = 0, la_expire = 0, la_flags = 8264, la_asked = 3, 
   la_preempt = 0, ln_byhint = 0, ln_state = 0, ln_router = 1, ln_ntick = 0, 
   lle_refcnt = 412, lle_chain = {le_next = 0x0, le_prev = 0x0}, lle_timer = {
     _c_store = {0xffffffff815025e8 <callout_cpu0+168>, 
       0xffffffff815025e8 <callout_cpu0+168>, 
       0xffffffff8050557c <nd6_llinfo_timer>, 0xfffffe8135c863c8, 
       0xffffffff81502540 <callout_cpu0>, 0x108000e39cf, 0x11deeba1, 0x0, 0x0, 
       0x0}}, lle_lock = {rw_owner = 18446742429671606372}, la_opaque = 0x0}


 and then in 

 #6  0xffffffff804eed4d in in6_lltable_lookup (llt=<optimized out>, 
     flags=<optimized out>, l3addr=<optimized out>)
     at ../../../../netinet6/in6.c:2487

         if (flags & LLE_EXCLUSIVE)
                 LLE_WLOCK(lle);
         else
                 LLE_RLOCK(lle);
         return lle;
 }

 we try to lock it again.

 (gdb) p lle
 $5 = (struct llentry *) 0xfffffe8135c863c8
 (gdb) p *lle
 $6 = {lle_next = {le_next = 0x0, le_prev = 0xfffffe810744fa10}, r_l3addr = {
     addr4 = {s_addr = 33587454}, addr6 = {__u6_addr = {
         __u6_addr8 = "\376\000\002\000\000\000\000\236Ç\246\377\376\233\267r", __u6_addr16 = {33022, 512, 0, 0, 51102, 65446, 39934, 29367}, __u6_addr32 = {
           33587454, 0, 4289120158, 1924635646}}}}, ll_addr = {
     mac_aligned = 126132915980188, mac16 = {51100, 39846, 29367}, 
     mac8 = "\234Ç\246\233\267r", '\000' <repeats 13 times>}, spare0 = 0, 
   spare1 = 0, lle_tbl = 0xfffffe81071997d0, lle_head = 0xfffffe810744fa10, 
   lle_free = 0xffffffff804ee628 <in6_lltable_destroy_lle>, lle_ll_free = 0x0, 
   la_hold = 0x0, la_numheld = 0, la_expire = 0, la_flags = 8264, la_asked = 3, 
   la_preempt = 0, ln_byhint = 0, ln_state = 0, ln_router = 1, ln_ntick = 0, 
   lle_refcnt = 412, lle_chain = {le_next = 0x0, le_prev = 0x0}, lle_timer = {
     _c_store = {0xffffffff815025e8 <callout_cpu0+168>, 
       0xffffffff815025e8 <callout_cpu0+168>, 
       0xffffffff8050557c <nd6_llinfo_timer>, 0xfffffe8135c863c8, 
       0xffffffff81502540 <callout_cpu0>, 0x108000e39cf, 0x11deeba1, 0x0, 0x0, 
       0x0}}, lle_lock = {rw_owner = 18446742429671606372}, la_opaque = 0x0}


From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51056: reader-writer lock error
Date: Sat, 9 Apr 2016 22:26:07 +0200

 Here is my current v6 routing information:

 Neighbor                             Linklayer Address  Netif Expire    S Flags
 fe80::9ec7:a6ff:fe9b:b772%athn0      9c:c7:a6:9b:b7:72  athn0 36s       R Rp
 2001:4dd0:ff00:9258:c617:feff:fece:3734 c4:17:fe:ce:37:34 athn0 permanent R p
 fe80::c617:feff:fece:3734%athn0      c4:17:fe:ce:37:34  athn0 permanent R p
 fe80::226:2dff:fe90:46d1%bge0        00:26:2d:90:46:d1   bge0 permanent R p

 and:

 Internet6:
 Destination                             Gateway                        Flags    Refs      Use    Mtu Interface
 ::/104                                  ::1                            UGRS        -        -  33648  lo0
 ::/96                                   ::1                            UGRS        -        -  33648  lo0
 default                                 fe80::9ec7:a6ff:fe9b:b772      UG          -        -      -  athn0
 ::1                                     ::1                            UH          -        -  33648  lo0
 ::127.0.0.0/104                         ::1                            UGRS        -        -  33648  lo0
 ::224.0.0.0/100                         ::1                            UGRS        -        -  33648  lo0
 ::255.0.0.0/104                         ::1                            UGRS        -        -  33648  lo0
 ::ffff:0.0.0.0/96                       ::1                            UGRS        -        -  33648  lo0
 2001:db8::/32                           ::1                            UGRS        -        -  33648  lo0
 2001:4dd0:ff00:9258::/64                link#2                         UC          -        -      -  athn0
 2001:4dd0:ff00:9258:c617:feff:fece:3734 link#2                         UHl         -        -      -  lo0
 2002::/24                               ::1                            UGRS        -        -  33648  lo0
 2002:7f00::/24                          ::1                            UGRS        -        -  33648  lo0
 2002:e000::/20                          ::1                            UGRS        -        -  33648  lo0
 2002:ff00::/24                          ::1                            UGRS        -        -  33648  lo0
 fe80::/10                               ::1                            UGRS        -        -  33648  lo0
 fe80::%bge0/64                          link#1                         UC          -        -      -  bge0
 fe80::226:2dff:fe90:46d1                link#1                         UHl         -        -      -  lo0
 fe80::%athn0/64                         link#2                         UC          -        -      -  athn0
 fe80::c617:feff:fece:3734               link#2                         UHl         -        -      -  lo0
 fe80::%lo0/64                           fe80::1                        U           -        -      -  lo0
 fe80::1                                 lo0                            UHl         -        -      -  lo0
 ff01:1::/32                             link#1                         UC          -        -      -  bge0
 ff01:2::/32                             link#2                         UC          -        -      -  athn0
 ff01:3::/32                             ::1                            UC          -        -  33648  lo0
 ff02::%bge0/32                          link#1                         UC          -        -      -  bge0
 ff02::%athn0/32                         link#2                         UC          -        -      -  athn0
 ff02::%lo0/32                           ::1                            UC          -        -  33648  lo0

Responsible-Changed-From-To: kern-bug-people->ozaki-r
Responsible-Changed-By: ozaki-r@NetBSD.org
Responsible-Changed-When: Sun, 10 Apr 2016 07:32:16 +0000
Responsible-Changed-Why:
mine


From: "Ryota Ozaki" <ozaki-r@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/51056 CVS commit: src/sys/netinet6
Date: Sun, 10 Apr 2016 08:15:52 +0000

 Module Name:	src
 Committed By:	ozaki-r
 Date:		Sun Apr 10 08:15:52 UTC 2016

 Modified Files:
 	src/sys/netinet6: nd6.c

 Log Message:
 Don't call pfxlist_onlink_check with holding llentry lock

 Sync nd6_free with FreeBSD (as of 2016-04-10).

 Should fix PR kern/51056.


 To generate a diff of this commit:
 cvs rdiff -u -r1.189 -r1.190 src/sys/netinet6/nd6.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: ozaki-r@NetBSD.org
State-Changed-When: Sun, 10 Apr 2016 08:22:13 +0000
State-Changed-Why:
A possible fix has been committed. Let me know if the failure still happens.


State-Changed-From-To: feedback->closed
State-Changed-By: ozaki-r@NetBSD.org
State-Changed-When: Wed, 25 May 2016 06:45:34 +0000
State-Changed-Why:
No recurrence over one month should be ok to close.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.