NetBSD Problem Report #50186

From www@NetBSD.org  Tue Sep  1 03:36:13 2015
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id F2741A65BA
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  1 Sep 2015 03:36:12 +0000 (UTC)
Message-Id: <20150901033611.B83C5A65BB@mollari.NetBSD.org>
Date: Tue,  1 Sep 2015 03:36:11 +0000 (UTC)
From: jdbaker@mylinuxisp.com
Reply-To: jdbaker@mylinuxisp.com
To: gnats-bugs@NetBSD.org
Subject: sparc memfault panic after 7.99.21 ARP changes
X-Send-Pr-Version: www-1.0

>Number:         50186
>Category:       kern
>Synopsis:       sparc memfault panic after 7.99.21 ARP changes
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Sep 01 03:40:00 +0000 2015
>Closed-Date:    Tue Sep 15 08:49:04 +0000 2015
>Last-Modified:  Tue Sep 15 08:49:04 +0000 2015
>Originator:     John D. Baker
>Release:        NetBSD/sparc-7.99.21
>Organization:
>Environment:
NetBSD jean.technoskunk.fur 7.99.21 NetBSD 7.99.21 (JEAN) #0: Mon Aug 31 20:21:50 CDT 2015  sysop@skuld.technoskunk.fur:/d0/build/current/obj/sparc/sys/arch/sparc/compile/JEAN sparc

NetBSD jean.technoskunk.fur 7.99.21 NetBSD 7.99.21 (GENERIC) #19: Mon Aug 31 20:03:50 CDT 2015  sysop@skuld.technoskunk.fur:/d0/build/current/obj/sparc/sys/arch/sparc/compile/GENERIC sparc

>Description:
Following the changes to ARP cache handling beginning with the
following commit:

  http://mail-index.netbsd.org/source-changes/2015/08/31/msg068612.html

sparc platform will panic after an indeterminate time (probably when
about to expire an ARP entry) as follows:

From custom kernel JEAN:

cpu0: data fault: pc=0xf008350c addr=0x10 sfsr=0x326<PERR=0x0,LVL=0x3,AT=0x1,FT=0x1,FAV,OW>
panic: kernel fault
Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:        or              %
o7, %g0, %g1
db> bt
cpu_Debugger(0xf03a4758, 0xf99efd20, 0xf0432400, 0xf04331a8, 0xf0433000, 0x104) a
t netbsd:panic+0x20
panic(0xf03a4758, 0x0, 0xf008350c, 0x10, 0xf99efd40, 0xf040cc00) at netbsd:mem_a
ccess_fault4m+0x5a4
mem_access_fault4m(0x9, 0x326, 0x10, 0xf99efde0, 0xf0409ff0, 0xf0a0d540) at netb
sd:memfault_sun4m+0xe8
memfault_sun4m(0xf0b366ac, 0x1, 0x0, 0xf041e318, 0xf0a0d544, 0xf0a0d544) at netb
sd:arptimer+0x6c
arptimer(0xf0b36600, 0xf0a0d540, 0xf0b39008, 0x0, 0xf0b366ac, 0xf0437800) at net
bsd:callout_softclock+0x154
callout_softclock(0xf041e31c, 0x1000000, 0x10000, 0xf041e318, 0xf0b36600, 0xf008
3478) at netbsd:softint_thread+0x94
softint_thread(0xf0a0d540, 0x3000, 0x2000, 0x0, 0x0, 0xf99e8218) at netbsd:lwp_t
rampoline+0x8
db>


From GENERIC:

cpu0: data fault: pc=0xf00a626c addr=0x10 sfsr=0x326<PERR=0x0,LVL=0x3,AT=0x1,FT=0x1,FAV,OW>
panic: kernel fault
Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:        or              %
o7, %g0, %g1
db> bt
cpu_Debugger(0xf03efb58, 0xf9ac7d20, 0xf0482c00, 0xf0483a58, 0xf0483800, 0x104) a
t netbsd:panic+0x20
panic(0xf03efb58, 0x0, 0xf00a626c, 0x10, 0xf9ac7d40, 0xf045c800) at netbsd:mem_a
ccess_fault4m+0x5b0
mem_access_fault4m(0x9, 0x326, 0x10, 0xf9ac7de0, 0xf0459b20, 0xf0a60540) at netb
sd:memfault_sun4m+0xe8
memfault_sun4m(0xf0b8852c, 0x1, 0x0, 0xf04712a0, 0xf0a60544, 0xf0a60544) at netb
sd:arptimer+0x6c
arptimer(0xf0b88480, 0xf0a60540, 0xf0b8c808, 0x0, 0xf0b8852c, 0xf0488800) at net
bsd:callout_softclock+0x154
callout_softclock(0xf04712a4, 0x1000000, 0x10000, 0xf04712a0, 0xf0b88480, 0xf00a
61d8) at netbsd:softint_thread+0x94
softint_thread(0xf0a60540, 0x3000, 0x2000, 0x0, 0x0, 0xf9ac0218) at netbsd:lwp_t
rampoline+0x8
db>

Machine is SPARCstation 5, 110Mhz, 256MB RAM.  Operating diskless.
(NetBSD-7.0_RC3 on local disk)

I hope to confirm this observation on another system, but it is
engaged in another task at this time.
>How-To-Repeat:
Build sparc release from 201509010100 or later and boot GENERIC.
>Fix:

>Release-Note:

>Audit-Trail:
From: Ryota Ozaki <ozaki-r@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Tue, 1 Sep 2015 15:52:45 +0900

 Hi,

 On Tue, Sep 1, 2015 at 12:40 PM,  <jdbaker@mylinuxisp.com> wrote:
 >>Number:         50186
 >>Category:       kern
 >>Synopsis:       sparc memfault panic after 7.99.21 ARP changes
 >>Confidential:   no
 >>Severity:       critical
 >>Priority:       high
 >>Responsible:    kern-bug-people
 >>State:          open
 >>Class:          sw-bug
 >>Submitter-Id:   net
 >>Arrival-Date:   Tue Sep 01 03:40:00 +0000 2015
 >>Originator:     John D. Baker
 >>Release:        NetBSD/sparc-7.99.21
 >>Organization:
 >>Environment:
 > NetBSD jean.technoskunk.fur 7.99.21 NetBSD 7.99.21 (JEAN) #0: Mon Aug 31 20:21:50 CDT 2015  sysop@skuld.technoskunk.fur:/d0/build/current/obj/sparc/sys/arch/sparc/compile/JEAN sparc
 >
 > NetBSD jean.technoskunk.fur 7.99.21 NetBSD 7.99.21 (GENERIC) #19: Mon Aug 31 20:03:50 CDT 2015  sysop@skuld.technoskunk.fur:/d0/build/current/obj/sparc/sys/arch/sparc/compile/GENERIC sparc
 >
 >>Description:
 > Following the changes to ARP cache handling beginning with the
 > following commit:
 >
 >   http://mail-index.netbsd.org/source-changes/2015/08/31/msg068612.html
 >
 > sparc platform will panic after an indeterminate time (probably when
 > about to expire an ARP entry) as follows:
 >
 > From custom kernel JEAN:
 >
 > cpu0: data fault: pc=0xf008350c addr=0x10 sfsr=0x326<PERR=0x0,LVL=0x3,AT=0x1,FT=0x1,FAV,OW>
 > panic: kernel fault
 > Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:        or              %
 > o7, %g0, %g1
 > db> bt
 > cpu_Debugger(0xf03a4758, 0xf99efd20, 0xf0432400, 0xf04331a8, 0xf0433000, 0x104) a
 > t netbsd:panic+0x20
 > panic(0xf03a4758, 0x0, 0xf008350c, 0x10, 0xf99efd40, 0xf040cc00) at netbsd:mem_a
 > ccess_fault4m+0x5a4
 > mem_access_fault4m(0x9, 0x326, 0x10, 0xf99efde0, 0xf0409ff0, 0xf0a0d540) at netb
 > sd:memfault_sun4m+0xe8
 > memfault_sun4m(0xf0b366ac, 0x1, 0x0, 0xf041e318, 0xf0a0d544, 0xf0a0d544) at netb
 > sd:arptimer+0x6c
 > arptimer(0xf0b36600, 0xf0a0d540, 0xf0b39008, 0x0, 0xf0b366ac, 0xf0437800) at net
 > bsd:callout_softclock+0x154
 > callout_softclock(0xf041e31c, 0x1000000, 0x10000, 0xf041e318, 0xf0b36600, 0xf008
 > 3478) at netbsd:softint_thread+0x94
 > softint_thread(0xf0a0d540, 0x3000, 0x2000, 0x0, 0x0, 0xf99e8218) at netbsd:lwp_t
 > rampoline+0x8
 > db>
 >
 >
 > From GENERIC:
 >
 > cpu0: data fault: pc=0xf00a626c addr=0x10 sfsr=0x326<PERR=0x0,LVL=0x3,AT=0x1,FT=0x1,FAV,OW>
 > panic: kernel fault
 > Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:        or              %
 > o7, %g0, %g1
 > db> bt
 > cpu_Debugger(0xf03efb58, 0xf9ac7d20, 0xf0482c00, 0xf0483a58, 0xf0483800, 0x104) a
 > t netbsd:panic+0x20
 > panic(0xf03efb58, 0x0, 0xf00a626c, 0x10, 0xf9ac7d40, 0xf045c800) at netbsd:mem_a
 > ccess_fault4m+0x5b0
 > mem_access_fault4m(0x9, 0x326, 0x10, 0xf9ac7de0, 0xf0459b20, 0xf0a60540) at netb
 > sd:memfault_sun4m+0xe8
 > memfault_sun4m(0xf0b8852c, 0x1, 0x0, 0xf04712a0, 0xf0a60544, 0xf0a60544) at netb
 > sd:arptimer+0x6c
 > arptimer(0xf0b88480, 0xf0a60540, 0xf0b8c808, 0x0, 0xf0b8852c, 0xf0488800) at net
 > bsd:callout_softclock+0x154
 > callout_softclock(0xf04712a4, 0x1000000, 0x10000, 0xf04712a0, 0xf0b88480, 0xf00a
 > 61d8) at netbsd:softint_thread+0x94
 > softint_thread(0xf0a60540, 0x3000, 0x2000, 0x0, 0x0, 0xf9ac0218) at netbsd:lwp_t
 > rampoline+0x8
 > db>
 >
 > Machine is SPARCstation 5, 110Mhz, 256MB RAM.  Operating diskless.
 > (NetBSD-7.0_RC3 on local disk)
 >
 > I hope to confirm this observation on another system, but it is
 > engaged in another task at this time.
 >>How-To-Repeat:
 > Build sparc release from 201509010100 or later and boot GENERIC.
 >>Fix:
 >

 I investigated where it happens:

 ----
 $ ~/git/netbsd-src/work.tools/sparc--netbsdelf/bin/nm -n
 work.sparc/sys/arch/sparc/compile/GENERIC/netbsd |grep arptimer
 f00a61d8 t arptimer
 $ ruby -e 'puts (0xf00a61d8 + 0x6c).to_s(16)'
 f00a6244
 $ ~/git/netbsd-src/work.tools/sparc--netbsdelf/bin/objdump -d -S
 work.sparc/sys/arch/sparc/compile/GENERIC/netbsd.gdb |grep -10
 f00a6244
         ifp = lle->lle_tbl->llt_ifp;
 f00a6234:       c2 06 20 40     ld  [ %i0 + 0x40 ], %g1

         callout_stop(&lle->la_timer);
 f00a6238:       90 10 00 1b     mov  %i3, %o0
 f00a623c:       40 03 34 68     call  f01733dc <callout_stop>
 f00a6240:       f4 00 60 10     ld  [ %g1 + 0x10 ], %i2

         /* XXX: LOR avoidance. We still have ref on lle. */
         LLE_WUNLOCK(lle);
 f00a6244:       40 02 f7 63     call  f0163fd0 <rw_exit>
 f00a6248:       90 10 00 1c     mov  %i4, %o0
 /*
  * Free an arp entry.
  */
 static void arptfree(struct llentry *la)
 {
         struct rtentry *rt = la->la_rt;
 f00a624c:       f6 06 20 b0     ld  [ %i0 + 0xb0 ], %i3

         KASSERT(rt != NULL);
 ----

 Hmm, the place calling rw_exit? Or just before/after it?
 I'm not familiar with sparc so I may be wrong on the
 investigation.

 Thanks,
   ozaki-r

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Tue, 01 Sep 2015 17:32:57 +1000

 > >From GENERIC:
 > =

 > cpu0: data fault: pc=3D0xf00a626c addr=3D0x10 sfsr=3D0x326<PERR=3D0x0,LV=
 L=3D0x3,AT=3D0x1,FT=3D0x1,FAV,OW>
 > panic: kernel fault
 > Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:        or      =
         %o7, %g0, %g1
 > db> bt
 > cpu_Debugger(0xf03efb58, 0xf9ac7d20, 0xf0482c00, 0xf0483a58, 0xf0483800,=
  0x104) at netbsd:panic+0x20
 > panic(0xf03efb58, 0x0, 0xf00a626c, 0x10, 0xf9ac7d40, 0xf045c800) at netb=
 sd:mem_access_fault4m+0x5b0
 > mem_access_fault4m(0x9, 0x326, 0x10, 0xf9ac7de0, 0xf0459b20, 0xf0a60540)=
  at netbsd:memfault_sun4m+0xe8
 > memfault_sun4m(0xf0b8852c, 0x1, 0x0, 0xf04712a0, 0xf0a60544, 0xf0a60544)=
  at netbsd:arptimer+0x6c
 > arptimer(0xf0b88480, 0xf0a60540, 0xf0b8c808, 0x0, 0xf0b8852c, 0xf0488800=
 ) at netbsd:callout_softclock+0x154
 > callout_softclock(0xf04712a4, 0x1000000, 0x10000, 0xf04712a0, 0xf0b88480=
 , 0xf00a61d8) at netbsd:softint_thread+0x94
 > softint_thread(0xf0a60540, 0x3000, 0x2000, 0x0, 0x0, 0xf9ac0218) at netb=
 sd:lwp_trampoline+0x8
 > db>

 OK, so i built my own GENERIC.  i get this:

 (gdb) l *(arptimer+0x6c)
 0xf00a6244 is in arptimer (/usr/src4/sys/netinet/if_arp.c:352).
 347             ifp =3D lle->lle_tbl->llt_ifp;
 348     =

 349             callout_stop(&lle->la_timer);
 350     =

 351             /* XXX: LOR avoidance. We still have ref on lle. */
 352             LLE_WUNLOCK(lle);
 353     =

 354             /* We have to call this w/o lock */
 355             arptfree(lle);

 disass/m arptimer gives:

 351             /* XXX: LOR avoidance. We still have ref on lle. */
 352             LLE_WUNLOCK(lle);
    0xf00a6244 <+108>:   call  0xf0163fd0 <rw_vector_exit>          <--- [a=
 ]
    0xf00a6248 <+112>:   mov  %i4, %o0

 but my addresses don't match yours entirely.  [a] is the instruction that
 appears to me faulting... which makes little sense.

 John, can you try the above gdb commands for yourself?  thanks.


 .mrg.

From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Tue, 1 Sep 2015 04:19:07 -0500 (CDT)

 On Tue, 1 Sep 2015, matthew green wrote:

 >  John, can you try the above gdb commands for yourself?  thanks.

 I need to build a DEBUG-enabled GENERIC first.  That is next on the
 list once my build host finishes its current task.

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Tue, 1 Sep 2015 10:35:04 -0500 (CDT)

 On Tue, 1 Sep 2015, matthew green wrote:

 >  John, can you try the above gdb commands for yourself?  thanks.

 My freshly-built DEBUG-enabled GENERIC behaves the same.  The panic:

 cpu0: data fault: pc=0xf00a626c addr=0x10 sfsr=0x326<PERR=0x0,LVL=0x3,AT=0x1,FT=0x1,FAV,OW>
 panic: kernel fault
 Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:        or              %
 o7, %g0, %g1
 db> bt
 cpu_Debugger(0xf03efba0, 0xf9ac3d20, 0xf0482c00, 0xf0483a98, 0xf0483800, 0x104) a
 t netbsd:panic+0x20
 panic(0xf03efba0, 0x0, 0xf00a626c, 0x10, 0xf9ac3d40, 0xf045c800) at netbsd:mem_a
 ccess_fault4m+0x5b0
 mem_access_fault4m(0x9, 0x326, 0x10, 0xf9ac3de0, 0xf0459b60, 0xf0a5c540) at netb
 sd:memfault_sun4m+0xe8
 memfault_sun4m(0xf0b8452c, 0x1, 0x0, 0xf04712e0, 0xf0a5c544, 0xf0a5c544) at netb
 sd:arptimer+0x6c
 arptimer(0xf0b84480, 0xf0a5c540, 0xf0b88808, 0x0, 0xf0b8452c, 0xf0488800) at net
 bsd:callout_softclock+0x154
 callout_softclock(0xf04712e4, 0x1000000, 0x10000, 0xf04712e0, 0xf0b84480, 0xf00a
 61d8) at netbsd:softint_thread+0x94
 softint_thread(0xf0a5c540, 0x3000, 0x2000, 0x0, 0x0, 0xf9a3b218) at netbsd:lwp_t
 rampoline+0x8
 db> 

 Loading into 'gdb' gives the same as you observed:

 Reading symbols from netbsd.gdb...done.
 (gdb) l *(arptimer+0x6c)
 0xf00a6244 is in arptimer (/x/current/src/sys/netinet/if_arp.c:352).
 347             ifp = lle->lle_tbl->llt_ifp;
 348     
 349             callout_stop(&lle->la_timer);
 350     
 351             /* XXX: LOR avoidance. We still have ref on lle. */
 352             LLE_WUNLOCK(lle);
 353     
 354             /* We have to call this w/o lock */
 355             arptfree(lle);
 356     

 (gdb) disass/m arptimer
 Dump of assembler code for function arptimer:
 [...]
 350     
 351             /* XXX: LOR avoidance. We still have ref on lle. */
 352             LLE_WUNLOCK(lle);
    0xf00a6244 <+108>:   call  0xf0163fd0 <rw_vector_exit>
    0xf00a6248 <+112>:   mov  %i4, %o0

 The program counter reported in the initial fault message:

 0xf00a626c

 gives:

 (gdb) l *0xf00a626c
 0xf00a626c is in arptimer (/x/current/src/sys/netinet/if_arp.c:1438).
 1433            if (la->la_rt != NULL) {
 1434                    rtfree(la->la_rt);
 1435                    la->la_rt = NULL;
 1436            }
 1437    
 1438            rtrequest(RTM_DELETE, rt_getkey(rt), NULL, rt_mask(rt), 0, NULL);
 1439    }
 1440    
 1441    /*
 1442     * Lookup or enter a new address in arptab.

 and disassembling there gives:

 1437    
 1438            rtrequest(RTM_DELETE, rt_getkey(rt), NULL, rt_mask(rt), 0, NULL);
    0xf00a6268 <+144>:   clr  %o2
    0xf00a626c <+148>:   ld  [ %i3 + 0x10 ], %o3
    0xf00a6270 <+152>:   clr  %o4
    0xf00a6274 <+156>:   clr  %o5
    0xf00a6278 <+160>:   ld  [ %i3 + 0xb4 ], %o1
    0xf00a627c <+164>:   call  0xf025a39c <rtrequest>
    0xf00a6280 <+168>:   mov  2, %o0

 I don't know SPARC assembly or the register usage conventions, but it
 looks to me like there is an expected load at offset 0x10 from an address
 in "i3", but since the address reported in the fault message is "0x10",
 it would seem that "i3" contains 0 (zero).

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Tue, 1 Sep 2015 10:52:15 -0500 (CDT)

 Actually, looking back earlier in the disassembly:

 1424    /*
 1425     * Free an arp entry.
 1426     */
 1427    static void arptfree(struct llentry *la)
 ---Type <return> to continue, or q <return> to quit---
 1428    {
 1429            struct rtentry *rt = la->la_rt;
    0xf00a624c <+116>:   ld  [ %i0 + 0xb0 ], %i3

 1430    
 1431            KASSERT(rt != NULL);
 1432    
 1433            if (la->la_rt != NULL) {
    0xf00a6250 <+120>:   cmp  %i3, 0
    0xf00a6254 <+124>:   be  0xf00a626c <arptimer+148>
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    0xf00a6258 <+128>:   clr  %o2

 1434                    rtfree(la->la_rt);
    0xf00a625c <+132>:   call  0xf025a784 <rtfree>
    0xf00a6260 <+136>:   mov  %i3, %o0

 1435                    la->la_rt = NULL;
    0xf00a6264 <+140>:   clr  [ %i0 + 0xb0 ]

 1436            }

 The fault appears to be a KASSERT in disguise?  Register "i3" is compared
 with zero and if equal (i.e., zero) branch to the address reported in
 the fault message.  This would indicate that the arp entry requested to
 be freed is NULL (or a wild pointer)?

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Wed, 2 Sep 2015 06:44:47 -0500 (CDT)

 I've repeated my observations on a few other machines:

 SPARCstation 5, 85MHz, 256MB
 SPARCstation 10, 40MHz (SM41 SuperSPARC), 128MB

 and most recently:

 NetBSD dpe2950 7.99.21 NetBSD 7.99.21 (GENERIC) #63: Mon Aug 31 19:52:42 CDT 2015  sysop@yggdrasil.technoskunk.fur:/r0/build/current/obj/amd64/sys/arch/amd64/compile/GENERIC amd64

 The panic:

 panic: kernel diagnostic assertion "rt != NULL" failed: file "/r0/nbsd/current/src/sys/netinet/if_arp.c", line 1431 
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 rip ffffffff8028a395 cs 8 rflags 246 cr2 ffff8000ddb92800 ilevel 2 rsp fffffe8115ef0e80
 curlwp 0xfffffe8115f1ca40 pid 0.50 lowest kstack 0xfffffe8115eed2c0
 Stopped in pid 0.50 (system) at netbsd:breakpoint+0x5:  leave
 db{6}> bt
 breakpoint() at netbsd:breakpoint+0x5
 vpanic() at netbsd:vpanic+0x13c
 kern_assert() at netbsd:kern_assert+0x4f
 arptimer() at netbsd:arptimer+0x1d0
 callout_softclock() at netbsd:callout_softclock+0x1d0
 softint_dispatch() at netbsd:softint_dispatch+0xd3
 DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe8115ef0ff0
 Xsoftintr() at netbsd:Xsoftintr+0x4f
 --- interrupt ---
 0:
 db{6}>


 Excerpt from 'dmesg':

 NetBSD 7.99.21 (GENERIC) #63: Mon Aug 31 19:52:42 CDT 2015
 	sysop@yggdrasil.technoskunk.fur:/r0/build/current/obj/amd64/sys/arch/amd64/compile/GENERIC
 total memory = 12287 MB
 avail memory = 11911 MB
 [...]
 ACPI: RSDP 0x00000000000F2620 000024 (v02 DELL  )
 ACPI: XSDT 0x00000000000F26A0 00004C (v01 DELL   PE_SC3   00000001 DELL 00000001)
 ACPI: FACP 0x00000000000F27A8 0000F4 (v03 DELL   PE_SC3   00000001 DELL 00000001)
 ACPI: DSDT 0x00000000CFFA8000 003C53 (v01 DELL   PE_SC3   00000001 MSFT 0100000E)
 ACPI: FACS 0x00000000CFFB7C00 000040
 ACPI: FACS 0x00000000CFFB7C00 000040
 ACPI: APIC 0x00000000000F289C 0000C8 (v01 DELL   PE_SC3   00000001 DELL 00000001)
 ACPI: SPCR 0x00000000000F297D 000050 (v01 DELL   PE_SC3   00000001 DELL 00000001)
 ACPI: HPET 0x00000000000F29CD 000038 (v01 DELL   PE_SC3   00000001 DELL 00000001)
 ACPI: MCFG 0x00000000000F2A05 00003C (v01 DELL   PE_SC3   00000001 DELL 00000001)
 ACPI: All ACPI Tables successfully acquired
 ioapic0 at mainbus0 apid 8
 ioapic1 at mainbus0 apid 9
 cpu0 at mainbus0 apid 0
 cpu0: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 cpu1 at mainbus0 apid 4
 cpu1: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 cpu2 at mainbus0 apid 1
 cpu2: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 cpu3 at mainbus0 apid 5
 cpu3: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 cpu4 at mainbus0 apid 2
 cpu4: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 cpu5 at mainbus0 apid 6
 cpu5: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 cpu6 at mainbus0 apid 3
 cpu6: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 cpu7 at mainbus0 apid 7
 cpu7: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7


 So, this is not just a SPARC issue.

 I see that KASSERTs in "if_arp.c" were recently disabled.

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	jdbaker@mylinuxisp.com
Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Thu, 3 Sep 2015 19:35:39 +0900

 Hi,

 Thank you for your investigation.

 On Wed, Sep 2, 2015 at 8:50 PM, John D. Baker <jdbaker@mylinuxisp.com> wrote:
 > The following reply was made to PR kern/50186; it has been noted by GNATS.
 >
 > From: "John D. Baker" <jdbaker@mylinuxisp.com>
 > To: gnats-bugs@NetBSD.org
 > Cc:
 > Subject: re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
 > Date: Wed, 2 Sep 2015 06:44:47 -0500 (CDT)
 >
 >  I've repeated my observations on a few other machines:
 >
 >  SPARCstation 5, 85MHz, 256MB
 >  SPARCstation 10, 40MHz (SM41 SuperSPARC), 128MB
 >
 >  and most recently:
 >
 >  NetBSD dpe2950 7.99.21 NetBSD 7.99.21 (GENERIC) #63: Mon Aug 31 19:52:42 CDT 2015  sysop@yggdrasil.technoskunk.fur:/r0/build/current/obj/amd64/sys/arch/amd64/compile/GENERIC amd64
 >
 >  The panic:
 >
 >  panic: kernel diagnostic assertion "rt != NULL" failed: file "/r0/nbsd/current/src/sys/netinet/if_arp.c", line 1431
 >  fatal breakpoint trap in supervisor mode
 >  trap type 1 code 0 rip ffffffff8028a395 cs 8 rflags 246 cr2 ffff8000ddb92800 ilevel 2 rsp fffffe8115ef0e80
 >  curlwp 0xfffffe8115f1ca40 pid 0.50 lowest kstack 0xfffffe8115eed2c0
 >  Stopped in pid 0.50 (system) at netbsd:breakpoint+0x5:  leave
 >  db{6}> bt
 >  breakpoint() at netbsd:breakpoint+0x5
 >  vpanic() at netbsd:vpanic+0x13c
 >  kern_assert() at netbsd:kern_assert+0x4f
 >  arptimer() at netbsd:arptimer+0x1d0
 >  callout_softclock() at netbsd:callout_softclock+0x1d0
 >  softint_dispatch() at netbsd:softint_dispatch+0xd3
 >  DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe8115ef0ff0
 >  Xsoftintr() at netbsd:Xsoftintr+0x4f
 >  --- interrupt ---
 >  0:
 >  db{6}>

 How do you produce the panic? I don't reproduce on my amd64 machines yet.

 >
 >
 >  Excerpt from 'dmesg':
 >
 >  NetBSD 7.99.21 (GENERIC) #63: Mon Aug 31 19:52:42 CDT 2015
 >         sysop@yggdrasil.technoskunk.fur:/r0/build/current/obj/amd64/sys/arch/amd64/compile/GENERIC
 >  total memory = 12287 MB
 >  avail memory = 11911 MB
 >  [...]
 >  ACPI: RSDP 0x00000000000F2620 000024 (v02 DELL  )
 >  ACPI: XSDT 0x00000000000F26A0 00004C (v01 DELL   PE_SC3   00000001 DELL 00000001)
 >  ACPI: FACP 0x00000000000F27A8 0000F4 (v03 DELL   PE_SC3   00000001 DELL 00000001)
 >  ACPI: DSDT 0x00000000CFFA8000 003C53 (v01 DELL   PE_SC3   00000001 MSFT 0100000E)
 >  ACPI: FACS 0x00000000CFFB7C00 000040
 >  ACPI: FACS 0x00000000CFFB7C00 000040
 >  ACPI: APIC 0x00000000000F289C 0000C8 (v01 DELL   PE_SC3   00000001 DELL 00000001)
 >  ACPI: SPCR 0x00000000000F297D 000050 (v01 DELL   PE_SC3   00000001 DELL 00000001)
 >  ACPI: HPET 0x00000000000F29CD 000038 (v01 DELL   PE_SC3   00000001 DELL 00000001)
 >  ACPI: MCFG 0x00000000000F2A05 00003C (v01 DELL   PE_SC3   00000001 DELL 00000001)
 >  ACPI: All ACPI Tables successfully acquired
 >  ioapic0 at mainbus0 apid 8
 >  ioapic1 at mainbus0 apid 9
 >  cpu0 at mainbus0 apid 0
 >  cpu0: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 >  cpu1 at mainbus0 apid 4
 >  cpu1: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 >  cpu2 at mainbus0 apid 1
 >  cpu2: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 >  cpu3 at mainbus0 apid 5
 >  cpu3: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 >  cpu4 at mainbus0 apid 2
 >  cpu4: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 >  cpu5 at mainbus0 apid 6
 >  cpu5: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 >  cpu6 at mainbus0 apid 3
 >  cpu6: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 >  cpu7 at mainbus0 apid 7
 >  cpu7: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz, id 0x6f7
 >
 >
 >  So, this is not just a SPARC issue.
 >
 >  I see that KASSERTs in "if_arp.c" were recently disabled.

 Does the fix suppress the panic on SPARC machines?

 Thanks,
   ozaki-r

From: Christos Zoulas <christos@zoulas.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@NetBSD.org>
Cc: "kern-bug-people@netbsd.org" <kern-bug-people@netbsd.org>,
 "gnats-admin@netbsd.org" <gnats-admin@netbsd.org>,
 "netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>,
 "jdbaker@mylinuxisp.com" <jdbaker@mylinuxisp.com>
Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Thu, 3 Sep 2015 14:38:39 +0300

 I just crashed in arptimer() so there are more locking problems in the code.=
  Can you document the locking discipline for la_rt and changing the lists?

 >=20

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Christos Zoulas <christos@zoulas.com>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, 
	"kern-bug-people@netbsd.org" <kern-bug-people@netbsd.org>, 
	"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>, "netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>, 
	"jdbaker@mylinuxisp.com" <jdbaker@mylinuxisp.com>
Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Fri, 4 Sep 2015 13:50:52 +0900

 On Thu, Sep 3, 2015 at 8:38 PM, Christos Zoulas <christos@zoulas.com> wrote:
 > I just crashed in arptimer() so there are more locking problems in the code. Can you document the locking discipline for la_rt and changing the lists?
 >

 I'm sorry for the defect.

 An ARP cache list of an interface is protected by a rwlock
 (IF_AFDATA_*LOCK) and each ARP cache is protected by a rwlock
 and refernce counting (LLE_*LOCK). However, la_rt still needs
 softnet_lock; if la_rt is accessed or modified without
 softnet_lock, it's a bug. And I found a bug :( lltable_free
 accesses la_rt but it's called without softnet_lock.

 Here is a patch:
 http://www.netbsd.org/~ozaki-r/lltable_free-softnet_lock.diff
 Could you try it?

 Thanks,
   ozaki-r

From: christos@zoulas.com (Christos Zoulas)
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, 
	"kern-bug-people@netbsd.org" <kern-bug-people@netbsd.org>, 
	"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>, 
	"netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>, 
	"jdbaker@mylinuxisp.com" <jdbaker@mylinuxisp.com>
Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Fri, 4 Sep 2015 04:40:42 -0400

 On Sep 4,  1:50pm, ozaki-r@netbsd.org (Ryota Ozaki) wrote:
 -- Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes

 | On Thu, Sep 3, 2015 at 8:38 PM, Christos Zoulas <christos@zoulas.com> wrote:
 | > I just crashed in arptimer() so there are more locking problems in the code. Can you document the locking discipline for la_rt and changing the lists?
 | >
 | 
 | I'm sorry for the defect.
 | 
 | An ARP cache list of an interface is protected by a rwlock
 | (IF_AFDATA_*LOCK) and each ARP cache is protected by a rwlock
 | and refernce counting (LLE_*LOCK). However, la_rt still needs
 | softnet_lock; if la_rt is accessed or modified without
 | softnet_lock, it's a bug. And I found a bug :( lltable_free
 | accesses la_rt but it's called without softnet_lock.
 | 
 | Here is a patch:
 | http://www.netbsd.org/~ozaki-r/lltable_free-softnet_lock.diff
 | Could you try it?
 | 

 Thanks, I am running with it now. Should we revert the KASSERT change
 too?

 christos

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Christos Zoulas <christos@zoulas.com>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, 
	"kern-bug-people@netbsd.org" <kern-bug-people@netbsd.org>, 
	"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>, "netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>, 
	"jdbaker@mylinuxisp.com" <jdbaker@mylinuxisp.com>
Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Fri, 4 Sep 2015 18:10:39 +0900

 On Fri, Sep 4, 2015 at 5:40 PM, Christos Zoulas <christos@zoulas.com> wrote:
 > On Sep 4,  1:50pm, ozaki-r@netbsd.org (Ryota Ozaki) wrote:
 > -- Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
 >
 > | On Thu, Sep 3, 2015 at 8:38 PM, Christos Zoulas <christos@zoulas.com> wrote:
 > | > I just crashed in arptimer() so there are more locking problems in the code. Can you document the locking discipline for la_rt and changing the lists?
 > | >
 > |
 > | I'm sorry for the defect.
 > |
 > | An ARP cache list of an interface is protected by a rwlock
 > | (IF_AFDATA_*LOCK) and each ARP cache is protected by a rwlock
 > | and refernce counting (LLE_*LOCK). However, la_rt still needs
 > | softnet_lock; if la_rt is accessed or modified without
 > | softnet_lock, it's a bug. And I found a bug :( lltable_free
 > | accesses la_rt but it's called without softnet_lock.
 > |
 > | Here is a patch:
 > | http://www.netbsd.org/~ozaki-r/lltable_free-softnet_lock.diff
 > | Could you try it?
 > |
 >
 > Thanks, I am running with it now. Should we revert the KASSERT change
 > too?

 Well, yes and no. Because KASSERT was actually wrong; la_rt can be NULL
 at the point according to my investigation for PR 50184. So anyway we
 have to get rid of it.

 I made a patch for the bug:
 http://www.netbsd.org/~ozaki-r/fix-PR50184.take2.diff
 which was for PR 50184. So reverting your commit and applying the patch
 instead might be easy for me. Of course rebasing my patch on the HEAD
 makes no difference though.

  ozaki-r

From: christos@zoulas.com (Christos Zoulas)
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, 
	"kern-bug-people@netbsd.org" <kern-bug-people@netbsd.org>, 
	"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>, 
	"netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>, 
	"jdbaker@mylinuxisp.com" <jdbaker@mylinuxisp.com>
Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Fri, 4 Sep 2015 05:23:26 -0400

 On Sep 4,  6:10pm, ozaki-r@netbsd.org (Ryota Ozaki) wrote:
 -- Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes

 | On Fri, Sep 4, 2015 at 5:40 PM, Christos Zoulas <christos@zoulas.com> wrote:
 | > On Sep 4,  1:50pm, ozaki-r@netbsd.org (Ryota Ozaki) wrote:
 | > -- Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
 | >
 | > | On Thu, Sep 3, 2015 at 8:38 PM, Christos Zoulas <christos@zoulas.com> wrote:
 | > | > I just crashed in arptimer() so there are more locking problems in the code. Can you document the locking discipline for la_rt and changing the lists?
 | > | >
 | > |
 | > | I'm sorry for the defect.
 | > |
 | > | An ARP cache list of an interface is protected by a rwlock
 | > | (IF_AFDATA_*LOCK) and each ARP cache is protected by a rwlock
 | > | and refernce counting (LLE_*LOCK). However, la_rt still needs
 | > | softnet_lock; if la_rt is accessed or modified without
 | > | softnet_lock, it's a bug. And I found a bug :( lltable_free
 | > | accesses la_rt but it's called without softnet_lock.
 | > |
 | > | Here is a patch:
 | > | http://www.netbsd.org/~ozaki-r/lltable_free-softnet_lock.diff
 | > | Could you try it?
 | > |
 | >
 | > Thanks, I am running with it now. Should we revert the KASSERT change
 | > too?
 | 
 | Well, yes and no. Because KASSERT was actually wrong; la_rt can be NULL
 | at the point according to my investigation for PR 50184. So anyway we
 | have to get rid of it.
 | 
 | I made a patch for the bug:
 | http://www.netbsd.org/~ozaki-r/fix-PR50184.take2.diff
 | which was for PR 50184. So reverting your commit and applying the patch
 | instead might be easy for me. Of course rebasing my patch on the HEAD
 | makes no difference though.

 Why don't you commit both of them?

 christos

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Christos Zoulas <christos@zoulas.com>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, 
	"kern-bug-people@netbsd.org" <kern-bug-people@netbsd.org>, 
	"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>, "netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>, 
	"jdbaker@mylinuxisp.com" <jdbaker@mylinuxisp.com>
Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Fri, 4 Sep 2015 18:30:40 +0900

 On Fri, Sep 4, 2015 at 6:23 PM, Christos Zoulas <christos@zoulas.com> wrote:
 > On Sep 4,  6:10pm, ozaki-r@netbsd.org (Ryota Ozaki) wrote:
 > -- Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
 >
 > | On Fri, Sep 4, 2015 at 5:40 PM, Christos Zoulas <christos@zoulas.com> wrote:
 > | > On Sep 4,  1:50pm, ozaki-r@netbsd.org (Ryota Ozaki) wrote:
 > | > -- Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
 > | >
 > | > | On Thu, Sep 3, 2015 at 8:38 PM, Christos Zoulas <christos@zoulas.com> wrote:
 > | > | > I just crashed in arptimer() so there are more locking problems in the code. Can you document the locking discipline for la_rt and changing the lists?
 > | > | >
 > | > |
 > | > | I'm sorry for the defect.
 > | > |
 > | > | An ARP cache list of an interface is protected by a rwlock
 > | > | (IF_AFDATA_*LOCK) and each ARP cache is protected by a rwlock
 > | > | and refernce counting (LLE_*LOCK). However, la_rt still needs
 > | > | softnet_lock; if la_rt is accessed or modified without
 > | > | softnet_lock, it's a bug. And I found a bug :( lltable_free
 > | > | accesses la_rt but it's called without softnet_lock.
 > | > |
 > | > | Here is a patch:
 > | > | http://www.netbsd.org/~ozaki-r/lltable_free-softnet_lock.diff
 > | > | Could you try it?
 > | > |
 > | >
 > | > Thanks, I am running with it now. Should we revert the KASSERT change
 > | > too?
 > |
 > | Well, yes and no. Because KASSERT was actually wrong; la_rt can be NULL
 > | at the point according to my investigation for PR 50184. So anyway we
 > | have to get rid of it.
 > |
 > | I made a patch for the bug:
 > | http://www.netbsd.org/~ozaki-r/fix-PR50184.take2.diff
 > | which was for PR 50184. So reverting your commit and applying the patch
 > | instead might be easy for me. Of course rebasing my patch on the HEAD
 > | makes no difference though.
 >
 > Why don't you commit both of them?

 I want to clarify they really fix the bug(s). I fail to reproduce the panic
 on my machines. Do my softnet_lock patch fix your issue?

   ozaki-r

From: christos@zoulas.com (Christos Zoulas)
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, 
	"kern-bug-people@netbsd.org" <kern-bug-people@netbsd.org>, 
	"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>, 
	"netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>, 
	"jdbaker@mylinuxisp.com" <jdbaker@mylinuxisp.com>
Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Fri, 4 Sep 2015 05:43:57 -0400

 On Sep 4,  6:30pm, ozaki-r@netbsd.org (Ryota Ozaki) wrote:
 -- Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes

 | > Why don't you commit both of them?
 | 
 | I want to clarify they really fix the bug(s). I fail to reproduce the panic
 | on my machines. Do my softnet_lock patch fix your issue?

 There are races, so they are not 100% reproducible... If it does not
 crash in the next couple of days...

 christos

From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Fri, 4 Sep 2015 08:40:44 -0500 (CDT)

 I meant to send this to gnats-bugs@, but forgot to reply-all and edit
 the To: list:

 On Thu, 3 Sep 2015, John D. Baker wrote:

 > On Thu, 3 Sep 2015, Ryota Ozaki wrote:
 > 
 > > Thank you for your investigation.
 > > [snip]
 > > How do you produce the panic? I don't reproduce on my amd64 machines yet.
 > 
 > I boot the system, collect a couple of transient ARP entries (ping some
 > other hosts) then just wait.
 > 
 > > Does the fix suppress the panic on SPARC machines?
 > 
 > My last build attempts failed due to ongoing changes to 'config' and
 > related infrastructure.  I am away from the systems until 8 September.
 > I will try again then.

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Christos Zoulas <christos@zoulas.com>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, 
	"kern-bug-people@netbsd.org" <kern-bug-people@netbsd.org>, 
	"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>, "netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>, 
	"jdbaker@mylinuxisp.com" <jdbaker@mylinuxisp.com>
Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Tue, 8 Sep 2015 17:49:21 +0900

 On Fri, Sep 4, 2015 at 6:43 PM, Christos Zoulas <christos@zoulas.com> wrote:
 > On Sep 4,  6:30pm, ozaki-r@netbsd.org (Ryota Ozaki) wrote:
 > -- Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
 >
 > | > Why don't you commit both of them?
 > |
 > | I want to clarify they really fix the bug(s). I fail to reproduce the panic
 > | on my machines. Do my softnet_lock patch fix your issue?
 >
 > There are races, so they are not 100% reproducible... If it does not
 > crash in the next couple of days...

 Sure... So how things are going now?

   ozaki-r

From: christos@zoulas.com (Christos Zoulas)
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, 
	"kern-bug-people@netbsd.org" <kern-bug-people@netbsd.org>, 
	"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>, 
	"netbsd-bugs@netbsd.org" <netbsd-bugs@netbsd.org>, 
	"jdbaker@mylinuxisp.com" <jdbaker@mylinuxisp.com>
Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Tue, 8 Sep 2015 07:27:32 -0400

 On Sep 8,  5:49pm, ozaki-r@netbsd.org (Ryota Ozaki) wrote:
 -- Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes

 | On Fri, Sep 4, 2015 at 6:43 PM, Christos Zoulas <christos@zoulas.com> wrote:
 | > On Sep 4,  6:30pm, ozaki-r@netbsd.org (Ryota Ozaki) wrote:
 | > -- Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
 | >
 | > | > Why don't you commit both of them?
 | > |
 | > | I want to clarify they really fix the bug(s). I fail to reproduce the panic
 | > | on my machines. Do my softnet_lock patch fix your issue?
 | >
 | > There are races, so they are not 100% reproducible... If it does not
 | > crash in the next couple of days...
 | 
 | Sure... So how things are going now?

 has not crashed... I guess commit everything... I think I will try to write a
 synthetic test to cause the problem.

 Best,

 christos

From: "Ryota Ozaki" <ozaki-r@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/50186 CVS commit: src/sys/netinet
Date: Wed, 9 Sep 2015 01:24:01 +0000

 Module Name:	src
 Committed By:	ozaki-r
 Date:		Wed Sep  9 01:24:01 UTC 2015

 Modified Files:
 	src/sys/netinet: if_arp.c

 Log Message:
 Remove wrong KASSERT in arptfree

 la_rt can be NULL because arptimer that calls arptfree doesn't always
 free llentry so llentry can remain with la_rt == NULL. So we instead
 check whether la_rt is NULL or not and do arptfree if not.

 This fixes PR kern/50184 (confirmed by martin@) and
 PR kern/50186 (maybe).


 To generate a diff of this commit:
 cvs rdiff -u -r1.179 -r1.180 src/sys/netinet/if_arp.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/50186 CVS commit: src/sys/netinet
Date: Wed, 9 Sep 2015 07:54:08 -0500 (CDT)

 On Wed, 9 Sep 2015, Ryota Ozaki wrote:

 >  Remove wrong KASSERT in arptfree
 >  
 >  la_rt can be NULL because arptimer that calls arptfree doesn't always
 >  free llentry so llentry can remain with la_rt == NULL. So we instead
 >  check whether la_rt is NULL or not and do arptfree if not.
 >  
 >  This fixes PR kern/50184 (confirmed by martin@) and
 >  PR kern/50186 (maybe).

 I've updated and built new release with these changes.  So far, no panic
 (testing on sparc right now) but instead the following message on the
 console:

   arptfree: llentry without rt

 I will check with amd64 soon.

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/50186 CVS commit: src/sys/netinet
Date: Wed, 9 Sep 2015 15:01:27 -0500 (CDT)

 On Wed, 9 Sep 2015, John D. Baker wrote:

 > I've updated and built new release with these changes.  So far, no panic
 > (testing on sparc right now) but instead the following message on the
 > console:
 > 
 >   arptfree: llentry without rt
 > 
 > I will check with amd64 soon.

 I've been running the amd64 system that previously panicked with the
 KASSERT and so far (several hours uptime now) it has not panicked, nor
 emitted the message shown above.

 The sparc system has also been up several hours and the message above
 has not been repeated.

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	"John D. Baker" <jdbaker@mylinuxisp.com>
Subject: Re: PR/50186 CVS commit: src/sys/netinet
Date: Tue, 15 Sep 2015 17:42:47 +0900

 Hi,

 On Thu, Sep 10, 2015 at 5:05 AM, John D. Baker <jdbaker@mylinuxisp.com> wrote:
 > The following reply was made to PR kern/50186; it has been noted by GNATS.
 >
 > From: "John D. Baker" <jdbaker@mylinuxisp.com>
 > To: gnats-bugs@NetBSD.org
 > Cc:
 > Subject: Re: PR/50186 CVS commit: src/sys/netinet
 > Date: Wed, 9 Sep 2015 15:01:27 -0500 (CDT)
 >
 >  On Wed, 9 Sep 2015, John D. Baker wrote:
 >
 >  > I've updated and built new release with these changes.  So far, no panic
 >  > (testing on sparc right now) but instead the following message on the
 >  > console:
 >  >
 >  >   arptfree: llentry without rt
 >  >
 >  > I will check with amd64 soon.
 >
 >  I've been running the amd64 system that previously panicked with the
 >  KASSERT and so far (several hours uptime now) it has not panicked, nor
 >  emitted the message shown above.
 >
 >  The sparc system has also been up several hours and the message above
 >  has not been repeated.

 Thank you for testing! I'll close the ticket.

   ozaki-r

State-Changed-From-To: open->closed
State-Changed-By: ozaki-r@NetBSD.org
State-Changed-When: Tue, 15 Sep 2015 08:49:04 +0000
State-Changed-Why:
Fix confirmed by the reporter.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.