NetBSD Problem Report #52036

From www@NetBSD.org  Mon Mar  6 14:19:40 2017
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id B8EAB7A237
	for <gnats-bugs@gnats.NetBSD.org>; Mon,  6 Mar 2017 14:19:40 +0000 (UTC)
Message-Id: <20170306141939.AEFDC7A292@mollari.NetBSD.org>
Date: Mon,  6 Mar 2017 14:19:39 +0000 (UTC)
From: 6bone@6bone.informatik.uni-leipzig.de
Reply-To: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Subject: netbsd-7 kernel crash at high network load
X-Send-Pr-Version: www-1.0

>Number:         52036
>Category:       kern
>Synopsis:       netbsd-7 kernel crash at high network load
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 06 14:20:00 +0000 2017
>Last-Modified:  Wed Mar 15 09:20:01 +0000 2017
>Originator:     Uwe Toenjes
>Release:        NetBSD 7.0
>Organization:
University of Leipzig
>Environment:
NetBSD gate.ipv6.uni-leipzig.de 7.0_STABLE NetBSD 7.0_STABLE (MYCONF7.gdb) #0: Sat Nov 19 11:38:21 CET 2016  root@gate.ipv6.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF7.gdb amd64

>Description:
the netbsd server crashs at high network load

debug output:


#0  0xffffffff8068beff in cpu_reboot (howto=howto@entry=260,
    bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:671
#1  0xffffffff808b9734 in vpanic (
    fmt=0xffffffff80d03098 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xfffffe813a5f7b70) at /usr/src/sys/kern/subr_prf.c:340
#2  0xffffffff80a562e3 in kern_assert (
    fmt=fmt@entry=0xffffffff80d03098 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") at /usr/src/sys/lib/libkern/kern_assert.c:51
#3  0xffffffff8096a67b in m_freem (m=0xfffffe82b75c4400)
    at /usr/src/sys/kern/uipc_mbuf.c:652
#4  0xffffffff806d6e7e in nd6_output (ifp=0xfffffe819823d010,
    origifp=origifp@entry=0xfffffe819823d010, m0=0xfffffe8745be7c00,
    dst=dst@entry=0xfffffe88264b0948, rt0=<optimized out>)
    at /usr/src/sys/netinet6/nd6.c:2305
#5  0xffffffff8056293b in ip6_output (m0=m0@entry=0xfffffe813bd1ec00,
    opt=opt@entry=0x0, ro=<optimized out>, ro@entry=0x0, flags=flags@entry=4,
    im6o=im6o@entry=0x0, so=so@entry=0x0, ifpp=ifpp@entry=0xfffffe813a5f7e30)
    at /usr/src/sys/netinet6/ip6_output.c:778
#6  0xffffffff803c1b89 in icmp6_reflect (m=m@entry=0xfffffe813bd1ec00,
    off=off@entry=40) at /usr/src/sys/netinet6/icmp6.c:2100
#7  0xffffffff803c2191 in icmp6_error (m=0xfffffe813bd1ec00,
    type=type@entry=1, code=code@entry=3, param=param@entry=0)
    at /usr/src/sys/netinet6/icmp6.c:431
#8  0xffffffff803c234f in icmp6_error2 (m=<optimized out>, type=type@entry=1,
    code=code@entry=3, param=param@entry=0, ifp=<optimized out>)
    at /usr/src/sys/netinet6/icmp6.c:287
#9  0xffffffff806d551d in nd6_llinfo_timer (arg=0xfffffe8816383550)
    at /usr/src/sys/netinet6/nd6.c:484
#10 0xffffffff80627a10 in callout_softclock (v=<optimized out>)
    at /usr/src/sys/kern/kern_timeout.c:739
#11 0xffffffff8061c758 in softint_execute (l=<optimized out>, s=2,
    si=0xffff80023b4e70c0) at /usr/src/sys/kern/kern_softint.c:589
#12 softint_dispatch (pinned=<optimized out>, s=2)
    at /usr/src/sys/kern/kern_softint.c:871
#13 0xffffffff8011412f in Xsoftintr ()


>How-To-Repeat:
The crash occurs randomly. At the time of the crash, the network load was always very high.
>Fix:

>Audit-Trail:
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/52036: netbsd-7 kernel crash at high network load
Date: Wed, 15 Mar 2017 10:18:35 +0100 (CET)

 In the last crash, I was able to collect some information.

 The switch has sent almost one million packets per second to the router:

    Load-Interval #1: 30 seconds
      30 seconds input rate 17345872 bits/sec, 21620 packets/sec
      30 seconds output rate 668750232 bits/sec, 972039 packets/sec

 CPU0 from the router was 100% used interrupts.

 load averages:  2.37,  1.92,  1.23;               up 0+00:12:32 
 16:19:21
 24 processes: 22 sleeping, 2 on CPU
 CPU00 states:  0.0% user,  0.0% nice,  0.0% system,  100% interrupt,  0.0% idle
 CPU01 states:  0.2% user,  0.0% nice, 12.8% system,  0.0% interrupt, 87.0% idle
 CPU02 states:  0.0% user,  0.0% nice,  5.0% system,  0.0% interrupt, 95.0% idle
 CPU03 states:  1.6% user,  0.0% nice, 12.8% system,  0.0% interrupt, 85.6% idle
 CPU04 states:  0.0% user,  0.0% nice,  3.0% system,  0.0% interrupt, 97.0% idle
 CPU05 states:  0.0% user,  0.0% nice,  6.0% system,  0.0% interrupt, 94.0% idle
 CPU06 states:  0.0% user,  0.0% nice,  4.6% system,  0.0% interrupt, 95.4% idle
 CPU07 states:  0.2% user,  0.0% nice,  9.4% system,  0.0% interrupt, 90.4% idle
 CPU08 states:  0.4% user,  0.0% nice,  6.8% system,  0.0% interrupt, 92.8% idle
 CPU09 states:  0.0% user,  0.0% nice,  6.4% system,  0.0% interrupt, 93.6% idle
 CPU10 states:  0.0% user,  0.0% nice,  7.8% system,  0.0% interrupt, 92.2% idle
 CPU11 states:  0.6% user,  0.0% nice,  8.2% system,  0.0% interrupt, 91.2% idle
 CPU12 states:  0.2% user,  0.0% nice,  4.6% system,  0.0% interrupt, 95.2% idle
 CPU13 states:  0.0% user,  0.0% nice, 10.6% system,  0.0% interrupt, 89.4% idle
 CPU14 states:  1.6% user,  0.0% nice, 30.7% system,  0.0% interrupt, 67.7% idle
 CPU15 states:  0.4% user,  0.0% nice,  9.4% system,  0.0% interrupt, 90.2% idle
 Memory: 1313M Act, 608K Inact, 6464K Wired, 19M Exec, 1181M File, 22G Free
 Swap: 49G Total, 49G Free

 In the time with high packet rates, dmesg delivered the following output:

 ...
 nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2015:211:aff:fef2:xxxx, if=vlan964
 nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2008:e844:bb94:5f6a:xxxx, if=vlan55
 nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2003:14e3:ff32:3f05:xxxx, if=vlan57
 nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2003:7c1c:7bb9:7972:xxxx, if=vlan57
 nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2003:14e3:ff32:3f05:xxxx, if=vlan57
 nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2003:14e3:ff32:3f05:xxxx, if=vlan57
 nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2003:14e3:ff32:3f05:xxxx, if=vlan57
 nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2003:7c1c:7bb9:7972:xxxx, if=vlan57
 nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2002:f01b:f17:a865:xxxx, if=vlan8
 nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2003:14e3:ff32:3f05:xxxx, if=vlan57
 ...

 (xxxx removed for safety reasons)

 There may be two problems. I am using an Intel 10GE card, ixg driver. All 
 packages come via interface ixg0 into the router. Should not the interrupt 
 throttling protect the CPU? Can it be that the interrupt throttling of the 
 ixg driver does not work properly?

 The second problem is ipfilter. If ipfilter is switched off, I could not 
 observe a crash. It can be that the cause of the crash is in ipfilter.

 Possibly you can attack netbsd router with this configuration over the 
 network.


 Regards
 Uwe

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.