NetBSD Problem Report #52036
From www@NetBSD.org Mon Mar 6 14:19:40 2017
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id B8EAB7A237
for <gnats-bugs@gnats.NetBSD.org>; Mon, 6 Mar 2017 14:19:40 +0000 (UTC)
Message-Id: <20170306141939.AEFDC7A292@mollari.NetBSD.org>
Date: Mon, 6 Mar 2017 14:19:39 +0000 (UTC)
From: 6bone@6bone.informatik.uni-leipzig.de
Reply-To: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Subject: netbsd-7 kernel crash at high network load
X-Send-Pr-Version: www-1.0
>Number: 52036
>Category: kern
>Synopsis: netbsd-7 kernel crash at high network load
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Mar 06 14:20:00 +0000 2017
>Last-Modified: Wed Mar 15 09:20:01 +0000 2017
>Originator: Uwe Toenjes
>Release: NetBSD 7.0
>Organization:
University of Leipzig
>Environment:
NetBSD gate.ipv6.uni-leipzig.de 7.0_STABLE NetBSD 7.0_STABLE (MYCONF7.gdb) #0: Sat Nov 19 11:38:21 CET 2016 root@gate.ipv6.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF7.gdb amd64
>Description:
the netbsd server crashs at high network load
debug output:
#0 0xffffffff8068beff in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:671
#1 0xffffffff808b9734 in vpanic (
fmt=0xffffffff80d03098 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xfffffe813a5f7b70) at /usr/src/sys/kern/subr_prf.c:340
#2 0xffffffff80a562e3 in kern_assert (
fmt=fmt@entry=0xffffffff80d03098 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") at /usr/src/sys/lib/libkern/kern_assert.c:51
#3 0xffffffff8096a67b in m_freem (m=0xfffffe82b75c4400)
at /usr/src/sys/kern/uipc_mbuf.c:652
#4 0xffffffff806d6e7e in nd6_output (ifp=0xfffffe819823d010,
origifp=origifp@entry=0xfffffe819823d010, m0=0xfffffe8745be7c00,
dst=dst@entry=0xfffffe88264b0948, rt0=<optimized out>)
at /usr/src/sys/netinet6/nd6.c:2305
#5 0xffffffff8056293b in ip6_output (m0=m0@entry=0xfffffe813bd1ec00,
opt=opt@entry=0x0, ro=<optimized out>, ro@entry=0x0, flags=flags@entry=4,
im6o=im6o@entry=0x0, so=so@entry=0x0, ifpp=ifpp@entry=0xfffffe813a5f7e30)
at /usr/src/sys/netinet6/ip6_output.c:778
#6 0xffffffff803c1b89 in icmp6_reflect (m=m@entry=0xfffffe813bd1ec00,
off=off@entry=40) at /usr/src/sys/netinet6/icmp6.c:2100
#7 0xffffffff803c2191 in icmp6_error (m=0xfffffe813bd1ec00,
type=type@entry=1, code=code@entry=3, param=param@entry=0)
at /usr/src/sys/netinet6/icmp6.c:431
#8 0xffffffff803c234f in icmp6_error2 (m=<optimized out>, type=type@entry=1,
code=code@entry=3, param=param@entry=0, ifp=<optimized out>)
at /usr/src/sys/netinet6/icmp6.c:287
#9 0xffffffff806d551d in nd6_llinfo_timer (arg=0xfffffe8816383550)
at /usr/src/sys/netinet6/nd6.c:484
#10 0xffffffff80627a10 in callout_softclock (v=<optimized out>)
at /usr/src/sys/kern/kern_timeout.c:739
#11 0xffffffff8061c758 in softint_execute (l=<optimized out>, s=2,
si=0xffff80023b4e70c0) at /usr/src/sys/kern/kern_softint.c:589
#12 softint_dispatch (pinned=<optimized out>, s=2)
at /usr/src/sys/kern/kern_softint.c:871
#13 0xffffffff8011412f in Xsoftintr ()
>How-To-Repeat:
The crash occurs randomly. At the time of the crash, the network load was always very high.
>Fix:
>Audit-Trail:
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/52036: netbsd-7 kernel crash at high network load
Date: Wed, 15 Mar 2017 10:18:35 +0100 (CET)
In the last crash, I was able to collect some information.
The switch has sent almost one million packets per second to the router:
Load-Interval #1: 30 seconds
30 seconds input rate 17345872 bits/sec, 21620 packets/sec
30 seconds output rate 668750232 bits/sec, 972039 packets/sec
CPU0 from the router was 100% used interrupts.
load averages: 2.37, 1.92, 1.23; up 0+00:12:32
16:19:21
24 processes: 22 sleeping, 2 on CPU
CPU00 states: 0.0% user, 0.0% nice, 0.0% system, 100% interrupt, 0.0% idle
CPU01 states: 0.2% user, 0.0% nice, 12.8% system, 0.0% interrupt, 87.0% idle
CPU02 states: 0.0% user, 0.0% nice, 5.0% system, 0.0% interrupt, 95.0% idle
CPU03 states: 1.6% user, 0.0% nice, 12.8% system, 0.0% interrupt, 85.6% idle
CPU04 states: 0.0% user, 0.0% nice, 3.0% system, 0.0% interrupt, 97.0% idle
CPU05 states: 0.0% user, 0.0% nice, 6.0% system, 0.0% interrupt, 94.0% idle
CPU06 states: 0.0% user, 0.0% nice, 4.6% system, 0.0% interrupt, 95.4% idle
CPU07 states: 0.2% user, 0.0% nice, 9.4% system, 0.0% interrupt, 90.4% idle
CPU08 states: 0.4% user, 0.0% nice, 6.8% system, 0.0% interrupt, 92.8% idle
CPU09 states: 0.0% user, 0.0% nice, 6.4% system, 0.0% interrupt, 93.6% idle
CPU10 states: 0.0% user, 0.0% nice, 7.8% system, 0.0% interrupt, 92.2% idle
CPU11 states: 0.6% user, 0.0% nice, 8.2% system, 0.0% interrupt, 91.2% idle
CPU12 states: 0.2% user, 0.0% nice, 4.6% system, 0.0% interrupt, 95.2% idle
CPU13 states: 0.0% user, 0.0% nice, 10.6% system, 0.0% interrupt, 89.4% idle
CPU14 states: 1.6% user, 0.0% nice, 30.7% system, 0.0% interrupt, 67.7% idle
CPU15 states: 0.4% user, 0.0% nice, 9.4% system, 0.0% interrupt, 90.2% idle
Memory: 1313M Act, 608K Inact, 6464K Wired, 19M Exec, 1181M File, 22G Free
Swap: 49G Total, 49G Free
In the time with high packet rates, dmesg delivered the following output:
...
nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2015:211:aff:fef2:xxxx, if=vlan964
nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2008:e844:bb94:5f6a:xxxx, if=vlan55
nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2003:14e3:ff32:3f05:xxxx, if=vlan57
nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2003:7c1c:7bb9:7972:xxxx, if=vlan57
nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2003:14e3:ff32:3f05:xxxx, if=vlan57
nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2003:14e3:ff32:3f05:xxxx, if=vlan57
nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2003:14e3:ff32:3f05:xxxx, if=vlan57
nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2003:7c1c:7bb9:7972:xxxx, if=vlan57
nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2002:f01b:f17:a865:xxxx, if=vlan8
nd6_storelladdr: sdl_alen == 0, dst=2001:638:902:2003:14e3:ff32:3f05:xxxx, if=vlan57
...
(xxxx removed for safety reasons)
There may be two problems. I am using an Intel 10GE card, ixg driver. All
packages come via interface ixg0 into the router. Should not the interrupt
throttling protect the CPU? Can it be that the interrupt throttling of the
ixg driver does not work properly?
The second problem is ipfilter. If ipfilter is switched off, I could not
observe a crash. It can be that the cause of the crash is in ipfilter.
Possibly you can attack netbsd router with this configuration over the
network.
Regards
Uwe
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.