NetBSD Problem Report #48208
From kardel@gateway.kardel.name Thu Sep 12 07:59:28 2013
Return-Path: <kardel@gateway.kardel.name>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 0F86772098
for <gnats-bugs@gnats.NetBSD.org>; Thu, 12 Sep 2013 07:59:28 +0000 (UTC)
Message-Id: <20130912075916.C0319570E9F@gateway.kardel.name>
Date: Thu, 12 Sep 2013 07:59:16 +0000 (UTC)
From: kardel@netbsd.org
Reply-To: kardel@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: panic: double fault
X-Send-Pr-Version: 3.95
>Number: 48208
>Category: kern
>Synopsis: fatal double fault in supervisor mode
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Sep 12 08:00:00 +0000 2013
>Closed-Date: Wed Mar 29 10:22:08 +0000 2017
>Last-Modified: Wed Mar 29 10:22:08 +0000 2017
>Originator: kardel@netbsd.org
>Release: NetBSD 6.1.1
>Organization:
>Environment:
Soekris 6501:
cpu0: Intel Pentium Pro, II or III (686-class), 1600.06 MHz, id 0x20661
cpu0: features 0xbfe9fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 0xbfe9fbff<PGE,MCA,CMOV,PAT,CFLUSH,DS,ACPI,MMX,FXSR,SSE>
cpu0: features 0xbfe9fbff<SSE2,SS,HTT,TM,SBF>
cpu0: features2 0x40e3bd<SSE3,DTES64,MONITOR,DS-CPL,VMX,EST,TM2,SSSE3,CX16>
cpu0: features2 0x40e3bd<xTPR,PDCM,MOVBE>
cpu0: features3 0x20100800<SYSCALL/SYSRET,XD,EM64T>
cpu0: features4 0x1<LAHF>
cpu0: "Genuine Intel(R) CPU @ 1.60GHz"
cpu0: I-cache 32KB 64B/line 8-way
cpu0: Initial APIC ID 0
cpu0: Cluster/Package ID 0
cpu0: SMT ID 0
cpu0: family 06 model 06 extfamily 00 extmodel 02 stepping 01
cpu0: UCode version: ?
System: NetBSD gator.pruy.de 6.1.1 NetBSD 6.1.1 (PYGW) #1: Wed Sep 4 20:40:15 CEST 2013 pruy@gator.pruy.de:/usr/obj/sys/arch/amd64/compile/PYGW amd64
Architecture: x86_64
Machine: amd64
>Description:
On a Soekris 6501 gateway we experience many double faults. both IPv4 and IPv6 are utilized.
Aoredumps are not available (hangs). Following debug output is available:
ohci5: 1 scheduling overruns
fatal page faultfatal double fault in supervisor mode
in supervisor mode
trap type 13 code 0 rip ffffffff801142da cs 8 rflags 10207 cr2 ffff80002519eda4 cpl 8 rsp caa184b8cf5c33a8
tkrerapne tl:yp edo u6b cleo dfe au0l tri ptr affp,ff cffofdfe8=007
9Stopped in pid 0.20 (system) at netbsd:Xsoftintr+0x4a: call netbsd:softint_d
ispatch
db{1}> trace
Xsoftintr() at netbsd:Xsoftintr+0x4a
--- interrupt ---
0:
db{1}> show registers
ds 0
es 0
fs 0
gs 0
rdi fffffe807f4518e0
rsi 2
rbp fffffe8004baebe0
rbx 0
rdx fffffe8004bbad80
rcx fffffe8004baed80
rax fffffe807f45ce48
r8 0
r9 0
r10 0
r11 0
r12 fffffe807f453040
r13 ffffffff801144a0 Xdoreti+0x10
r14 0
r15 fffffe807f4518e0
rip ffffffff801142da Xsoftintr+0x4a
cs 8
rflags 10207
rsp caa184b8cf5c33a8
ss 10
netbsd:Xsoftintr+0x4a: call netbsd:softint_dispatch
db{1}> callout
hardclock_ticks now: 8542028
ticks wheel arg func
-1 -1/-256 0 rnd_timeout
0 -1/-256 0 pffasttimo
0 -1/-256 fffffe80061566a0 sleepq_timeout
6 0/82 0 rnd_skew
15 0/91 ffff800003b27000 wm_tick
15 0/91 ffff800003b25000 wm_tick
15 0/91 0 pfslowtimo
15 0/91 0 sched_balance
20 0/96 fffffe807e5101a0 sleepq_timeout
5 0/102 fffffe8008ea22c0 sleepq_timeout
12 0/109 ffffffff80ed0340 sleepq_timeout
36 0/112 ffff800003b2b000 wm_tick
36 0/112 0 if_slowtimo
36 0/112 0 nd6_timer
36 0/112 0 rt_timer_timer
36 0/112 0 key_timehandler
45 0/121 fffffe807fd09040 sleepq_timeout
33 0/130 fffffe807fb038a0 sleepq_timeout
42 0/139 fffffe807e5105c0 sleepq_timeout
67 0/143 fffffe8006feda50 realtimerexpire
77 0/174 fffffe807fd09460 sleepq_timeout
397 1/344 fffffe8006fed528 realtimerexpire
437 1/345 fffffe80053a7a60 sleepq_timeout
900 1/346 fffffe8005e64a80 sleepq_timeout
981 1/347 0 vmem_rehash_all_kick
1265 1/348 fffffe8008ea2b00 sleepq_timeout
1808 1/350 fffffe80057e1300 sleepq_timeout
1816 1/350 fffffe8006fede70 realtimerexpire
2067 1/351 fffffe807efa6960 sleepq_timeout
2084 1/351 fffffe807efa6540 sleepq_timeout
1982 1/351 fffffe807f43a208 sme_events_check
2053 1/351 fffffe807ef87940 sleepq_timeout
2061 1/351 fffffe807ef3a5a0 sleepq_timeout
2061 1/351 fffffe807ef87100 sleepq_timeout
2065 1/351 fffffe807f3d50e0 sleepq_timeout
2076 1/351 fffffe807ef87520 sleepq_timeout
2079 1/351 fffffe807efa6120 sleepq_timeout
2950 1/354 fffffe800c4bb420 sleepq_timeout
3073 1/355 fffffe800cb7e860 sleepq_timeout
5808 1/366 fffffe80052b1a40 sleepq_timeout
5901 1/366 fffffe8006fedd68 realtimerexpire
8460 1/376 0 arptimer
10405 1/384 fffffe800c4bb000 sleepq_timeout
13705 1/396 fffffe8006fed738 realtimerexpire
15496 1/403 fffffe8005c38260 sleepq_timeout
19695 1/420 fffffe8006fedc60 realtimerexpire
29108 1/457 fffffe8006fed948 realtimerexpire
8253982 2/512 fffffe800fdaa850 nd6_llinfo_timer
8579187 2/517 fffffe80060a1710 nd6_llinfo_timer
8632073 2/518 fffffe80060a1050 nd6_llinfo_timer
8639214 2/518 fffffe80060a17d0 nd6_llinfo_timer
91082 2/643 0 in6_tmpaddrtimer
105820 2/643 fffffe8009959018 tcp_timer_keep
97982 2/643 0 nd6_slowtimo
719265 2/643 fffffe80082bc008 tcp_timer_keep
86988 2/643 fffffe8005d216c0 sleepq_timeout
141896 2/644 fffffe8005e64660 sleepq_timeout
719952 2/651 fffffe8009959648 tcp_timer_keep
595808 2/651 fffffe8006fed840 realtimerexpire
6822272 2/746 fffffe800fdaacd0 nd6_llinfo_timer
7571367 2/757 fffffe800fdaa310 nd6_llinfo_timer
db{1}> show mbuf
MBUF 0xffffffff801142da
data=0x41000001e8253c87, len=1142886246, type=40, flags=0xff650000<LINK4,LINK6
,EXT_CLUSTER,EXT_PAGES,EXT_ROMAP,EXT_RW>
owner=0xe5ff4128c483485f, next=0x8b49fa00403e5be8, nextpkt=0x4c6520618b48304f
leadingspace=1746139509, trailingspace=1405941997, readonly=0
[.....reboot triggered........]
after reboot:
Adding interface aliases:.
Starting dhcpcd.
dhcpcd[143]: version 5.6.2 starting
dhcpcd[143]: wm1: carrier acquired
dhcpcd[143]: wm1: carrier lost
dhcpcd[143]: wm1: waiting for carrier
dhcpcd[143]: wm1: carrier acquired
dhcpcd[143]: wm1: sending IPv6 Router Solicitation
dhcpcd[143]: wm1: rebinding lease of 95.222.200.208
dhcpcd[143]: wm1: acknowledged 95.222.200.208 from 10.145.0.1
dhcpcd[143]: wm1: checking for 95.222.200.208
dhcpcd[143]: wm1: sending IPv6 Router Solicitation
dhcpcd[143]: wm1: leased 95.222.200.208 for 2623 seconds
dhcpcd[143]: forked to background, child pid 414
Enabling pf firewall.
Starting route6d.
Starting routed.
Building databases: dev, utmp, utmpx.
Starting syslogd.
Starting named.
panic: kernel diagnostic assertion "(!cpu_intr_p() && !cpu_softintr_p())" failed: file "/usr/src/sys/kern/subr_kmem.c", line 306 kmem(9) should not be used from the interrupt context
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80269a95 cs 8 rflags 246 cr2 7f7ff7b0a000 cpl 8 rsp fffffe8004b765c0
Stopped in pid 0.3 (system) at netbsd:breakpoint+0x5: leave
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x1f2
kern_assert() at netbsd:kern_assert+0x48
kmem_alloc() at netbsd:kmem_alloc+0x3f
cprng_strong() at netbsd:cprng_strong+0x29e
tcp_rndiss_init() at netbsd:tcp_rndiss_init+0x1e
tcp_rndiss_next() at netbsd:tcp_rndiss_next+0x2c
pf_test_rule() at netbsd:pf_test_rule+0x1f1a
pf_test() at netbsd:pf_test+0x8bf
pfil4_wrapper() at netbsd:pfil4_wrapper+0x47
pfil_run_hooks() at netbsd:pfil_run_hooks+0x9d
ip_output() at netbsd:ip_output+0x435
ip_forward() at netbsd:ip_forward+0x130
ip_input() at netbsd:ip_input+0x83d
ipintr() at netbsd:ipintr+0x107
softint_dispatch() at netbsd:softint_dispatch+0xd9
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe8004b76d70
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
0:
db{0}>
>How-To-Repeat:
Run NetBSD 6.1.1 on a Soekris 6501 with IPv4, IPv6 and pf as gateway
to a provider.
Watch NetBSD 6.1.1 crash 1-2 times per day.
>Fix:
not known
>Release-Note:
>Audit-Trail:
From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/48208: panic: double fault
Date: Thu, 21 Nov 2013 08:27:46 +0100
Further analysis shows the the is caused by unfortunate rng
initialization due to
the pf configuration
... modulate state
RNG setup needs to be revisited for NetBSD 6 (or pulled up from -current)
Tests show that:
6.1.1 with modulate state -> crashes
6.1.1 without modulate state -> survives
-current 6.99.26 with modulate state -> survives
State-Changed-From-To: open->closed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Wed, 29 Mar 2017 10:22:08 +0000
State-Changed-Why:
workaround known
closed by submitter (me)
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.