NetBSD Problem Report #48208

From kardel@gateway.kardel.name  Thu Sep 12 07:59:28 2013
Return-Path: <kardel@gateway.kardel.name>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 0F86772098
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 12 Sep 2013 07:59:28 +0000 (UTC)
Message-Id: <20130912075916.C0319570E9F@gateway.kardel.name>
Date: Thu, 12 Sep 2013 07:59:16 +0000 (UTC)
From: kardel@netbsd.org
Reply-To: kardel@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: panic: double fault
X-Send-Pr-Version: 3.95

>Number:         48208
>Category:       kern
>Synopsis:       fatal double fault in supervisor mode
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Sep 12 08:00:00 +0000 2013
>Closed-Date:    Wed Mar 29 10:22:08 +0000 2017
>Last-Modified:  Wed Mar 29 10:22:08 +0000 2017
>Originator:     kardel@netbsd.org
>Release:        NetBSD 6.1.1
>Organization:

>Environment:
Soekris 6501:
cpu0: Intel Pentium Pro, II or III (686-class), 1600.06 MHz, id 0x20661
cpu0: features 0xbfe9fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 0xbfe9fbff<PGE,MCA,CMOV,PAT,CFLUSH,DS,ACPI,MMX,FXSR,SSE>
cpu0: features  0xbfe9fbff<SSE2,SS,HTT,TM,SBF>
cpu0: features2 0x40e3bd<SSE3,DTES64,MONITOR,DS-CPL,VMX,EST,TM2,SSSE3,CX16>
cpu0: features2 0x40e3bd<xTPR,PDCM,MOVBE>
cpu0: features3 0x20100800<SYSCALL/SYSRET,XD,EM64T>
cpu0: features4 0x1<LAHF>
cpu0: "Genuine Intel(R) CPU        @ 1.60GHz"
cpu0: I-cache 32KB 64B/line 8-way
cpu0: Initial APIC ID 0
cpu0: Cluster/Package ID 0
cpu0: SMT ID 0
cpu0: family 06 model 06 extfamily 00 extmodel 02 stepping 01
cpu0: UCode version: ?
System: NetBSD gator.pruy.de 6.1.1 NetBSD 6.1.1 (PYGW) #1: Wed Sep  4 20:40:15 CEST 2013 pruy@gator.pruy.de:/usr/obj/sys/arch/amd64/compile/PYGW amd64
Architecture: x86_64
Machine: amd64
>Description:
	On a Soekris 6501 gateway we experience many double faults. both IPv4 and IPv6 are utilized.
	Aoredumps are not available (hangs). Following debug output is available:

ohci5: 1 scheduling overruns
fatal page faultfatal double fault in supervisor mode
 in supervisor mode
trap type 13 code 0 rip ffffffff801142da cs 8 rflags 10207 cr2  ffff80002519eda4 cpl 8 rsp caa184b8cf5c33a8
tkrerapne tl:yp edo u6b cleo dfe au0l tri ptr affp,ff cffofdfe8=007
9Stopped in pid 0.20 (system) at netbsd:Xsoftintr+0x4a:  call    netbsd:softint_d
ispatch
db{1}> trace
Xsoftintr() at netbsd:Xsoftintr+0x4a
--- interrupt ---
0:
db{1}> show registers
ds          0
es          0
fs          0
gs          0
rdi         fffffe807f4518e0
rsi         2
rbp         fffffe8004baebe0
rbx         0
rdx         fffffe8004bbad80
rcx         fffffe8004baed80
rax         fffffe807f45ce48
r8          0
r9          0
r10         0
r11         0
r12         fffffe807f453040
r13         ffffffff801144a0    Xdoreti+0x10
r14         0
r15         fffffe807f4518e0
rip         ffffffff801142da    Xsoftintr+0x4a
cs          8
rflags      10207
rsp         caa184b8cf5c33a8
ss          10
netbsd:Xsoftintr+0x4a:  call    netbsd:softint_dispatch
db{1}> callout
hardclock_ticks now: 8542028
    ticks  wheel               arg  func
       -1 -1/-256                0  rnd_timeout
        0 -1/-256                0  pffasttimo
        0 -1/-256 fffffe80061566a0  sleepq_timeout
        6  0/82                  0  rnd_skew
       15  0/91   ffff800003b27000  wm_tick
       15  0/91   ffff800003b25000  wm_tick
       15  0/91                  0  pfslowtimo
       15  0/91                  0  sched_balance
       20  0/96   fffffe807e5101a0  sleepq_timeout
        5  0/102  fffffe8008ea22c0  sleepq_timeout
       12  0/109  ffffffff80ed0340  sleepq_timeout
       36  0/112  ffff800003b2b000  wm_tick
       36  0/112                 0  if_slowtimo
       36  0/112                 0  nd6_timer
       36  0/112                 0  rt_timer_timer
       36  0/112                 0  key_timehandler
       45  0/121  fffffe807fd09040  sleepq_timeout
       33  0/130  fffffe807fb038a0  sleepq_timeout
       42  0/139  fffffe807e5105c0  sleepq_timeout
       67  0/143  fffffe8006feda50  realtimerexpire
       77  0/174  fffffe807fd09460  sleepq_timeout
      397  1/344  fffffe8006fed528  realtimerexpire
      437  1/345  fffffe80053a7a60  sleepq_timeout
      900  1/346  fffffe8005e64a80  sleepq_timeout
      981  1/347                 0  vmem_rehash_all_kick
     1265  1/348  fffffe8008ea2b00  sleepq_timeout
     1808  1/350  fffffe80057e1300  sleepq_timeout
     1816  1/350  fffffe8006fede70  realtimerexpire
     2067  1/351  fffffe807efa6960  sleepq_timeout
     2084  1/351  fffffe807efa6540  sleepq_timeout
     1982  1/351  fffffe807f43a208  sme_events_check
     2053  1/351  fffffe807ef87940  sleepq_timeout
     2061  1/351  fffffe807ef3a5a0  sleepq_timeout
     2061  1/351  fffffe807ef87100  sleepq_timeout
     2065  1/351  fffffe807f3d50e0  sleepq_timeout
     2076  1/351  fffffe807ef87520  sleepq_timeout
     2079  1/351  fffffe807efa6120  sleepq_timeout
     2950  1/354  fffffe800c4bb420  sleepq_timeout
     3073  1/355  fffffe800cb7e860  sleepq_timeout
     5808  1/366  fffffe80052b1a40  sleepq_timeout
     5901  1/366  fffffe8006fedd68  realtimerexpire
     8460  1/376                 0  arptimer
    10405  1/384  fffffe800c4bb000  sleepq_timeout
    13705  1/396  fffffe8006fed738  realtimerexpire
    15496  1/403  fffffe8005c38260  sleepq_timeout
    19695  1/420  fffffe8006fedc60  realtimerexpire
    29108  1/457  fffffe8006fed948  realtimerexpire
  8253982  2/512  fffffe800fdaa850  nd6_llinfo_timer
  8579187  2/517  fffffe80060a1710  nd6_llinfo_timer
  8632073  2/518  fffffe80060a1050  nd6_llinfo_timer
  8639214  2/518  fffffe80060a17d0  nd6_llinfo_timer
    91082  2/643                 0  in6_tmpaddrtimer
   105820  2/643  fffffe8009959018  tcp_timer_keep
    97982  2/643                 0  nd6_slowtimo
   719265  2/643  fffffe80082bc008  tcp_timer_keep
    86988  2/643  fffffe8005d216c0  sleepq_timeout
   141896  2/644  fffffe8005e64660  sleepq_timeout
   719952  2/651  fffffe8009959648  tcp_timer_keep
   595808  2/651  fffffe8006fed840  realtimerexpire
  6822272  2/746  fffffe800fdaacd0  nd6_llinfo_timer
  7571367  2/757  fffffe800fdaa310  nd6_llinfo_timer
db{1}> show mbuf
MBUF 0xffffffff801142da
  data=0x41000001e8253c87, len=1142886246, type=40, flags=0xff650000<LINK4,LINK6
,EXT_CLUSTER,EXT_PAGES,EXT_ROMAP,EXT_RW>
  owner=0xe5ff4128c483485f, next=0x8b49fa00403e5be8, nextpkt=0x4c6520618b48304f
  leadingspace=1746139509, trailingspace=1405941997, readonly=0

[.....reboot triggered........]

after reboot:

Adding interface aliases:.
Starting dhcpcd.
dhcpcd[143]: version 5.6.2 starting
dhcpcd[143]: wm1: carrier acquired
dhcpcd[143]: wm1: carrier lost
dhcpcd[143]: wm1: waiting for carrier
dhcpcd[143]: wm1: carrier acquired
dhcpcd[143]: wm1: sending IPv6 Router Solicitation
dhcpcd[143]: wm1: rebinding lease of 95.222.200.208
dhcpcd[143]: wm1: acknowledged 95.222.200.208 from 10.145.0.1
dhcpcd[143]: wm1: checking for 95.222.200.208
dhcpcd[143]: wm1: sending IPv6 Router Solicitation
dhcpcd[143]: wm1: leased 95.222.200.208 for 2623 seconds
dhcpcd[143]: forked to background, child pid 414
Enabling pf firewall.
Starting route6d.
Starting routed.
Building databases: dev, utmp, utmpx.
Starting syslogd.
Starting named.
panic: kernel diagnostic assertion "(!cpu_intr_p() && !cpu_softintr_p())" failed: file "/usr/src/sys/kern/subr_kmem.c", line 306 kmem(9) should not be used from the interrupt context
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80269a95 cs 8 rflags 246 cr2  7f7ff7b0a000 cpl 8 rsp fffffe8004b765c0
Stopped in pid 0.3 (system) at  netbsd:breakpoint+0x5:  leave
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x1f2
kern_assert() at netbsd:kern_assert+0x48
kmem_alloc() at netbsd:kmem_alloc+0x3f
cprng_strong() at netbsd:cprng_strong+0x29e
tcp_rndiss_init() at netbsd:tcp_rndiss_init+0x1e
tcp_rndiss_next() at netbsd:tcp_rndiss_next+0x2c
pf_test_rule() at netbsd:pf_test_rule+0x1f1a
pf_test() at netbsd:pf_test+0x8bf
pfil4_wrapper() at netbsd:pfil4_wrapper+0x47
pfil_run_hooks() at netbsd:pfil_run_hooks+0x9d
ip_output() at netbsd:ip_output+0x435
ip_forward() at netbsd:ip_forward+0x130
ip_input() at netbsd:ip_input+0x83d
ipintr() at netbsd:ipintr+0x107
softint_dispatch() at netbsd:softint_dispatch+0xd9
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe8004b76d70
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
0:
db{0}> 

>How-To-Repeat:
	Run NetBSD 6.1.1 on a Soekris 6501 with IPv4, IPv6 and pf as gateway
	to a provider.
	Watch NetBSD 6.1.1 crash 1-2 times per day.
>Fix:
	not known

>Release-Note:

>Audit-Trail:
From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48208: panic: double fault
Date: Thu, 21 Nov 2013 08:27:46 +0100

 Further analysis shows the the is caused by unfortunate rng 
 initialization due to
 the pf configuration
      ... modulate state

 RNG setup needs to be revisited for NetBSD 6 (or pulled up from -current)

 Tests show that:
 6.1.1 with modulate state -> crashes
 6.1.1 without modulate state -> survives
 -current 6.99.26 with modulate state -> survives


State-Changed-From-To: open->closed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Wed, 29 Mar 2017 10:22:08 +0000
State-Changed-Why:
workaround known
closed by submitter (me)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.