NetBSD Problem Report #49488
From www@NetBSD.org Fri Dec 19 14:09:00 2014
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 4530BA65B6
for <gnats-bugs@gnats.NetBSD.org>; Fri, 19 Dec 2014 14:09:00 +0000 (UTC)
Message-Id: <20141219140858.D8F2BA65BB@mollari.NetBSD.org>
Date: Fri, 19 Dec 2014 14:08:58 +0000 (UTC)
From: tnn-netbsd@nygren.pp.se
Reply-To: tnn-netbsd@nygren.pp.se
To: gnats-bugs@NetBSD.org
Subject: npf: assertion "ret == con" failed at npf_conn.c:747
X-Send-Pr-Version: www-1.0
>Number: 49488
>Category: kern
>Synopsis: npf: assertion "ret == con" failed at npf_conn.c:747
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: rmind
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Dec 19 14:10:01 +0000 2014
>Closed-Date: Thu Feb 05 22:07:05 +0000 2015
>Last-Modified: Thu Feb 05 22:07:05 +0000 2015
>Originator: Tobias Nygren
>Release: 7.99.3
>Organization:
>Environment:
>Description:
After updating from -current dated ~ Jun 1024 to -current from today my SOHO NAT box panics several times per day with the following message:
kernel diagnostic assertion "ret == con" failed at npf_conn.c:747
>How-To-Repeat:
Not sure. But seems to be NAT related. (Also it seems stateful firewalling on the internal interface doesn't work anymore when using NAT rules on the external interface, this may or may not be related).
Here is the ruleset:
$ext_if = "vlan1"
$ext_v4 = inet4(vlan1)
$ext_addrs = {inet4(vlan1)}
$stf_addrs = {inet6(stf0)}
$int_if = "vlan0"
table <rfc5735> type tree file "/etc/npf.rfc5735"
map $ext_if dynamic 172.18.129.0/24 -> $ext_v4
map $ext_if dynamic 172.18.129.10 port 51813 <- $ext_v4 port 51813
map $ext_if dynamic 172.18.129.12 port 25 <- $ext_v4 port 20025
map $ext_if dynamic 172.18.129.12 port 80 <- $ext_v4 port 80
map $ext_if dynamic 172.18.129.12 port 80 <- $ext_v4 port 20080
map $ext_if dynamic 172.18.129.12 port 443 <- $ext_v4 port 20443
map $ext_if dynamic 172.18.129.12 port 993 <- $ext_v4 port 20993
map $ext_if dynamic 172.18.129.25 port 5060 <- $ext_v4 port 5060
map $ext_if dynamic 172.18.129.25 port 5061 <- $ext_v4 port 5061
map $ext_if dynamic 172.18.129.25 port 5062 <- $ext_v4 port 5062
map $ext_if dynamic 172.18.129.25 port 5063 <- $ext_v4 port 5063
map $ext_if dynamic 172.18.129.25 port 5064 <- $ext_v4 port 5064
map $ext_if dynamic 172.18.129.25 port 5065 <- $ext_v4 port 5065
map $ext_if dynamic 172.18.129.25 port 5066 <- $ext_v4 port 5066
map $ext_if dynamic 172.18.129.25 port 5067 <- $ext_v4 port 5067
map $ext_if dynamic 172.18.129.25 port 5068 <- $ext_v4 port 5068
map $ext_if dynamic 172.18.129.25 port 5069 <- $ext_v4 port 5069
map $ext_if dynamic 127.0.0.1 port 22 <- $ext_v4 port 23
map $ext_if dynamic 127.0.0.1 port 22 <- $ext_v4 port 53
procedure "log" {
log: npflog0
}
group "external" on $ext_if {
# pass sixtofour
pass in final family inet4 proto ipv6 from any to $ext_v4
pass out final family inet4 proto ipv6 from $ext_v4 to any
# block rfc5735, exceptions must be listed above, note that sixtofour is one.
block in final family inet4 from <rfc5735> apply "log"
block out final family inet4 to <rfc5735> apply "log"
# pass locally generated traffic
pass stateful out final family inet4 from $ext_v4 to any
# pass NAT from inside
pass stateful out final family inet4 from 172.18.129.0/24 to any
# pass services
pass stateful in final family inet4 proto icmp from any to $ext_v4
pass stateful in final family inet4 from any to $ext_v4 port 51813
pass stateful in final family inet4 from any to $ext_v4 port 23
pass stateful in final family inet4 from any to $ext_v4 port 53
pass stateful in final family inet4 from any to $ext_v4 port 80
pass stateful in final family inet4 from any to $ext_v4 port 5060
pass stateful in final family inet4 from any to $ext_v4 port 5061
pass stateful in final family inet4 from any to $ext_v4 port 5062
pass stateful in final family inet4 from any to $ext_v4 port 5063
pass stateful in final family inet4 from any to $ext_v4 port 5064
pass stateful in final family inet4 from any to $ext_v4 port 5065
pass stateful in final family inet4 from any to $ext_v4 port 5066
pass stateful in final family inet4 from any to $ext_v4 port 5067
pass stateful in final family inet4 from any to $ext_v4 port 5068
pass stateful in final family inet4 from any to $ext_v4 port 5069
pass stateful in final family inet4 from 46.21.107.176 to $ext_v4 port 20025
pass stateful in final family inet4 from 109.74.13.132 to $ext_v4 port 20025
pass stateful in final family inet4 from 46.21.107.176 to $ext_v4 port 20080
pass stateful in final family inet4 from 46.21.107.176 to $ext_v4 port 20443
pass stateful in final family inet4 from 46.21.107.176 to $ext_v4 port 20993
}
group "internal" on $int_if {
# pass v4 routing
# XXX for some reason stateful does not play nice here anymore.
# pass stateful in final proto tcp from any to any
# pass stateful in final proto icmp from 172.18.129.0/24 to any
# pass stateful in final proto udp from 172.18.129.0/24 to any
pass in final pcap-filter "tcp and src net 172.18.129.0/24 and not dst net 172.18.129.0/24"
pass in final pcap-filter "udp and src net 172.18.129.0/24 and not dst net 172.18.129.0/24"
pass in final pcap-filter "icmp and src net 172.18.129.0/24 and not dst net 172.18.129.0/24"
# pass internal ssh, NS and NTP
pass stateful in final proto tcp from 172.18.129.0/24 to 172.18.129.1/32 port 22
pass stateful in final proto icmp from 172.18.129.0/24 to 172.18.129.1/32
pass stateful in final proto udp from 172.18.129.0/24 to 172.18.129.1/32 port 53
pass stateful in final proto udp from 172.18.129.0/24 to 172.18.129.1/32 port 69
pass stateful in final proto udp from 172.18.129.0/24 to 172.18.129.1/32 port 123
# pass any outgoing
pass stateful out final all
}
group "loopback" on lo0 {
pass final all
}
group "sixtofour" on stf0 {
# pass outbound traffic
pass stateful out final family inet6 proto tcp from $stf_addrs to any
pass stateful out final family inet6 proto udp from $stf_addrs to any
pass stateful out final family inet6 proto ipv6-icmp from $stf_addrs to any
# pass inbound traffic
pass stateful in final family inet6 proto ipv6-icmp from any to $stf_addrs
}
group default {
block all # apply "log"
}
>Fix:
n/a
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->rmind
Responsible-Changed-By: rmind@NetBSD.org
Responsible-Changed-When: Sat, 20 Dec 2014 16:12:57 +0000
Responsible-Changed-Why:
Take.
State-Changed-From-To: open->feedback
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Sat, 20 Dec 2014 16:20:26 +0000
State-Changed-Why:
Can you please try with the latest changes?
From: "Mindaugas Rasiukevicius" <rmind@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/49488 CVS commit: src/sys/net/npf
Date: Sat, 20 Dec 2014 16:19:43 +0000
Module Name: src
Committed By: rmind
Date: Sat Dec 20 16:19:43 UTC 2014
Modified Files:
src/sys/net/npf: npf_conn.c npf_conn.h npf_nat.c
Log Message:
NPF: set the connection flags atomically in the post-creation logic and
fix a tiny race condition window. Might fix PR/49488.
To generate a diff of this commit:
cvs rdiff -u -r1.13 -r1.14 src/sys/net/npf/npf_conn.c
cvs rdiff -u -r1.7 -r1.8 src/sys/net/npf/npf_conn.h
cvs rdiff -u -r1.37 -r1.38 src/sys/net/npf/npf_nat.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "SAITOH Masanobu" <msaitoh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/49488 CVS commit: [netbsd-7] src/sys/net/npf
Date: Mon, 22 Dec 2014 02:10:30 +0000
Module Name: src
Committed By: msaitoh
Date: Mon Dec 22 02:10:30 UTC 2014
Modified Files:
src/sys/net/npf [netbsd-7]: npf_conn.c npf_conn.h npf_nat.c
Log Message:
Pull up following revision(s) (requested by rmind in ticket #347):
sys/net/npf/npf_nat.c: revision 1.38
sys/net/npf/npf_conn.h: revision 1.8
sys/net/npf/npf_conn.c: revision 1.14
NPF: set the connection flags atomically in the post-creation logic and
fix a tiny race condition window. Might fix PR/49488.
To generate a diff of this commit:
cvs rdiff -u -r1.10.2.2 -r1.10.2.3 src/sys/net/npf/npf_conn.c
cvs rdiff -u -r1.6.2.1 -r1.6.2.2 src/sys/net/npf/npf_conn.h
cvs rdiff -u -r1.32.2.3 -r1.32.2.4 src/sys/net/npf/npf_nat.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: feedback->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Mon, 29 Dec 2014 13:18:43 +0000
State-Changed-Why:
kardel@ had the same problem and it seems to be fixed with the patch.
Assume fixed. If it reappears - please shout.
From: Tobias Nygren <tnn-netbsd@nygren.pp.se>
To: rmind@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/49488 (npf: assertion "ret == con" failed at
npf_conn.c:747)
Date: Mon, 29 Dec 2014 16:22:55 +0100
Hi, unfortunately the problem is still present. Although I did get a
slightly different panic today on a DIAG+DEBUG+LOCKDEBUG kernel:
panic: kernel diagnostic assertion "RB_BLACK_P(self)" failed: file
"/usr/src/sys/lib/libkern/../../../common/lib/libc/gen/rb.c", line 780
kern_assert()
rb_tree_remove_node()
npf_conndb_remove()
npf_conn_gc()
npf_worker()
I am now running a kernel with KTHREAD_MPSAFE commented out on the
npf worker thread to see if it makes any difference. (This will prevent
network stuff from preempting the worker thread right?)
Anything else worth trying?
From: Tobias Nygren <tnn-netbsd@nygren.pp.se>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/49488 (npf: assertion "ret == con" failed at
npf_conn.c:747)
Date: Sun, 4 Jan 2015 22:16:07 +0100
Hi, just wanted to remind that the PR should be reopened as
the problem is not fixed. I meanwhile added many extra KASSERTs to try
to track down the bug. It looks like an npf_connkey_t ends up inserted
into two different hash buckets (with disastrous results). Have
yet to find the root cause but the code path where I've seen this
happen is:
npf_conndb_insert()
npf_conn_setnat()
npf_do_nat()
npf_packet_handler()
pfil_run_hooks()
ipintr()
softintr_dispatch()
State-Changed-From-To: closed->open
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Sun, 04 Jan 2015 21:45:53 +0000
State-Changed-Why:
reopened on request by submitter
From: Tobias Nygren <tnn-netbsd@nygren.pp.se>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/49488 (npf: assertion "ret == con" failed at
npf_conn.c:747)
Date: Mon, 5 Jan 2015 12:53:54 +0100
Possible fix, no guarantee about correctness:
Index: npf_conn.c
===================================================================
RCS file: /cvsroot/src/sys/net/npf/npf_conn.c,v
retrieving revision 1.14
diff -u -p -r1.14 npf_conn.c
--- npf_conn.c 20 Dec 2014 16:19:43 -0000 1.14
+++ npf_conn.c 5 Jan 2015 11:50:06 -0000
@@ -563,6 +563,7 @@ npf_conn_setnat(const npf_cache_t *npc,
npf_addr_t *taddr;
in_port_t tport;
u_int tidx;
+ bool forw;
KASSERT(con->c_refcnt > 0);
@@ -589,6 +590,16 @@ npf_conn_setnat(const npf_cache_t *npc,
return EISCONN;
}
+ /* kern/49488: check for conflicting connection in opposite direction(?) */
+ ret = npf_conndb_lookup(conn_db, &key, &forw);
+ KASSERT(ret == con);
+ atomic_dec_uint(&con->c_refcnt); /* lookup did increase refcnt */
+ if (__predict_false(forw == true)) {
+ printf("npf_conn_setnat: expected backwards entry but got forwards one\n");
+ mutex_exit(&con->c_lock);
+ return EBUSY; /* XXX */
+ }
+
/* Remove the "backwards" entry. */
ret = npf_conndb_remove(conn_db, &key);
KASSERT(ret == con);
From: "Mindaugas Rasiukevicius" <rmind@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/49488 CVS commit: src/sys/net/npf
Date: Sun, 1 Feb 2015 22:41:22 +0000
Module Name: src
Committed By: rmind
Date: Sun Feb 1 22:41:22 UTC 2015
Modified Files:
src/sys/net/npf: npf_conn.c
Log Message:
- npf_conn_establish: remove a rare race condition when we might destroy a
connection when it is still referenced by another thread.
- npf_conn_destroy: remove the backwards entry using the saved key, PR/49488.
- Sprinkle some asserts.
To generate a diff of this commit:
cvs rdiff -u -r1.14 -r1.15 src/sys/net/npf/npf_conn.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/49488 CVS commit: [netbsd-7] src
Date: Wed, 4 Feb 2015 07:13:04 +0000
Module Name: src
Committed By: snj
Date: Wed Feb 4 07:13:04 UTC 2015
Modified Files:
src/lib/libnpf [netbsd-7]: npf.c npf.h
src/sys/net/npf [netbsd-7]: npf_conn.c npf_impl.h npf_ruleset.c
src/usr.sbin/npf/npfctl [netbsd-7]: npf.conf.5 npf_parse.y npf_show.c
npfctl.c
Log Message:
Pull up following revision(s) (requested by rmind in ticket #479):
lib/libnpf/npf.c: revision 1.35
lib/libnpf/npf.h: revision 1.28
sys/net/npf/npf_conn.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.61
sys/net/npf/npf_ruleset.c: revision 1.41
usr.sbin/npf/npfctl/npf.conf.5: revision 1.44
usr.sbin/npf/npfctl/npf_parse.y: revision 1.37
usr.sbin/npf/npfctl/npf_show.c: revisions 1.16, 1.17
usr.sbin/npf/npfctl/npfctl.c: revision 1.46
load the config file before bpfjit so that we can disable the warning.
--
Don't depend on yacc to include stdlib.h or string.h.
--
- npf_conn_establish: remove a rare race condition when we might destroy a
connection when it is still referenced by another thread.
- npf_conn_destroy: remove the backwards entry using the saved key, PR/49488.
- Sprinkle some asserts.
--
npf.conf(5): mention alg, include in the example, minor fix.
--
npfctl(8): report dynamic rule ID in a comment, print the case when libpcap
is used correctly. Also, add npf_ruleset_dump() helper in the kernel.
--
libnpf: add npf_rule_getid() and npf_rule_getcode().
Missed in the previous commit.
--
npfctl_print_rule: print the ID in hex, not decimal.
To generate a diff of this commit:
cvs rdiff -u -r1.32.2.1 -r1.32.2.2 src/lib/libnpf/npf.c
cvs rdiff -u -r1.27 -r1.27.2.1 src/lib/libnpf/npf.h
cvs rdiff -u -r1.10.2.3 -r1.10.2.4 src/sys/net/npf/npf_conn.c
cvs rdiff -u -r1.58.2.2 -r1.58.2.3 src/sys/net/npf/npf_impl.h
cvs rdiff -u -r1.37.2.2 -r1.37.2.3 src/sys/net/npf/npf_ruleset.c
cvs rdiff -u -r1.42.2.1 -r1.42.2.2 src/usr.sbin/npf/npfctl/npf.conf.5
cvs rdiff -u -r1.35.4.1 -r1.35.4.2 src/usr.sbin/npf/npfctl/npf_parse.y
cvs rdiff -u -r1.15 -r1.15.2.1 src/usr.sbin/npf/npfctl/npf_show.c
cvs rdiff -u -r1.42.2.2 -r1.42.2.3 src/usr.sbin/npf/npfctl/npfctl.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Thu, 05 Feb 2015 22:07:05 +0000
State-Changed-Why:
Finally fixed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.