NetBSD Problem Report #49488

From www@NetBSD.org  Fri Dec 19 14:09:00 2014
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 4530BA65B6
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 19 Dec 2014 14:09:00 +0000 (UTC)
Message-Id: <20141219140858.D8F2BA65BB@mollari.NetBSD.org>
Date: Fri, 19 Dec 2014 14:08:58 +0000 (UTC)
From: tnn-netbsd@nygren.pp.se
Reply-To: tnn-netbsd@nygren.pp.se
To: gnats-bugs@NetBSD.org
Subject: npf: assertion "ret == con" failed at npf_conn.c:747
X-Send-Pr-Version: www-1.0

>Number:         49488
>Category:       kern
>Synopsis:       npf: assertion "ret == con" failed at npf_conn.c:747
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    rmind
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Dec 19 14:10:01 +0000 2014
>Closed-Date:    Thu Feb 05 22:07:05 +0000 2015
>Last-Modified:  Thu Feb 05 22:07:05 +0000 2015
>Originator:     Tobias Nygren
>Release:        7.99.3
>Organization:
>Environment:
>Description:
After updating from -current dated ~ Jun 1024 to -current from today my SOHO NAT box panics several times per day with the following message:
kernel diagnostic assertion "ret == con" failed at npf_conn.c:747



>How-To-Repeat:
Not sure. But seems to be NAT related. (Also it seems stateful firewalling on the internal interface doesn't work anymore when using NAT rules on the external interface, this may or may not be related).

Here is the ruleset:

$ext_if = "vlan1"
$ext_v4 = inet4(vlan1)
$ext_addrs = {inet4(vlan1)}
$stf_addrs = {inet6(stf0)}

$int_if = "vlan0"

table <rfc5735> type tree file "/etc/npf.rfc5735"

map $ext_if dynamic 172.18.129.0/24 -> $ext_v4
map $ext_if dynamic 172.18.129.10 port 51813 <- $ext_v4 port 51813
map $ext_if dynamic 172.18.129.12 port 25 <- $ext_v4 port 20025
map $ext_if dynamic 172.18.129.12 port 80 <- $ext_v4 port 80
map $ext_if dynamic 172.18.129.12 port 80 <- $ext_v4 port 20080
map $ext_if dynamic 172.18.129.12 port 443 <- $ext_v4 port 20443
map $ext_if dynamic 172.18.129.12 port 993 <- $ext_v4 port 20993
map $ext_if dynamic 172.18.129.25 port 5060 <- $ext_v4 port 5060
map $ext_if dynamic 172.18.129.25 port 5061 <- $ext_v4 port 5061
map $ext_if dynamic 172.18.129.25 port 5062 <- $ext_v4 port 5062
map $ext_if dynamic 172.18.129.25 port 5063 <- $ext_v4 port 5063
map $ext_if dynamic 172.18.129.25 port 5064 <- $ext_v4 port 5064
map $ext_if dynamic 172.18.129.25 port 5065 <- $ext_v4 port 5065
map $ext_if dynamic 172.18.129.25 port 5066 <- $ext_v4 port 5066
map $ext_if dynamic 172.18.129.25 port 5067 <- $ext_v4 port 5067
map $ext_if dynamic 172.18.129.25 port 5068 <- $ext_v4 port 5068
map $ext_if dynamic 172.18.129.25 port 5069 <- $ext_v4 port 5069
map $ext_if dynamic 127.0.0.1 port 22 <- $ext_v4 port 23
map $ext_if dynamic 127.0.0.1 port 22 <- $ext_v4 port 53

procedure "log" {
	log: npflog0
}

group "external" on $ext_if {
# pass sixtofour
	pass in final family inet4 proto ipv6 from any to $ext_v4
	pass out final family inet4 proto ipv6 from $ext_v4 to any
# block rfc5735, exceptions must be listed above, note that sixtofour is one.
	block in final family inet4 from <rfc5735> apply "log"
	block out final family inet4 to <rfc5735> apply "log"
# pass locally generated traffic
	pass stateful out final family inet4 from $ext_v4 to any
# pass NAT from inside
	pass stateful out final family inet4 from 172.18.129.0/24 to any
# pass services
	pass stateful in final family inet4 proto icmp from any to $ext_v4
	pass stateful in final family inet4 from any to $ext_v4 port 51813
	pass stateful in final family inet4 from any to $ext_v4 port 23
	pass stateful in final family inet4 from any to $ext_v4 port 53
	pass stateful in final family inet4 from any to $ext_v4 port 80
	pass stateful in final family inet4 from any to $ext_v4 port 5060
	pass stateful in final family inet4 from any to $ext_v4 port 5061
	pass stateful in final family inet4 from any to $ext_v4 port 5062
	pass stateful in final family inet4 from any to $ext_v4 port 5063
	pass stateful in final family inet4 from any to $ext_v4 port 5064
	pass stateful in final family inet4 from any to $ext_v4 port 5065
	pass stateful in final family inet4 from any to $ext_v4 port 5066
	pass stateful in final family inet4 from any to $ext_v4 port 5067
	pass stateful in final family inet4 from any to $ext_v4 port 5068
	pass stateful in final family inet4 from any to $ext_v4 port 5069
	pass stateful in final family inet4 from 46.21.107.176 to $ext_v4 port 20025
	pass stateful in final family inet4 from 109.74.13.132 to $ext_v4 port 20025
	pass stateful in final family inet4 from 46.21.107.176 to $ext_v4 port 20080
	pass stateful in final family inet4 from 46.21.107.176 to $ext_v4 port 20443
	pass stateful in final family inet4 from 46.21.107.176 to $ext_v4 port 20993
}

group "internal" on $int_if {
# pass v4 routing
# XXX for some reason stateful does not play nice here anymore.
#	pass stateful in final proto tcp from any to any
#	pass stateful in final proto icmp from 172.18.129.0/24 to any
#	pass stateful in final proto udp from 172.18.129.0/24 to any
	pass in final pcap-filter "tcp and src net 172.18.129.0/24 and not dst net 172.18.129.0/24"
	pass in final pcap-filter "udp and src net 172.18.129.0/24 and not dst net 172.18.129.0/24"
	pass in final pcap-filter "icmp and src net 172.18.129.0/24 and not dst net 172.18.129.0/24"
# pass internal ssh, NS and NTP
	pass stateful in final proto tcp from 172.18.129.0/24 to 172.18.129.1/32 port 22
	pass stateful in final proto icmp from 172.18.129.0/24 to 172.18.129.1/32
	pass stateful in final proto udp from 172.18.129.0/24 to 172.18.129.1/32 port 53
	pass stateful in final proto udp from 172.18.129.0/24 to 172.18.129.1/32 port 69
	pass stateful in final proto udp from 172.18.129.0/24 to 172.18.129.1/32 port 123
# pass any outgoing
	pass stateful out final all
}

group "loopback" on lo0 {
	pass final all
}

group "sixtofour" on stf0 {
# pass outbound traffic
	pass stateful out final family inet6 proto tcp from $stf_addrs to any
	pass stateful out final family inet6 proto udp from $stf_addrs to any
	pass stateful out final family inet6 proto ipv6-icmp from $stf_addrs to any
# pass inbound traffic
	pass stateful in final family inet6 proto ipv6-icmp from any to $stf_addrs
}

group default {
	block all # apply "log"
}

>Fix:
n/a

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->rmind
Responsible-Changed-By: rmind@NetBSD.org
Responsible-Changed-When: Sat, 20 Dec 2014 16:12:57 +0000
Responsible-Changed-Why:
Take.


State-Changed-From-To: open->feedback
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Sat, 20 Dec 2014 16:20:26 +0000
State-Changed-Why:
Can you please try with the latest changes?


From: "Mindaugas Rasiukevicius" <rmind@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/49488 CVS commit: src/sys/net/npf
Date: Sat, 20 Dec 2014 16:19:43 +0000

 Module Name:	src
 Committed By:	rmind
 Date:		Sat Dec 20 16:19:43 UTC 2014

 Modified Files:
 	src/sys/net/npf: npf_conn.c npf_conn.h npf_nat.c

 Log Message:
 NPF: set the connection flags atomically in the post-creation logic and
 fix a tiny race condition window.  Might fix PR/49488.


 To generate a diff of this commit:
 cvs rdiff -u -r1.13 -r1.14 src/sys/net/npf/npf_conn.c
 cvs rdiff -u -r1.7 -r1.8 src/sys/net/npf/npf_conn.h
 cvs rdiff -u -r1.37 -r1.38 src/sys/net/npf/npf_nat.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "SAITOH Masanobu" <msaitoh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/49488 CVS commit: [netbsd-7] src/sys/net/npf
Date: Mon, 22 Dec 2014 02:10:30 +0000

 Module Name:	src
 Committed By:	msaitoh
 Date:		Mon Dec 22 02:10:30 UTC 2014

 Modified Files:
 	src/sys/net/npf [netbsd-7]: npf_conn.c npf_conn.h npf_nat.c

 Log Message:
 Pull up following revision(s) (requested by rmind in ticket #347):
 	sys/net/npf/npf_nat.c: revision 1.38
 	sys/net/npf/npf_conn.h: revision 1.8
 	sys/net/npf/npf_conn.c: revision 1.14
 NPF: set the connection flags atomically in the post-creation logic and
 fix a tiny race condition window.  Might fix PR/49488.


 To generate a diff of this commit:
 cvs rdiff -u -r1.10.2.2 -r1.10.2.3 src/sys/net/npf/npf_conn.c
 cvs rdiff -u -r1.6.2.1 -r1.6.2.2 src/sys/net/npf/npf_conn.h
 cvs rdiff -u -r1.32.2.3 -r1.32.2.4 src/sys/net/npf/npf_nat.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: feedback->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Mon, 29 Dec 2014 13:18:43 +0000
State-Changed-Why:
kardel@ had the same problem and it seems to be fixed with the patch.
Assume fixed.  If it reappears - please shout.


From: Tobias Nygren <tnn-netbsd@nygren.pp.se>
To: rmind@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/49488 (npf: assertion "ret == con" failed at
 npf_conn.c:747)
Date: Mon, 29 Dec 2014 16:22:55 +0100

 Hi, unfortunately the problem is still present. Although I did get a
 slightly different panic today on a DIAG+DEBUG+LOCKDEBUG kernel:

 panic: kernel diagnostic assertion "RB_BLACK_P(self)" failed: file
 "/usr/src/sys/lib/libkern/../../../common/lib/libc/gen/rb.c", line 780
 kern_assert()
 rb_tree_remove_node()
 npf_conndb_remove()
 npf_conn_gc()
 npf_worker()

 I am now running a kernel with KTHREAD_MPSAFE commented out on the
 npf worker thread to see if it makes any difference. (This will prevent
 network stuff from preempting the worker thread right?)
 Anything else worth trying?




From: Tobias Nygren <tnn-netbsd@nygren.pp.se>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49488 (npf: assertion "ret == con" failed at
 npf_conn.c:747)
Date: Sun, 4 Jan 2015 22:16:07 +0100

 Hi, just wanted to remind that the PR should be reopened as
 the problem is not fixed. I meanwhile added many extra KASSERTs to try
 to track down the bug. It looks like an npf_connkey_t ends up inserted
 into two different hash buckets (with disastrous results). Have
 yet to find the root cause but the code path where I've seen this
 happen is:

 npf_conndb_insert()
 npf_conn_setnat()
 npf_do_nat()
 npf_packet_handler()
 pfil_run_hooks()
 ipintr()
 softintr_dispatch()

State-Changed-From-To: closed->open
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Sun, 04 Jan 2015 21:45:53 +0000
State-Changed-Why:
reopened on request by submitter


From: Tobias Nygren <tnn-netbsd@nygren.pp.se>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49488 (npf: assertion "ret == con" failed at
 npf_conn.c:747)
Date: Mon, 5 Jan 2015 12:53:54 +0100

 Possible fix, no guarantee about correctness:

 Index: npf_conn.c
 ===================================================================
 RCS file: /cvsroot/src/sys/net/npf/npf_conn.c,v
 retrieving revision 1.14
 diff -u -p -r1.14 npf_conn.c
 --- npf_conn.c	20 Dec 2014 16:19:43 -0000	1.14
 +++ npf_conn.c	5 Jan 2015 11:50:06 -0000
 @@ -563,6 +563,7 @@ npf_conn_setnat(const npf_cache_t *npc, 
  	npf_addr_t *taddr;
  	in_port_t tport;
  	u_int tidx;
 +	bool forw;

  	KASSERT(con->c_refcnt > 0);

 @@ -589,6 +590,16 @@ npf_conn_setnat(const npf_cache_t *npc, 
  		return EISCONN;
  	}

 +	/* kern/49488: check for conflicting connection in opposite direction(?) */
 +	ret = npf_conndb_lookup(conn_db, &key, &forw);
 +	KASSERT(ret == con);
 +	atomic_dec_uint(&con->c_refcnt); /* lookup did increase refcnt */
 +	if (__predict_false(forw == true)) {
 +		printf("npf_conn_setnat: expected backwards entry but got forwards one\n");
 +		mutex_exit(&con->c_lock);
 +		return EBUSY; /* XXX */
 +	}
 +
  	/* Remove the "backwards" entry. */
  	ret = npf_conndb_remove(conn_db, &key);
  	KASSERT(ret == con);




From: "Mindaugas Rasiukevicius" <rmind@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/49488 CVS commit: src/sys/net/npf
Date: Sun, 1 Feb 2015 22:41:22 +0000

 Module Name:	src
 Committed By:	rmind
 Date:		Sun Feb  1 22:41:22 UTC 2015

 Modified Files:
 	src/sys/net/npf: npf_conn.c

 Log Message:
 - npf_conn_establish: remove a rare race condition when we might destroy a
   connection when it is still referenced by another thread.
 - npf_conn_destroy: remove the backwards entry using the saved key, PR/49488.
 - Sprinkle some asserts.


 To generate a diff of this commit:
 cvs rdiff -u -r1.14 -r1.15 src/sys/net/npf/npf_conn.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/49488 CVS commit: [netbsd-7] src
Date: Wed, 4 Feb 2015 07:13:04 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Wed Feb  4 07:13:04 UTC 2015

 Modified Files:
 	src/lib/libnpf [netbsd-7]: npf.c npf.h
 	src/sys/net/npf [netbsd-7]: npf_conn.c npf_impl.h npf_ruleset.c
 	src/usr.sbin/npf/npfctl [netbsd-7]: npf.conf.5 npf_parse.y npf_show.c
 	    npfctl.c

 Log Message:
 Pull up following revision(s) (requested by rmind in ticket #479):
 	lib/libnpf/npf.c: revision 1.35
 	lib/libnpf/npf.h: revision 1.28
 	sys/net/npf/npf_conn.c: revision 1.15
 	sys/net/npf/npf_impl.h: revision 1.61
 	sys/net/npf/npf_ruleset.c: revision 1.41
 	usr.sbin/npf/npfctl/npf.conf.5: revision 1.44
 	usr.sbin/npf/npfctl/npf_parse.y: revision 1.37
 	usr.sbin/npf/npfctl/npf_show.c: revisions 1.16, 1.17
 	usr.sbin/npf/npfctl/npfctl.c: revision 1.46
 load the config file before bpfjit so that we can disable the warning.
 --
 Don't depend on yacc to include stdlib.h or string.h.
 --
 - npf_conn_establish: remove a rare race condition when we might destroy a
   connection when it is still referenced by another thread.
 - npf_conn_destroy: remove the backwards entry using the saved key, PR/49488.
 - Sprinkle some asserts.
 --
 npf.conf(5): mention alg, include in the example, minor fix.
 --
 npfctl(8): report dynamic rule ID in a comment, print the case when libpcap
 is used correctly.  Also, add npf_ruleset_dump() helper in the kernel.
 --
 libnpf: add npf_rule_getid() and npf_rule_getcode().
 Missed in the previous commit.
 --
 npfctl_print_rule: print the ID in hex, not decimal.


 To generate a diff of this commit:
 cvs rdiff -u -r1.32.2.1 -r1.32.2.2 src/lib/libnpf/npf.c
 cvs rdiff -u -r1.27 -r1.27.2.1 src/lib/libnpf/npf.h
 cvs rdiff -u -r1.10.2.3 -r1.10.2.4 src/sys/net/npf/npf_conn.c
 cvs rdiff -u -r1.58.2.2 -r1.58.2.3 src/sys/net/npf/npf_impl.h
 cvs rdiff -u -r1.37.2.2 -r1.37.2.3 src/sys/net/npf/npf_ruleset.c
 cvs rdiff -u -r1.42.2.1 -r1.42.2.2 src/usr.sbin/npf/npfctl/npf.conf.5
 cvs rdiff -u -r1.35.4.1 -r1.35.4.2 src/usr.sbin/npf/npfctl/npf_parse.y
 cvs rdiff -u -r1.15 -r1.15.2.1 src/usr.sbin/npf/npfctl/npf_show.c
 cvs rdiff -u -r1.42.2.2 -r1.42.2.3 src/usr.sbin/npf/npfctl/npfctl.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Thu, 05 Feb 2015 22:07:05 +0000
State-Changed-Why:
Finally fixed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.