NetBSD Problem Report #53962

From www@NetBSD.org  Sat Feb  9 00:32:06 2019
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id C55397A16A
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  9 Feb 2019 00:32:06 +0000 (UTC)
Message-Id: <20190209003205.7A0717A1DA@mollari.NetBSD.org>
Date: Sat,  9 Feb 2019 00:32:05 +0000 (UTC)
From: fstd.lkml@gmail.com
Reply-To: fstd.lkml@gmail.com
To: gnats-bugs@NetBSD.org
Subject: npf: weird 'stateful' behavior
X-Send-Pr-Version: www-1.0

>Number:         53962
>Category:       kern
>Synopsis:       npf: weird 'stateful' behavior
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    rmind
>State:          feedback
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Feb 09 00:35:00 +0000 2019
>Closed-Date:    
>Last-Modified:  Sun May 31 17:25:00 +0000 2020
>Originator:     Timo Buhrmester
>Release:        8.0_STABLE as of 2019-01-26
>Organization:
>Environment:
NetBSD lemon.pr0.tips 8.0_STABLE NetBSD 8.0_STABLE (LEMONKERND) #0: Sat Jan 26 15:22:56 CET 2019  build@kiwi.pr0.tips:/stor/netbsd/foreign/lemon-apu/obj/sys/arch/amd64/compile/LEMONKERND amd64
>Description:
I've been attempting to migrate from ipf to npf and I'm having trouble understanding how 'stateful' is supposed to work.


In a test-case sort of scenario, I'm having a NetBSD router connected to two networks:
wm1: 192.168.1.1/24
wm2: 192.168.2.1/24

I want to allow the network behind wm1 to SSH to the network behind wm2.


This is what I initially tried, it's how things worked in ipf:

npf.conf:
| group "net1" on wm1 {
|         pass stateful in final proto tcp from 192.168.1.0/24 to 192.168.2.0/24 port 22 apply "log"
|         block all apply "log"
| }
| 
| group "net2" on wm2 {
|         block all apply "log"
| }
| 
| group default {
|         pass final on lo0 all
|         block all apply "log"
| }

$ nc 192.168.2.13 22  #run on 192.168.1.14

npflog0:
| 00:36:03.085493 rule 5.rules.0/0(match): pass in on wm1: 192.168.1.14.42714 > 192.168.2.13.22: Flags [S], seq 453874606, win 29200, options [mss 1460,sackOK,TS val 561118807 ecr 0,nop,wscale 7], length 0
| 00:36:03.085524 rule 8.rules.0/0(match): block out on wm2: 192.168.1.14.42714 > 192.168.2.13.22: Flags [S], seq 453874606, win 29200, options [mss 1460,sackOK,TS val 561118807 ecr 0,nop,wscale 7], length 0

My SYN doesn't make it out of wm2, so apparently I need a rule for that as well.

Therefore my next attempt:

npf.conf:
| procedure "log" {
|         log: npflog0
| }
| 
| group "net1" on wm1 {
|         pass stateful in final proto tcp from 192.168.1.0/24 to 192.168.2.0/24 port 22 apply "log"
|         block all apply "log"
| }
| 
| group "net2" on wm2 {
|         pass stateful out final proto tcp from 192.168.1.0/24 to 192.168.2.13 port 22 apply "log"
|         block all apply "log"
| }
| 
| group default {
|         pass final on lo0 all
|         block all apply "log"
| }

$ nc 192.168.2.13 22  #run on 192.168.1.14

npflog0:
| 00:36:52.006564 rule 5.rules.0/0(match): pass in on wm1: 192.168.1.14.42718 > 192.168.2.13.22: Flags [S], seq 2274723816, win 29200, options [mss 1460,sackOK,TS val 561131037 ecr 0,nop,wscale 7], length 0
| 00:36:52.006639 rule 8.rules.0/0(match): pass out on wm2: 192.168.1.14.42718 > 192.168.2.13.22: Flags [S], seq 2274723816, win 29200, options [mss 1460,sackOK,TS val 561131037 ecr 0,nop,wscale 7], length 0
| 00:36:52.007299 rule 9.rules.0/0(match): block in on wm2: 192.168.2.13.22 > 192.168.1.14.42718: Flags [S.], seq 283128975, ack 2274723817, win 28960, options [mss 1460,sackOK,TS val 4155176782 ecr 561131037,nop,wscale 6], length 0

Now the router is blocking the SYN/ACK -- why?  The rule was 'stateful' and therefore state should've been kept, no?

Anyway, let's see what happens when the wm1 rule is no longer stateful:

npf.conf:
| procedure "log" {
|         log: npflog0
| }
| 
| group "net1" on wm1 {
|         pass in final proto tcp from 192.168.1.0/24 to 192.168.2.0/24 port 22 apply "log"
|         block all apply "log"
| }
| 
| group "net2" on wm2 {
|         pass stateful out final proto tcp from 192.168.1.0/24 to 192.168.2.13 port 22 apply "log"
|         block all apply "log"
| }
| 
| group default {
|         pass final on lo0 all
|         block all apply "log"
| }

$ nc 192.168.2.13 22  #run on 192.168.1.14

npflog0:
| 00:39:00.969658 rule 5.rules.0/0(match): pass in on wm1: 192.168.1.14.42732 > 192.168.2.13.22: Flags [S], seq 1423577834, win 29200, options [mss 1460,sackOK,TS val 561163278 ecr 0,nop,wscale 7], length 0
| 00:39:00.969758 rule 8.rules.0/0(match): pass out on wm2: 192.168.1.14.42732 > 192.168.2.13.22: Flags [S], seq 1423577834, win 29200, options [mss 1460,sackOK,TS val 561163278 ecr 0,nop,wscale 7], length 0
| 00:39:00.970346 rule 8.rules.0/0(match): pass out on wm2: 192.168.2.13.22 > 192.168.1.14.42732: Flags [S.], seq 1701442932, ack 1423577835, win 28960, options [mss 1460,sackOK,TS val 4155305749 ecr 561163278,nop,wscale 6], length 0
| 00:39:00.970379 rule 6.rules.0/0(match): block out on wm1: 192.168.2.13.22 > 192.168.1.14.42732: Flags [S.], seq 1701442932, ack 1423577835, win 28960, options [mss 1460,sackOK,TS val 4155305749 ecr 561163278,nop,wscale 6], length 0

Now the SYN/ACK is passed OUT on wm2?  This seems fishy, that's the same direction we sent the SYN.  It totally should've been "in" there.  Is this a bug in npf's logging?

(mlelstv explained on IRC: "I am pretty sure that the direction in the log is not about the packet but the rule so "pass out on wm2" is actually the stateful handling of the "pass out" rule for the incoming packet."  so I guess this concern can be disregarded)

Anyway assuming 'out' was actually 'in' on the 3rd line, apparently now state is kept for the (sole) 'stateful' rule.  But I have no state on the wm1 rule anymore, so this still doesn't get through.


Out of curiosity, the last case: wm1 rule is 'stateful', wm2 rule isn't:

npf.conf:
| procedure "log" {
|         log: npflog0
| }
| 
| group "net1" on wm1 {
|         pass stateful in final proto tcp from 192.168.1.0/24 to 192.168.2.0/24 port 22 apply "log"
|         block all apply "log"
| }
| 
| group "net2" on wm2 {
|         pass out final proto tcp from 192.168.1.0/24 to 192.168.2.13 port 22 apply "log"
|         block all apply "log"
| }
| 
| group default {
|         pass final on lo0 all
|         block all apply "log"
| }

$ nc 192.168.2.13 22  #run on 192.168.1.14

npflog0:
| 00:40:43.323638 rule 5.rules.0/0(match): pass in on wm1: 192.168.1.14.42738 > 192.168.2.13.22: Flags [S], seq 1102658105, win 29200, options [mss 1460,sackOK,TS val 561188866 ecr 0,nop,wscale 7], length 0
| 00:40:43.323669 rule 8.rules.0/0(match): pass out on wm2: 192.168.1.14.42738 > 192.168.2.13.22: Flags [S], seq 1102658105, win 29200, options [mss 1460,sackOK,TS val 561188866 ecr 0,nop,wscale 7], length 0
| 00:40:43.324247 rule 9.rules.0/0(match): block in on wm2: 192.168.2.13.22 > 192.168.1.14.42738: Flags [S.], seq 1712581557, ack 1102658106, win 28960, options [mss 1460,sackOK,TS val 4155408105 ecr 561188866,nop,wscale 6], length 0

I never learn whether state was kept on the wm1 rule since it's blocked on wm2 ingress (as it should as far as my understanding of what 'stateful' is supposed to do goes).


So what's the wizardry here?

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53962: npf: weird 'stateful' behavior
Date: Mon, 18 Feb 2019 05:50:33 +0000

 not sent to gnats
 (you usually need to change To: explicitly to gnats-bugs if replying
 to your own posting)

    ------

 From: fstd.lkml@gmail.com
 To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
 Subject: Re: kern/53962: npf: weird 'stateful' behavior
 Date: Sat, 9 Feb 2019 12:54:01 +0100

 I have discovered that this 5th example does what I need:

 npf.conf:
 | procedure "log" {
 | 	log: npflog0
 | }
 | 
 | group "net1" on wm1 {
 | 	pass in final proto tcp flags S/SARF from 192.168.1.0/24 to 192.168.2.0/24 port 22 apply "log"
 | 	block all apply "log"
 | }
 | 
 | group "net2" on wm2 {
 | 	pass stateful-ends out final proto tcp flags S/SARF from 192.168.1.0/24 to 192.168.2.13 port 22 apply "log"
 | 	block all apply "log"
 | }
 | 
 | group default {
 | 	pass final on lo0 all
 | 	block all apply "log"
 | }


 Since the packet will first ingress on wm1, originally I thought 'stateful-ends' on the wm1 rule would be what to go for, but the state kept by it would not make it egress on wm2.  Having both rules 'stateful-ends' doesn't do the trick either.

 But if I, as shown above, stateLESSly let the SYN ingress and then keep state(ful-ends) on the wm2 egress rule -- THEN state is kept that, in fact, also applies to future related packages in- or egressing on wm1.

 I'd love to understand what's going on here.


From: fstd.lkml@gmail.com
To: gnats-bugs@netbsd.org
Cc: tech-net@netbsd.org, rmind@netbsd.org
Subject: Re: kern/53962: npf: weird 'stateful' behavior
Date: Sun, 10 Feb 2019 03:03:14 +0100

 > npf.conf:
 > | procedure "log" {
 > |         log: npflog0
 > | }
 > | 
 > | group "net1" on wm1 {
 > |         pass stateful in final proto tcp from 192.168.1.0/24 to 192.168.2.0/24 port 22 apply "log"
 > |         block all apply "log"
 > | }
 > | 
 > | group "net2" on wm2 {
 > |         pass stateful out final proto tcp from 192.168.1.0/24 to 192.168.2.13 port 22 apply "log"
 > |         block all apply "log"
 > | }
 > | 
 > | group default {
 > |         pass final on lo0 all
 > |         block all apply "log"
 > | }
 > 
 > $ nc 192.168.2.13 22  #run on 192.168.1.14
 > 
 > npflog0:
 > | 00:36:52.006564 rule 5.rules.0/0(match): pass in on wm1: 192.168.1.14.42718 > 192.168.2.13.22: Flags [S], seq 2274723816, win 29200, options [mss 1460,sackOK,TS val 561131037 ecr 0,nop,wscale 7], length 0
 > | 00:36:52.006639 rule 8.rules.0/0(match): pass out on wm2: 192.168.1.14.42718 > 192.168.2.13.22: Flags [S], seq 2274723816, win 29200, options [mss 1460,sackOK,TS val 561131037 ecr 0,nop,wscale 7], length 0
 > | 00:36:52.007299 rule 9.rules.0/0(match): block in on wm2: 192.168.2.13.22 > 192.168.1.14.42718: Flags [S.], seq 283128975, ack 2274723817, win 28960, options [mss 1460,sackOK,TS val 4155176782 ecr 561131037,nop,wscale 6], length 0
 > 
 > Now the router is blocking the SYN/ACK -- why?  The rule was 'stateful' and therefore state should've been kept, no?

 Ok what seems to be going on is twofold:

 1)

 Whenever a packet arrives, npf tries to retrieve, from the 'connection db', a connection that relates to that packet.  The key that this lookup is done with, i.e. what identifies this connection, is derived (in the case of TCP) from (src/dst port, src/dst addr, protocol). (Note that 'interface' is absent from that list.).  The result of this lookup is a 'connection' object which keeps the connection's state.  (cf. npf_conn_conkey() and connkey_setkey() in sys/net/npf/npf_conn.c)

 It follows that it's impossible to keep state on two connections that only differ by interface.  The connection objects can represent it, but the keys they'd be stored with in the connection db would collide.  (The semantics of the connection db are to refuse to insert a key that already exists, rather than overwriting it.).

 That means (looking at the above npf.conf) that an ingressing packet on wm1 will create a connection to keep state on; then upon egress of wm2 there's technically another connection to keep state on (same parameters, different interface), but the latter connection fails to be inserted into the connection db because of a key collision with the former.


 2)

 npf considers a connection to have a "direction" (i.e. the direction of the initial SYN), and essentially assumes that a "forwards" packet will only ever INgress on an interface, and a "backwards" packet will only ever Egress from an interface (or the other way around, depending on whether the SYN in- or egressed).  This assumption is obviously not true on, say, a router, where one and the same packet may ingress on one interface, and egress out on another.  The piece of code that does this is in npf_conn_ok() in sys/net/npf/npf_conn.c:

 | /*
 |  * npf_conn_ok: check if the connection is active and has the right direction.
 |  */
 | static bool
 | npf_conn_ok(const npf_conn_t *con, const int di, bool forw) //di=2 forw=1
 | {
 | 	const uint32_t flags = con->c_flags;
 | 
 | 	/* Check if connection is active and not expired. */
 | 	bool ok = (flags & (CONN_ACTIVE | CONN_EXPIRE)) == CONN_ACTIVE;
 | 	if (__predict_false(!ok)) {
 | 		return false;
 | 	}
 | 
 all good until here, but now...: ('di' is direction, flags is either 1 (ingress) or 2 (egress), PFIL_ALL is 3)
 | 	/* Check if the direction is consistent */
 | 	bool pforw = (flags & PFIL_ALL) == (unsigned)di;
 | 	if (__predict_false(forw != pforw)) {
 | 		return false;
 | 	}
 | 	return true;
 | }

 When commenting out that last check, the connection is later still discarded for having the wrong interface (as it should).  But since the connection is tied to a particular interface, that interface should've been part of the key to the connection db in the first place.  However I realize that if that were the cae, it'd be difficult to implement interface-agnostic state ("stateful-ends").

 Speaking of stateful-ends, when the direction-check is commented out, a single 'stateful-ends' ingress rule gives me exactly the good old ipf "keep state" behavior (if the packet is accepted into the filter, it's implicitly permitted out of the filter, on whatever interface).  So that's a workaround I can live with for now, although of course I'm not entirely sure of the purpose of this direction check, or the consequences of removing it.

 Any insights?

From: Timo Buhrmester <fstd.lkml@gmail.com>
To: gnats-bugs@netbsd.org
Cc: rmind@netbsd.org
Subject: Re: kern/53962: npf: weird 'stateful' behavior
Date: Thu, 14 Feb 2019 02:28:27 +0100

 > So that's a workaround I can live with for now
 Turns out that workaround doesn't play nice with NAT.

 I've "fixed" things for now by making the interface identifier part
 of the connection key so as to avoid identical rules on different
 interfaces colliding.

 Cheers
 Timo

 diff --git a/sys/net/npf/npf_conn.c b/sys/net/npf/npf_conn.c
 index 3557132b1f1a..023f845964f0 100644
 --- a/sys/net/npf/npf_conn.c
 +++ b/sys/net/npf/npf_conn.c
 @@ -238,7 +238,7 @@ npf_conn_trackable_p(const npf_cache_t *npc)

  static uint32_t
  connkey_setkey(npf_connkey_t *key, uint16_t proto, const void *ipv,
 -    const uint16_t *id, unsigned alen, bool forw)
 +    const uint16_t *id, unsigned alen, bool forw, uint32_t ifid)
  {
  	uint32_t isrc, idst, *k = key->ck_key;
  	const npf_addr_t * const *ips = ipv;
 @@ -263,22 +263,23 @@ connkey_setkey(npf_connkey_t *key, uint16_t proto, const void *ipv,

  	k[0] = ((uint32_t)proto << 16) | (alen & 0xffff);
  	k[1] = ((uint32_t)id[isrc] << 16) | id[idst];
 +	k[2] = ifid;

  	if (__predict_true(alen == sizeof(in_addr_t))) {
 -		k[2] = ips[isrc]->word32[0];
 -		k[3] = ips[idst]->word32[0];
 +		k[3] = ips[isrc]->word32[0];
 +		k[4] = ips[idst]->word32[0];
  		return 4 * sizeof(uint32_t);
  	} else {
  		const u_int nwords = alen >> 2;
 -		memcpy(&k[2], ips[isrc], alen);
 -		memcpy(&k[2 + nwords], ips[idst], alen);
 +		memcpy(&k[3], ips[isrc], alen);
 +		memcpy(&k[3 + nwords], ips[idst], alen);
  		return (2 + (nwords * 2)) * sizeof(uint32_t);
  	}
  }

  static void
  connkey_getkey(const npf_connkey_t *key, uint16_t *proto, npf_addr_t *ips,
 -    uint16_t *id, uint16_t *alen)
 +    uint16_t *id, uint16_t *alen, uint32_t *ifid)
  {
  	const uint32_t *k = key->ck_key;

 @@ -286,12 +287,14 @@ connkey_getkey(const npf_connkey_t *key, uint16_t *proto, npf_addr_t *ips,
  	*alen = k[0] & 0xffff;
  	id[NPF_SRC] = k[1] >> 16;
  	id[NPF_DST] = k[1] & 0xffff;
 +	if (ifid)
 +		*ifid = k[2];

  	switch (*alen) {
  	case sizeof(struct in6_addr):
  	case sizeof(struct in_addr):
 -		memcpy(&ips[NPF_SRC], &k[2], *alen);
 -		memcpy(&ips[NPF_DST], &k[2 + ((unsigned)*alen >> 2)], *alen);
 +		memcpy(&ips[NPF_SRC], &k[3], *alen);
 +		memcpy(&ips[NPF_DST], &k[3 + ((unsigned)*alen >> 2)], *alen);
  		return;
  	default:
  		KASSERT(0);
 @@ -345,14 +348,14 @@ npf_conn_conkey(const npf_cache_t *npc, npf_connkey_t *key, const bool forw)
  		/* Unsupported protocol. */
  		return 0;
  	}
 -	return connkey_setkey(key, proto, npc->npc_ips, id, alen, forw);
 +	return connkey_setkey(key, proto, npc->npc_ips, id, alen, forw, npc->npc_nbuf->nb_ifid);
  }

  static __inline void
  connkey_set_addr(npf_connkey_t *key, const npf_addr_t *naddr, const int di)
  {
  	const u_int alen = key->ck_key[0] & 0xffff;
 -	uint32_t *addr = &key->ck_key[2 + ((alen >> 2) * di)];
 +	uint32_t *addr = &key->ck_key[3 + ((alen >> 2) * di)];

  	KASSERT(alen > 0);
  	memcpy(addr, naddr, alen);
 @@ -945,18 +948,21 @@ static prop_dictionary_t
  npf_connkey_export(const npf_connkey_t *key)
  {
  	uint16_t id[2], alen, proto;
 +	uint32_t ifid;
  	prop_dictionary_t kdict;
  	npf_addr_t ips[2];
  	prop_data_t d;

  	kdict = prop_dictionary_create();
 -	connkey_getkey(key, &proto, ips, id, &alen);
 +	connkey_getkey(key, &proto, ips, id, &alen, &ifid);

  	prop_dictionary_set_uint16(kdict, "proto", proto);

  	prop_dictionary_set_uint16(kdict, "sport", id[NPF_SRC]);
  	prop_dictionary_set_uint16(kdict, "dport", id[NPF_DST]);

 +	prop_dictionary_set_uint32(kdict, "ifid", ifid);
 +
  	d = prop_data_create_data(&ips[NPF_SRC], alen);
  	prop_dictionary_set_and_rel(kdict, "saddr", d);

 @@ -1007,6 +1013,7 @@ npf_connkey_import(prop_dictionary_t kdict, npf_connkey_t *key)
  	prop_object_t sobj, dobj;
  	npf_addr_t const * ips[2];
  	uint16_t alen, proto, id[2];
 +	uint32_t ifid;

  	if (!prop_dictionary_get_uint16(kdict, "proto", &proto))
  		return 0;
 @@ -1017,6 +1024,9 @@ npf_connkey_import(prop_dictionary_t kdict, npf_connkey_t *key)
  	if (!prop_dictionary_get_uint16(kdict, "dport", &id[NPF_DST]))
  		return 0;

 +	if (!prop_dictionary_get_uint32(kdict, "ifid", &ifid))
 +		return 0;
 +
  	sobj = prop_dictionary_get(kdict, "saddr");
  	if ((ips[NPF_SRC] = prop_data_data_nocopy(sobj)) == NULL)
  		return 0;
 @@ -1029,7 +1039,7 @@ npf_connkey_import(prop_dictionary_t kdict, npf_connkey_t *key)
  	if (alen != prop_data_size(dobj))
  		return 0;

 -	return connkey_setkey(key, proto, ips, id, alen, true);
 +	return connkey_setkey(key, proto, ips, id, alen, true, ifid);
  }

  /*
 diff --git a/sys/net/npf/npf_conn.h b/sys/net/npf/npf_conn.h
 index debb27e22ee6..2023ec787ec4 100644
 --- a/sys/net/npf/npf_conn.h
 +++ b/sys/net/npf/npf_conn.h
 @@ -47,9 +47,9 @@ typedef struct npf_connkey npf_connkey_t;
  /*
   * See npf_conn_conkey() function for the key layout description.
   */
 -#define	NPF_CONN_NKEYWORDS	(2 + ((sizeof(npf_addr_t) * 2) >> 2))
 +#define	NPF_CONN_NKEYWORDS	(3 + ((sizeof(npf_addr_t) * 2) >> 2))
  #define	NPF_CONN_GETALEN(key)	((key)->ck_key[0] & 0xffff)
 -#define	NPF_CONN_KEYLEN(key)	(8 + (2 * NPF_CONN_GETALEN(key)))
 +#define	NPF_CONN_KEYLEN(key)	(12 + (2 * NPF_CONN_GETALEN(key)))

  struct npf_connkey {
  	/* Entry node and back-pointer to the actual connection. */

Responsible-Changed-From-To: kern-bug-people->rmind
Responsible-Changed-By: rmind@NetBSD.org
Responsible-Changed-When: Sun, 17 Feb 2019 01:05:06 +0000
Responsible-Changed-Why:
Take.


From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: fstd.lkml@gmail.com
Cc: gnats-bugs@netbsd.org, tech-net@netbsd.org
Subject: Re: kern/53962: npf: weird 'stateful' behavior
Date: Sun, 17 Feb 2019 01:13:13 +0000

 fstd.lkml@gmail.com wrote:
 > Ok what seems to be going on is twofold:
 > 
 > <...>

 > Speaking of stateful-ends, when the direction-check is commented out, a
 > single 'stateful-ends' ingress rule gives me exactly the good old ipf
 > "keep state" behavior (if the packet is accepted into the filter, it's
 > implicitly permitted out of the filter, on whatever interface).  So
 > that's a workaround I can live with for now, although of course I'm not
 > entirely sure of the purpose of this direction check, or the consequences
 > of removing it.
 > 
 > Any insights?

 Thanks for a thorough summary.  Your (1) and (2) observations are correct.
 Basically, there are two points here:

 - NPF connection state is generally per-interface, but see below.  Bypassing
 the ruleset on other interfaces can have security implications, e.g. a packet
 with a spoofed IP address might bypass ingress filtering.  Hence the design
 decision to default to such behaviour (so you control what's happening on
 other interfaces with a ruleset there).

 - There are two keys for a connection (so that the reverse lookup on the
 returning packets would succeed).  It is necessary to establish the packet
 direction (with the respect to connection direction) for the full TCP state
 tracking.

 The "stateful-ends" mechanism is for having a global state (which could be
 picked up on other interfaces).  I think it should be fixed to assume that
 the packets on interface different than where the state was created should
 match the reverse key (for the "backwards stream"), without checking that
 it has the opposite interface-level direction.

 I'll have a look at this.

 -- 
 Mindaugas

From: Timo Buhrmester <fstd.lkml@gmail.com>
To: Mindaugas Rasiukevicius <rmind@netbsd.org>
Cc: gnats-bugs@netbsd.org, tech-net@netbsd.org
Subject: Re: kern/53962: npf: weird 'stateful' behavior
Date: Sun, 17 Feb 2019 19:36:40 +0100

 > - NPF connection state is generally per-interface, but see below.  Bypassing
 > the ruleset on other interfaces can have security implications, e.g. a packet
 > with a spoofed IP address might bypass ingress filtering.  Hence the design
 > decision to default to such behaviour (so you control what's happening on
 > other interfaces with a ruleset there).
 I actually like the per-interface state for various reasons including the one
 you mentioned.  However it does come with the downside of rule multiplication.

 Since with my last patch (including ifid in connkey) I have something that
 works the way I intend and it's "in production" now, here's a bit of syntactic
 inspiration as to how the rule multiplication could be countered:

 Basically when writing my npf.conf I pretend 'egress <interface list>' is a
 valid construct so my rules look like this:

 | pass stateful in on wm1 egress pppoe0,wm2 final proto tcp from $foo to $bar

 and a perl script will generate from that:

 | pass stateful in on wm1 final proto tcp from $foo to $bar
 | pass stateful out on pppoe0 final proto tcp from $foo to $bar
 | pass stateful out on wm2 final proto tcp from $foo to $bar

 (and sort it in the right groups).  


State-Changed-From-To: open->feedback
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Thu, 08 Aug 2019 21:41:56 +0000
State-Changed-Why:
Does the "stateful-all" keyword (in -current/netbsd-9) satisfy your use case?


From: Timo Buhrmester <fstd.lkml@gmail.com>
To: gnats-bugs@netbsd.org
Cc: tech-net@netbsd.org, rmind@netbsd.org
Subject: Re: kern/53962 (npf: weird 'stateful' behavior)
Date: Sun, 5 Apr 2020 04:32:32 +0200

 Hi, sorry for the late reply, I haven't had time to upgrade to netbsd-9.

 > Does the "stateful-all" keyword (in -current/netbsd-9) satisfy your use case?
 The short answer is no, or rather I don't know; something with the NAT seems broken.


 I've built a test setup with yesterday's -current with three machines invovled:

 Machine "client" has one interface (eth0, 192.168.3.2/24).
 It will try to connect to 5.9.82.75:25/tcp via "npfbox"

 Machine "npfbox" has two interfaces (vr1, 192.168.3.1/24 and vr0, 192.168.1.200/24)
 It will perform NAT 192.168.3.0/24 to 192.168.1.200 and forward to 192.168.1.1

 Machine "gateway" (192.168.1.1) is my internet gateway.

 Here's the npf.conf on "npfbox"
 | alg "icmp"
 | 
 | map vr0 dynamic 192.168.3.0/24 -> 192.168.1.200
 | 
 | procedure "logb" { log: npflog0 } #blocked
 | procedure "logp" { log: npflog1 } #passed
 | 
 | group "lo0" on lo0 {
 |         pass in final all apply "logp"
 |         pass out final all apply "logp"
 | }
 | 
 | group "internalnet" on vr1 {
 |         pass stateful-all in final family inet4 proto tcp from 192.168.3.0/24 to 5.9.82.75 port 25 apply "logp"
 | }
 | 
 | group default {
 |         block in final all apply "logb"
 |         block out final all apply "logb"
 | }

 Here's what currently happens, tcpdumping on all of npfbox's interfaces:
 (note that npflog1 logs *passed* packets, also note no traffic at all on npflog0)

 vr1:     04:00:03.913162 IP 192.168.3.2.53200 > 5.9.82.75.25: Flags [S], seq 4038765496, win 64240, options [mss 1460,sackOK,TS val 1371479013 ecr 0,nop,wscale 7], length 0
 npflog1: 04:00:03.913232 IP 192.168.3.2.53200 > 5.9.82.75.25: Flags [S], seq 4038765496, win 64240, options [mss 1460,sackOK,TS val 1371479013 ecr 0,nop,wscale 7], length 0
 npflog1: 04:00:03.913323 IP 192.168.1.200.1046 > 5.9.82.75.25: Flags [S], seq 4038765496, win 64240, options [mss 1460,sackOK,TS val 1371479013 ecr 0,nop,wscale 7], length 0
 vr0:     04:00:03.913353 IP 192.168.1.200.1046 > 5.9.82.75.25: Flags [S], seq 4038765496, win 64240, options [mss 1460,sackOK,TS val 1371479013 ecr 0,nop,wscale 7], length 0
 vr0:     04:00:03.936635 IP 5.9.82.75.25 > 192.168.1.200.1046: Flags [S.], seq 698708591, ack 4038765497, win 65535, options [mss 1432,nop,wscale 4,sackOK,TS val 1 ecr 1371479013], length 0
 npflog1: 04:00:03.936683 IP 192.168.1.200.1046 > 192.168.1.200.1046: Flags [S.], seq 698708591, ack 4038765497, win 65535, options [mss 1432,nop,wscale 4,sackOK,TS val 1 ecr 1371479013], length 0
 npflog1: 04:00:03.936756 IP 192.168.1.200.1046 > 192.168.1.200.1046: Flags [R], seq 4038765497, win 0, length 0
 lo0:     04:00:03.936770 IP 192.168.1.200.1046 > 192.168.1.200.1046: Flags [R], seq 4038765497, win 0, length 0

 So it seems that the "de-NATting" on the reverse path is broken.
 I don't understand why the SYN/ACK doesn't show up on lo0, but I guess it doesn't matter much

 Am I doing something wrong?

 Timo

From: Timo Buhrmester <fstd.lkml@gmail.com>
To: gnats-bugs@netbsd.org
Cc: tech-net@netbsd.org, rmind@netbsd.org
Subject: Re: kern/53962 (npf: weird 'stateful' behavior)
Date: Tue, 14 Apr 2020 04:07:50 +0200

 > > Does the "stateful-all" keyword (in -current/netbsd-9) satisfy your use case?
 > The short answer is no, or rather I don't know; something with the NAT seems broken.

 After some digging it seems that npf ties packet direction (in/out) to
 stream direction (forwards/backwards), which naturally fails when
 multiple interfaces are involved.  Maybe I'm misunderstanding things,
 but it fits the fact that the wrong address is being rewritten
 (in the mentioned testcase, rewriting 5.9.82.75 > 192.168.1.200
 to 192.168.1.200 > 192.168.1.200 rather than to 5.9.82.75 > 192.168.3.2.

 Unrelatedly, I noticed that the order of groups in npf.conf matters.
 That is, if the "default" group is the first group in the file,
 the rules in the "default" group will apply to all packets regardless
 of more specific groups below.  This can be trivially worked around
 by putting the default group last, of course, but the documentation
 doesn't read as if this was intended behavior.

From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: Timo Buhrmester <fstd.lkml@gmail.com>
Cc: gnats-bugs@netbsd.org, tech-net@netbsd.org
Subject: Re: kern/53962 (npf: weird 'stateful' behavior)
Date: Fri, 15 May 2020 20:30:10 +0100

 Timo Buhrmester <fstd.lkml@gmail.com> wrote:
 > <...>
 > 
 > Here's the npf.conf on "npfbox"
 > | alg "icmp"
 > | 
 > | map vr0 dynamic 192.168.3.0/24 -> 192.168.1.200
 > | 
 > | procedure "logb" { log: npflog0 } #blocked
 > | procedure "logp" { log: npflog1 } #passed
 > | 
 > | group "lo0" on lo0 {
 > |         pass in final all apply "logp"
 > |         pass out final all apply "logp"
 > | }
 > | 
 > | group "internalnet" on vr1 {
 > |         pass stateful-all in final family inet4 proto tcp from
 > | 192.168.3.0/24 to 5.9.82.75 port 25 apply "logp" }
 > | 
 > | group default {
 > |         block in final all apply "logb"
 > |         block out final all apply "logb"
 > | }
 > 
 > Here's what currently happens, tcpdumping on all of npfbox's interfaces:
 > (note that npflog1 logs *passed* packets, also note no traffic at all on
 > npflog0)
 > 
 > <...>
 > 
 > So it seems that the "de-NATting" on the reverse path is broken.
 > I don't understand why the SYN/ACK doesn't show up on lo0, but I guess it
 > doesn't matter much

 Just a general update: I have various NPF fixes and improvements which
 will soon be merged to NetBSD.

 On the 'stateful-all' problem: while the state will be picked up on the
 other interface (vr0), the NAT policy will operate using the *initial*
 connection direction which was established on vr1 as inbound.  So, the
 NAT mechanism doesn't recognize the SYN-ACK packet as returning/reverse.
 Such behaviour is unhelpful and, instead, NPF should probably capture the
 connection direction at the point of the NAT entry creation and perform
 the translation based on that (rather than using the original connection
 direction at the point of state creation).

 There are more implications here.. I am going to add configuration-wide
 parameters to give user more flexibility on connection state behaviour.

 -- 
 Mindaugas

From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: gnats-bugs@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
 fstd.lkml@gmail.com
Cc: 
Subject: Re: kern/53962 (npf: weird 'stateful' behavior)
Date: Sun, 31 May 2020 18:20:03 +0100

 Mindaugas Rasiukevicius <rmind@netbsd.org> wrote:
 >
 > There are more implications here.. I am going to add configuration-wide
 > parameters to give user more flexibility on connection state behaviour.
 >

 The changes are committed.

 1. You can try your original stateful rules with strictly per-interface
 state (the default).

 2. Alternative, you can try 'stateful-all' with the following parameters:

     set state.key.interface 0
     set state.key.direction 0

 Note that if you mix it with dynamic NAT, like in your last example, the
 translation will happen on the interface where the NAT policy is applied.
 The state will then use a translated address, meaning that the state (for
 the reverse flow) will not be picked up on the initial interface, so you
 would still need a rule to pass it.

 We could add an option mark the packet to bypass the ruleset if the state
 was picked on some interface and the packet is forwarded.

 -- 
 Mindaugas

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.