NetBSD Problem Report #53199

From www@NetBSD.org  Fri Apr 20 08:44:31 2018
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E56747A1D3
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 20 Apr 2018 08:44:30 +0000 (UTC)
Message-Id: <20180420084429.D93127A220@mollari.NetBSD.org>
Date: Fri, 20 Apr 2018 08:44:29 +0000 (UTC)
From: prlw1@cam.ac.uk
Reply-To: prlw1@cam.ac.uk
To: gnats-bugs@NetBSD.org
Subject: stateful npf
X-Send-Pr-Version: www-1.0

>Number:         53199
>Category:       kern
>Synopsis:       stateful npf
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Apr 20 08:45:00 +0000 2018
>Closed-Date:    Sat Jun 06 15:38:59 +0000 2020
>Last-Modified:  Sat Jun 06 15:38:59 +0000 2020
>Originator:     Patrick Welche
>Release:        NetBSD-8.99.14/amd64
>Organization:
>Environment:
NetBSD-8.99.14/amd64
>Description:
First suspicion that stateful npf doesn't work as expected (if not sw-bug, then doc-bug):

http://mail-index.netbsd.org/netbsd-users/2018/03/28/msg020565.html

The more specific subsequent test (also related in the thread) is:
>How-To-Repeat:
ext iwn0: 10.168.5.65
int wm0:  192.168.2.62

Toy ipf setup works as expected: 

# cat /etc/ipnat.conf
map iwn0 192.168.2.0/24 -> 10.168.5.65 portmap tcp/udp 40000:6000 
map iwn0 192.168.2.0/24 -> 10.168.5.65
# cat /etc/ipf.conf
block in on wm0 all
pass in proto tcp from any to 10.168.5.4 port = 80 flags S/SA keep state


I hope this is the equivalent in npf:

# cat /etc/npf.conf
map iwn0 dynamic 192.168.2.0/24 -> 10.168.5.65

group "ext" on wm0 {
  block in all
  pass stateful in proto tcp flags S/SA from any to 10.168.5.4 port 80
}

group default {
  pass all 
}


test: plug NetBSD-running rpi into wm0 as 192.168.2.26 and grab web page
from another NetBSD/amd64 webserver, 10.168.5.4. Webpage arrives with ipf,
but not with npf.

>Fix:

>Release-Note:

>Audit-Trail:
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53199: stateful npf
Date: Fri, 4 May 2018 15:05:04 +0100

 --liOOAslEiF7prFVr
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 The attached rump based script works. Trying the same on a NetBSD-8.99.15/amd64
 webserver with two wm(4), and morden.no as the client, doesn't.

 Running out of ideas...

 --liOOAslEiF7prFVr
 Content-Type: application/x-sh
 Content-Disposition: attachment; filename="web.sh"
 Content-Transfer-Encoding: quoted-printable

 # =0A# webserver 192.168.0.1 ----- client 192.168.0.2=0A# =0A=0Asock_webser=
 ver=3Dunix:///tmp/sockwebserver=0Asock_client=3Dunix:///tmp/sockclient=0A=
 =0Awire1=3D/tmp/netbus1=0A=0A#inetserver=3D"rump_server -lrumpnet -lrumpnet=
 _net -lrumpnet_netinet -lrumpnet_shmif"=0Ainetserver=3Drump_allserver=0A=0A=
 ${inetserver} ${sock_webserver}=0Aexport RUMP_SERVER=3D${sock_webserver}=0A=
 rump.ifconfig shmif0 create=0Arump.ifconfig shmif0 linkstr $wire1=0Arump.if=
 config shmif0 inet 192.168.0.1 netmask 0xffffff00=0A=0Acat > /tmp/npf.conf =
 << EOF=0Aset bpf.jit off=0A=0Agroup "ext" on shmif0 {=0A  block in all=0A  =
 pass stateful in proto tcp flags S/SA from any to 192.168.0.1 port 80=0A}=
 =0A=0Agroup default {=0A  pass all =0A}=0AEOF=0A=0Aexport 'RUMPHIJACK=3Dpat=
 h=3D/rump,blanket=3D/dev/npf,socket=3Dall:nolocal,sysctl=3Dyes'=0Aenv LD_PR=
 ELOAD=3D/usr/lib/librumphijack.so \=0A  /sbin/npfctl validate /tmp/npf.conf=
 =0Aenv LD_PRELOAD=3D/usr/lib/librumphijack.so \=0A  /sbin/npfctl reload /tm=
 p/npf.conf=0Aenv LD_PRELOAD=3D/usr/lib/librumphijack.so \=0A  /sbin/npfctl =
 start=0A=0Acat > /tmp/webserver.html << EOF=0A<html>=0A<head>=0A<title>webs=
 erver</title>=0A</head>=0A<body>=0AHello from webserver!=0A</body>=0A</html=
 >=0AEOF=0A=0Aenv LD_PRELOAD=3D/usr/lib/librumphijack.so \=0A  /usr/libexec/=
 httpd -d -P httpd.pid -i 192.168.0.1 -f -b -s /tmp &=0A=0A${inetserver} ${s=
 ock_client}=0Aexport RUMP_SERVER=3D$sock_client=0Arump.ifconfig shmif0 crea=
 te=0Arump.ifconfig shmif0 linkstr $wire1=0Arump.ifconfig shmif0 inet 192.16=
 8.0.2 netmask 0xffffff00=0Aenv LD_PRELOAD=3D/usr/lib/librumphijack.so \=0A =
  /usr/bin/ftp -4 -n -d -o out 'http://192.168.0.1/webserver.html'=0A=0Akill=
  `cat httpd.pid`=0A=0Afor box in ${sock_webserver} ${sock_client}; do=0A	ex=
 port RUMP_SERVER=3D${box}=0A	rump.halt=0Adone=0A
 --liOOAslEiF7prFVr--

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53199: stateful npf
Date: Fri, 4 May 2018 15:11:50 +0100

 Also odd: morden long gave up with

 $ ftp -o out http://siskin.bpi.cam.ac.uk/webserver.html
 Requesting http://siskin.bpi.cam.ac.uk/webserver.html

 ftp: HTTP fetch timeout.

 but tcpdump running on siskin still sees repeated:

 15:10:17.461742 IP (tos 0x0, ttl 53, id 59642, offset 0, flags [none], proto TCP (6), length 177)
     199.233.217.201.58574 > 131.111.65.65.80: Flags [FP.], cksum 0x1b0f (correct), seq 0:125, ack 1, win 16402, options [nop,nop,TS val 767 ecr 91], length 125: HTTP, length: 125
         GET /webserver.html HTTP/1.1
         Host: siskin.bpi.cam.ac.uk
         Accept: */*
         Connection: close
         User-Agent: NetBSD-ftp/20150912

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53199: stateful npf
Date: Fri, 4 May 2018 15:13:39 +0100

 On Fri, May 04, 2018 at 02:10:01PM +0000, Patrick Welche wrote:
 >  The attached rump based script works. Trying the same on a NetBSD-8.99.15/amd64
 >  webserver with two wm(4), and morden.no as the client, doesn't.

 surprised that pkgsrc mutt munged the attachement like that... Here using :r


 # 
 # webserver 192.168.0.1 ----- client 192.168.0.2
 # 

 sock_webserver=unix:///tmp/sockwebserver
 sock_client=unix:///tmp/sockclient

 wire1=/tmp/netbus1

 #inetserver="rump_server -lrumpnet -lrumpnet_net -lrumpnet_netinet -lrumpnet_shmif"
 inetserver=rump_allserver

 ${inetserver} ${sock_webserver}
 export RUMP_SERVER=${sock_webserver}
 rump.ifconfig shmif0 create
 rump.ifconfig shmif0 linkstr $wire1
 rump.ifconfig shmif0 inet 192.168.0.1 netmask 0xffffff00

 cat > /tmp/npf.conf << EOF
 set bpf.jit off

 group "ext" on shmif0 {
   block in all
   pass stateful in proto tcp flags S/SA from any to 192.168.0.1 port 80
 }

 group default {
   pass all 
 }
 EOF

 export 'RUMPHIJACK=path=/rump,blanket=/dev/npf,socket=all:nolocal,sysctl=yes'
 env LD_PRELOAD=/usr/lib/librumphijack.so \
   /sbin/npfctl validate /tmp/npf.conf
 env LD_PRELOAD=/usr/lib/librumphijack.so \
   /sbin/npfctl reload /tmp/npf.conf
 env LD_PRELOAD=/usr/lib/librumphijack.so \
   /sbin/npfctl start

 cat > /tmp/webserver.html << EOF
 <html>
 <head>
 <title>webserver</title>
 </head>
 <body>
 Hello from webserver!
 </body>
 </html>
 EOF

 env LD_PRELOAD=/usr/lib/librumphijack.so \
   /usr/libexec/httpd -d -P httpd.pid -i 192.168.0.1 -f -b -s /tmp &

 ${inetserver} ${sock_client}
 export RUMP_SERVER=$sock_client
 rump.ifconfig shmif0 create
 rump.ifconfig shmif0 linkstr $wire1
 rump.ifconfig shmif0 inet 192.168.0.2 netmask 0xffffff00
 env LD_PRELOAD=/usr/lib/librumphijack.so \
   /usr/bin/ftp -4 -n -d -o out 'http://192.168.0.1/webserver.html'

 kill `cat httpd.pid`

 for box in ${sock_webserver} ${sock_client}; do
 	export RUMP_SERVER=${box}
 	rump.halt
 done

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53199: stateful npf
Date: Fri, 4 May 2018 16:32:07 +0100

 # npfctl show
 # filtering:    active
 # config:       loaded

 procedure "log"

 group "ext" on wm0 # id="1" 
         block in all apply "log" # id="2" 
         pass stateful in family inet4 proto tcp flags S/SA to 192.168.0.1 port 80 apply "log" # id="3" 

 group # id="4" 
         pass all apply "log" # id="5" 


 Experiment with

 NetBSD-8.99.14/evbarm rpi, usmsc0: 192.168.0.2 as ftp client

 NetBSD-8.99.15/amd64          wm0: 192.168.0.1 as webserver
                              iwn0: external interface

 as per rump script. This works, and the first few packets are:

 16:15:34.581155 rule 3.rules.0/0(match): pass in on ???: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
     192.168.0.2.65534 > 192.168.0.1.80: Flags [S], cksum 0x2ef0 (correct), seq 196676535, win 32768, options [mss 1460,nop,wscale 3,sackOK,TS val 1 ecr 0], length 0
 16:15:34.581176 rule 3.rules.0/0(match): pass out on ???: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->b968)!)
     192.168.0.1.80 > 192.168.0.2.65534: Flags [S.], cksum 0x50c6 (correct), seq 38525900, ack 196676536, win 32768, options [mss 1460,nop,wscale 3,sackOK,TS val 1 ecr 1], length 0
 16:15:34.581952 rule 3.rules.0/0(match): pass in on ???: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
     192.168.0.2.65534 > 192.168.0.1.80: Flags [.], cksum 0xef29 (correct), ack 1, win 4197, options [nop,nop,TS val 1 ecr 1], length 0


 # filtering:    active
 # config:       loaded

 procedure "log"

 group "ext" on wm1 # id="1" 
         block in all apply "log" # id="2" 
         pass stateful in family inet4 proto tcp flags S/SA to 131.111.65.65 port 80 apply "log" # id="3" 

 group # id="4" 
         pass all apply "log" # id="5" 

 Experiment with

 linux box as lynx client
 NetBSD-8.99.15/amd64  wm0: internal interface
                       wm1: 131.111.65.65

 fails, first few packets are

 15:43:20.478154 rule 3.rules.0/0(match): pass in on ???: (tos 0x0, ttl 62, id 17986, offset 0, flags [DF], proto TCP (6), length 60)
     131.111.62.210.60810 > 131.111.65.65.80: Flags [S], cksum 0x6b3f (correct), seq 198627856, win 29200, options [mss 1460,sackOK,TS val 2208994386 ecr 0,nop,wscale 7], length 0
 15:43:20.478166 rule 5.rules.0/0(match): pass out on ???: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->b3ca)!)
     131.111.65.65.80 > 131.111.62.210.60810: Flags [S.], cksum 0x1bee (correct), seq 1414262023, ack 198627857, win 32768, options [mss 1460,nop,wscale 3,sackOK,TS val 1 ecr 2208994386], length 0
 15:43:20.479687 rule 2.rules.0/0(match): block in on ???: (tos 0x0, ttl 62, id 17987, offset 0, flags [DF], proto TCP (6), length 52)
     131.111.62.210.60810 > 131.111.65.65.80: Flags [.], cksum 0xc9d0 (correct), ack 1, win 229, options [nop,nop,TS val 2208994387 ecr 1], length 0


 Note that now the 2nd packet goes out via the "pass all" rule, rather than the
 stateful rule.

 Why the difference in behaviour?

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53199: stateful npf
Date: Wed, 9 May 2018 10:46:58 +0100

 Updated working and broken to 8.99.16.
 Checked both run same byte code according to npfctl debug.
 Working still replies via the stateful rule. Broken still replies via the
 pass all rule.

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53199: stateful npf
Date: Wed, 9 May 2018 11:54:20 +0100

 Changed "broken" to

 # modstat | grep npf
 if_npflog               driver   builtin  -        0       - -
 npf                     misc     builtin  -        4       - bpf
 npf_alg_icmp            misc     builtin  -        0       - npf
 npf_ext_log             misc     builtin  -        0       - npf
 npf_ext_normalize       misc     builtin  -        0       - npf
 npf_ext_rndblock        misc     builtin  -        0       - npf

 and it still is broken (second packet should match rule 3):

 procedure "log"

 group "ext" on wm1 # id="1" 
         block in all apply "log" # id="2" 
         pass stateful in final family inet4 proto tcp flags S/SA to 131.111.65.65 port 80 apply "log" # id="3" 

 group # id="4" 
         pass all apply "log" # id="5" 

 11:50:42.035452 rule 3.rules.0/0(match): pass in on ???: (tos 0x0, ttl 62, id 56144, offset 0, flags [DF], proto TCP (6), length 60)
     131.111.62.210.44044 > 131.111.65.65.80: Flags [S], cksum 0x9ca2 (correct), seq 2696319833, win 29200, options [mss 1460,sackOK,TS val 2313504777 ecr 0,nop,wscale 7], length 0
 11:50:42.035469 rule 5.rules.0/0(match): pass out on ???: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->b3ca)!)

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53199: stateful npf
Date: Wed, 9 May 2018 10:55:15 +0100

 Difference spotted:

 Working:
 $ modstat | grep npf
 if_npflog               driver   builtin  -        0       - -
 npf                     misc     builtin  -        4       - bpf
 npf_alg_icmp            misc     builtin  -        0       - npf
 npf_ext_log             misc     builtin  -        0       - npf
 npf_ext_normalize       misc     builtin  -        0       - npf
 npf_ext_rndblock        misc     builtin  -        0       - npf

 Broken:
 # modstat | grep npf
 if_npflog               driver   filesys  a        0     516 -
 npf                     driver   filesys  a        1   40955 bpf
 npf_ext_log             misc     filesys  a        0     643 npf

 clutching straw...

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53199: stateful npf
Date: Wed, 9 May 2018 16:38:50 +0100

 [message disappeared into the ether - trying again]

 Made "broken" built-in too, and no change => nothing to do with modules.

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53199: stateful npf
Date: Wed, 9 May 2018 17:04:28 +0100

 npfctl list on "broken" correctly lists the connection.

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53199: stateful npf
Date: Thu, 10 May 2018 10:08:44 +0100

 Plugged a USB aue0 in. Swapped external from wm1 to aue0. It now works.

 Somehow, stateful rules don't work with

 wm1 at pci4 dev 5 function 0: Intel i82541GI 1000BASE-T Ethernet (rev. 0x05)
 wm1: interrupting at ioapic0 pin 17
 wm1: 32-bit 33MHz PCI bus
 wm1: 64 words (8 address bits) SPI EEPROM
 wm1: Ethernet address 00:15:17:21:7b:ca
 wm1: 0x220442<LOCK_EECD,SPI,IOH_VALID,ASF_FIRM,WOL>
 igphy0 at wm1 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0

 Trying with and without hardware checksumming didn't change anything.

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53199: stateful npf
Date: Thu, 10 May 2018 10:44:00 +0100

 On Thu, May 10, 2018 at 09:10:01AM +0000, Patrick Welche wrote:
 >  Plugged a USB aue0 in. Swapped external from wm1 to aue0. It now works.

 But I made a mistake and didn't change the "on wm1" on the rules => default
 all pass happened.

 Now it looks like a routing issue: "working" were all on a local network.
 I probably didn't think that through properly.

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53199: stateful npf
Date: Thu, 10 May 2018 11:31:44 +0100

 On Thu, May 10, 2018 at 09:45:00AM +0000, Patrick Welche wrote:
 >  Now it looks like a routing issue: "working" were all on a local network.
 >  I probably didn't think that through properly.

 Yup - looks like pf's route-to would do the trick.

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53199: stateful npf
Date: Fri, 11 May 2018 16:28:25 +0100

 On Thu, May 10, 2018 at 10:35:01AM +0000, Patrick Welche wrote:
 >  >  Now it looks like a routing issue: "working" were all on a local network.
 >  >  I probably didn't think that through properly.

 I have the default route pointing to the internal interface. With npf,
 the webserver's reply gets the default route applied to it, so doesn't
 go through the external interface's rule, which contains the keep state
 rule.

 I just checked with ipf, and the reply from the webserver DOES go out
 of the external interface despite the default route pointing to the
 internal interface, so everything works as expected.

 Is this diffence in behaviour intended?

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53199: stateful npf
Date: Fri, 11 May 2018 16:30:46 +0100

 For reference, the ipf rules were just

 block in all
 pass in on wm1 proto tcp from any to wm1/network port = 80 \
    flags S keep state
 pass in on lo0 all
 pass out on lo0 all
 pass in on wm0 all
 pass out on wm0 all

State-Changed-From-To: open->feedback
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Thu, 17 Jan 2019 01:10:31 +0000
State-Changed-Why:
Feedback:

- There is no problem description.  It is unclear what are you trying to
  achieve.  The NAT rule is for one interface, but the stateful is for
  another.  No description of interfaces, routing and the desired setup.
- Further in the emails you mention an unusual routing setup and that it
  might be a routing problem.  Also, pf's "route-to" functionality.

Is this a feature request?  Please provide a clear description of the
problem.  If the synopsis is no longer right, then perhaps just open a
new GNATS ticket.


State-Changed-From-To: feedback->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Sat, 02 Feb 2019 18:50:23 +0000
State-Changed-Why:
Feedback timeout.


State-Changed-From-To: closed->feedback
State-Changed-By: prlw1@NetBSD.org
State-Changed-When: Sat, 02 Feb 2019 21:23:22 +0000
State-Changed-Why:
First response to PR is 9 months after the PR is opened. On that 
sort of timescale,  feedback timeout cannot possibilly be after
13 days.


From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53199 (stateful npf)
Date: Thu, 14 Feb 2019 15:20:23 +0000

 I am surprised that this PR is unclear given the level of detail. 
 I even submitted rump scripts for reproduction which admittedly 
 gnats munged.

 I hope this is a more simple understandable explanation:

 I have a computer with 2 network interfaces, wm0 as "internal" and
 wm1 as "external". The default route points to a router connected to
 "internal". There is a web server listening on port 80 of "external".

 The system is running ipf with the following configuration file:

 block in all
 pass in on wm1 proto tcp from any to wm1/network port = 80 \
    flags S keep state
 pass in on lo0 all
 pass out on lo0 all
 pass in on wm0 all
 pass out on wm0 all

 It works for users logged in on the box, and it successfully hands
 out webpages to anyone who cares to retrieve one.

 If it is obvious to you how to achieve this with npf, please update
 the documentation so that it is obvious to others. If it is not
 currently possible to do this with npf, please consider this a
 change request and reconsider the removal of ipf.

 It seems someone else is suffering the same pain in PR kern/53962.  

State-Changed-From-To: feedback->open
State-Changed-By: prlw1@NetBSD.org
State-Changed-When: Wed, 20 Feb 2019 10:18:28 +0000
State-Changed-Why:
Feedback given


State-Changed-From-To: open->feedback
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Thu, 08 Aug 2019 21:41:04 +0000
State-Changed-Why:
Does the "stateful-all" keyword (in -current/netbsd-9) satisfy your use case?


State-Changed-From-To: feedback->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Sat, 06 Jun 2020 15:38:59 +0000
State-Changed-Why:
I think we can close this ticket:

- The latest code changes and documentation improvements in -current should
  generally address the state issues.

- This PR is generally a duplicate of PR/53962 (which has more clear problem
  description) and I will keep the latter open for a little bit longer.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.