NetBSD Problem Report #53199
From www@NetBSD.org Fri Apr 20 08:44:31 2018
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id E56747A1D3
for <gnats-bugs@gnats.NetBSD.org>; Fri, 20 Apr 2018 08:44:30 +0000 (UTC)
Message-Id: <20180420084429.D93127A220@mollari.NetBSD.org>
Date: Fri, 20 Apr 2018 08:44:29 +0000 (UTC)
From: prlw1@cam.ac.uk
Reply-To: prlw1@cam.ac.uk
To: gnats-bugs@NetBSD.org
Subject: stateful npf
X-Send-Pr-Version: www-1.0
>Number: 53199
>Category: kern
>Synopsis: stateful npf
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Apr 20 08:45:00 +0000 2018
>Closed-Date: Sat Jun 06 15:38:59 +0000 2020
>Last-Modified: Sat Jun 06 15:38:59 +0000 2020
>Originator: Patrick Welche
>Release: NetBSD-8.99.14/amd64
>Organization:
>Environment:
NetBSD-8.99.14/amd64
>Description:
First suspicion that stateful npf doesn't work as expected (if not sw-bug, then doc-bug):
http://mail-index.netbsd.org/netbsd-users/2018/03/28/msg020565.html
The more specific subsequent test (also related in the thread) is:
>How-To-Repeat:
ext iwn0: 10.168.5.65
int wm0: 192.168.2.62
Toy ipf setup works as expected:
# cat /etc/ipnat.conf
map iwn0 192.168.2.0/24 -> 10.168.5.65 portmap tcp/udp 40000:6000
map iwn0 192.168.2.0/24 -> 10.168.5.65
# cat /etc/ipf.conf
block in on wm0 all
pass in proto tcp from any to 10.168.5.4 port = 80 flags S/SA keep state
I hope this is the equivalent in npf:
# cat /etc/npf.conf
map iwn0 dynamic 192.168.2.0/24 -> 10.168.5.65
group "ext" on wm0 {
block in all
pass stateful in proto tcp flags S/SA from any to 10.168.5.4 port 80
}
group default {
pass all
}
test: plug NetBSD-running rpi into wm0 as 192.168.2.26 and grab web page
from another NetBSD/amd64 webserver, 10.168.5.4. Webpage arrives with ipf,
but not with npf.
>Fix:
>Release-Note:
>Audit-Trail:
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53199: stateful npf
Date: Fri, 4 May 2018 15:05:04 +0100
--liOOAslEiF7prFVr
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
The attached rump based script works. Trying the same on a NetBSD-8.99.15/amd64
webserver with two wm(4), and morden.no as the client, doesn't.
Running out of ideas...
--liOOAslEiF7prFVr
Content-Type: application/x-sh
Content-Disposition: attachment; filename="web.sh"
Content-Transfer-Encoding: quoted-printable
# =0A# webserver 192.168.0.1 ----- client 192.168.0.2=0A# =0A=0Asock_webser=
ver=3Dunix:///tmp/sockwebserver=0Asock_client=3Dunix:///tmp/sockclient=0A=
=0Awire1=3D/tmp/netbus1=0A=0A#inetserver=3D"rump_server -lrumpnet -lrumpnet=
_net -lrumpnet_netinet -lrumpnet_shmif"=0Ainetserver=3Drump_allserver=0A=0A=
${inetserver} ${sock_webserver}=0Aexport RUMP_SERVER=3D${sock_webserver}=0A=
rump.ifconfig shmif0 create=0Arump.ifconfig shmif0 linkstr $wire1=0Arump.if=
config shmif0 inet 192.168.0.1 netmask 0xffffff00=0A=0Acat > /tmp/npf.conf =
<< EOF=0Aset bpf.jit off=0A=0Agroup "ext" on shmif0 {=0A block in all=0A =
pass stateful in proto tcp flags S/SA from any to 192.168.0.1 port 80=0A}=
=0A=0Agroup default {=0A pass all =0A}=0AEOF=0A=0Aexport 'RUMPHIJACK=3Dpat=
h=3D/rump,blanket=3D/dev/npf,socket=3Dall:nolocal,sysctl=3Dyes'=0Aenv LD_PR=
ELOAD=3D/usr/lib/librumphijack.so \=0A /sbin/npfctl validate /tmp/npf.conf=
=0Aenv LD_PRELOAD=3D/usr/lib/librumphijack.so \=0A /sbin/npfctl reload /tm=
p/npf.conf=0Aenv LD_PRELOAD=3D/usr/lib/librumphijack.so \=0A /sbin/npfctl =
start=0A=0Acat > /tmp/webserver.html << EOF=0A<html>=0A<head>=0A<title>webs=
erver</title>=0A</head>=0A<body>=0AHello from webserver!=0A</body>=0A</html=
>=0AEOF=0A=0Aenv LD_PRELOAD=3D/usr/lib/librumphijack.so \=0A /usr/libexec/=
httpd -d -P httpd.pid -i 192.168.0.1 -f -b -s /tmp &=0A=0A${inetserver} ${s=
ock_client}=0Aexport RUMP_SERVER=3D$sock_client=0Arump.ifconfig shmif0 crea=
te=0Arump.ifconfig shmif0 linkstr $wire1=0Arump.ifconfig shmif0 inet 192.16=
8.0.2 netmask 0xffffff00=0Aenv LD_PRELOAD=3D/usr/lib/librumphijack.so \=0A =
/usr/bin/ftp -4 -n -d -o out 'http://192.168.0.1/webserver.html'=0A=0Akill=
`cat httpd.pid`=0A=0Afor box in ${sock_webserver} ${sock_client}; do=0A ex=
port RUMP_SERVER=3D${box}=0A rump.halt=0Adone=0A
--liOOAslEiF7prFVr--
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53199: stateful npf
Date: Fri, 4 May 2018 15:11:50 +0100
Also odd: morden long gave up with
$ ftp -o out http://siskin.bpi.cam.ac.uk/webserver.html
Requesting http://siskin.bpi.cam.ac.uk/webserver.html
ftp: HTTP fetch timeout.
but tcpdump running on siskin still sees repeated:
15:10:17.461742 IP (tos 0x0, ttl 53, id 59642, offset 0, flags [none], proto TCP (6), length 177)
199.233.217.201.58574 > 131.111.65.65.80: Flags [FP.], cksum 0x1b0f (correct), seq 0:125, ack 1, win 16402, options [nop,nop,TS val 767 ecr 91], length 125: HTTP, length: 125
GET /webserver.html HTTP/1.1
Host: siskin.bpi.cam.ac.uk
Accept: */*
Connection: close
User-Agent: NetBSD-ftp/20150912
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53199: stateful npf
Date: Fri, 4 May 2018 15:13:39 +0100
On Fri, May 04, 2018 at 02:10:01PM +0000, Patrick Welche wrote:
> The attached rump based script works. Trying the same on a NetBSD-8.99.15/amd64
> webserver with two wm(4), and morden.no as the client, doesn't.
surprised that pkgsrc mutt munged the attachement like that... Here using :r
#
# webserver 192.168.0.1 ----- client 192.168.0.2
#
sock_webserver=unix:///tmp/sockwebserver
sock_client=unix:///tmp/sockclient
wire1=/tmp/netbus1
#inetserver="rump_server -lrumpnet -lrumpnet_net -lrumpnet_netinet -lrumpnet_shmif"
inetserver=rump_allserver
${inetserver} ${sock_webserver}
export RUMP_SERVER=${sock_webserver}
rump.ifconfig shmif0 create
rump.ifconfig shmif0 linkstr $wire1
rump.ifconfig shmif0 inet 192.168.0.1 netmask 0xffffff00
cat > /tmp/npf.conf << EOF
set bpf.jit off
group "ext" on shmif0 {
block in all
pass stateful in proto tcp flags S/SA from any to 192.168.0.1 port 80
}
group default {
pass all
}
EOF
export 'RUMPHIJACK=path=/rump,blanket=/dev/npf,socket=all:nolocal,sysctl=yes'
env LD_PRELOAD=/usr/lib/librumphijack.so \
/sbin/npfctl validate /tmp/npf.conf
env LD_PRELOAD=/usr/lib/librumphijack.so \
/sbin/npfctl reload /tmp/npf.conf
env LD_PRELOAD=/usr/lib/librumphijack.so \
/sbin/npfctl start
cat > /tmp/webserver.html << EOF
<html>
<head>
<title>webserver</title>
</head>
<body>
Hello from webserver!
</body>
</html>
EOF
env LD_PRELOAD=/usr/lib/librumphijack.so \
/usr/libexec/httpd -d -P httpd.pid -i 192.168.0.1 -f -b -s /tmp &
${inetserver} ${sock_client}
export RUMP_SERVER=$sock_client
rump.ifconfig shmif0 create
rump.ifconfig shmif0 linkstr $wire1
rump.ifconfig shmif0 inet 192.168.0.2 netmask 0xffffff00
env LD_PRELOAD=/usr/lib/librumphijack.so \
/usr/bin/ftp -4 -n -d -o out 'http://192.168.0.1/webserver.html'
kill `cat httpd.pid`
for box in ${sock_webserver} ${sock_client}; do
export RUMP_SERVER=${box}
rump.halt
done
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53199: stateful npf
Date: Fri, 4 May 2018 16:32:07 +0100
# npfctl show
# filtering: active
# config: loaded
procedure "log"
group "ext" on wm0 # id="1"
block in all apply "log" # id="2"
pass stateful in family inet4 proto tcp flags S/SA to 192.168.0.1 port 80 apply "log" # id="3"
group # id="4"
pass all apply "log" # id="5"
Experiment with
NetBSD-8.99.14/evbarm rpi, usmsc0: 192.168.0.2 as ftp client
NetBSD-8.99.15/amd64 wm0: 192.168.0.1 as webserver
iwn0: external interface
as per rump script. This works, and the first few packets are:
16:15:34.581155 rule 3.rules.0/0(match): pass in on ???: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
192.168.0.2.65534 > 192.168.0.1.80: Flags [S], cksum 0x2ef0 (correct), seq 196676535, win 32768, options [mss 1460,nop,wscale 3,sackOK,TS val 1 ecr 0], length 0
16:15:34.581176 rule 3.rules.0/0(match): pass out on ???: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->b968)!)
192.168.0.1.80 > 192.168.0.2.65534: Flags [S.], cksum 0x50c6 (correct), seq 38525900, ack 196676536, win 32768, options [mss 1460,nop,wscale 3,sackOK,TS val 1 ecr 1], length 0
16:15:34.581952 rule 3.rules.0/0(match): pass in on ???: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
192.168.0.2.65534 > 192.168.0.1.80: Flags [.], cksum 0xef29 (correct), ack 1, win 4197, options [nop,nop,TS val 1 ecr 1], length 0
# filtering: active
# config: loaded
procedure "log"
group "ext" on wm1 # id="1"
block in all apply "log" # id="2"
pass stateful in family inet4 proto tcp flags S/SA to 131.111.65.65 port 80 apply "log" # id="3"
group # id="4"
pass all apply "log" # id="5"
Experiment with
linux box as lynx client
NetBSD-8.99.15/amd64 wm0: internal interface
wm1: 131.111.65.65
fails, first few packets are
15:43:20.478154 rule 3.rules.0/0(match): pass in on ???: (tos 0x0, ttl 62, id 17986, offset 0, flags [DF], proto TCP (6), length 60)
131.111.62.210.60810 > 131.111.65.65.80: Flags [S], cksum 0x6b3f (correct), seq 198627856, win 29200, options [mss 1460,sackOK,TS val 2208994386 ecr 0,nop,wscale 7], length 0
15:43:20.478166 rule 5.rules.0/0(match): pass out on ???: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->b3ca)!)
131.111.65.65.80 > 131.111.62.210.60810: Flags [S.], cksum 0x1bee (correct), seq 1414262023, ack 198627857, win 32768, options [mss 1460,nop,wscale 3,sackOK,TS val 1 ecr 2208994386], length 0
15:43:20.479687 rule 2.rules.0/0(match): block in on ???: (tos 0x0, ttl 62, id 17987, offset 0, flags [DF], proto TCP (6), length 52)
131.111.62.210.60810 > 131.111.65.65.80: Flags [.], cksum 0xc9d0 (correct), ack 1, win 229, options [nop,nop,TS val 2208994387 ecr 1], length 0
Note that now the 2nd packet goes out via the "pass all" rule, rather than the
stateful rule.
Why the difference in behaviour?
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53199: stateful npf
Date: Wed, 9 May 2018 10:46:58 +0100
Updated working and broken to 8.99.16.
Checked both run same byte code according to npfctl debug.
Working still replies via the stateful rule. Broken still replies via the
pass all rule.
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53199: stateful npf
Date: Wed, 9 May 2018 11:54:20 +0100
Changed "broken" to
# modstat | grep npf
if_npflog driver builtin - 0 - -
npf misc builtin - 4 - bpf
npf_alg_icmp misc builtin - 0 - npf
npf_ext_log misc builtin - 0 - npf
npf_ext_normalize misc builtin - 0 - npf
npf_ext_rndblock misc builtin - 0 - npf
and it still is broken (second packet should match rule 3):
procedure "log"
group "ext" on wm1 # id="1"
block in all apply "log" # id="2"
pass stateful in final family inet4 proto tcp flags S/SA to 131.111.65.65 port 80 apply "log" # id="3"
group # id="4"
pass all apply "log" # id="5"
11:50:42.035452 rule 3.rules.0/0(match): pass in on ???: (tos 0x0, ttl 62, id 56144, offset 0, flags [DF], proto TCP (6), length 60)
131.111.62.210.44044 > 131.111.65.65.80: Flags [S], cksum 0x9ca2 (correct), seq 2696319833, win 29200, options [mss 1460,sackOK,TS val 2313504777 ecr 0,nop,wscale 7], length 0
11:50:42.035469 rule 5.rules.0/0(match): pass out on ???: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->b3ca)!)
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53199: stateful npf
Date: Wed, 9 May 2018 10:55:15 +0100
Difference spotted:
Working:
$ modstat | grep npf
if_npflog driver builtin - 0 - -
npf misc builtin - 4 - bpf
npf_alg_icmp misc builtin - 0 - npf
npf_ext_log misc builtin - 0 - npf
npf_ext_normalize misc builtin - 0 - npf
npf_ext_rndblock misc builtin - 0 - npf
Broken:
# modstat | grep npf
if_npflog driver filesys a 0 516 -
npf driver filesys a 1 40955 bpf
npf_ext_log misc filesys a 0 643 npf
clutching straw...
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53199: stateful npf
Date: Wed, 9 May 2018 16:38:50 +0100
[message disappeared into the ether - trying again]
Made "broken" built-in too, and no change => nothing to do with modules.
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53199: stateful npf
Date: Wed, 9 May 2018 17:04:28 +0100
npfctl list on "broken" correctly lists the connection.
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53199: stateful npf
Date: Thu, 10 May 2018 10:08:44 +0100
Plugged a USB aue0 in. Swapped external from wm1 to aue0. It now works.
Somehow, stateful rules don't work with
wm1 at pci4 dev 5 function 0: Intel i82541GI 1000BASE-T Ethernet (rev. 0x05)
wm1: interrupting at ioapic0 pin 17
wm1: 32-bit 33MHz PCI bus
wm1: 64 words (8 address bits) SPI EEPROM
wm1: Ethernet address 00:15:17:21:7b:ca
wm1: 0x220442<LOCK_EECD,SPI,IOH_VALID,ASF_FIRM,WOL>
igphy0 at wm1 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
Trying with and without hardware checksumming didn't change anything.
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53199: stateful npf
Date: Thu, 10 May 2018 10:44:00 +0100
On Thu, May 10, 2018 at 09:10:01AM +0000, Patrick Welche wrote:
> Plugged a USB aue0 in. Swapped external from wm1 to aue0. It now works.
But I made a mistake and didn't change the "on wm1" on the rules => default
all pass happened.
Now it looks like a routing issue: "working" were all on a local network.
I probably didn't think that through properly.
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53199: stateful npf
Date: Thu, 10 May 2018 11:31:44 +0100
On Thu, May 10, 2018 at 09:45:00AM +0000, Patrick Welche wrote:
> Now it looks like a routing issue: "working" were all on a local network.
> I probably didn't think that through properly.
Yup - looks like pf's route-to would do the trick.
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53199: stateful npf
Date: Fri, 11 May 2018 16:28:25 +0100
On Thu, May 10, 2018 at 10:35:01AM +0000, Patrick Welche wrote:
> > Now it looks like a routing issue: "working" were all on a local network.
> > I probably didn't think that through properly.
I have the default route pointing to the internal interface. With npf,
the webserver's reply gets the default route applied to it, so doesn't
go through the external interface's rule, which contains the keep state
rule.
I just checked with ipf, and the reply from the webserver DOES go out
of the external interface despite the default route pointing to the
internal interface, so everything works as expected.
Is this diffence in behaviour intended?
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53199: stateful npf
Date: Fri, 11 May 2018 16:30:46 +0100
For reference, the ipf rules were just
block in all
pass in on wm1 proto tcp from any to wm1/network port = 80 \
flags S keep state
pass in on lo0 all
pass out on lo0 all
pass in on wm0 all
pass out on wm0 all
State-Changed-From-To: open->feedback
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Thu, 17 Jan 2019 01:10:31 +0000
State-Changed-Why:
Feedback:
- There is no problem description. It is unclear what are you trying to
achieve. The NAT rule is for one interface, but the stateful is for
another. No description of interfaces, routing and the desired setup.
- Further in the emails you mention an unusual routing setup and that it
might be a routing problem. Also, pf's "route-to" functionality.
Is this a feature request? Please provide a clear description of the
problem. If the synopsis is no longer right, then perhaps just open a
new GNATS ticket.
State-Changed-From-To: feedback->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Sat, 02 Feb 2019 18:50:23 +0000
State-Changed-Why:
Feedback timeout.
State-Changed-From-To: closed->feedback
State-Changed-By: prlw1@NetBSD.org
State-Changed-When: Sat, 02 Feb 2019 21:23:22 +0000
State-Changed-Why:
First response to PR is 9 months after the PR is opened. On that
sort of timescale, feedback timeout cannot possibilly be after
13 days.
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53199 (stateful npf)
Date: Thu, 14 Feb 2019 15:20:23 +0000
I am surprised that this PR is unclear given the level of detail.
I even submitted rump scripts for reproduction which admittedly
gnats munged.
I hope this is a more simple understandable explanation:
I have a computer with 2 network interfaces, wm0 as "internal" and
wm1 as "external". The default route points to a router connected to
"internal". There is a web server listening on port 80 of "external".
The system is running ipf with the following configuration file:
block in all
pass in on wm1 proto tcp from any to wm1/network port = 80 \
flags S keep state
pass in on lo0 all
pass out on lo0 all
pass in on wm0 all
pass out on wm0 all
It works for users logged in on the box, and it successfully hands
out webpages to anyone who cares to retrieve one.
If it is obvious to you how to achieve this with npf, please update
the documentation so that it is obvious to others. If it is not
currently possible to do this with npf, please consider this a
change request and reconsider the removal of ipf.
It seems someone else is suffering the same pain in PR kern/53962.
State-Changed-From-To: feedback->open
State-Changed-By: prlw1@NetBSD.org
State-Changed-When: Wed, 20 Feb 2019 10:18:28 +0000
State-Changed-Why:
Feedback given
State-Changed-From-To: open->feedback
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Thu, 08 Aug 2019 21:41:04 +0000
State-Changed-Why:
Does the "stateful-all" keyword (in -current/netbsd-9) satisfy your use case?
State-Changed-From-To: feedback->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Sat, 06 Jun 2020 15:38:59 +0000
State-Changed-Why:
I think we can close this ticket:
- The latest code changes and documentation improvements in -current should
generally address the state issues.
- This PR is generally a duplicate of PR/53962 (which has more clear problem
description) and I will keep the latter open for a little bit longer.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.