NetBSD Problem Report #52643
From www@NetBSD.org Mon Oct 23 07:40:36 2017
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 9F2E47A1CE
for <gnats-bugs@gnats.NetBSD.org>; Mon, 23 Oct 2017 07:40:36 +0000 (UTC)
Message-Id: <20171023074034.6B6177A1CE@mollari.NetBSD.org>
Date: Mon, 23 Oct 2017 07:40:34 +0000 (UTC)
From: ozaki-r@netbsd.org
Reply-To: ozaki-r@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: npfctl reload on rump kernels fails randomly
X-Send-Pr-Version: www-1.0
>Number: 52643
>Category: kern
>Synopsis: npfctl reload on rump kernels fails randomly
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Oct 23 07:45:00 +0000 2017
>Closed-Date: Fri Aug 31 14:40:45 +0000 2018
>Last-Modified: Fri Aug 31 14:40:45 +0000 2018
>Originator: Ryota Ozaki
>Release: -current (and maybe earlier releases)
>Organization:
>Environment:
NetBSD rangeley 8.99.3 NetBSD 8.99.3 (RANGELEY) #77: Wed Oct 4 10:23:38 JST 2017 ozaki-r@rangeley:(hidden) amd64
>Description:
npfctl reload fails sometimes on rump kernels with EIO like this:
tc-so:Executing command [ env LD_PRELOAD=/usr/lib/librumphijack.so RUMPHIJACK=path=/rump,socket=all:nolocal,sysctl=yes,blanket=/dev/npf npfctl reload ./npf.conf ]
tc-se:Fail: incorrect exit status: 1, expected: 0
tc-se:stdout:
tc-se:
tc-se:stderr:
tc-se:npfctl: source line 333501697
tc-se:npfctl: object: 535
tc-se:npfctl: npfctl_config_send: Input/output error
tc-se:
The error message is generated by libnpf that uses libprop.
npfctl uses npf_config_submit of libnpf to call ioctl to the kernel.
libnpf uses prop_dictionary_sendrecv_ioctl to call ioctl and
get an errno of the ioctl by prop_dictionary_get_int32.
On npfctl reload on rump kernels, prop_dictionary_sendrecv_ioctl succeeds
but prop_dictionary_get_int32 fails because errno isn't set correctly.
It seems that npfctl_load that is a kernel function of npf and called
via the prop_dictionary_sendrecv_ioctl doesn't set the errno to the return
value of libprop: https://nxr.netbsd.org/xref/src/sys/net/npf/npf_ctl.c#635
/* Error report. */
#if !defined(_NPF_TESTING) && !defined(_NPF_STANDALONE)
prop_dictionary_set_int32(errdict, "errno", error);
prop_dictionary_copyout_ioctl(pref, cmd, errdict);
prop_object_release(errdict);
error = 0;
#endif
return error;
The code setting an errno is trimmed on rump kernels since by default
rump kernels set _NPF_TESTING. If we enable the code, npfctl reload
works expectedly.
Note that npfctl reload sometimes succeeds, I guess, because npf_config_submit
tries to read an errno from a uninitialized memory and it sometimes works luckily.
>How-To-Repeat:
# From an ATF test script that is under development
rump_server -lrumpnet -lrumpnet_net -lrumpnet_netinet -lrumpnet_shmif -lrumpdev -lrumpvfs -lrumpdev_bpf -lrumpnet_bpfjit -lrumpnet_npf -lrumpnet_netinet6 unix://npf_nat_local
export RUMP_SERVER=unix://npf_nat_local
rump.ifconfig shmif0 create
rump.ifconfig shmif0 linkstr ./bus_npf_local
rump.ifconfig shmif1 create
rump.ifconfig shmif1 linkstr ./bus_npf_nat
rump.sysctl -q -w net.inet.ip.dad_count=0
rump.ifconfig shmif0 10.0.1.1/24
rump.ifconfig shmif1 20.0.0.1/24
rump.sysctl -q -w net.inet.ip.forwarding=1
rump.route -n add -net 10.0.2.0 20.0.0.2
cat > ./npf.conf <<-EOF
set bpf.jit off
\$int_if = inet4(shmif0)
\$ext_if = inet4(shmif1)
\$localnet = { 10.0.1.0/24 }
map \$ext_if dynamic \$localnet -> \$ext_if
group "external" on \$ext_if {
pass stateful out final all
}
group "internal" on \$int_if {
block in all
pass in final from \$localnet
pass out final all
}
group default {
pass final on lo0 all
block all
}
EOF
env LD_PRELOAD=/usr/lib/librumphijack.so RUMPHIJACK=path=/rump,socket=all:nolocal,sysctl=yes,blanket=/dev/npf npfctl reload ./npf.conf
>Fix:
As described above, enabling the code fixes the issue, but I'm not sure
if the fix is appropriate.
diff --git a/sys/net/npf/npf_ctl.c b/sys/net/npf/npf_ctl.c
index b1af2d347ac..43105db9c85 100644
--- a/sys/net/npf/npf_ctl.c
+++ b/sys/net/npf/npf_ctl.c
@@ -633,12 +633,12 @@ fail:
prop_object_release(npf_dict);
/* Error report. */
-#if !defined(_NPF_TESTING) && !defined(_NPF_STANDALONE)
+//#if !defined(_NPF_TESTING) && !defined(_NPF_STANDALONE)
prop_dictionary_set_int32(errdict, "errno", error);
prop_dictionary_copyout_ioctl(pref, cmd, errdict);
prop_object_release(errdict);
error = 0;
-#endif
+//#endif
return error;
}
>Release-Note:
>Audit-Trail:
From: Ryota Ozaki <ozaki-r@netbsd.org>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/52643: npfctl reload on rump kernels fails randomly
Date: Fri, 27 Oct 2017 12:41:33 +0900
On Mon, Oct 23, 2017 at 4:45 PM, <ozaki-r@netbsd.org> wrote:
>>Number: 52643
>>Category: kern
>>Synopsis: npfctl reload on rump kernels fails randomly
>>Confidential: no
>>Severity: serious
>>Priority: medium
>>Responsible: kern-bug-people
>>State: open
>>Class: sw-bug
>>Submitter-Id: net
>>Arrival-Date: Mon Oct 23 07:45:00 +0000 2017
>>Originator: Ryota Ozaki
>>Release: -current (and maybe earlier releases)
>>Organization:
>>Environment:
> NetBSD rangeley 8.99.3 NetBSD 8.99.3 (RANGELEY) #77: Wed Oct 4 10:23:38 JST 2017 ozaki-r@rangeley:(hidden) amd64
>>Description:
> npfctl reload fails sometimes on rump kernels with EIO like this:
>
> tc-so:Executing command [ env LD_PRELOAD=/usr/lib/librumphijack.so RUMPHIJACK=path=/rump,socket=all:nolocal,sysctl=yes,blanket=/dev/npf npfctl reload ./npf.conf ]
> tc-se:Fail: incorrect exit status: 1, expected: 0
> tc-se:stdout:
> tc-se:
> tc-se:stderr:
> tc-se:npfctl: source line 333501697
> tc-se:npfctl: object: 535
> tc-se:npfctl: npfctl_config_send: Input/output error
> tc-se:
>
> The error message is generated by libnpf that uses libprop.
>
> npfctl uses npf_config_submit of libnpf to call ioctl to the kernel.
> libnpf uses prop_dictionary_sendrecv_ioctl to call ioctl and
> get an errno of the ioctl by prop_dictionary_get_int32.
> On npfctl reload on rump kernels, prop_dictionary_sendrecv_ioctl succeeds
> but prop_dictionary_get_int32 fails because errno isn't set correctly.
>
> It seems that npfctl_load that is a kernel function of npf and called
> via the prop_dictionary_sendrecv_ioctl doesn't set the errno to the return
> value of libprop: https://nxr.netbsd.org/xref/src/sys/net/npf/npf_ctl.c#635
>
> /* Error report. */
> #if !defined(_NPF_TESTING) && !defined(_NPF_STANDALONE)
> prop_dictionary_set_int32(errdict, "errno", error);
> prop_dictionary_copyout_ioctl(pref, cmd, errdict);
> prop_object_release(errdict);
> error = 0;
> #endif
> return error;
>
> The code setting an errno is trimmed on rump kernels since by default
> rump kernels set _NPF_TESTING. If we enable the code, npfctl reload
> works expectedly.
>
> Note that npfctl reload sometimes succeeds, I guess, because npf_config_submit
> tries to read an errno from a uninitialized memory and it sometimes works luckily.
>>How-To-Repeat:
> # From an ATF test script that is under development
>
> rump_server -lrumpnet -lrumpnet_net -lrumpnet_netinet -lrumpnet_shmif -lrumpdev -lrumpvfs -lrumpdev_bpf -lrumpnet_bpfjit -lrumpnet_npf -lrumpnet_netinet6 unix://npf_nat_local
>
> export RUMP_SERVER=unix://npf_nat_local
> rump.ifconfig shmif0 create
> rump.ifconfig shmif0 linkstr ./bus_npf_local
> rump.ifconfig shmif1 create
> rump.ifconfig shmif1 linkstr ./bus_npf_nat
>
> rump.sysctl -q -w net.inet.ip.dad_count=0
> rump.ifconfig shmif0 10.0.1.1/24
> rump.ifconfig shmif1 20.0.0.1/24
> rump.sysctl -q -w net.inet.ip.forwarding=1
> rump.route -n add -net 10.0.2.0 20.0.0.2
>
> cat > ./npf.conf <<-EOF
> set bpf.jit off
> \$int_if = inet4(shmif0)
> \$ext_if = inet4(shmif1)
> \$localnet = { 10.0.1.0/24 }
> map \$ext_if dynamic \$localnet -> \$ext_if
> group "external" on \$ext_if {
> pass stateful out final all
> }
> group "internal" on \$int_if {
> block in all
> pass in final from \$localnet
> pass out final all
> }
> group default {
> pass final on lo0 all
> block all
> }
> EOF
>
> env LD_PRELOAD=/usr/lib/librumphijack.so RUMPHIJACK=path=/rump,socket=all:nolocal,sysctl=yes,blanket=/dev/npf npfctl reload ./npf.conf
>>Fix:
> As described above, enabling the code fixes the issue, but I'm not sure
> if the fix is appropriate.
>
> diff --git a/sys/net/npf/npf_ctl.c b/sys/net/npf/npf_ctl.c
> index b1af2d347ac..43105db9c85 100644
> --- a/sys/net/npf/npf_ctl.c
> +++ b/sys/net/npf/npf_ctl.c
> @@ -633,12 +633,12 @@ fail:
> prop_object_release(npf_dict);
>
> /* Error report. */
> -#if !defined(_NPF_TESTING) && !defined(_NPF_STANDALONE)
> +//#if !defined(_NPF_TESTING) && !defined(_NPF_STANDALONE)
> prop_dictionary_set_int32(errdict, "errno", error);
> prop_dictionary_copyout_ioctl(pref, cmd, errdict);
> prop_object_release(errdict);
> error = 0;
> -#endif
> +//#endif
> return error;
> }
>
Further investigations found the following requirements:
- prop_object_release(errdict) must be always called
because errdict = prop_dictionary_create() is always called
- prop_object_release(npf_dict) needs to be called only if
npf_dict is allocated by prop_dictionary_copyin_ioctl_size
- _NPF_STANDALONE doesn't require to set prop
- For _NPF_TESTING, if npfctl_testing, setting prop isn't
needed, otherwise it's needed
So I think a fix looks like the following patch. Comments?
ozaki-r
--
diff --git a/sys/net/npf/npf_ctl.c b/sys/net/npf/npf_ctl.c
index b1af2d347ac..3e82db6bf73 100644
--- a/sys/net/npf/npf_ctl.c
+++ b/sys/net/npf/npf_ctl.c
@@ -630,15 +630,25 @@ fail:
if (tblset) {
npf_tableset_destroy(tblset);
}
- prop_object_release(npf_dict);
+#if defined(_NPF_TESTING) || defined(_NPF_STANDALONE)
+ if (!npfctl_testing)
+#endif
+ prop_object_release(npf_dict);
- /* Error report. */
-#if !defined(_NPF_TESTING) && !defined(_NPF_STANDALONE)
- prop_dictionary_set_int32(errdict, "errno", error);
- prop_dictionary_copyout_ioctl(pref, cmd, errdict);
+#ifndef _NPF_STANDALONE
+#ifdef _NPF_TESTING
+ if (!npfctl_testing) {
+#endif
+ /* Error report. */
+ prop_dictionary_set_int32(errdict, "errno", error);
+ prop_dictionary_copyout_ioctl(pref, cmd, errdict);
+ error = 0;
+#ifdef _NPF_TESTING
+ }
+#endif
+#endif /* _NPF_STANDALONE */
prop_object_release(errdict);
- error = 0;
-#endif
+
return error;
}
From: christos@zoulas.com (Christos Zoulas)
To: Ryota Ozaki <ozaki-r@netbsd.org>,
"gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/52643: npfctl reload on rump kernels fails randomly
Date: Fri, 27 Oct 2017 07:28:20 -0400
On Oct 27, 12:41pm, ozaki-r@netbsd.org (Ryota Ozaki) wrote:
-- Subject: Re: kern/52643: npfctl reload on rump kernels fails randomly
| Further investigations found the following requirements:
| - prop_object_release(errdict) must be always called
| because errdict = prop_dictionary_create() is always called
| - prop_object_release(npf_dict) needs to be called only if
| npf_dict is allocated by prop_dictionary_copyin_ioctl_size
| - _NPF_STANDALONE doesn't require to set prop
| - For _NPF_TESTING, if npfctl_testing, setting prop isn't
| needed, otherwise it's needed
|
| So I think a fix looks like the following patch. Comments?
Go for it, add the big fat comment above to the patch though :-)
christos
From: "Ryota Ozaki" <ozaki-r@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52643 CVS commit: src/sys/net/npf
Date: Mon, 30 Oct 2017 03:02:35 +0000
Module Name: src
Committed By: ozaki-r
Date: Mon Oct 30 03:02:35 UTC 2017
Modified Files:
src/sys/net/npf: npf_ctl.c
Log Message:
Fix npfclt reload on rump kernels
It fails because npfctl cannot get an errno when it calls ioctl to the (rump)
kernel; npfctl (libnpf) expects that an errno is returned via proplib,
however, the rump library of npf doesn't so. It happens because of mishandlings
of complicate npf kernel options.
PR kern/52643
To generate a diff of this commit:
cvs rdiff -u -r1.48 -r1.49 src/sys/net/npf/npf_ctl.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52643 CVS commit: [netbsd-8] src
Date: Fri, 17 Nov 2017 20:43:11 +0000
Module Name: src
Committed By: snj
Date: Fri Nov 17 20:43:11 UTC 2017
Modified Files:
src/distrib/sets/lists/debug [netbsd-8]: mi
src/distrib/sets/lists/tests [netbsd-8]: mi
src/etc/mtree [netbsd-8]: NetBSD.dist.tests
src/sys/net/npf [netbsd-8]: npf_ctl.c
src/tests/net [netbsd-8]: net_common.sh
src/tests/net/ipsec [netbsd-8]: Makefile algorithms.sh
src/usr.sbin/npf/npfctl [netbsd-8]: npfctl.c
Added Files:
src/tests/net/ipsec [netbsd-8]: natt_terminator.c t_ipsec_natt.sh
Log Message:
Pull up following revision(s) (requested by ozaki-r in ticket #357):
distrib/sets/lists/debug/mi: 1.228
distrib/sets/lists/tests/mi: 1.765-1.766
etc/mtree/NetBSD.dist.tests: 1.149
sys/net/npf/npf_ctl.c: 1.49
tests/net/ipsec/Makefile: 1.10
tests/net/ipsec/algorithms.sh: 1.6
tests/net/ipsec/natt_terminator.c: 1.1
tests/net/ipsec/t_ipsec_natt.sh: 1.1
tests/net/net_common.sh: 1.23-1.24
usr.sbin/npf/npfctl/npfctl.c: 1.54
Handle esp-udp for NAT-T
--
Fix npfclt reload on rump kernels
It fails because npfctl cannot get an errno when it calls ioctl to the (rump)
kernel; npfctl (libnpf) expects that an errno is returned via proplib,
however, the rump library of npf doesn't so. It happens because of mishandlings
of complicate npf kernel options.
PR kern/52643
--
Fix showing translated port (ntohs-ed twice wrongly)
--
Add test cases of NAT-T (transport mode)
A small C program is added to make a special socket (UDP_ENCAP_ESPINUDP)
and keep it to handle UDP-encapsulated ESP packets.
--
Add net/ipsec debug lib directory
--
Add ./usr/libdata/debug/usr/tests/net/ipsec
--
Stop using bpfjit
Because most architectures don't support it and npf still works without it.
To generate a diff of this commit:
cvs rdiff -u -r1.216.2.4 -r1.216.2.5 src/distrib/sets/lists/debug/mi
cvs rdiff -u -r1.752.2.5 -r1.752.2.6 src/distrib/sets/lists/tests/mi
cvs rdiff -u -r1.147.2.1 -r1.147.2.2 src/etc/mtree/NetBSD.dist.tests
cvs rdiff -u -r1.48 -r1.48.2.1 src/sys/net/npf/npf_ctl.c
cvs rdiff -u -r1.18.2.2 -r1.18.2.3 src/tests/net/net_common.sh
cvs rdiff -u -r1.6.2.1 -r1.6.2.2 src/tests/net/ipsec/Makefile
cvs rdiff -u -r1.4.2.1 -r1.4.2.2 src/tests/net/ipsec/algorithms.sh
cvs rdiff -u -r0 -r1.1.2.2 src/tests/net/ipsec/natt_terminator.c \
src/tests/net/ipsec/t_ipsec_natt.sh
cvs rdiff -u -r1.53 -r1.53.6.1 src/usr.sbin/npf/npfctl/npfctl.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Fri, 31 Aug 2018 14:40:45 +0000
State-Changed-Why:
Fixed and pulled up, close this PR.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.