NetBSD Problem Report #53562
From www@NetBSD.org Thu Aug 30 09:24:58 2018
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 73E997A202
for <gnats-bugs@gnats.NetBSD.org>; Thu, 30 Aug 2018 09:24:58 +0000 (UTC)
Message-Id: <20180830092457.3CA8D7A261@mollari.NetBSD.org>
Date: Thu, 30 Aug 2018 09:24:57 +0000 (UTC)
From: rokuyama@rk.phys.keio.ac.jp
Reply-To: rokuyama@rk.phys.keio.ac.jp
To: gnats-bugs@NetBSD.org
Subject: bridge(4) breaks segmentation / TX checksum offloading
X-Send-Pr-Version: www-1.0
>Number: 53562
>Category: kern
>Synopsis: bridge(4) breaks segmentation / TX checksum offloading
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Aug 30 09:25:00 +0000 2018
>Last-Modified: Sat Apr 06 05:25:01 +0000 2019
>Originator: Rin Okuyama
>Release: 8.99.24
>Organization:
School of Science and Technology, Meiji University
>Environment:
NetBSD rpi3b 8.99.24 NetBSD 8.99.24 (RPI3-64) #24: Thu Aug 30 17:39:24 JST 2018 rin@latipes:/var/build/src/sys/arch/evbarm/compile/RPI3-64 evbarm aarch64
>Description:
If a network interface is added to bridge(4), segmentation or TX
checksum offloading do not work. This is because csum_flags are
cleared in bridge_enqueue():
https://nxr.netbsd.org/xref/src/sys/net/if_bridge.c#1401
1391 void
1392 bridge_enqueue(struct bridge_softc *sc, struct ifnet *dst_ifp, struct mbuf *m,
1393 int runfilt)
1394 {
....
1398 /*
1399 * Clear any in-bound checksum flags for this packet.
1400 */
1401 m->m_pkthdr.csum_flags = 0;
>How-To-Repeat:
Enable segmentation or TX checksum offloading for a network interface,
and add it to bridge(4). Then, data transmission starts to fail.
>Fix:
Not known.
>Audit-Trail:
From: Masanobu SAITOH <msaitoh@execsw.org>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: msaitoh@execsw.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Mon, 3 Sep 2018 17:52:52 +0900
On 2018/08/30 18:25, rokuyama@rk.phys.keio.ac.jp wrote:
>> Number: 53562
>> Category: kern
>> Synopsis: bridge(4) breaks segmentation / TX checksum offloading
>> Confidential: no
>> Severity: serious
>> Priority: medium
>> Responsible: kern-bug-people
>> State: open
>> Class: sw-bug
>> Submitter-Id: net
>> Arrival-Date: Thu Aug 30 09:25:00 +0000 2018
>> Originator: Rin Okuyama
>> Release: 8.99.24
>> Organization:
> School of Science and Technology, Meiji University
>> Environment:
> NetBSD rpi3b 8.99.24 NetBSD 8.99.24 (RPI3-64) #24: Thu Aug 30 17:39:24 JST 2018 rin@latipes:/var/build/src/sys/arch/evbarm/compile/RPI3-64 evbarm aarch64
>> Description:
> If a network interface is added to bridge(4), segmentation or TX
> checksum offloading do not work. This is because csum_flags are
> cleared in bridge_enqueue():
>
> https://nxr.netbsd.org/xref/src/sys/net/if_bridge.c#1401
>
> 1391 void
> 1392 bridge_enqueue(struct bridge_softc *sc, struct ifnet *dst_ifp, struct mbuf *m,
> 1393 int runfilt)
> 1394 {
> ....
> 1398 /*
> 1399 * Clear any in-bound checksum flags for this packet.
> 1400 */
> 1401 m->m_pkthdr.csum_flags = 0;
>> How-To-Repeat:
> Enable segmentation or TX checksum offloading for a network interface,
> and add it to bridge(4). Then, data transmission starts to fail.
>> Fix:
> Not known.
>
Is the following change correct?
Index: if_bridge.c
===================================================================
RCS file: /cvsroot/src/sys/net/if_bridge.c,v
retrieving revision 1.156
diff -u -p -r1.156 if_bridge.c
--- if_bridge.c 25 May 2018 04:40:27 -0000 1.156
+++ if_bridge.c 3 Sep 2018 08:51:23 -0000
@@ -1395,11 +1395,6 @@ bridge_enqueue(struct bridge_softc *sc,
int len, error;
short mflags;
- /*
- * Clear any in-bound checksum flags for this packet.
- */
- m->m_pkthdr.csum_flags = 0;
-
if (runfilt) {
if (pfil_run_hooks(sc->sc_if.if_pfil, &m,
dst_ifp, PFIL_OUT) != 0) {
@@ -1644,6 +1639,11 @@ bridge_forward(struct bridge_softc *sc,
if ((sc->sc_if.if_flags & IFF_RUNNING) == 0)
return;
+ /*
+ * Clear any in-bound checksum flags for this packet.
+ */
+ m->m_pkthdr.csum_flags = 0;
+
src_if = m_get_rcvif_psref(m, &psref_src);
if (src_if == NULL) {
/* Interface is being destroyed? */
--
-----------------------------------------------
SAITOH Masanobu (msaitoh@execsw.org
msaitoh@netbsd.org)
From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Masanobu SAITOH <msaitoh@execsw.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum offloading
Date: Mon, 3 Sep 2018 18:18:41 +0900
On Mon, Sep 3, 2018 at 5:53 PM Masanobu SAITOH <msaitoh@execsw.org> wrote:
>
> On 2018/08/30 18:25, rokuyama@rk.phys.keio.ac.jp wrote:
> >> Number: 53562
> >> Category: kern
> >> Synopsis: bridge(4) breaks segmentation / TX checksum offloading
> >> Confidential: no
> >> Severity: serious
> >> Priority: medium
> >> Responsible: kern-bug-people
> >> State: open
> >> Class: sw-bug
> >> Submitter-Id: net
> >> Arrival-Date: Thu Aug 30 09:25:00 +0000 2018
> >> Originator: Rin Okuyama
> >> Release: 8.99.24
> >> Organization:
> > School of Science and Technology, Meiji University
> >> Environment:
> > NetBSD rpi3b 8.99.24 NetBSD 8.99.24 (RPI3-64) #24: Thu Aug 30 17:39:24 JST 2018 rin@latipes:/var/build/src/sys/arch/evbarm/compile/RPI3-64 evbarm aarch64
> >> Description:
> > If a network interface is added to bridge(4), segmentation or TX
> > checksum offloading do not work. This is because csum_flags are
> > cleared in bridge_enqueue():
> >
> > https://nxr.netbsd.org/xref/src/sys/net/if_bridge.c#1401
> >
> > 1391 void
> > 1392 bridge_enqueue(struct bridge_softc *sc, struct ifnet *dst_ifp, struct mbuf *m,
> > 1393 int runfilt)
> > 1394 {
> > ....
> > 1398 /*
> > 1399 * Clear any in-bound checksum flags for this packet.
> > 1400 */
> > 1401 m->m_pkthdr.csum_flags = 0;
> >> How-To-Repeat:
> > Enable segmentation or TX checksum offloading for a network interface,
> > and add it to bridge(4). Then, data transmission starts to fail.
> >> Fix:
> > Not known.
> >
>
> Is the following change correct?
>
> Index: if_bridge.c
> ===================================================================
> RCS file: /cvsroot/src/sys/net/if_bridge.c,v
> retrieving revision 1.156
> diff -u -p -r1.156 if_bridge.c
> --- if_bridge.c 25 May 2018 04:40:27 -0000 1.156
> +++ if_bridge.c 3 Sep 2018 08:51:23 -0000
> @@ -1395,11 +1395,6 @@ bridge_enqueue(struct bridge_softc *sc,
> int len, error;
> short mflags;
>
> - /*
> - * Clear any in-bound checksum flags for this packet.
> - */
> - m->m_pkthdr.csum_flags = 0;
> -
> if (runfilt) {
> if (pfil_run_hooks(sc->sc_if.if_pfil, &m,
> dst_ifp, PFIL_OUT) != 0) {
> @@ -1644,6 +1639,11 @@ bridge_forward(struct bridge_softc *sc,
> if ((sc->sc_if.if_flags & IFF_RUNNING) == 0)
> return;
>
> + /*
> + * Clear any in-bound checksum flags for this packet.
> + */
> + m->m_pkthdr.csum_flags = 0;
> +
> src_if = m_get_rcvif_psref(m, &psref_src);
> if (src_if == NULL) {
> /* Interface is being destroyed? */
>
ether_input can be called via ether_broadcast that is called from
bridge_forward.
So I guess clearing csum_flags should be done just before calling bridge_enqueue
like the following patch?
ozaki-r
diff --git a/sys/net/if_bridge.c b/sys/net/if_bridge.c
index e8c73f8a199..f92675e3c81 100644
--- a/sys/net/if_bridge.c
+++ b/sys/net/if_bridge.c
@@ -1395,11 +1395,6 @@ bridge_enqueue(struct bridge_softc *sc, struct
ifnet *dst_ifp, struct mbuf *m,
int len, error;
short mflags;
- /*
- * Clear any in-bound checksum flags for this packet.
- */
- m->m_pkthdr.csum_flags = 0;
-
if (runfilt) {
if (pfil_run_hooks(sc->sc_if.if_pfil, &m,
dst_ifp, PFIL_OUT) != 0) {
@@ -1768,6 +1763,11 @@ bridge_forward(struct bridge_softc *sc, struct mbuf *m)
bridge_release_member(sc, bif, &psref);
+ /*
+ * Clear any in-bound checksum flags for this packet.
+ */
+ m->m_pkthdr.csum_flags = 0;
+
ACQUIRE_GLOBAL_LOCKS();
bridge_enqueue(sc, dst_if, m, 1);
RELEASE_GLOBAL_LOCKS();
@@ -1978,6 +1978,12 @@ bridge_broadcast(struct bridge_softc *sc,
struct ifnet *src_if,
sc->sc_if.if_oerrors++;
goto next;
}
+
+ /*
+ * Clear any in-bound checksum flags for this packet.
+ */
+ mc->m_pkthdr.csum_flags = 0;
+
ACQUIRE_GLOBAL_LOCKS();
bridge_enqueue(sc, dst_if, mc, 1);
From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: Ryota Ozaki <ozaki-r@netbsd.org>, Masanobu SAITOH <msaitoh@execsw.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Wed, 5 Sep 2018 21:04:16 +0900
Thank you for your responses!
With both patches, TX works fine through an interface added to bridge,
even if TSO is enabled. However, RX from an host which shares the
interface via bridge does not work for larger packets (because of TSO?).
RX of smaller packets and TX work for that host.
I'm not sure if my setup should work in principle...
From: Christoph Badura <bad@bsd.de>
To: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
Cc: Ryota Ozaki <ozaki-r@netbsd.org>, Masanobu SAITOH <msaitoh@execsw.org>,
"gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Wed, 5 Sep 2018 23:51:37 +0200
On Wed, Sep 05, 2018 at 09:04:16PM +0900, Rin Okuyama wrote:
> With both patches, TX works fine through an interface added to bridge,
> even if TSO is enabled. However, RX from an host which shares the
> interface via bridge does not work for larger packets (because of TSO?).
> RX of smaller packets and TX work for that host.
I would not expect any TSO flags to be set on a packet that is received
from another host. The packet should have been segmented on other host
already.
> I'm not sure if my setup should work in principle...
Maybe you could describe your setup in more detail. Looking at the PR I
don't understand what exactly your setup is.
It might be worth going over the original PR kern/27007 again. Also note
the second part of PR kern/21831 by yamt@.
Note that in kern/27007 the damage was actually done only when ipf was
enabled on the bridge interface. It might well be that my change was
not correct because it was to broad. Maybe ipf has changed since, so that
packets do not get corrupted anymore in this case, too.
Looking back at the code I think the comment could have been worded
better. If you move it out of bridge_enqueue() it would be nice if the
comment could be expanded. Like, what exactly the circumstances are that
the flags have been set and why they need clearing.
Maybe we need ATF tests. That certainly would be nice. Also we now have
more test cases to consider: npf and pf active on the bridge.
--chris
From: Masanobu SAITOH <msaitoh@execsw.org>
To: Christoph Badura <bad@bsd.de>, Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
Cc: msaitoh@execsw.org, Ryota Ozaki <ozaki-r@netbsd.org>,
"gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Fri, 7 Sep 2018 17:50:58 +0900
On 2018/09/06 6:51, Christoph Badura wrote:
> On Wed, Sep 05, 2018 at 09:04:16PM +0900, Rin Okuyama wrote:
>> With both patches, TX works fine through an interface added to bridge,
>> even if TSO is enabled. However, RX from an host which shares the
>> interface via bridge does not work for larger packets (because of TSO?).
>> RX of smaller packets and TX work for that host.
>
> I would not expect any TSO flags to be set on a packet that is received
> from another host. The packet should have been segmented on other host
> already.
>
>> I'm not sure if my setup should work in principle...
>
> Maybe you could describe your setup in more detail. Looking at the PR I
> don't understand what exactly your setup is.
I think so, too. I read if_bridge.c and I think ozaki-r's change should be
correct. There might be another bug. The detailed configuration is required
to reproduce the problem.
> It might be worth going over the original PR kern/27007 again. Also note
> the second part of PR kern/21831 by yamt@.
>
> Note that in kern/27007 the damage was actually done only when ipf was
> enabled on the bridge interface. It might well be that my change was
> not correct because it was to broad. Maybe ipf has changed since, so that
> packets do not get corrupted anymore in this case, too.
>
> Looking back at the code I think the comment could have been worded
> better. If you move it out of bridge_enqueue() it would be nice if the
> comment could be expanded. Like, what exactly the circumstances are that
> the flags have been set and why they need clearing.
New patch:
Index: if_bridge.c
===================================================================
RCS file: /cvsroot/src/sys/net/if_bridge.c,v
retrieving revision 1.156
diff -u -p -r1.156 if_bridge.c
--- if_bridge.c 25 May 2018 04:40:27 -0000 1.156
+++ if_bridge.c 7 Sep 2018 08:47:47 -0000
@@ -1395,11 +1395,6 @@ bridge_enqueue(struct bridge_softc *sc,
int len, error;
short mflags;
- /*
- * Clear any in-bound checksum flags for this packet.
- */
- m->m_pkthdr.csum_flags = 0;
-
if (runfilt) {
if (pfil_run_hooks(sc->sc_if.if_pfil, &m,
dst_ifp, PFIL_OUT) != 0) {
@@ -1768,6 +1763,13 @@ bridge_forward(struct bridge_softc *sc,
bridge_release_member(sc, bif, &psref);
+ /*
+ * Before enqueueing this packet to the destination interface,
+ * clear any in-bound checksum flags to prevent them from being
+ * misused as out-bound flags.
+ */
+ m->m_pkthdr.csum_flags = 0;
+
ACQUIRE_GLOBAL_LOCKS();
bridge_enqueue(sc, dst_if, m, 1);
RELEASE_GLOBAL_LOCKS();
@@ -1978,6 +1980,13 @@ bridge_broadcast(struct bridge_softc *sc
sc->sc_if.if_oerrors++;
goto next;
}
+ /*
+ * Before enqueueing this packet to the destination
+ * interface, clear any in-bound checksum flags to
+ * prevent them from being misused as out-bound flags.
+ */
+ m->m_pkthdr.csum_flags = 0;
+
ACQUIRE_GLOBAL_LOCKS();
bridge_enqueue(sc, dst_if, mc, 1);
RELEASE_GLOBAL_LOCKS();
> Maybe we need ATF tests. That certainly would be nice. Also we now have
> more test cases to consider: npf and pf active on the bridge.
I think adding offload functions into shmif(4) is a good idea to test offload
stuff.
>
> --chris
>
--
-----------------------------------------------
SAITOH Masanobu (msaitoh@execsw.org
msaitoh@netbsd.org)
From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: Masanobu SAITOH <msaitoh@execsw.org>, Christoph Badura <bad@bsd.de>
Cc: Ryota Ozaki <ozaki-r@netbsd.org>,
"gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Sat, 8 Sep 2018 08:48:04 +0900
On 2018/09/07 17:50, Masanobu SAITOH wrote:
> On 2018/09/06 6:51, Christoph Badura wrote:
>> On Wed, Sep 05, 2018 at 09:04:16PM +0900, Rin Okuyama wrote:
>>> With both patches, TX works fine through an interface added to bridge,
>>> even if TSO is enabled. However, RX from an host which shares the
>>> interface via bridge does not work for larger packets (because of TSO?).
>>> RX of smaller packets and TX work for that host.
>>
>> I would not expect any TSO flags to be set on a packet that is received
>> from another host. The packet should have been segmented on other host
>> already.
>>
>>> I'm not sure if my setup should work in principle...
>>
>> Maybe you could describe your setup in more detail. Looking at the PR I
>> don't understand what exactly your setup is.
>
> I think so, too. I read if_bridge.c and I think ozaki-r's change should be
> correct. There might be another bug. The detailed configuration is required
> to reproduce the problem.
Yeah, sorry for the poor explanation. I'm working on an userland
application on Raspberry Pi, which emulates an ethernet adapter for
a client connected via GPIO. Then, it sends and receives packets for
the client using tap device bridged to a NIC. When TSO is enabled for
that NIC, the client cannot retrieve file via ftp, for example.
Well, my setup is hard to reproduce. The following would be a simplest
example where your patch does not work:
- NetBSD/amd64-currnet with wm0 connected to LAN
- qemu-3.0.0nb2 installed from pkgsrc
- setup tap and bridge as follows:
# ifconfig tap0 create up
# ifconfig bridge0 create
# brconfig bridge0 add wm0 add tap0 up
- run NetBSD/amd64 8.0 on QEMU with tap enabled:
# qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
-display curses -net tap,fd=3 -net nic 3<>/dev/tap0
Then, the virtual host can send and receive small packets, e.g.,
ping works well regardless of whether TSO is enabled or not for wm0.
However, it cannot receive larger packets when TSO is enabled, e.g.,
file cannot be retrieved through ftp. If TSO is turned off, everything
works fine.
>> It might be worth going over the original PR kern/27007 again. Also note
>> the second part of PR kern/21831 by yamt@.
The second part of kern/21831 still affects at least for tap?
>> Maybe we need ATF tests. That certainly would be nice. Also we now have
>> more test cases to consider: npf and pf active on the bridge.
>
> I think adding offload functions into shmif(4) is a good idea to test offload
> stuff.
Well, I'm working on offload engine for shmif(4). The application
mentioned above emulates TSO and TX/RX checksum offload by software.
The algorithm, or at least ideas, can be easily applied to shmif(4).
Thanks,
rin
From: Masanobu SAITOH <msaitoh@execsw.org>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, rokuyama@rk.phys.keio.ac.jp
Cc: msaitoh@execsw.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Mon, 10 Sep 2018 16:34:38 +0900
Hi.
On 2018/09/08 8:50, Rin Okuyama wrote:
> The following reply was made to PR kern/53562; it has been noted by GNATS.
>
> From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
> To: Masanobu SAITOH <msaitoh@execsw.org>, Christoph Badura <bad@bsd.de>
> Cc: Ryota Ozaki <ozaki-r@netbsd.org>,
> "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org,
> gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
> Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
> offloading
> Date: Sat, 8 Sep 2018 08:48:04 +0900
>
> On 2018/09/07 17:50, Masanobu SAITOH wrote:
> > On 2018/09/06 6:51, Christoph Badura wrote:
> >> On Wed, Sep 05, 2018 at 09:04:16PM +0900, Rin Okuyama wrote:
> >>> With both patches, TX works fine through an interface added to bridge,
> >>> even if TSO is enabled. However, RX from an host which shares the
> >>> interface via bridge does not work for larger packets (because of TSO?).
> >>> RX of smaller packets and TX work for that host.
> >>
> >> I would not expect any TSO flags to be set on a packet that is received
> >> from another host. The packet should have been segmented on other host
> >> already.
> >>
> >>> I'm not sure if my setup should work in principle...
> >>
> >> Maybe you could describe your setup in more detail. Looking at the PR I
> >> don't understand what exactly your setup is.
> >
> > I think so, too. I read if_bridge.c and I think ozaki-r's change should be
> > correct. There might be another bug. The detailed configuration is required
> > to reproduce the problem.
>
> Yeah, sorry for the poor explanation. I'm working on an userland
> application on Raspberry Pi, which emulates an ethernet adapter for
> a client connected via GPIO. Then, it sends and receives packets for
> the client using tap device bridged to a NIC. When TSO is enabled for
> that NIC, the client cannot retrieve file via ftp, for example.
>
> Well, my setup is hard to reproduce. The following would be a simplest
> example where your patch does not work:
>
> - NetBSD/amd64-currnet with wm0 connected to LAN
>
> - qemu-3.0.0nb2 installed from pkgsrc
>
> - setup tap and bridge as follows:
>
> # ifconfig tap0 create up
> # ifconfig bridge0 create
> # brconfig bridge0 add wm0 add tap0 up
>
> - run NetBSD/amd64 8.0 on QEMU with tap enabled:
>
> # qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
> -display curses -net tap,fd=3 -net nic 3<>/dev/tap0
>
> Then, the virtual host can send and receive small packets, e.g.,
> ping works well regardless of whether TSO is enabled or not for wm0.
0)
Is this "wm0" on the guest or the host?
If my understanding is correct, writing packet via tap on host is not related
to TSO...
1)
If you're using wm(4) on the guest, please try vioif(4) and see if the problem
occurs or not.
2)
For checksum oofloading, the following kernel options will help for debugging:
INET_CSUM_COUNTERS
TCP_CSUM_COUNTERS
UDP_CSUM_COUNTERS
These options add some event counters:
> inet swcsum 0 0 misc
> inet hwcsum ok 419667 69 misc
> inet hwcsum bad 0 0 misc
> tcp6 swcsum 0 0 misc
> tcp6 hwcsum data 0 0 misc
> tcp6 hwcsum ok 0 0 misc
> tcp6 hwcsum bad 0 0 misc
> tcp swcsum 0 0 misc
> tcp hwcsum data 0 0 misc
> tcp hwcsum ok 419430 69 misc
> tcp hwcsum bad 0 0 misc
> udp swcsum 0 0 misc
> udp hwcsum data 0 0 misc
> udp hwcsum ok 237 0 misc
> udp hwcsum bad 0 0 misc
> udp6 swcsum 0 0 misc
> udp6 hwcsum data 0 0 misc
> udp6 hwcsum ok 0 0 misc
> udp6 hwcsum bad 0 0 misc
3)
wm(4) has a lot of event counters, some of them are useful for debugging.
To use them add "options WM_EVENT_COUNTERS" to your kernel config file.
4)
For TSO, current wm(4) is not perfect because it doesn't use m_defrag().
I have not-yet-committed patch:
http://www.netbsd.org/~msaitoh/wm-defrag-20180906-0.dif
> % vmstat -ev | grep toomany
> wm0 txq00txtoomanyseg 0 0 misc
If "wmX txqYYtxtoomanyseg" is increasing, the above diff would fix the
problem.
> However, it cannot receive larger packets when TSO is enabled, e.g.,
> file cannot be retrieved through ftp. If TSO is turned off, everything
> works fine.
>
> >> It might be worth going over the original PR kern/27007 again. Also note
> >> the second part of PR kern/21831 by yamt@.
>
> The second part of kern/21831 still affects at least for tap?
>
> >> Maybe we need ATF tests. That certainly would be nice. Also we now have
> >> more test cases to consider: npf and pf active on the bridge.
> >
> > I think adding offload functions into shmif(4) is a good idea to test offload
> > stuff.
>
> Well, I'm working on offload engine for shmif(4). The application
> mentioned above emulates TSO and TX/RX checksum offload by software.
> The algorithm, or at least ideas, can be easily applied to shmif(4).
>
> Thanks,
> rin
>
>
--
-----------------------------------------------
SAITOH Masanobu (msaitoh@execsw.org
msaitoh@netbsd.org)
From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: Masanobu SAITOH <msaitoh@execsw.org>, gnats-bugs@NetBSD.org,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Wed, 12 Sep 2018 21:33:20 +0900
Hi, sorry for the delay in the response.
On 2018/09/10 16:34, Masanobu SAITOH wrote:
...
> On 2018/09/08 8:50, Rin Okuyama wrote:
...
>> The following would be a simplest
>> example where your patch does not work:
>> - NetBSD/amd64-currnet with wm0 connected to LAN
>> - qemu-3.0.0nb2 installed from pkgsrc
>> - setup tap and bridge as follows:
>> # ifconfig tap0 create up
>> # ifconfig bridge0 create
>> # brconfig bridge0 add wm0 add tap0 up
>> - run NetBSD/amd64 8.0 on QEMU with tap enabled:
>> # qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
>> -display curses -net tap,fd=3 -net nic 3<>/dev/tap0
>> Then, the virtual host can send and receive small packets, e.g.,
>> ping works well regardless of whether TSO is enabled or not for wm0.
[quotation added by RO from the previous message of mine]
>> However, it cannot receive larger packets when TSO is enabled, e.g.,
>> file cannot be retrieved through ftp. If TSO is turned off, everything
>> works fine.
[quotation done]
>
> 0)
> Is this "wm0" on the guest or the host?
> If my understanding is correct, writing packet via tap on host is not related
> to TSO...
Sorry for bothering you many times. This "wm0" is for the host. Also,
in this example, the guest fails to receive data from the ftp server
working on the host. I forgot to test servers other than the host. It
worked fine; only connecting from the guest to host fails.
> 1)
> If you're using wm(4) on the guest, please try vioif(4) and see if the problem
> occurs or not.
I tried vioif(4) on the guest but the situation does not change.
> 2)
> For checksum oofloading, the following kernel options will help for debugging:
> INET_CSUM_COUNTERS
> TCP_CSUM_COUNTERS
> UDP_CSUM_COUNTERS
>
> These options add some event counters:
...
>
> 3)
> wm(4) has a lot of event counters, some of them are useful for debugging.
> To use them add "options WM_EVENT_COUNTERS" to your kernel config file.
Thanks! I enabled these options.
> 4)
> For TSO, current wm(4) is not perfect because it doesn't use m_defrag().
> I have not-yet-committed patch:
>
> http://www.netbsd.org/~msaitoh/wm-defrag-20180906-0.dif
>
>> % vmstat -ev | grep toomany
>> wm0 txq00txtoomanyseg 0 0 misc
>
> If "wmX txqYYtxtoomanyseg" is increasing, the above diff would fix the
> problem.
I tried if_wm.c r1.587, which already contains this patch. Counter
"wmX txqYYtxtoomanyseg" was not increased.
I examined how packets flow in the example above, from the ftp server
on host (wm0 on host) to the ftp client on guest (tap0 on host ->
vioif0 on guest):
tcp_output()
-> ip_output()
-> ip_if_output()
-> if_output_lock()
-> ifp->if_output = ether_output()
-> bridge_output()
-> bridge_enqueue()
When TSO4 is enabled for wm0 on the host, tcp_output() attempts to send
packets larger than MTU. If the destination interface is wm0, i.e.,
dst_ifp == ifp in bridge_output(), there's no problem with your patch
for if_bridge.c; packets are segmented automatically by the HW. However,
in this case, the destination is tap0, and bridge_enqueue() send packets
larger than MTU for tap0. This is, of course, illegal, since there is no
HW by which packets are segmented. The guest on QEMU, therefore, cannot
receive packets. This is the scenario how the failure occurs in my
example of QEMU above.
The similar problems occur when one of other TX offload options, TSO6,
IP4CSUM-TX, TCPnCSUM-TX, or UDPnCSUM-TX, are enabled. For example,
consider the case of IP4CSUM-TX. When this option is enabled for wm0
on the host, checksumming is omitted in ip_output(). Then, when the
destination is tap0, packets with wrong checksum are sent.
This patch disable TX offload when the interface is added to bridge:
http://www.netbsd.org/~rin/disable_tx_offload_when_bridged_20180912.patch
With this patch, the guest on QEMU can retrieve files from the host
without problem, even if TX offload options are enabled, which supports
the discussion above. No need to modify if_bridge.c from the original
(r1.156) in this case.
I don't know this patch is preferable; there should be better ways to
handle the problem. Packets need to be segmented or checksummed only
if the destination in bridge_output() is different from the original
interface. I will examine further if needed.
Thanks,
rin
From: Masanobu SAITOH <msaitoh@execsw.org>
To: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>, gnats-bugs@NetBSD.org,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: msaitoh@execsw.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Fri, 14 Sep 2018 20:17:15 +0900
On 2018/09/12 21:33, Rin Okuyama wrote:
> Hi, sorry for the delay in the response.
>
> On 2018/09/10 16:34, Masanobu SAITOH wrote:
> ...
>> On 2018/09/08 8:50, Rin Okuyama wrote:
> ...
>>> The following would be a simplest
>>> example where your patch does not work:
>>> - NetBSD/amd64-currnet with wm0 connected to LAN
>>> - qemu-3.0.0nb2 installed from pkgsrc
>>> - setup tap and bridge as follows:
>>> # ifconfig tap0 create up
>>> # ifconfig bridge0 create
>>> # brconfig bridge0 add wm0 add tap0 up
>>> - run NetBSD/amd64 8.0 on QEMU with tap enabled:
>>> # qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
>>> -display curses -net tap,fd=3 -net nic 3<>/dev/tap0
>>> Then, the virtual host can send and receive small packets, e.g.,
>>> ping works well regardless of whether TSO is enabled or not for wm0.
> [quotation added by RO from the previous message of mine]
>>> However, it cannot receive larger packets when TSO is enabled, e.g.,
>>> file cannot be retrieved through ftp. If TSO is turned off, everything
>>> works fine.
> [quotation done]
>>
>> 0)
>> Is this "wm0" on the guest or the host?
>> If my understanding is correct, writing packet via tap on host is not related
>> to TSO...
>
I realized that this problem is not only for TSO but for all outgoing offload flags.
> Sorry for bothering you many times. This "wm0" is for the host. Also,
> in this example, the guest fails to receive data from the ftp server
> working on the host. I forgot to test servers other than the host. It
> worked fine; only connecting from the guest to host fails.
I see. I could reproduce the same problem with Xen dom0 & domU by adding only ip4csum
flag to wm0.
>> 1)
>> If you're using wm(4) on the guest, please try vioif(4) and see if the problem
>> occurs or not.
>
> I tried vioif(4) on the guest but the situation does not change.
>
>> 2)
>> For checksum oofloading, the following kernel options will help for debugging:
>> INET_CSUM_COUNTERS
>> TCP_CSUM_COUNTERS
>> UDP_CSUM_COUNTERS
>>
>> These options add some event counters:
> ...
>>
>> 3)
>> wm(4) has a lot of event counters, some of them are useful for debugging.
>> To use them add "options WM_EVENT_COUNTERS" to your kernel config file.
>
> Thanks! I enabled these options.
>
>> 4)
>> For TSO, current wm(4) is not perfect because it doesn't use m_defrag().
>> I have not-yet-committed patch:
>>
>> http://www.netbsd.org/~msaitoh/wm-defrag-20180906-0.dif
>>
>>> % vmstat -ev | grep toomany
>>> wm0 txq00txtoomanyseg 0 0 misc
>>
>> If "wmX txqYYtxtoomanyseg" is increasing, the above diff would fix the
>> problem.
>
> I tried if_wm.c r1.587, which already contains this patch. Counter
> "wmX txqYYtxtoomanyseg" was not increased.
>
> I examined how packets flow in the example above, from the ftp server
> on host (wm0 on host) to the ftp client on guest (tap0 on host ->
> vioif0 on guest):
>
> tcp_output()
> -> ip_output()
> -> ip_if_output()
> -> if_output_lock()
> -> ifp->if_output = ether_output()
> -> bridge_output()
> -> bridge_enqueue()
>
> When TSO4 is enabled for wm0 on the host, tcp_output() attempts to send
> packets larger than MTU. If the destination interface is wm0, i.e.,
> dst_ifp == ifp in bridge_output(), there's no problem with your patch
> for if_bridge.c; packets are segmented automatically by the HW. However,
> in this case, the destination is tap0, and bridge_enqueue() send packets
> larger than MTU for tap0. This is, of course, illegal, since there is no
> HW by which packets are segmented. The guest on QEMU, therefore, cannot
> receive packets. This is the scenario how the failure occurs in my
> example of QEMU above.
>
> The similar problems occur when one of other TX offload options, TSO6,
> IP4CSUM-TX, TCPnCSUM-TX, or UDPnCSUM-TX, are enabled. For example,
> consider the case of IP4CSUM-TX. When this option is enabled for wm0
> on the host, checksumming is omitted in ip_output(). Then, when the
> destination is tap0, packets with wrong checksum are sent.
>
> This patch disable TX offload when the interface is added to bridge:
>
> http://www.netbsd.org/~rin/disable_tx_offload_when_bridged_20180912.patch
I don't know if it's good to doing in L3...
It seems that both FreeBSD and DragonFly clear all bridge member's outgoing
offload flags when adding a member to a bridge. I also don't know if
this is good solution or not...
My diff have been committed as if_bridge.c rev. 1.156 now.
> Module Name: src
> Committed By: msaitoh
> Date: Fri Sep 14 11:05:09 UTC 2018
>
> Modified Files:
> src/sys/net: if_bridge.c
>
> Log Message:
> Fix a bug that bridge_enqueue() incorrectly cleared outgoing packet's offload
> flags. bridge_enqueue() is called from bridge_output() when a packet is
> spontaneous. Clear csum_flags before calling brige_enqueue() in
> bridge_forward() or bridge_broadcast() instead of in the beginning of
> bridge_enqueue().
>
> Note that this change doesn't fix a problem on the following configuration:
>
> A bridge has two or more interfaces.
>
> An address is assigned to an bridge member interface and
> some offload flags are set.
>
> Another interface has no address and has no any offload flag.
>
> XXX pullup-[78]
>
>
> To generate a diff of this commit:
> cvs rdiff -u -r1.156 -r1.157 src/sys/net/if_bridge.c
> With this patch, the guest on QEMU can retrieve files from the host
> without problem, even if TX offload options are enabled, which supports
> the discussion above. No need to modify if_bridge.c from the original
> (r1.156) in this case.
>
> I don't know this patch is preferable; there should be better ways to
> handle the problem. Packets need to be segmented or checksummed only
> if the destination in bridge_output() is different from the original
> interface. I will examine further if needed.
>
> Thanks,
> rin
--
-----------------------------------------------
SAITOH Masanobu (msaitoh@execsw.org
msaitoh@netbsd.org)
From: Masanobu SAITOH <msaitoh@execsw.org>
To: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>, gnats-bugs@NetBSD.org,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: msaitoh@execsw.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Fri, 14 Sep 2018 20:26:20 +0900
On 2018/09/14 20:17, Masanobu SAITOH wrote:
> On 2018/09/12 21:33, Rin Okuyama wrote:
>> Hi, sorry for the delay in the response.
>>
>> On 2018/09/10 16:34, Masanobu SAITOH wrote:
>> ...
>>> On 2018/09/08 8:50, Rin Okuyama wrote:
>> ...
>>>> The following would be a simplest
>>>> example where your patch does not work:
>>>> - NetBSD/amd64-currnet with wm0 connected to LAN
>>>> - qemu-3.0.0nb2 installed from pkgsrc
>>>> - setup tap and bridge as follows:
>>>> # ifconfig tap0 create up
>>>> # ifconfig bridge0 create
>>>> # brconfig bridge0 add wm0 add tap0 up
>>>> - run NetBSD/amd64 8.0 on QEMU with tap enabled:
>>>> # qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
>>>> -display curses -net tap,fd=3 -net nic 3<>/dev/tap0
>>>> Then, the virtual host can send and receive small packets, e.g.,
>>>> ping works well regardless of whether TSO is enabled or not for wm0.
>> [quotation added by RO from the previous message of mine]
>>>> However, it cannot receive larger packets when TSO is enabled, e.g.,
>>>> file cannot be retrieved through ftp. If TSO is turned off, everything
>>>> works fine.
>> [quotation done]
>>>
>>> 0)
>>> Is this "wm0" on the guest or the host?
>>> If my understanding is correct, writing packet via tap on host is not related
>>> to TSO...
>>
>
> I realized that this problem is not only for TSO but for all outgoing offload flags.
>
>
>> Sorry for bothering you many times. This "wm0" is for the host. Also,
>> in this example, the guest fails to receive data from the ftp server
>> working on the host. I forgot to test servers other than the host. It
>> worked fine; only connecting from the guest to host fails.
>
> I see. I could reproduce the same problem with Xen dom0 & domU by adding only ip4csum
> flag to wm0.
>
>>> 1)
>>> If you're using wm(4) on the guest, please try vioif(4) and see if the problem
>>> occurs or not.
>>
>> I tried vioif(4) on the guest but the situation does not change.
>>
>>> 2)
>>> For checksum oofloading, the following kernel options will help for debugging:
>>> INET_CSUM_COUNTERS
>>> TCP_CSUM_COUNTERS
>>> UDP_CSUM_COUNTERS
>>>
>>> These options add some event counters:
>> ...
>>>
>>> 3)
>>> wm(4) has a lot of event counters, some of them are useful for debugging.
>>> To use them add "options WM_EVENT_COUNTERS" to your kernel config file.
>>
>> Thanks! I enabled these options.
>>
>>> 4)
>>> For TSO, current wm(4) is not perfect because it doesn't use m_defrag().
>>> I have not-yet-committed patch:
>>>
>>> http://www.netbsd.org/~msaitoh/wm-defrag-20180906-0.dif
>>>
>>>> % vmstat -ev | grep toomany
>>>> wm0 txq00txtoomanyseg 0 0 misc
>>>
>>> If "wmX txqYYtxtoomanyseg" is increasing, the above diff would fix the
>>> problem.
>>
>> I tried if_wm.c r1.587, which already contains this patch. Counter
>> "wmX txqYYtxtoomanyseg" was not increased.
>>
>> I examined how packets flow in the example above, from the ftp server
>> on host (wm0 on host) to the ftp client on guest (tap0 on host ->
>> vioif0 on guest):
>>
>> tcp_output()
>> -> ip_output()
>> -> ip_if_output()
>> -> if_output_lock()
>> -> ifp->if_output = ether_output()
>> -> bridge_output()
>> -> bridge_enqueue()
>>
>> When TSO4 is enabled for wm0 on the host, tcp_output() attempts to send
>> packets larger than MTU. If the destination interface is wm0, i.e.,
>> dst_ifp == ifp in bridge_output(), there's no problem with your patch
>> for if_bridge.c; packets are segmented automatically by the HW. However,
>> in this case, the destination is tap0, and bridge_enqueue() send packets
>> larger than MTU for tap0. This is, of course, illegal, since there is no
>> HW by which packets are segmented. The guest on QEMU, therefore, cannot
>> receive packets. This is the scenario how the failure occurs in my
>> example of QEMU above.
>>
>> The similar problems occur when one of other TX offload options, TSO6,
>> IP4CSUM-TX, TCPnCSUM-TX, or UDPnCSUM-TX, are enabled. For example,
>> consider the case of IP4CSUM-TX. When this option is enabled for wm0
>> on the host, checksumming is omitted in ip_output(). Then, when the
>> destination is tap0, packets with wrong checksum are sent.
>>
>> This patch disable TX offload when the interface is added to bridge:
>>
>> http://www.netbsd.org/~rin/disable_tx_offload_when_bridged_20180912.patch
>
> I don't know if it's good to doing in L3...
>
> It seems that both FreeBSD and DragonFly clear all bridge member's outgoing
> offload flags when adding a member to a bridge. I also don't know if
> this is good solution or not...
>
> My diff have been committed as if_bridge.c rev. 1.156 now.
>
>> Module Name: src
>> Committed By: msaitoh
>> Date: Fri Sep 14 11:05:09 UTC 2018
>>
>> Modified Files:
>> src/sys/net: if_bridge.c
>>
>> Log Message:
>> Fix a bug that bridge_enqueue() incorrectly cleared outgoing packet's offload
>> flags. bridge_enqueue() is called from bridge_output() when a packet is
>> spontaneous. Clear csum_flags before calling brige_enqueue() in
>> bridge_forward() or bridge_broadcast() instead of in the beginning of
>> bridge_enqueue().
>>
>> Note that this change doesn't fix a problem on the following configuration:
>>
>> A bridge has two or more interfaces.
>>
>> An address is assigned to an bridge member interface and
>> some offload flags are set.
>>
>> Another interface has no address and has no any offload flag.
>>
>> XXX pullup-[78]
>>
>>
>> To generate a diff of this commit:
>> cvs rdiff -u -r1.156 -r1.157 src/sys/net/if_bridge.c
>
>
>
>> With this patch, the guest on QEMU can retrieve files from the host
>> without problem, even if TX offload options are enabled, which supports
>> the discussion above. No need to modify if_bridge.c from the original
>> (r1.156) in this case.
>>
>> I don't know this patch is preferable; there should be better ways to
>> handle the problem. Packets need to be segmented or checksummed only
>> if the destination in bridge_output() is different from the original
>> interface. I will examine further if needed.
>>
>> Thanks,
>> rin
>
>
FYI:
http://www.netbsd.org/~msaitoh/packetflow017.pdf
--
-----------------------------------------------
SAITOH Masanobu (msaitoh@execsw.org
msaitoh@netbsd.org)
From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: Masanobu SAITOH <msaitoh@execsw.org>, gnats-bugs@NetBSD.org,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Sun, 16 Sep 2018 22:49:24 +0900
On 2018/09/14 20:17, Masanobu SAITOH wrote:
> On 2018/09/12 21:33, Rin Okuyama wrote:
...
>> This patch disable TX offload when the interface is added to bridge:
>>
>> http://www.netbsd.org/~rin/disable_tx_offload_when_bridged_20180912.patch
>
> I don't know if it's good to doing in L3...
>
> It seems that both FreeBSD and DragonFly clear all bridge member's outgoing
> offload flags when adding a member to a bridge. I also don't know if
> this is good solution or not...
If I understand correctly, since TX offload is for L3 (and L4), we
inevitably need something to do in L3...
I implemented software TX/RX offload engines for ethernet:
http://www.netbsd.org/~rin/ether_offload_20180916.patch
Using these routines, I wrote a patch where TX offloading is canceled in
bridge_output() when the destination interface is different from the
source interface:
http://www.netbsd.org/~rin/bridge_tx_offload_20180916.patch
With this patch, the guest successfully communicates with the host even
if any TX offloading is enabled on the host side.
Also, I added TX/RX offloading support to shmif(4):
http://www.netbsd.org/~rin/shmif_offload_20180916.patch
Since I'm not familiar with rump, this patch has not been tested
systematically yet. But it seems to work to some extent...
Thanks,
rin
From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: Masanobu SAITOH <msaitoh@execsw.org>, gnats-bugs@NetBSD.org,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Mon, 17 Sep 2018 19:36:47 +0900
On 2018/09/16 22:49, Rin Okuyama wrote:
> I implemented software TX/RX offload engines for ethernet:
>
> http://www.netbsd.org/~rin/ether_offload_20180916.patch
>
> Using these routines, I wrote a patch where TX offloading is canceled in
> bridge_output() when the destination interface is different from the
> source interface:
>
> http://www.netbsd.org/~rin/bridge_tx_offload_20180916.patch
>
> With this patch, the guest successfully communicates with the host even
> if any TX offloading is enabled on the host side.
>
> Also, I added TX/RX offloading support to shmif(4):
>
> http://www.netbsd.org/~rin/shmif_offload_20180916.patch
>
> Since I'm not familiar with rump, this patch has not been tested
> systematically yet. But it seems to work to some extent...
I fixed a bug in the patch for ethernet:
http://www.netbsd.org/~rin/ether_offload_20180917.patch
Other two patches are not updated:
http://www.netbsd.org/~rin/bridge_tx_offload_20180916.patch
http://www.netbsd.org/~rin/shmif_offload_20180916.patch
Thanks,
rin
From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@NetBSD.org>
Cc:
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Tue, 4 Dec 2018 09:32:45 +0900
First, let me summarize the current status of the problem. Most problems
have been resolved by msaito. The remaining problem is that transmission
from interface if0 to interface if1 fails when any TX offload option is
enabled for if0 but disabled for if1.
We need handle TX offload in software in bridge_output() when some TX
offload option is enabled for a packet and it is not available for the
destination interface. The attached patches deal with this as follows.
For unicast:
(1) When the destination interface is the same as source interface, we
do not need to handle offload options; send a packet as is.
(2) When the destination is different from the source, check whether TX
offload options specified in a packet is supported by the
destination. If so, we can send it as is. Otherwise, handle these
options in software.
For multicast/broadcast:
(3) Send a packet as is if all the members of that bridge support given
TX offload options. Otherwise, handle these options in software.
For (3) we need calculate logical AND b/w capabilities of TX offload
options in member interface (ifp->if_csum_flags_tx). I've added
sc_csum_flags_tx flag to bridge softc for this purpose. This flag is
updated when a member is (i) added to or (ii) removed from a bridge, or
(iii) if_csum_flags_tx flag of a member interface is manipulated via
ifconfig(8).
Also, currently, if_csum_flags_tx do not contains M_CSUM_TSOv[46]. I've
added these flags when TSOv[46] is enabled for that interface. As far
as I can see, no one seems to depend on the old behavior.
I've separated the patch into four parts for readability.
(a) http://www.netbsd.org/~rin/bridge_20181204/ether_sw_offload_20181204.patch
This patch provides routines to handle TX/RX offload options in software
for ethernet. Since this breaks separation between L2 and L3/L4, I've
added new files (ether_sw_offload.[ch]) rather than adding the routines
to existing files.
(b) http://www.netbsd.org/~rin/bridge_20181204/bridge_sw_offload_20181204.patch
This is main patch to deal with the problem described above by using (a).
(c) http://www.netbsd.org/~rin/bridge_20181204/if_shmem_sw_offload_20181204.patch
This patch adds TX/RX offload support for shmemif(4) using (a). This may
be useful for testing offload related codes in the ATF framework. This
patch also contains a test code written by msaito; if_capabilities for
shmemif(4) can be specified by environmental variable
RUMP_SHMIF_CAPENABLE:
setenv RUMP_SHMIF_CAPENABLE 0x7ff80 (all offload)
setenv RUMP_SHMIF_CAPENABLE 0x6aa80 (all TX)
setenv RUMP_SHMIF_CAPENABLE 0x15500 (all RX)
for definition of flags, see sys/net/if.h:
https://nxr.netbsd.org/xref/src/sys/net/if.h#591
I've tested all tests in /usr/tests passed with all offload options
enabled (RUMP_SHMIF_CAPENABLE=0x7ff80).
(d) http://www.netbsd.org/~rin/bridge_20181204/simplify_ip_output_20181204.patch
This patch simplify logics in ip{,6}_output() by using M_CSUM_TSOv[46]
bits in ifp->if_csum_flags_tx.
Any comments and suggestions are welcomed. I thank msaito for fruitful
discussion!
rin
From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Tue, 11 Dec 2018 17:38:46 +0900
On 2018/12/04 9:35, Rin Okuyama wrote:
> First, let me summarize the current status of the problem. Most problems
> have been resolved by msaito. The remaining problem is that transmission
> from interface if0 to interface if1 fails when any TX offload option is
> enabled for if0 but disabled for if1.
>
> We need handle TX offload in software in bridge_output() when some TX
> offload option is enabled for a packet and it is not available for the
> destination interface. The attached patches deal with this as follows.
>
> For unicast:
>
> (1) When the destination interface is the same as source interface, we
> do not need to handle offload options; send a packet as is.
>
> (2) When the destination is different from the source, check whether TX
> offload options specified in a packet is supported by the
> destination. If so, we can send it as is. Otherwise, handle these
> options in software.
>
> For multicast/broadcast:
>
> (3) Send a packet as is if all the members of that bridge support given
> TX offload options. Otherwise, handle these options in software.
>
> For (3) we need calculate logical AND b/w capabilities of TX offload
> options in member interface (ifp->if_csum_flags_tx). I've added
> sc_csum_flags_tx flag to bridge softc for this purpose. This flag is
> updated when a member is (i) added to or (ii) removed from a bridge, or
> (iii) if_csum_flags_tx flag of a member interface is manipulated via
> ifconfig(8).
>
> Also, currently, if_csum_flags_tx do not contains M_CSUM_TSOv[46]. I've
> added these flags when TSOv[46] is enabled for that interface. As far
> as I can see, no one seems to depend on the old behavior.
>
> I've separated the patch into four parts for readability.
>
> (a) http://www.netbsd.org/~rin/bridge_20181204/ether_sw_offload_20181204.patch
>
> This patch provides routines to handle TX/RX offload options in software
> for ethernet. Since this breaks separation between L2 and L3/L4, I've
> added new files (ether_sw_offload.[ch]) rather than adding the routines
> to existing files.
>
> (b) http://www.netbsd.org/~rin/bridge_20181204/bridge_sw_offload_20181204.patch
>
> This is main patch to deal with the problem described above by using (a).
>
> (c) http://www.netbsd.org/~rin/bridge_20181204/if_shmem_sw_offload_20181204.patch
>
> This patch adds TX/RX offload support for shmemif(4) using (a). This may
> be useful for testing offload related codes in the ATF framework. This
> patch also contains a test code written by msaito; if_capabilities for
> shmemif(4) can be specified by environmental variable
> RUMP_SHMIF_CAPENABLE:
>
> setenv RUMP_SHMIF_CAPENABLE 0x7ff80 (all offload)
> setenv RUMP_SHMIF_CAPENABLE 0x6aa80 (all TX)
> setenv RUMP_SHMIF_CAPENABLE 0x15500 (all RX)
>
> for definition of flags, see sys/net/if.h:
>
> https://nxr.netbsd.org/xref/src/sys/net/if.h#591
>
> I've tested all tests in /usr/tests passed with all offload options
> enabled (RUMP_SHMIF_CAPENABLE=0x7ff80).
>
> (d) http://www.netbsd.org/~rin/bridge_20181204/simplify_ip_output_20181204.patch
>
> This patch simplify logics in ip{,6}_output() by using M_CSUM_TSOv[46]
> bits in ifp->if_csum_flags_tx.
>
> Any comments and suggestions are welcomed. I thank msaito for fruitful
> discussion!
>
> rin
I will commit this soon if there's no objection (OKed by msaitoh).
The patch (a) above adds new public functions, tcp[46]_segment(), that
are used from bridge(4) via ether_sw_offload_tx(). Should I bump
__NetBSD_Version__ although bridge(4) does not have its module?
Thanks,
rin
From: Jason Thorpe <thorpej@me.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>
Cc: kern-bug-people@netbsd.org,
gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org,
rokuyama@rk.phys.keio.ac.jp
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Tue, 11 Dec 2018 05:55:31 -0800
> On Dec 11, 2018, at 12:40 AM, Rin Okuyama =
<rokuyama@rk.phys.keio.ac.jp> wrote:
>=20
> I will commit this soon if there's no objection (OKed by msaitoh).
No objection. Thanks for doing this work!
> The patch (a) above adds new public functions, tcp[46]_segment(), that
> are used from bridge(4) via ether_sw_offload_tx(). Should I bump
> __NetBSD_Version__ although bridge(4) does not have its module?
I don't think it's needed.
-- thorpej
From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: Jason Thorpe <thorpej@me.com>,
"gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Wed, 12 Dec 2018 10:27:07 +0900
On 2018/12/11 22:55, Jason Thorpe wrote:
>
>
>> On Dec 11, 2018, at 12:40 AM, Rin Okuyama <rokuyama@rk.phys.keio.ac.jp> wrote:
>>
>> I will commit this soon if there's no objection (OKed by msaitoh).
>
> No objection. Thanks for doing this work!
>
>> The patch (a) above adds new public functions, tcp[46]_segment(), that
>> are used from bridge(4) via ether_sw_offload_tx(). Should I bump
>> __NetBSD_Version__ although bridge(4) does not have its module?
>
> I don't think it's needed.
>
> -- thorpej
>
>
Thank you! I'll commit it!
rin
From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53562 CVS commit: src/sys
Date: Wed, 12 Dec 2018 01:40:21 +0000
Module Name: src
Committed By: rin
Date: Wed Dec 12 01:40:21 UTC 2018
Modified Files:
src/sys/net: files.net
src/sys/netinet: in_offload.c in_offload.h
src/sys/netinet6: in6_offload.c in6_offload.h
src/sys/rump/net/lib/libnet: Makefile
Added Files:
src/sys/net: ether_sw_offload.c ether_sw_offload.h
Log Message:
PR kern/53562
Add ether_sw_offload_[tr]x: handle TX/RX offload options in software.
Since this violates separation b/w L2 and L3/L4, new files are added
rather than having the routines in sys/net/if_ethersubr.c.
OK msaitoh thorpej
To generate a diff of this commit:
cvs rdiff -u -r0 -r1.1 src/sys/net/ether_sw_offload.c \
src/sys/net/ether_sw_offload.h
cvs rdiff -u -r1.19 -r1.20 src/sys/net/files.net
cvs rdiff -u -r1.12 -r1.13 src/sys/netinet/in_offload.c
cvs rdiff -u -r1.11 -r1.12 src/sys/netinet/in_offload.h
cvs rdiff -u -r1.11 -r1.12 src/sys/netinet6/in6_offload.c
cvs rdiff -u -r1.9 -r1.10 src/sys/netinet6/in6_offload.h
cvs rdiff -u -r1.30 -r1.31 src/sys/rump/net/lib/libnet/Makefile
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53562 CVS commit: src/sys
Date: Wed, 12 Dec 2018 01:46:47 +0000
Module Name: src
Committed By: rin
Date: Wed Dec 12 01:46:47 UTC 2018
Modified Files:
src/sys/net: if.c if_bridge.c if_bridgevar.h
src/sys/rump/librump/rumpnet: net_stub.c
Log Message:
PR kern/53562
Handle TX offload in software when a packet is sent via
bridge_output(). We can send it as is in the following
exceptional cases:
For unicast:
(1) When the destination interface is the same as source.
(2) When the destination supports all TX offload options
specified in a packet.
For multicast/broadcast:
(3) When all the members of the bridge support the specified
TX offload options.
For (3), add sc_csum_flags_tx flag to bridge softc, which is
logical AND b/w capabilities of TX offload options in member
interface (ifp->if_csum_flags_tx). The flag is updated when a
member is (i) added to or (ii) removed from a bridge, or (iii)
if_csum_flags_tx flag of a member interface is manipulated via
ifconfig(8).
Turn on M_CSUM_TSOv[46] bit in ifp->if_csum_flags_tx flag when
TSO[46] is enabled for that interface.
OK msaitoh thorpej
To generate a diff of this commit:
cvs rdiff -u -r1.441 -r1.442 src/sys/net/if.c
cvs rdiff -u -r1.160 -r1.161 src/sys/net/if_bridge.c
cvs rdiff -u -r1.32 -r1.33 src/sys/net/if_bridgevar.h
cvs rdiff -u -r1.36 -r1.37 src/sys/rump/librump/rumpnet/net_stub.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53562 CVS commit: src/sys/rump/net/lib/libshmif
Date: Wed, 12 Dec 2018 01:51:32 +0000
Module Name: src
Committed By: rin
Date: Wed Dec 12 01:51:32 UTC 2018
Modified Files:
src/sys/rump/net/lib/libshmif: if_shmem.c
Log Message:
Add TX/RX offload capabilities to shmif(4). They are emulated in
software by ether_sw_offload_[tr]x().
For rump kernels, if_capabilities for shmemif(4) can be specified
by environmental variable RUMP_SHMIF_CAPENABLE:
setenv RUMP_SHMIF_CAPENABLE 0x7ff80 (all offload)
setenv RUMP_SHMIF_CAPENABLE 0x6aa80 (all TX)
setenv RUMP_SHMIF_CAPENABLE 0x15500 (all RX)
part of PR kern/53562
OK msaitoh
To generate a diff of this commit:
cvs rdiff -u -r1.75 -r1.76 src/sys/rump/net/lib/libshmif/if_shmem.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53562 CVS commit: src/sys
Date: Wed, 12 Dec 2018 01:53:52 +0000
Module Name: src
Committed By: rin
Date: Wed Dec 12 01:53:52 UTC 2018
Modified Files:
src/sys/netinet: ip_output.c
src/sys/netinet6: ip6_output.c
Log Message:
Simplify logic in ip{,6}_output().
Now, we have M_CSUM_TSOv[46] bit in ifp->if_csum_flags_tx when
TSO[46] is enabled for the interface. So we can simply check
whether TSO[46] is required in a packet but missing in the
interface by (sw_csum & M_CSUM_TSOv[46]).
Note that this is a very rare case where TSO[46] is suddenly
turned off during a packet passing b/w TCP and IP.
part of PR kern/53562
OK msaitoh
To generate a diff of this commit:
cvs rdiff -u -r1.307 -r1.308 src/sys/netinet/ip_output.c
cvs rdiff -u -r1.213 -r1.214 src/sys/netinet6/ip6_output.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53562 CVS commit: src/share/man/man4
Date: Wed, 12 Dec 2018 02:26:40 +0000
Module Name: src
Committed By: rin
Date: Wed Dec 12 02:26:40 UTC 2018
Modified Files:
src/share/man/man4: shmif.4
Log Message:
Document capability of TX/RX offload and environment variable
RUMP_SHMIF_CAPENABLE. Bump date.
part of PR kern/53562
To generate a diff of this commit:
cvs rdiff -u -r1.4 -r1.5 src/share/man/man4/shmif.4
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Christoph Badura <bad@bsd.de>
To: gnats-bugs@netbsd.org
Cc: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>,
Masanobu SAITOH <msaitoh@execsw.org>,
Ryota Ozaki <ozaki-r@netbsd.org>
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Wed, 3 Apr 2019 00:58:41 +0200
There is one more problem:
We use the same bits in csum_flags to indicate both that the hardware
*needs to do* offloading and that the hardware *has done* offloading.
IIRC, this wasn't a problem when this interface was introduced, because we
didn't have bridge(4) yet and packets would either travel down the stack
from kernel to harware or up the stack from hardware to kernel only.
Adding bridge(4) changed that, hence the problems.
To finaly resolve that for good we need separate flags to indicate that
the hardware did offloading and what the status of that is.
--chris
From: Masanobu SAITOH <msaitoh@execsw.org>
To: Christoph Badura <bad@bsd.de>, gnats-bugs@netbsd.org
Cc: msaitoh@execsw.org, Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>,
Ryota Ozaki <ozaki-r@netbsd.org>
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Sat, 6 Apr 2019 14:24:20 +0900
On 2019/04/03 7:58, Christoph Badura wrote:
> There is one more problem:
>
> We use the same bits in csum_flags to indicate both that the hardware
> *needs to do* offloading and that the hardware *has done* offloading.
>
> IIRC, this wasn't a problem when this interface was introduced, because we
> didn't have bridge(4) yet and packets would either travel down the stack
> from kernel to harware or up the stack from hardware to kernel only.
>
> Adding bridge(4) changed that, hence the problems.
>
> To finaly resolve that for good we need separate flags to indicate that
> the hardware did offloading and what the status of that is.
In current implementation, csum_flags field is cleared when a packet
is forwarded to an outgoing (interface). The IPv{4,6} forwarding path
and bridge do so. (If a filter function does policy routing, it's also
be required.) If there is a unknown path which doesn't clear csum_flags,
it should be fixed.
Do you know any real use case which get this problem?
To separate flags into input and output, it solves the problem,
but I don't know it's the good solution or not.
It does not directly related to the separating, I think the correct
way is to clear flags when those are evaluated. When a packet is
a tunneling (encapsulated) packet, the flags should be cleared when
the outer header is checked and passed to the next input to not to
misunderstand the input flags.
> --chris
>
--
-----------------------------------------------
SAITOH Masanobu (msaitoh@execsw.org
msaitoh@netbsd.org)
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.