NetBSD Problem Report #53562

From www@NetBSD.org  Thu Aug 30 09:24:58 2018
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 73E997A202
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 30 Aug 2018 09:24:58 +0000 (UTC)
Message-Id: <20180830092457.3CA8D7A261@mollari.NetBSD.org>
Date: Thu, 30 Aug 2018 09:24:57 +0000 (UTC)
From: rokuyama@rk.phys.keio.ac.jp
Reply-To: rokuyama@rk.phys.keio.ac.jp
To: gnats-bugs@NetBSD.org
Subject: bridge(4) breaks segmentation / TX checksum offloading
X-Send-Pr-Version: www-1.0

>Number:         53562
>Category:       kern
>Synopsis:       bridge(4) breaks segmentation / TX checksum offloading
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Aug 30 09:25:00 +0000 2018
>Last-Modified:  Sat Apr 06 05:25:01 +0000 2019
>Originator:     Rin Okuyama
>Release:        8.99.24
>Organization:
School of Science and Technology, Meiji University
>Environment:
NetBSD rpi3b 8.99.24 NetBSD 8.99.24 (RPI3-64) #24: Thu Aug 30 17:39:24 JST 2018  rin@latipes:/var/build/src/sys/arch/evbarm/compile/RPI3-64 evbarm aarch64
>Description:
If a network interface is added to bridge(4), segmentation or TX
checksum offloading do not work. This is because csum_flags are
cleared in bridge_enqueue():

https://nxr.netbsd.org/xref/src/sys/net/if_bridge.c#1401

  1391  void
  1392  bridge_enqueue(struct bridge_softc *sc, struct ifnet *dst_ifp, struct mbuf *m,
  1393      int runfilt)
  1394  {
  ....
  1398          /*
  1399           * Clear any in-bound checksum flags for this packet.
  1400           */
  1401          m->m_pkthdr.csum_flags = 0;
>How-To-Repeat:
Enable segmentation or TX checksum offloading for a network interface,
and add it to bridge(4). Then, data transmission starts to fail.
>Fix:
Not known.

>Audit-Trail:
From: Masanobu SAITOH <msaitoh@execsw.org>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: msaitoh@execsw.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Mon, 3 Sep 2018 17:52:52 +0900

 On 2018/08/30 18:25, rokuyama@rk.phys.keio.ac.jp wrote:
 >> Number:         53562
 >> Category:       kern
 >> Synopsis:       bridge(4) breaks segmentation / TX checksum offloading
 >> Confidential:   no
 >> Severity:       serious
 >> Priority:       medium
 >> Responsible:    kern-bug-people
 >> State:          open
 >> Class:          sw-bug
 >> Submitter-Id:   net
 >> Arrival-Date:   Thu Aug 30 09:25:00 +0000 2018
 >> Originator:     Rin Okuyama
 >> Release:        8.99.24
 >> Organization:
 > School of Science and Technology, Meiji University
 >> Environment:
 > NetBSD rpi3b 8.99.24 NetBSD 8.99.24 (RPI3-64) #24: Thu Aug 30 17:39:24 JST 2018  rin@latipes:/var/build/src/sys/arch/evbarm/compile/RPI3-64 evbarm aarch64
 >> Description:
 > If a network interface is added to bridge(4), segmentation or TX
 > checksum offloading do not work. This is because csum_flags are
 > cleared in bridge_enqueue():
 > 
 > https://nxr.netbsd.org/xref/src/sys/net/if_bridge.c#1401
 > 
 >    1391  void
 >    1392  bridge_enqueue(struct bridge_softc *sc, struct ifnet *dst_ifp, struct mbuf *m,
 >    1393      int runfilt)
 >    1394  {
 >    ....
 >    1398          /*
 >    1399           * Clear any in-bound checksum flags for this packet.
 >    1400           */
 >    1401          m->m_pkthdr.csum_flags = 0;
 >> How-To-Repeat:
 > Enable segmentation or TX checksum offloading for a network interface,
 > and add it to bridge(4). Then, data transmission starts to fail.
 >> Fix:
 > Not known.
 > 

 Is the following change correct?

 Index: if_bridge.c
 ===================================================================
 RCS file: /cvsroot/src/sys/net/if_bridge.c,v
 retrieving revision 1.156
 diff -u -p -r1.156 if_bridge.c
 --- if_bridge.c	25 May 2018 04:40:27 -0000	1.156
 +++ if_bridge.c	3 Sep 2018 08:51:23 -0000
 @@ -1395,11 +1395,6 @@ bridge_enqueue(struct bridge_softc *sc,
   	int len, error;
   	short mflags;

 -	/*
 -	 * Clear any in-bound checksum flags for this packet.
 -	 */
 -	m->m_pkthdr.csum_flags = 0;
 -
   	if (runfilt) {
   		if (pfil_run_hooks(sc->sc_if.if_pfil, &m,
   		    dst_ifp, PFIL_OUT) != 0) {
 @@ -1644,6 +1639,11 @@ bridge_forward(struct bridge_softc *sc,
   	if ((sc->sc_if.if_flags & IFF_RUNNING) == 0)
   		return;

 +	/*
 +	 * Clear any in-bound checksum flags for this packet.
 +	 */
 +	m->m_pkthdr.csum_flags = 0;
 +
   	src_if = m_get_rcvif_psref(m, &psref_src);
   	if (src_if == NULL) {
   		/* Interface is being destroyed? */


 -- 
 -----------------------------------------------
                  SAITOH Masanobu (msaitoh@execsw.org
                                   msaitoh@netbsd.org)

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Masanobu SAITOH <msaitoh@execsw.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum offloading
Date: Mon, 3 Sep 2018 18:18:41 +0900

 On Mon, Sep 3, 2018 at 5:53 PM Masanobu SAITOH <msaitoh@execsw.org> wrote:
 >
 > On 2018/08/30 18:25, rokuyama@rk.phys.keio.ac.jp wrote:
 > >> Number:         53562
 > >> Category:       kern
 > >> Synopsis:       bridge(4) breaks segmentation / TX checksum offloading
 > >> Confidential:   no
 > >> Severity:       serious
 > >> Priority:       medium
 > >> Responsible:    kern-bug-people
 > >> State:          open
 > >> Class:          sw-bug
 > >> Submitter-Id:   net
 > >> Arrival-Date:   Thu Aug 30 09:25:00 +0000 2018
 > >> Originator:     Rin Okuyama
 > >> Release:        8.99.24
 > >> Organization:
 > > School of Science and Technology, Meiji University
 > >> Environment:
 > > NetBSD rpi3b 8.99.24 NetBSD 8.99.24 (RPI3-64) #24: Thu Aug 30 17:39:24 JST 2018  rin@latipes:/var/build/src/sys/arch/evbarm/compile/RPI3-64 evbarm aarch64
 > >> Description:
 > > If a network interface is added to bridge(4), segmentation or TX
 > > checksum offloading do not work. This is because csum_flags are
 > > cleared in bridge_enqueue():
 > >
 > > https://nxr.netbsd.org/xref/src/sys/net/if_bridge.c#1401
 > >
 > >    1391  void
 > >    1392  bridge_enqueue(struct bridge_softc *sc, struct ifnet *dst_ifp, struct mbuf *m,
 > >    1393      int runfilt)
 > >    1394  {
 > >    ....
 > >    1398          /*
 > >    1399           * Clear any in-bound checksum flags for this packet.
 > >    1400           */
 > >    1401          m->m_pkthdr.csum_flags = 0;
 > >> How-To-Repeat:
 > > Enable segmentation or TX checksum offloading for a network interface,
 > > and add it to bridge(4). Then, data transmission starts to fail.
 > >> Fix:
 > > Not known.
 > >
 >
 > Is the following change correct?
 >
 > Index: if_bridge.c
 > ===================================================================
 > RCS file: /cvsroot/src/sys/net/if_bridge.c,v
 > retrieving revision 1.156
 > diff -u -p -r1.156 if_bridge.c
 > --- if_bridge.c 25 May 2018 04:40:27 -0000      1.156
 > +++ if_bridge.c 3 Sep 2018 08:51:23 -0000
 > @@ -1395,11 +1395,6 @@ bridge_enqueue(struct bridge_softc *sc,
 >         int len, error;
 >         short mflags;
 >
 > -       /*
 > -        * Clear any in-bound checksum flags for this packet.
 > -        */
 > -       m->m_pkthdr.csum_flags = 0;
 > -
 >         if (runfilt) {
 >                 if (pfil_run_hooks(sc->sc_if.if_pfil, &m,
 >                     dst_ifp, PFIL_OUT) != 0) {
 > @@ -1644,6 +1639,11 @@ bridge_forward(struct bridge_softc *sc,
 >         if ((sc->sc_if.if_flags & IFF_RUNNING) == 0)
 >                 return;
 >
 > +       /*
 > +        * Clear any in-bound checksum flags for this packet.
 > +        */
 > +       m->m_pkthdr.csum_flags = 0;
 > +
 >         src_if = m_get_rcvif_psref(m, &psref_src);
 >         if (src_if == NULL) {
 >                 /* Interface is being destroyed? */
 >

 ether_input can be called via ether_broadcast that is called from
 bridge_forward.
 So I guess clearing csum_flags should be done just before calling bridge_enqueue
 like the following patch?

   ozaki-r


 diff --git a/sys/net/if_bridge.c b/sys/net/if_bridge.c
 index e8c73f8a199..f92675e3c81 100644
 --- a/sys/net/if_bridge.c
 +++ b/sys/net/if_bridge.c
 @@ -1395,11 +1395,6 @@ bridge_enqueue(struct bridge_softc *sc, struct
 ifnet *dst_ifp, struct mbuf *m,
         int len, error;
         short mflags;

 -       /*
 -        * Clear any in-bound checksum flags for this packet.
 -        */
 -       m->m_pkthdr.csum_flags = 0;
 -
         if (runfilt) {
                 if (pfil_run_hooks(sc->sc_if.if_pfil, &m,
                     dst_ifp, PFIL_OUT) != 0) {
 @@ -1768,6 +1763,11 @@ bridge_forward(struct bridge_softc *sc, struct mbuf *m)

         bridge_release_member(sc, bif, &psref);

 +       /*
 +        * Clear any in-bound checksum flags for this packet.
 +        */
 +       m->m_pkthdr.csum_flags = 0;
 +
         ACQUIRE_GLOBAL_LOCKS();
         bridge_enqueue(sc, dst_if, m, 1);
         RELEASE_GLOBAL_LOCKS();
 @@ -1978,6 +1978,12 @@ bridge_broadcast(struct bridge_softc *sc,
 struct ifnet *src_if,
                                 sc->sc_if.if_oerrors++;
                                 goto next;
                         }
 +
 +                       /*
 +                        * Clear any in-bound checksum flags for this packet.
 +                        */
 +                       mc->m_pkthdr.csum_flags = 0;
 +
                         ACQUIRE_GLOBAL_LOCKS();
                         bridge_enqueue(sc, dst_if, mc, 1);

From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: Ryota Ozaki <ozaki-r@netbsd.org>, Masanobu SAITOH <msaitoh@execsw.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>,
 kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Wed, 5 Sep 2018 21:04:16 +0900

 Thank you for your responses!

 With both patches, TX works fine through an interface added to bridge,
 even if TSO is enabled. However, RX from an host which shares the
 interface via bridge does not work for larger packets (because of TSO?).
 RX of smaller packets and TX work for that host.

 I'm not sure if my setup should work in principle...

From: Christoph Badura <bad@bsd.de>
To: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
Cc: Ryota Ozaki <ozaki-r@netbsd.org>, Masanobu SAITOH <msaitoh@execsw.org>,
	"gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>,
	kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Wed, 5 Sep 2018 23:51:37 +0200

 On Wed, Sep 05, 2018 at 09:04:16PM +0900, Rin Okuyama wrote:
 > With both patches, TX works fine through an interface added to bridge,
 > even if TSO is enabled. However, RX from an host which shares the
 > interface via bridge does not work for larger packets (because of TSO?).
 > RX of smaller packets and TX work for that host.

 I would not expect any TSO flags to be set on a packet that is received
 from another host.  The packet should have been segmented on other host
 already.

 > I'm not sure if my setup should work in principle...

 Maybe you could describe your setup in more detail.  Looking at the PR I
 don't understand what exactly your setup is.

 It might be worth going over the original PR kern/27007 again.  Also note
 the second part of PR kern/21831 by yamt@.

 Note that in kern/27007 the damage was actually done only when ipf was
 enabled on the bridge interface.  It might well be that my change was
 not correct because it was to broad.  Maybe ipf has changed since, so that
 packets do not get corrupted anymore in this case, too.

 Looking back at the code I think the comment could have been worded
 better.  If you move it out of bridge_enqueue() it would be nice if the
 comment could be expanded.  Like, what exactly the circumstances are that
 the flags have been set and why they need clearing.

 Maybe we need ATF tests.  That certainly would be nice.  Also we now have
 more test cases to consider: npf and pf active on the bridge.

 --chris

From: Masanobu SAITOH <msaitoh@execsw.org>
To: Christoph Badura <bad@bsd.de>, Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
Cc: msaitoh@execsw.org, Ryota Ozaki <ozaki-r@netbsd.org>,
 "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Fri, 7 Sep 2018 17:50:58 +0900

 On 2018/09/06 6:51, Christoph Badura wrote:
 > On Wed, Sep 05, 2018 at 09:04:16PM +0900, Rin Okuyama wrote:
 >> With both patches, TX works fine through an interface added to bridge,
 >> even if TSO is enabled. However, RX from an host which shares the
 >> interface via bridge does not work for larger packets (because of TSO?).
 >> RX of smaller packets and TX work for that host.
 > 
 > I would not expect any TSO flags to be set on a packet that is received
 > from another host.  The packet should have been segmented on other host
 > already.
 > 
 >> I'm not sure if my setup should work in principle...
 > 
 > Maybe you could describe your setup in more detail.  Looking at the PR I
 > don't understand what exactly your setup is.

 I think so, too. I read if_bridge.c and I think ozaki-r's change should be
 correct. There might be another bug. The detailed configuration is required
 to reproduce the problem.


 > It might be worth going over the original PR kern/27007 again.  Also note
 > the second part of PR kern/21831 by yamt@.
 > 
 > Note that in kern/27007 the damage was actually done only when ipf was
 > enabled on the bridge interface.  It might well be that my change was
 > not correct because it was to broad.  Maybe ipf has changed since, so that
 > packets do not get corrupted anymore in this case, too.
 > 
 > Looking back at the code I think the comment could have been worded
 > better.  If you move it out of bridge_enqueue() it would be nice if the
 > comment could be expanded.  Like, what exactly the circumstances are that
 > the flags have been set and why they need clearing.

 New patch:
 Index: if_bridge.c
 ===================================================================
 RCS file: /cvsroot/src/sys/net/if_bridge.c,v
 retrieving revision 1.156
 diff -u -p -r1.156 if_bridge.c
 --- if_bridge.c	25 May 2018 04:40:27 -0000	1.156
 +++ if_bridge.c	7 Sep 2018 08:47:47 -0000
 @@ -1395,11 +1395,6 @@ bridge_enqueue(struct bridge_softc *sc,
   	int len, error;
   	short mflags;

 -	/*
 -	 * Clear any in-bound checksum flags for this packet.
 -	 */
 -	m->m_pkthdr.csum_flags = 0;
 -
   	if (runfilt) {
   		if (pfil_run_hooks(sc->sc_if.if_pfil, &m,
   		    dst_ifp, PFIL_OUT) != 0) {
 @@ -1768,6 +1763,13 @@ bridge_forward(struct bridge_softc *sc,

   	bridge_release_member(sc, bif, &psref);

 +	/*
 +	 * Before enqueueing this packet to the destination interface,
 +	 * clear any in-bound checksum flags to prevent them from being
 +	 * misused as out-bound flags.
 +	 */
 +	m->m_pkthdr.csum_flags = 0;
 +
   	ACQUIRE_GLOBAL_LOCKS();
   	bridge_enqueue(sc, dst_if, m, 1);
   	RELEASE_GLOBAL_LOCKS();
 @@ -1978,6 +1980,13 @@ bridge_broadcast(struct bridge_softc *sc
   				sc->sc_if.if_oerrors++;
   				goto next;
   			}
 +			/*
 +			 * Before enqueueing this packet to the destination
 +			 * interface, clear any in-bound checksum flags to
 +			 * prevent them from being misused as out-bound flags.
 +			 */
 +			m->m_pkthdr.csum_flags = 0;
 +
   			ACQUIRE_GLOBAL_LOCKS();
   			bridge_enqueue(sc, dst_if, mc, 1);
   			RELEASE_GLOBAL_LOCKS();


 > Maybe we need ATF tests.  That certainly would be nice.  Also we now have
 > more test cases to consider: npf and pf active on the bridge.

 I think adding offload functions into shmif(4) is a good idea to test offload
 stuff.

 > 
 > --chris
 > 


 -- 
 -----------------------------------------------
                  SAITOH Masanobu (msaitoh@execsw.org
                                   msaitoh@netbsd.org)

From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: Masanobu SAITOH <msaitoh@execsw.org>, Christoph Badura <bad@bsd.de>
Cc: Ryota Ozaki <ozaki-r@netbsd.org>,
 "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Sat, 8 Sep 2018 08:48:04 +0900

 On 2018/09/07 17:50, Masanobu SAITOH wrote:
 > On 2018/09/06 6:51, Christoph Badura wrote:
 >> On Wed, Sep 05, 2018 at 09:04:16PM +0900, Rin Okuyama wrote:
 >>> With both patches, TX works fine through an interface added to bridge,
 >>> even if TSO is enabled. However, RX from an host which shares the
 >>> interface via bridge does not work for larger packets (because of TSO?).
 >>> RX of smaller packets and TX work for that host.
 >>
 >> I would not expect any TSO flags to be set on a packet that is received
 >> from another host.  The packet should have been segmented on other host
 >> already.
 >>
 >>> I'm not sure if my setup should work in principle...
 >>
 >> Maybe you could describe your setup in more detail.  Looking at the PR I
 >> don't understand what exactly your setup is.
 > 
 > I think so, too. I read if_bridge.c and I think ozaki-r's change should be
 > correct. There might be another bug. The detailed configuration is required
 > to reproduce the problem.

 Yeah, sorry for the poor explanation. I'm working on an userland
 application on Raspberry Pi, which emulates an ethernet adapter for
 a client connected via GPIO. Then, it sends and receives packets for
 the client using tap device bridged to a NIC. When TSO is enabled for
 that NIC, the client cannot retrieve file via ftp, for example.

 Well, my setup is hard to reproduce. The following would be a simplest
 example where your patch does not work:

 - NetBSD/amd64-currnet with wm0 connected to LAN

 - qemu-3.0.0nb2 installed from pkgsrc

 - setup tap and bridge as follows:

      # ifconfig tap0 create up
      # ifconfig bridge0 create
      # brconfig bridge0 add wm0 add tap0 up

 - run NetBSD/amd64 8.0 on QEMU with tap enabled:

      # qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
        -display curses -net tap,fd=3 -net nic 3<>/dev/tap0

 Then, the virtual host can send and receive small packets, e.g.,
 ping works well regardless of whether TSO is enabled or not for wm0.
 However, it cannot receive larger packets when TSO is enabled, e.g.,
 file cannot be retrieved through ftp. If TSO is turned off, everything
 works fine.

 >> It might be worth going over the original PR kern/27007 again.  Also note
 >> the second part of PR kern/21831 by yamt@.

 The second part of kern/21831 still affects at least for tap?

 >> Maybe we need ATF tests.  That certainly would be nice.  Also we now have
 >> more test cases to consider: npf and pf active on the bridge.
 > 
 > I think adding offload functions into shmif(4) is a good idea to test offload
 > stuff.

 Well, I'm working on offload engine for shmif(4). The application
 mentioned above emulates TSO and TX/RX checksum offload by software.
 The algorithm, or at least ideas, can be easily applied to shmif(4).

 Thanks,
 rin

From: Masanobu SAITOH <msaitoh@execsw.org>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, rokuyama@rk.phys.keio.ac.jp
Cc: msaitoh@execsw.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Mon, 10 Sep 2018 16:34:38 +0900

 Hi.

 On 2018/09/08 8:50, Rin Okuyama wrote:
 > The following reply was made to PR kern/53562; it has been noted by GNATS.
 > 
 > From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
 > To: Masanobu SAITOH <msaitoh@execsw.org>, Christoph Badura <bad@bsd.de>
 > Cc: Ryota Ozaki <ozaki-r@netbsd.org>,
 >   "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org,
 >   gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
 > Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 >   offloading
 > Date: Sat, 8 Sep 2018 08:48:04 +0900
 > 
 >   On 2018/09/07 17:50, Masanobu SAITOH wrote:
 >   > On 2018/09/06 6:51, Christoph Badura wrote:
 >   >> On Wed, Sep 05, 2018 at 09:04:16PM +0900, Rin Okuyama wrote:
 >   >>> With both patches, TX works fine through an interface added to bridge,
 >   >>> even if TSO is enabled. However, RX from an host which shares the
 >   >>> interface via bridge does not work for larger packets (because of TSO?).
 >   >>> RX of smaller packets and TX work for that host.
 >   >>
 >   >> I would not expect any TSO flags to be set on a packet that is received
 >   >> from another host.  The packet should have been segmented on other host
 >   >> already.
 >   >>
 >   >>> I'm not sure if my setup should work in principle...
 >   >>
 >   >> Maybe you could describe your setup in more detail.  Looking at the PR I
 >   >> don't understand what exactly your setup is.
 >   >
 >   > I think so, too. I read if_bridge.c and I think ozaki-r's change should be
 >   > correct. There might be another bug. The detailed configuration is required
 >   > to reproduce the problem.
 >   
 >   Yeah, sorry for the poor explanation. I'm working on an userland
 >   application on Raspberry Pi, which emulates an ethernet adapter for
 >   a client connected via GPIO. Then, it sends and receives packets for
 >   the client using tap device bridged to a NIC. When TSO is enabled for
 >   that NIC, the client cannot retrieve file via ftp, for example.
 >   
 >   Well, my setup is hard to reproduce. The following would be a simplest
 >   example where your patch does not work:
 >   
 >   - NetBSD/amd64-currnet with wm0 connected to LAN
 >   
 >   - qemu-3.0.0nb2 installed from pkgsrc
 >   
 >   - setup tap and bridge as follows:
 >   
 >        # ifconfig tap0 create up
 >        # ifconfig bridge0 create
 >        # brconfig bridge0 add wm0 add tap0 up
 >   
 >   - run NetBSD/amd64 8.0 on QEMU with tap enabled:
 >   
 >        # qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
 >          -display curses -net tap,fd=3 -net nic 3<>/dev/tap0
 >   
 >   Then, the virtual host can send and receive small packets, e.g.,
 >   ping works well regardless of whether TSO is enabled or not for wm0.

   0)
   Is this "wm0" on the guest or the host?
   If my understanding is correct, writing packet via tap on host is not related
   to TSO...

   1)
   If you're using wm(4) on the guest, please try vioif(4) and see if the problem
 occurs or not.

   2)
   For checksum oofloading, the following kernel options will help for debugging:
 	INET_CSUM_COUNTERS
 	TCP_CSUM_COUNTERS
 	UDP_CSUM_COUNTERS

   These options add some event counters:
 > inet swcsum                                                  0    0 misc
 > inet hwcsum ok                                          419667   69 misc
 > inet hwcsum bad                                              0    0 misc
 > tcp6 swcsum                                                  0    0 misc
 > tcp6 hwcsum data                                             0    0 misc
 > tcp6 hwcsum ok                                               0    0 misc
 > tcp6 hwcsum bad                                              0    0 misc
 > tcp swcsum                                                   0    0 misc
 > tcp hwcsum data                                              0    0 misc
 > tcp hwcsum ok                                           419430   69 misc
 > tcp hwcsum bad                                               0    0 misc
 > udp swcsum                                                   0    0 misc
 > udp hwcsum data                                              0    0 misc
 > udp hwcsum ok                                              237    0 misc
 > udp hwcsum bad                                               0    0 misc
 > udp6 swcsum                                                  0    0 misc
 > udp6 hwcsum data                                             0    0 misc
 > udp6 hwcsum ok                                               0    0 misc
 > udp6 hwcsum bad                                              0    0 misc

   3)
   wm(4) has a lot of event counters, some of them are useful for debugging.
   To use them add "options WM_EVENT_COUNTERS" to your kernel config file.

   4)
   For TSO, current wm(4) is not perfect because it doesn't use m_defrag().
   I have not-yet-committed patch:

 	http://www.netbsd.org/~msaitoh/wm-defrag-20180906-0.dif

 > % vmstat -ev | grep toomany
 > wm0 txq00txtoomanyseg                                      0    0 misc

   If "wmX txqYYtxtoomanyseg" is increasing, the above diff would fix the
 problem.

 >   However, it cannot receive larger packets when TSO is enabled, e.g.,
 >   file cannot be retrieved through ftp. If TSO is turned off, everything
 >   works fine.
 >   
 >   >> It might be worth going over the original PR kern/27007 again.  Also note
 >   >> the second part of PR kern/21831 by yamt@.
 >   
 >   The second part of kern/21831 still affects at least for tap?
 >   
 >   >> Maybe we need ATF tests.  That certainly would be nice.  Also we now have
 >   >> more test cases to consider: npf and pf active on the bridge.
 >   >
 >   > I think adding offload functions into shmif(4) is a good idea to test offload
 >   > stuff.
 >   
 >   Well, I'm working on offload engine for shmif(4). The application
 >   mentioned above emulates TSO and TX/RX checksum offload by software.
 >   The algorithm, or at least ideas, can be easily applied to shmif(4).
 >   
 >   Thanks,
 >   rin
 >   
 > 


 -- 
 -----------------------------------------------
                  SAITOH Masanobu (msaitoh@execsw.org
                                   msaitoh@netbsd.org)

From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: Masanobu SAITOH <msaitoh@execsw.org>, gnats-bugs@NetBSD.org,
 kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Wed, 12 Sep 2018 21:33:20 +0900

 Hi, sorry for the delay in the response.

 On 2018/09/10 16:34, Masanobu SAITOH wrote:
 ...
 > On 2018/09/08 8:50, Rin Okuyama wrote:
 ...
 >>   The following would be a simplest
 >>   example where your patch does not work:
 >>   - NetBSD/amd64-currnet with wm0 connected to LAN
 >>   - qemu-3.0.0nb2 installed from pkgsrc
 >>   - setup tap and bridge as follows:
 >>        # ifconfig tap0 create up
 >>        # ifconfig bridge0 create
 >>        # brconfig bridge0 add wm0 add tap0 up
 >>   - run NetBSD/amd64 8.0 on QEMU with tap enabled:
 >>        # qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
 >>          -display curses -net tap,fd=3 -net nic 3<>/dev/tap0
 >>   Then, the virtual host can send and receive small packets, e.g.,
 >>   ping works well regardless of whether TSO is enabled or not for wm0.
 [quotation added by RO from the previous message of mine]
 >>   However, it cannot receive larger packets when TSO is enabled, e.g.,
 >>   file cannot be retrieved through ftp. If TSO is turned off, everything
 >>   works fine. 
 [quotation done]
 > 
 >   0)
 >   Is this "wm0" on the guest or the host?
 >   If my understanding is correct, writing packet via tap on host is not related
 >   to TSO...

 Sorry for bothering you many times. This "wm0" is for the host. Also,
 in this example, the guest fails to receive data from the ftp server
 working on the host. I forgot to test servers other than the host. It
 worked fine; only connecting from the guest to host fails.

 >   1)
 >   If you're using wm(4) on the guest, please try vioif(4) and see if the problem
 > occurs or not.

 I tried vioif(4) on the guest but the situation does not change.

 >   2)
 >   For checksum oofloading, the following kernel options will help for debugging:
 >      INET_CSUM_COUNTERS
 >      TCP_CSUM_COUNTERS
 >      UDP_CSUM_COUNTERS
 > 
 >   These options add some event counters:
 ...
 > 
 >   3)
 >   wm(4) has a lot of event counters, some of them are useful for debugging.
 >   To use them add "options WM_EVENT_COUNTERS" to your kernel config file.

 Thanks! I enabled these options.

 >   4)
 >   For TSO, current wm(4) is not perfect because it doesn't use m_defrag().
 >   I have not-yet-committed patch:
 > 
 >      http://www.netbsd.org/~msaitoh/wm-defrag-20180906-0.dif
 > 
 >> % vmstat -ev | grep toomany
 >> wm0 txq00txtoomanyseg                                      0    0 misc
 > 
 >   If "wmX txqYYtxtoomanyseg" is increasing, the above diff would fix the
 > problem.

 I tried if_wm.c r1.587, which already contains this patch. Counter
 "wmX txqYYtxtoomanyseg" was not increased.

 I examined how packets flow in the example above, from the ftp server
 on host (wm0 on host) to the ftp client on guest (tap0 on host ->
 vioif0 on guest):

 tcp_output()
 -> ip_output()
 -> ip_if_output()
 -> if_output_lock()
 -> ifp->if_output = ether_output()
 -> bridge_output()
 -> bridge_enqueue()

 When TSO4 is enabled for wm0 on the host, tcp_output() attempts to send
 packets larger than MTU. If the destination interface is wm0, i.e.,
 dst_ifp == ifp in bridge_output(), there's no problem with your patch
 for if_bridge.c; packets are segmented automatically by the HW. However,
 in this case, the destination is tap0, and bridge_enqueue() send packets
 larger than MTU for tap0. This is, of course, illegal, since there is no
 HW by which packets are segmented. The guest on QEMU, therefore, cannot
 receive packets. This is the scenario how the failure occurs in my
 example of QEMU above.

 The similar problems occur when one of other TX offload options, TSO6,
 IP4CSUM-TX, TCPnCSUM-TX, or UDPnCSUM-TX, are enabled. For example,
 consider the case of IP4CSUM-TX. When this option is enabled for wm0
 on the host, checksumming is omitted in ip_output(). Then, when the
 destination is tap0, packets with wrong checksum are sent.

 This patch disable TX offload when the interface is added to bridge:

 http://www.netbsd.org/~rin/disable_tx_offload_when_bridged_20180912.patch

 With this patch, the guest on QEMU can retrieve files from the host
 without problem, even if TX offload options are enabled, which supports
 the discussion above. No need to modify if_bridge.c from the original
 (r1.156) in this case.

 I don't know this patch is preferable; there should be better ways to
 handle the problem. Packets need to be segmented or checksummed only
 if the destination in bridge_output() is different from the original
 interface. I will examine further if needed.

 Thanks,
 rin

From: Masanobu SAITOH <msaitoh@execsw.org>
To: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>, gnats-bugs@NetBSD.org,
 kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: msaitoh@execsw.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Fri, 14 Sep 2018 20:17:15 +0900

 On 2018/09/12 21:33, Rin Okuyama wrote:
 > Hi, sorry for the delay in the response.
 > 
 > On 2018/09/10 16:34, Masanobu SAITOH wrote:
 > ...
 >> On 2018/09/08 8:50, Rin Okuyama wrote:
 > ...
 >>>   The following would be a simplest
 >>>   example where your patch does not work:
 >>>   - NetBSD/amd64-currnet with wm0 connected to LAN
 >>>   - qemu-3.0.0nb2 installed from pkgsrc
 >>>   - setup tap and bridge as follows:
 >>>        # ifconfig tap0 create up
 >>>        # ifconfig bridge0 create
 >>>        # brconfig bridge0 add wm0 add tap0 up
 >>>   - run NetBSD/amd64 8.0 on QEMU with tap enabled:
 >>>        # qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
 >>>          -display curses -net tap,fd=3 -net nic 3<>/dev/tap0
 >>>   Then, the virtual host can send and receive small packets, e.g.,
 >>>   ping works well regardless of whether TSO is enabled or not for wm0.
 > [quotation added by RO from the previous message of mine]
 >>>   However, it cannot receive larger packets when TSO is enabled, e.g.,
 >>>   file cannot be retrieved through ftp. If TSO is turned off, everything
 >>>   works fine.
 > [quotation done]
 >>
 >>   0)
 >>   Is this "wm0" on the guest or the host?
 >>   If my understanding is correct, writing packet via tap on host is not related
 >>   to TSO...
 > 

 I realized that this problem is not only for TSO but for all outgoing offload flags.


 > Sorry for bothering you many times. This "wm0" is for the host. Also,
 > in this example, the guest fails to receive data from the ftp server
 > working on the host. I forgot to test servers other than the host. It
 > worked fine; only connecting from the guest to host fails.

 I see. I could reproduce the same problem with Xen dom0 & domU by adding only ip4csum
 flag to wm0.

 >>   1)
 >>   If you're using wm(4) on the guest, please try vioif(4) and see if the problem
 >> occurs or not.
 > 
 > I tried vioif(4) on the guest but the situation does not change.
 > 
 >>   2)
 >>   For checksum oofloading, the following kernel options will help for debugging:
 >>      INET_CSUM_COUNTERS
 >>      TCP_CSUM_COUNTERS
 >>      UDP_CSUM_COUNTERS
 >>
 >>   These options add some event counters:
 > ...
 >>
 >>   3)
 >>   wm(4) has a lot of event counters, some of them are useful for debugging.
 >>   To use them add "options WM_EVENT_COUNTERS" to your kernel config file.
 > 
 > Thanks! I enabled these options.
 > 
 >>   4)
 >>   For TSO, current wm(4) is not perfect because it doesn't use m_defrag().
 >>   I have not-yet-committed patch:
 >>
 >>      http://www.netbsd.org/~msaitoh/wm-defrag-20180906-0.dif
 >>
 >>> % vmstat -ev | grep toomany
 >>> wm0 txq00txtoomanyseg                                      0    0 misc
 >>
 >>   If "wmX txqYYtxtoomanyseg" is increasing, the above diff would fix the
 >> problem.
 > 
 > I tried if_wm.c r1.587, which already contains this patch. Counter
 > "wmX txqYYtxtoomanyseg" was not increased.
 > 
 > I examined how packets flow in the example above, from the ftp server
 > on host (wm0 on host) to the ftp client on guest (tap0 on host ->
 > vioif0 on guest):
 > 
 > tcp_output()
 > -> ip_output()
 > -> ip_if_output()
 > -> if_output_lock()
 > -> ifp->if_output = ether_output()
 > -> bridge_output()
 > -> bridge_enqueue()
 > 
 > When TSO4 is enabled for wm0 on the host, tcp_output() attempts to send
 > packets larger than MTU. If the destination interface is wm0, i.e.,
 > dst_ifp == ifp in bridge_output(), there's no problem with your patch
 > for if_bridge.c; packets are segmented automatically by the HW. However,
 > in this case, the destination is tap0, and bridge_enqueue() send packets
 > larger than MTU for tap0. This is, of course, illegal, since there is no
 > HW by which packets are segmented. The guest on QEMU, therefore, cannot
 > receive packets. This is the scenario how the failure occurs in my
 > example of QEMU above.
 > 
 > The similar problems occur when one of other TX offload options, TSO6,
 > IP4CSUM-TX, TCPnCSUM-TX, or UDPnCSUM-TX, are enabled. For example,
 > consider the case of IP4CSUM-TX. When this option is enabled for wm0
 > on the host, checksumming is omitted in ip_output(). Then, when the
 > destination is tap0, packets with wrong checksum are sent.
 > 
 > This patch disable TX offload when the interface is added to bridge:
 > 
 > http://www.netbsd.org/~rin/disable_tx_offload_when_bridged_20180912.patch

 I don't know if it's good to doing in L3...

 It seems that both FreeBSD and DragonFly clear all bridge member's outgoing
 offload flags when adding a member to a bridge. I also don't know if
 this is good solution or not...

   My diff have been committed as if_bridge.c rev. 1.156 now.

 > Module Name:	src
 > Committed By:	msaitoh
 > Date:		Fri Sep 14 11:05:09 UTC 2018
 > 
 > Modified Files:
 > 	src/sys/net: if_bridge.c
 > 
 > Log Message:
 >  Fix a bug that bridge_enqueue() incorrectly cleared outgoing packet's offload
 > flags. bridge_enqueue() is called from bridge_output() when a packet is
 > spontaneous. Clear csum_flags before calling brige_enqueue() in
 > bridge_forward() or bridge_broadcast() instead of in the beginning of
 > bridge_enqueue().
 > 
 > Note that this change doesn't fix a problem on the following configuration:
 > 
 > 	A bridge has two or more interfaces.
 > 
 > 	An address is assigned to an bridge member interface and
 > 	some offload flags are set.
 > 
 > 	Another interface has no address and has no any offload flag.
 > 
 > XXX pullup-[78]
 > 
 > 
 > To generate a diff of this commit:
 > cvs rdiff -u -r1.156 -r1.157 src/sys/net/if_bridge.c



 > With this patch, the guest on QEMU can retrieve files from the host
 > without problem, even if TX offload options are enabled, which supports
 > the discussion above. No need to modify if_bridge.c from the original
 > (r1.156) in this case.
 > 
 > I don't know this patch is preferable; there should be better ways to
 > handle the problem. Packets need to be segmented or checksummed only
 > if the destination in bridge_output() is different from the original
 > interface. I will examine further if needed.
 > 
 > Thanks,
 > rin


 -- 
 -----------------------------------------------
                  SAITOH Masanobu (msaitoh@execsw.org
                                   msaitoh@netbsd.org)

From: Masanobu SAITOH <msaitoh@execsw.org>
To: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>, gnats-bugs@NetBSD.org,
 kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: msaitoh@execsw.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Fri, 14 Sep 2018 20:26:20 +0900

 On 2018/09/14 20:17, Masanobu SAITOH wrote:
 > On 2018/09/12 21:33, Rin Okuyama wrote:
 >> Hi, sorry for the delay in the response.
 >>
 >> On 2018/09/10 16:34, Masanobu SAITOH wrote:
 >> ...
 >>> On 2018/09/08 8:50, Rin Okuyama wrote:
 >> ...
 >>>>   The following would be a simplest
 >>>>   example where your patch does not work:
 >>>>   - NetBSD/amd64-currnet with wm0 connected to LAN
 >>>>   - qemu-3.0.0nb2 installed from pkgsrc
 >>>>   - setup tap and bridge as follows:
 >>>>        # ifconfig tap0 create up
 >>>>        # ifconfig bridge0 create
 >>>>        # brconfig bridge0 add wm0 add tap0 up
 >>>>   - run NetBSD/amd64 8.0 on QEMU with tap enabled:
 >>>>        # qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
 >>>>          -display curses -net tap,fd=3 -net nic 3<>/dev/tap0
 >>>>   Then, the virtual host can send and receive small packets, e.g.,
 >>>>   ping works well regardless of whether TSO is enabled or not for wm0.
 >> [quotation added by RO from the previous message of mine]
 >>>>   However, it cannot receive larger packets when TSO is enabled, e.g.,
 >>>>   file cannot be retrieved through ftp. If TSO is turned off, everything
 >>>>   works fine.
 >> [quotation done]
 >>>
 >>>   0)
 >>>   Is this "wm0" on the guest or the host?
 >>>   If my understanding is correct, writing packet via tap on host is not related
 >>>   to TSO...
 >>
 > 
 > I realized that this problem is not only for TSO but for all outgoing offload flags.
 > 
 > 
 >> Sorry for bothering you many times. This "wm0" is for the host. Also,
 >> in this example, the guest fails to receive data from the ftp server
 >> working on the host. I forgot to test servers other than the host. It
 >> worked fine; only connecting from the guest to host fails.
 > 
 > I see. I could reproduce the same problem with Xen dom0 & domU by adding only ip4csum
 > flag to wm0.
 > 
 >>>   1)
 >>>   If you're using wm(4) on the guest, please try vioif(4) and see if the problem
 >>> occurs or not.
 >>
 >> I tried vioif(4) on the guest but the situation does not change.
 >>
 >>>   2)
 >>>   For checksum oofloading, the following kernel options will help for debugging:
 >>>      INET_CSUM_COUNTERS
 >>>      TCP_CSUM_COUNTERS
 >>>      UDP_CSUM_COUNTERS
 >>>
 >>>   These options add some event counters:
 >> ...
 >>>
 >>>   3)
 >>>   wm(4) has a lot of event counters, some of them are useful for debugging.
 >>>   To use them add "options WM_EVENT_COUNTERS" to your kernel config file.
 >>
 >> Thanks! I enabled these options.
 >>
 >>>   4)
 >>>   For TSO, current wm(4) is not perfect because it doesn't use m_defrag().
 >>>   I have not-yet-committed patch:
 >>>
 >>>      http://www.netbsd.org/~msaitoh/wm-defrag-20180906-0.dif
 >>>
 >>>> % vmstat -ev | grep toomany
 >>>> wm0 txq00txtoomanyseg                                      0    0 misc
 >>>
 >>>   If "wmX txqYYtxtoomanyseg" is increasing, the above diff would fix the
 >>> problem.
 >>
 >> I tried if_wm.c r1.587, which already contains this patch. Counter
 >> "wmX txqYYtxtoomanyseg" was not increased.
 >>
 >> I examined how packets flow in the example above, from the ftp server
 >> on host (wm0 on host) to the ftp client on guest (tap0 on host ->
 >> vioif0 on guest):
 >>
 >> tcp_output()
 >> -> ip_output()
 >> -> ip_if_output()
 >> -> if_output_lock()
 >> -> ifp->if_output = ether_output()
 >> -> bridge_output()
 >> -> bridge_enqueue()
 >>
 >> When TSO4 is enabled for wm0 on the host, tcp_output() attempts to send
 >> packets larger than MTU. If the destination interface is wm0, i.e.,
 >> dst_ifp == ifp in bridge_output(), there's no problem with your patch
 >> for if_bridge.c; packets are segmented automatically by the HW. However,
 >> in this case, the destination is tap0, and bridge_enqueue() send packets
 >> larger than MTU for tap0. This is, of course, illegal, since there is no
 >> HW by which packets are segmented. The guest on QEMU, therefore, cannot
 >> receive packets. This is the scenario how the failure occurs in my
 >> example of QEMU above.
 >>
 >> The similar problems occur when one of other TX offload options, TSO6,
 >> IP4CSUM-TX, TCPnCSUM-TX, or UDPnCSUM-TX, are enabled. For example,
 >> consider the case of IP4CSUM-TX. When this option is enabled for wm0
 >> on the host, checksumming is omitted in ip_output(). Then, when the
 >> destination is tap0, packets with wrong checksum are sent.
 >>
 >> This patch disable TX offload when the interface is added to bridge:
 >>
 >> http://www.netbsd.org/~rin/disable_tx_offload_when_bridged_20180912.patch
 > 
 > I don't know if it's good to doing in L3...
 > 
 > It seems that both FreeBSD and DragonFly clear all bridge member's outgoing
 > offload flags when adding a member to a bridge. I also don't know if
 > this is good solution or not...
 > 
 >   My diff have been committed as if_bridge.c rev. 1.156 now.
 > 
 >> Module Name:    src
 >> Committed By:    msaitoh
 >> Date:        Fri Sep 14 11:05:09 UTC 2018
 >>
 >> Modified Files:
 >>     src/sys/net: if_bridge.c
 >>
 >> Log Message:
 >>  Fix a bug that bridge_enqueue() incorrectly cleared outgoing packet's offload
 >> flags. bridge_enqueue() is called from bridge_output() when a packet is
 >> spontaneous. Clear csum_flags before calling brige_enqueue() in
 >> bridge_forward() or bridge_broadcast() instead of in the beginning of
 >> bridge_enqueue().
 >>
 >> Note that this change doesn't fix a problem on the following configuration:
 >>
 >>     A bridge has two or more interfaces.
 >>
 >>     An address is assigned to an bridge member interface and
 >>     some offload flags are set.
 >>
 >>     Another interface has no address and has no any offload flag.
 >>
 >> XXX pullup-[78]
 >>
 >>
 >> To generate a diff of this commit:
 >> cvs rdiff -u -r1.156 -r1.157 src/sys/net/if_bridge.c
 > 
 > 
 > 
 >> With this patch, the guest on QEMU can retrieve files from the host
 >> without problem, even if TX offload options are enabled, which supports
 >> the discussion above. No need to modify if_bridge.c from the original
 >> (r1.156) in this case.
 >>
 >> I don't know this patch is preferable; there should be better ways to
 >> handle the problem. Packets need to be segmented or checksummed only
 >> if the destination in bridge_output() is different from the original
 >> interface. I will examine further if needed.
 >>
 >> Thanks,
 >> rin
 > 
 > 

 FYI:
 	http://www.netbsd.org/~msaitoh/packetflow017.pdf

 -- 
 -----------------------------------------------
                  SAITOH Masanobu (msaitoh@execsw.org
                                   msaitoh@netbsd.org)

From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: Masanobu SAITOH <msaitoh@execsw.org>, gnats-bugs@NetBSD.org,
 kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Sun, 16 Sep 2018 22:49:24 +0900

 On 2018/09/14 20:17, Masanobu SAITOH wrote:
 > On 2018/09/12 21:33, Rin Okuyama wrote:
 ...
 >> This patch disable TX offload when the interface is added to bridge:
 >>
 >> http://www.netbsd.org/~rin/disable_tx_offload_when_bridged_20180912.patch
 > 
 > I don't know if it's good to doing in L3...
 > 
 > It seems that both FreeBSD and DragonFly clear all bridge member's outgoing
 > offload flags when adding a member to a bridge. I also don't know if
 > this is good solution or not...

 If I understand correctly, since TX offload is for L3 (and L4), we
 inevitably need something to do in L3...

 I implemented software TX/RX offload engines for ethernet:

    http://www.netbsd.org/~rin/ether_offload_20180916.patch

 Using these routines, I wrote a patch where TX offloading is canceled in
 bridge_output() when the destination interface is different from the
 source interface:

    http://www.netbsd.org/~rin/bridge_tx_offload_20180916.patch

 With this patch, the guest successfully communicates with the host even
 if any TX offloading is enabled on the host side.

 Also, I added TX/RX offloading support to shmif(4):

    http://www.netbsd.org/~rin/shmif_offload_20180916.patch

 Since I'm not familiar with rump, this patch has not been tested
 systematically yet. But it seems to work to some extent...

 Thanks,
 rin

From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: Masanobu SAITOH <msaitoh@execsw.org>, gnats-bugs@NetBSD.org,
 kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Mon, 17 Sep 2018 19:36:47 +0900

 On 2018/09/16 22:49, Rin Okuyama wrote:
 > I implemented software TX/RX offload engines for ethernet:
 > 
 >    http://www.netbsd.org/~rin/ether_offload_20180916.patch
 > 
 > Using these routines, I wrote a patch where TX offloading is canceled in
 > bridge_output() when the destination interface is different from the
 > source interface:
 > 
 >    http://www.netbsd.org/~rin/bridge_tx_offload_20180916.patch
 > 
 > With this patch, the guest successfully communicates with the host even
 > if any TX offloading is enabled on the host side.
 > 
 > Also, I added TX/RX offloading support to shmif(4):
 > 
 >    http://www.netbsd.org/~rin/shmif_offload_20180916.patch
 > 
 > Since I'm not familiar with rump, this patch has not been tested
 > systematically yet. But it seems to work to some extent...

 I fixed a bug in the patch for ethernet:

      http://www.netbsd.org/~rin/ether_offload_20180917.patch

 Other two patches are not updated:

      http://www.netbsd.org/~rin/bridge_tx_offload_20180916.patch
      http://www.netbsd.org/~rin/shmif_offload_20180916.patch

 Thanks,
 rin

From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Tue, 4 Dec 2018 09:32:45 +0900

 First, let me summarize the current status of the problem. Most problems
 have been resolved by msaito. The remaining problem is that transmission
 from interface if0 to interface if1 fails when any TX offload option is
 enabled for if0 but disabled for if1.

 We need handle TX offload in software in bridge_output() when some TX
 offload option is enabled for a packet and it is not available for the
 destination interface. The attached patches deal with this as follows.

 For unicast:

 (1) When the destination interface is the same as source interface, we
      do not need to handle offload options; send a packet as is.

 (2) When the destination is different from the source, check whether TX
      offload options specified in a packet is supported by the
      destination. If so, we can send it as is. Otherwise, handle these
      options in software.

 For multicast/broadcast:

 (3) Send a packet as is if all the members of that bridge support given
      TX offload options. Otherwise, handle these options in software.

 For (3) we need calculate logical AND b/w capabilities of TX offload
 options in member interface (ifp->if_csum_flags_tx). I've added
 sc_csum_flags_tx flag to bridge softc for this purpose. This flag is
 updated when a member is (i) added to or (ii) removed from a bridge, or
 (iii) if_csum_flags_tx flag of a member interface is manipulated via
 ifconfig(8).

 Also, currently, if_csum_flags_tx do not contains M_CSUM_TSOv[46]. I've
 added these flags when TSOv[46] is enabled for that interface. As far
 as I can see, no one seems to depend on the old behavior.

 I've separated the patch into four parts for readability.

 (a) http://www.netbsd.org/~rin/bridge_20181204/ether_sw_offload_20181204.patch

 This patch provides routines to handle TX/RX offload options in software
 for ethernet. Since this breaks separation between L2 and L3/L4, I've
 added new files (ether_sw_offload.[ch]) rather than adding the routines
 to existing files.

 (b) http://www.netbsd.org/~rin/bridge_20181204/bridge_sw_offload_20181204.patch

 This is main patch to deal with the problem described above by using (a).

 (c) http://www.netbsd.org/~rin/bridge_20181204/if_shmem_sw_offload_20181204.patch

 This patch adds TX/RX offload support for shmemif(4) using (a). This may
 be useful for testing offload related codes in the ATF framework. This
 patch also contains a test code written by msaito; if_capabilities for
 shmemif(4) can be specified by environmental variable
 RUMP_SHMIF_CAPENABLE:

    setenv RUMP_SHMIF_CAPENABLE 0x7ff80 (all offload)
    setenv RUMP_SHMIF_CAPENABLE 0x6aa80 (all TX)
    setenv RUMP_SHMIF_CAPENABLE 0x15500 (all RX)

 for definition of flags, see sys/net/if.h:

    https://nxr.netbsd.org/xref/src/sys/net/if.h#591

 I've tested all tests in /usr/tests passed with all offload options
 enabled (RUMP_SHMIF_CAPENABLE=0x7ff80).

 (d) http://www.netbsd.org/~rin/bridge_20181204/simplify_ip_output_20181204.patch

 This patch simplify logics in ip{,6}_output() by using M_CSUM_TSOv[46]
 bits in ifp->if_csum_flags_tx.

 Any comments and suggestions are welcomed. I thank msaito for fruitful
 discussion!

 rin

From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Tue, 11 Dec 2018 17:38:46 +0900

 On 2018/12/04 9:35, Rin Okuyama wrote:
 >   First, let me summarize the current status of the problem. Most problems
 >   have been resolved by msaito. The remaining problem is that transmission
 >   from interface if0 to interface if1 fails when any TX offload option is
 >   enabled for if0 but disabled for if1.
 >   
 >   We need handle TX offload in software in bridge_output() when some TX
 >   offload option is enabled for a packet and it is not available for the
 >   destination interface. The attached patches deal with this as follows.
 >   
 >   For unicast:
 >   
 >   (1) When the destination interface is the same as source interface, we
 >        do not need to handle offload options; send a packet as is.
 >   
 >   (2) When the destination is different from the source, check whether TX
 >        offload options specified in a packet is supported by the
 >        destination. If so, we can send it as is. Otherwise, handle these
 >        options in software.
 >   
 >   For multicast/broadcast:
 >   
 >   (3) Send a packet as is if all the members of that bridge support given
 >        TX offload options. Otherwise, handle these options in software.
 >   
 >   For (3) we need calculate logical AND b/w capabilities of TX offload
 >   options in member interface (ifp->if_csum_flags_tx). I've added
 >   sc_csum_flags_tx flag to bridge softc for this purpose. This flag is
 >   updated when a member is (i) added to or (ii) removed from a bridge, or
 >   (iii) if_csum_flags_tx flag of a member interface is manipulated via
 >   ifconfig(8).
 >   
 >   Also, currently, if_csum_flags_tx do not contains M_CSUM_TSOv[46]. I've
 >   added these flags when TSOv[46] is enabled for that interface. As far
 >   as I can see, no one seems to depend on the old behavior.
 >   
 >   I've separated the patch into four parts for readability.
 >   
 >   (a) http://www.netbsd.org/~rin/bridge_20181204/ether_sw_offload_20181204.patch
 >   
 >   This patch provides routines to handle TX/RX offload options in software
 >   for ethernet. Since this breaks separation between L2 and L3/L4, I've
 >   added new files (ether_sw_offload.[ch]) rather than adding the routines
 >   to existing files.
 >   
 >   (b) http://www.netbsd.org/~rin/bridge_20181204/bridge_sw_offload_20181204.patch
 >   
 >   This is main patch to deal with the problem described above by using (a).
 >   
 >   (c) http://www.netbsd.org/~rin/bridge_20181204/if_shmem_sw_offload_20181204.patch
 >   
 >   This patch adds TX/RX offload support for shmemif(4) using (a). This may
 >   be useful for testing offload related codes in the ATF framework. This
 >   patch also contains a test code written by msaito; if_capabilities for
 >   shmemif(4) can be specified by environmental variable
 >   RUMP_SHMIF_CAPENABLE:
 >   
 >      setenv RUMP_SHMIF_CAPENABLE 0x7ff80 (all offload)
 >      setenv RUMP_SHMIF_CAPENABLE 0x6aa80 (all TX)
 >      setenv RUMP_SHMIF_CAPENABLE 0x15500 (all RX)
 >   
 >   for definition of flags, see sys/net/if.h:
 >   
 >      https://nxr.netbsd.org/xref/src/sys/net/if.h#591
 >   
 >   I've tested all tests in /usr/tests passed with all offload options
 >   enabled (RUMP_SHMIF_CAPENABLE=0x7ff80).
 >   
 >   (d) http://www.netbsd.org/~rin/bridge_20181204/simplify_ip_output_20181204.patch
 >   
 >   This patch simplify logics in ip{,6}_output() by using M_CSUM_TSOv[46]
 >   bits in ifp->if_csum_flags_tx.
 >   
 >   Any comments and suggestions are welcomed. I thank msaito for fruitful
 >   discussion!
 >   
 >   rin

 I will commit this soon if there's no objection (OKed by msaitoh).

 The patch (a) above adds new public functions, tcp[46]_segment(), that
 are used from bridge(4) via ether_sw_offload_tx(). Should I bump
 __NetBSD_Version__ although bridge(4) does not have its module?

 Thanks,
 rin

From: Jason Thorpe <thorpej@me.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 rokuyama@rk.phys.keio.ac.jp
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Tue, 11 Dec 2018 05:55:31 -0800

 > On Dec 11, 2018, at 12:40 AM, Rin Okuyama =
 <rokuyama@rk.phys.keio.ac.jp> wrote:
 >=20
 > I will commit this soon if there's no objection (OKed by msaitoh).

 No objection.  Thanks for doing this work!

 > The patch (a) above adds new public functions, tcp[46]_segment(), that
 > are used from bridge(4) via ether_sw_offload_tx(). Should I bump
 > __NetBSD_Version__ although bridge(4) does not have its module?

 I don't think it's needed.

 -- thorpej

From: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>
To: Jason Thorpe <thorpej@me.com>,
 "gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Wed, 12 Dec 2018 10:27:07 +0900

 On 2018/12/11 22:55, Jason Thorpe wrote:
 > 
 > 
 >> On Dec 11, 2018, at 12:40 AM, Rin Okuyama <rokuyama@rk.phys.keio.ac.jp> wrote:
 >>
 >> I will commit this soon if there's no objection (OKed by msaitoh).
 > 
 > No objection.  Thanks for doing this work!
 > 
 >> The patch (a) above adds new public functions, tcp[46]_segment(), that
 >> are used from bridge(4) via ether_sw_offload_tx(). Should I bump
 >> __NetBSD_Version__ although bridge(4) does not have its module?
 > 
 > I don't think it's needed.
 > 
 > -- thorpej
 > 
 > 

 Thank you! I'll commit it!

 rin

From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53562 CVS commit: src/sys
Date: Wed, 12 Dec 2018 01:40:21 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Wed Dec 12 01:40:21 UTC 2018

 Modified Files:
 	src/sys/net: files.net
 	src/sys/netinet: in_offload.c in_offload.h
 	src/sys/netinet6: in6_offload.c in6_offload.h
 	src/sys/rump/net/lib/libnet: Makefile
 Added Files:
 	src/sys/net: ether_sw_offload.c ether_sw_offload.h

 Log Message:
 PR kern/53562

 Add ether_sw_offload_[tr]x: handle TX/RX offload options in software.
 Since this violates separation b/w L2 and L3/L4, new files are added
 rather than having the routines in sys/net/if_ethersubr.c.

 OK msaitoh thorpej


 To generate a diff of this commit:
 cvs rdiff -u -r0 -r1.1 src/sys/net/ether_sw_offload.c \
     src/sys/net/ether_sw_offload.h
 cvs rdiff -u -r1.19 -r1.20 src/sys/net/files.net
 cvs rdiff -u -r1.12 -r1.13 src/sys/netinet/in_offload.c
 cvs rdiff -u -r1.11 -r1.12 src/sys/netinet/in_offload.h
 cvs rdiff -u -r1.11 -r1.12 src/sys/netinet6/in6_offload.c
 cvs rdiff -u -r1.9 -r1.10 src/sys/netinet6/in6_offload.h
 cvs rdiff -u -r1.30 -r1.31 src/sys/rump/net/lib/libnet/Makefile

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53562 CVS commit: src/sys
Date: Wed, 12 Dec 2018 01:46:47 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Wed Dec 12 01:46:47 UTC 2018

 Modified Files:
 	src/sys/net: if.c if_bridge.c if_bridgevar.h
 	src/sys/rump/librump/rumpnet: net_stub.c

 Log Message:
 PR kern/53562

 Handle TX offload in software when a packet is sent via
 bridge_output(). We can send it as is in the following
 exceptional cases:

 For unicast:

 (1) When the destination interface is the same as source.

 (2) When the destination supports all TX offload options
     specified in a packet.

 For multicast/broadcast:

 (3) When all the members of the bridge support the specified
     TX offload options.

 For (3), add sc_csum_flags_tx flag to bridge softc, which is
 logical AND b/w capabilities of TX offload options in member
 interface (ifp->if_csum_flags_tx). The flag is updated when a
 member is (i) added to or (ii) removed from a bridge, or (iii)
 if_csum_flags_tx flag of a member interface is manipulated via
 ifconfig(8).

 Turn on M_CSUM_TSOv[46] bit in ifp->if_csum_flags_tx flag when
 TSO[46] is enabled for that interface.

 OK msaitoh thorpej


 To generate a diff of this commit:
 cvs rdiff -u -r1.441 -r1.442 src/sys/net/if.c
 cvs rdiff -u -r1.160 -r1.161 src/sys/net/if_bridge.c
 cvs rdiff -u -r1.32 -r1.33 src/sys/net/if_bridgevar.h
 cvs rdiff -u -r1.36 -r1.37 src/sys/rump/librump/rumpnet/net_stub.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53562 CVS commit: src/sys/rump/net/lib/libshmif
Date: Wed, 12 Dec 2018 01:51:32 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Wed Dec 12 01:51:32 UTC 2018

 Modified Files:
 	src/sys/rump/net/lib/libshmif: if_shmem.c

 Log Message:
 Add TX/RX offload capabilities to shmif(4). They are emulated in
 software by ether_sw_offload_[tr]x().

 For rump kernels, if_capabilities for shmemif(4) can be specified
 by environmental variable RUMP_SHMIF_CAPENABLE:

  setenv RUMP_SHMIF_CAPENABLE 0x7ff80 (all offload)
  setenv RUMP_SHMIF_CAPENABLE 0x6aa80 (all TX)
  setenv RUMP_SHMIF_CAPENABLE 0x15500 (all RX)

 part of PR kern/53562
 OK msaitoh


 To generate a diff of this commit:
 cvs rdiff -u -r1.75 -r1.76 src/sys/rump/net/lib/libshmif/if_shmem.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53562 CVS commit: src/sys
Date: Wed, 12 Dec 2018 01:53:52 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Wed Dec 12 01:53:52 UTC 2018

 Modified Files:
 	src/sys/netinet: ip_output.c
 	src/sys/netinet6: ip6_output.c

 Log Message:
 Simplify logic in ip{,6}_output().

 Now, we have M_CSUM_TSOv[46] bit in ifp->if_csum_flags_tx when
 TSO[46] is enabled for the interface. So we can simply check
 whether TSO[46] is required in a packet but missing in the
 interface by (sw_csum & M_CSUM_TSOv[46]).

 Note that this is a very rare case where TSO[46] is suddenly
 turned off during a packet passing b/w TCP and IP.

 part of PR kern/53562
 OK msaitoh


 To generate a diff of this commit:
 cvs rdiff -u -r1.307 -r1.308 src/sys/netinet/ip_output.c
 cvs rdiff -u -r1.213 -r1.214 src/sys/netinet6/ip6_output.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53562 CVS commit: src/share/man/man4
Date: Wed, 12 Dec 2018 02:26:40 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Wed Dec 12 02:26:40 UTC 2018

 Modified Files:
 	src/share/man/man4: shmif.4

 Log Message:
 Document capability of TX/RX offload and environment variable
 RUMP_SHMIF_CAPENABLE. Bump date.

 part of PR kern/53562


 To generate a diff of this commit:
 cvs rdiff -u -r1.4 -r1.5 src/share/man/man4/shmif.4

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Christoph Badura <bad@bsd.de>
To: gnats-bugs@netbsd.org
Cc: Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>,
	Masanobu SAITOH <msaitoh@execsw.org>,
	Ryota Ozaki <ozaki-r@netbsd.org>
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Wed, 3 Apr 2019 00:58:41 +0200

 There is one more problem:

 We use the same bits in csum_flags to indicate both that the hardware
 *needs to do* offloading and that the hardware *has done* offloading.

 IIRC, this wasn't a problem when this interface was introduced, because we
 didn't have bridge(4) yet and packets would either travel down the stack
 from kernel to harware or up the stack from hardware to kernel only.

 Adding bridge(4) changed that, hence the problems.

 To finaly resolve that for good we need separate flags to indicate that
 the hardware did offloading and what the status of that is.

 --chris

From: Masanobu SAITOH <msaitoh@execsw.org>
To: Christoph Badura <bad@bsd.de>, gnats-bugs@netbsd.org
Cc: msaitoh@execsw.org, Rin Okuyama <rokuyama@rk.phys.keio.ac.jp>,
 Ryota Ozaki <ozaki-r@netbsd.org>
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Sat, 6 Apr 2019 14:24:20 +0900

 On 2019/04/03 7:58, Christoph Badura wrote:
 > There is one more problem:
 > 
 > We use the same bits in csum_flags to indicate both that the hardware
 > *needs to do* offloading and that the hardware *has done* offloading.
 > 
 > IIRC, this wasn't a problem when this interface was introduced, because we
 > didn't have bridge(4) yet and packets would either travel down the stack
 > from kernel to harware or up the stack from hardware to kernel only.
 > 
 > Adding bridge(4) changed that, hence the problems.
 > 
 > To finaly resolve that for good we need separate flags to indicate that
 > the hardware did offloading and what the status of that is.

 In current implementation, csum_flags field is cleared when a packet
 is forwarded to an outgoing (interface). The IPv{4,6} forwarding path
 and bridge do so. (If a filter function does policy routing, it's also
 be required.) If there is a unknown path which doesn't clear csum_flags,
 it should be fixed.

  Do you know any real use case which get this problem?

 To separate flags into input and output, it solves the problem,
 but I don't know it's the good solution or not.

 It does not directly related to the separating, I think the correct
 way is to clear flags when those are evaluated. When a packet is
 a tunneling (encapsulated) packet, the flags should be cleared when
 the outer header is checked and passed to the next input to not to
 misunderstand the input flags.

 > --chris
 > 


 -- 
 -----------------------------------------------
                 SAITOH Masanobu (msaitoh@execsw.org
                                  msaitoh@netbsd.org)

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.