NetBSD Problem Report #52876

From clare@csel.org  Fri Dec 29 00:13:10 2017
Return-Path: <clare@csel.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 3EB807A1AE
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 29 Dec 2017 00:13:10 +0000 (UTC)
Message-Id: <20171229001304.7911E146CFA@router.csel.org>
Date: Fri, 29 Dec 2017 09:13:04 +0900 (JST)
From: Shinichi Doyashiki <clare@csel.org>
Reply-To: clare@csel.org
To: gnats-bugs@NetBSD.org
Subject: The vlan(4) over wm(4) behaves something strange
X-Send-Pr-Version: 3.95

>Number:         52876
>Category:       kern
>Synopsis:       The vlan(4) over wm(4) behaves something strange
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Dec 29 00:15:00 +0000 2017
>Last-Modified:  Tue Jan 02 00:25:00 +0000 2018
>Originator:     Shinichi Doyashiki
>Release:        NetBSD 8.99.9
>Organization:
	at home
>Environment:
System: NetBSD router.csel.org 8.99.9 NetBSD 8.99.9 (XCYMINIPC) #**: Fri Dec 29 **:**:** JST 2017 clare@mizuki.csel.org:/export/stage/hack/sys/arch/amd64/compile/XCYMINIPC amd64
Architecture: x86_64
Machine: amd64
>Description:
	The vlan(4) over wm(4) behaves something strange.
	Packet forwarding itself is seemes good,
	but ssh or telnet session to the box is sometimes
	freezed (or dropped packet) on the box.
>How-To-Repeat:
	Detail is unknown.
	My router box has configured vlan(4) over wm(4).
	Another endpoint machine has wm(4) only, that works fine.

wm0: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	enabled=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	enabled=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	enabled=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	ec_capabilities=7<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU>
	ec_enabled=3<VLAN_MTU,VLAN_HWTAGGING>
	address: **:**:**:**:**:**
	media: Ethernet autoselect (1000baseT full-duplex)
	status: active
	inet 192.168.0.1/24 broadcast 192.168.0.255 flags 0x0
	inet 192.168.1.1/24 broadcast 192.168.1.255 flags 0x0
	inet 192.168.0.34/24 broadcast 192.168.0.255 flags 0x0
	inet 192.168.1.34/24 broadcast 192.168.1.255 flags 0x0
	inet6 fe**::e**:****:****:**d4%wm0/64 flags 0x0 scopeid 0x1
	inet6 fe**::**%wm0/64 flags 0x0 scopeid 0x1
	inet6 24**:****:****:****::**/64 flags 0x0
(snip)
vlan10: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	enabled=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	enabled=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	enabled=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	vlan: 10 parent: wm0
	address: **:**:**:**:**:**
	inet 192.168.10.1/24 broadcast 192.168.10.255 flags 0x0
	inet6 fe**::*:****:****:****%vlan10/64 flags 0x0 scopeid 0x8
	inet6 fe**::1%vlan10/64 flags 0x0 scopeid 0x8
	inet6 24**:****:****:****::1/64 flags 0x0
vlan11: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	enabled=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	enabled=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	enabled=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	vlan: 11 parent: wm0
	address: **:**:**:**:**:**
	inet 192.168.11.1/24 broadcast 192.168.11.255 flags 0x0
	inet6 fe**::*:****:****:****%vlan11/64 flags 0x0 scopeid 0x9
	inet6 fe**::1%vlan11/64 flags 0x0 scopeid 0x9
	inet6 24**:****:****:****::1/64 flags 0x0
vlan2: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	enabled=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	enabled=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	enabled=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	vlan: 2 parent: wm0
	address: **:**:**:**:**:**
	inet 192.168.2.1/24 broadcast 192.168.2.255 flags 0x0
	inet6 fe**::*:****:****:****%vlan2/64 flags 0x0 scopeid 0xa
	inet6 fe**::1%vlan2/64 flags 0x0 scopeid 0xa
	inet6 24**:****:****:****::1/64 flags 0x0
vlan29: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	enabled=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	enabled=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	enabled=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	vlan: 29 parent: wm0
	address: **:**:**:**:**:**
	inet 192.168.29.1/24 broadcast 192.168.29.255 flags 0x0
	inet6 fe**::*:****:****:****%vlan29/64 flags 0x0 scopeid 0xb
	inet6 fe**::1%vlan29/64 flags 0x0 scopeid 0xb
	inet6 24**:****:****:****::1/64 flags 0x0
vlan3: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	enabled=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	enabled=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	enabled=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	vlan: 3 parent: wm0
	address: **:**:**:**:**:**
	inet 192.168.3.1/24 broadcast 192.168.3.255 flags 0x0
	inet6 fe**::e**:****:****:****%vlan3/64 flags 0x0 scopeid 0xc
	inet6 fe**::1%vlan3/64 flags 0x0 scopeid 0xc
	inet6 24**:****:****:****::1/64 flags 0x0
vlan30: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	enabled=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	enabled=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	enabled=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	vlan: 30 parent: wm0
	address: **:**:**:**:**:**
	inet 192.168.30.1/24 broadcast 192.168.30.255 flags 0x0
	inet6 fe**::e**:****:****:****%vlan30/64 flags 0x0 scopeid 0xd
	inet6 fe**::1%vlan30/64 flags 0x0 scopeid 0xd
	inet6 80:****:****:****::1/64 flags 0x0
vlan31: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	enabled=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	enabled=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	enabled=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	vlan: 31 parent: wm0
	address: **:**:**:**:**:**
	inet6 fe80::*%vlan31/64 flags 0x0 scopeid 0xe
vlan4: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 9000
	capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	enabled=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
	enabled=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
	enabled=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
	vlan: 4 parent: wm1
	address: **:**:**:**:**:**
	inet 192.168.4.1/24 broadcast 192.168.4.255 flags 0x0
	inet6 fe80::*%vlan4/64 flags 0x0 scopeid 0xf
	inet6 fe80::1%vlan4/64 flags 0x0 scopeid 0xf
	inet6 */64 flags 0x0

>Fix:
	Unknown.

>Audit-Trail:
From: clare@csel.org
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/52876: The vlan(4) over wm(4) behaves something strange
Date: Fri, 29 Dec 2017 10:02:55 +0900

 disabling TSO is workaround of the problem.

 -- 
 Shinichi Doyashiki <clare@csel.org>

From: SAITOH Masanobu <msaitoh@execsw.org>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, clare@csel.org
Cc: msaitoh@execsw.org
Subject: Re: kern/52876: The vlan(4) over wm(4) behaves something strange
Date: Fri, 29 Dec 2017 12:27:01 +0900

 Hi.

 On 2017/12/29 10:05, clare@csel.org wrote:
 > The following reply was made to PR kern/52876; it has been noted by GNATS.
 > 
 > From: clare@csel.org
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/52876: The vlan(4) over wm(4) behaves something strange
 > Date: Fri, 29 Dec 2017 10:02:55 +0900
 > 
 >  disabling TSO is workaround of the problem.
 >  
 >  -- 
 >  Shinichi Doyashiki <clare@csel.org>
 >  
 > 

 Have you ever check /var/log/message?
 One of possibilities is:

 >                 error = bus_dmamap_load_mbuf(sc->sc_dmat, dmamap, m0,
 >                     BUS_DMA_WRITE | BUS_DMA_NOWAIT);
 >                 if (error) {
 >                         if (error == EFBIG) {
 >                                 WM_Q_EVCNT_INCR(txq, txdrop);
 >                                 log(LOG_ERR, "%s: Tx packet consumes too many "
 >                                     "DMA segments, dropping...\n",
 >                                     device_xname(sc->sc_dev));
 >                                 wm_dump_mbuf_chain(sc, m0);
 >                                 m_freem(m0);
 >                                 continue;
 >                         }
 >                         /* Short on resources, just stop for now. */
 >                         DPRINTF(WM_DEBUG_TX,
 >                             ("%s: TX: dmamap load failed: %d\n",
 >                             device_xname(sc->sc_dev), error));
 >                         break;
 >                 }

 This error is by log(LOG_ERR), so it's not printed in dmesg but
 in /var/log/message

 And, could you test with "options WM_EVENT_COUNTERS" in your
 kernel config and show me the output of "vmstat -ev |grep wm"
 after problem occurred.

  Thanks in advance.

 -- 
 -----------------------------------------------
                 SAITOH Masanobu (msaitoh@execsw.org
                                  msaitoh@netbsd.org)

From: clare@csel.org
To: SAITOH Masanobu <msaitoh@execsw.org>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/52876: The vlan(4) over wm(4) behaves something strange
Date: Sat, 30 Dec 2017 11:12:38 +0900

 On Fri, 29 Dec 2017 12:27:01 +0900
 SAITOH Masanobu <msaitoh@execsw.org> wrote:

 > Have you ever check /var/log/message?
 > One of possibilities is:
 > 
 > >                 error = bus_dmamap_load_mbuf(sc->sc_dmat, dmamap, m0,
 > >                     BUS_DMA_WRITE | BUS_DMA_NOWAIT);
 > >                 if (error) {
 > >                         if (error == EFBIG) {
 > >                                 WM_Q_EVCNT_INCR(txq, txdrop);
 > >                                 log(LOG_ERR, "%s: Tx packet consumes too many "
 > >                                     "DMA segments, dropping...\n",
 > >                                     device_xname(sc->sc_dev));
 > >                                 wm_dump_mbuf_chain(sc, m0);
 > >                                 m_freem(m0);
 > >                                 continue;
 > >                         }
 > >                         /* Short on resources, just stop for now. */
 > >                         DPRINTF(WM_DEBUG_TX,
 > >                             ("%s: TX: dmamap load failed: %d\n",
 > >                             device_xname(sc->sc_dev), error));
 > >                         break;
 > >                 }
 > 
 > This error is by log(LOG_ERR), so it's not printed in dmesg but
 > in /var/log/message

 I couldn't find any messages generated from wm in /var/log/messages.

 > And, could you test with "options WM_EVENT_COUNTERS" in your
 > kernel config and show me the output of "vmstat -ev |grep wm"
 > after problem occurred.

 I placed the log to following URL:
 https://www.csel.org/netbsd/pr/52876/vmstat-ev-wm-20171230.txt


 -- 
 Shinichi Doyashiki <clare@csel.org>

From: clare@csel.org
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/52876: The vlan(4) over wm(4) behaves something strange
Date: Sat, 30 Dec 2017 15:11:08 +0900

 the problematic device is i82583V.
 erratum 7 is complex, I can't understand it.

 wm0 at pci1 dev 0 function 0: Intel i82583V (rev. 0x00)
 wm0: interrupting at msi2 vec 0
 wm0: PCI-Express bus
 wm0: ASPM L0s and L1 are disabled to workaround the errata.
 wm0: 512 words (8 address bits) SPI EEPROM, version 1.10.0, Image Unique ID ffffffff
 wm0: Ethernet address 0c:e8:5c:**:**:**
 wm0: 0x2a4440<SPI,IOH_VALID,PCIE,ASF_FIRM,AMT,WOL>
 makphy0 at wm0 phy 1: Marvell 88E1149 Gigabit PHY, rev. 1


 -- 
 Shinichi Doyashiki <clare@csel.org>

From: clare@csel.org
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/52876: The vlan(4) over wm(4) behaves something strange
Date: Sat, 30 Dec 2017 17:37:55 +0900

 the interface generates strange packets during occurence
 of the problem.  what is this?

 17:31:05.780362 IP6 164.250.162.1 > 0.0.0.0: [|tcp]
 17:31:06.820008 IP6 164.250.162.1 > 0.0.0.0: [|tcp]
 17:31:08.859493 IP6 164.250.162.1 > 0.0.0.0: [|tcp]


 -- 
 Shinichi Doyashiki <clare@csel.org>

From: clare@csel.org
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/52876: The vlan(4) over wm(4) behaves something strange
Date: Mon, 1 Jan 2018 18:03:14 +0900

 Transmitter DMA was seems to be halted on such packets.
 It was not detected by the wm(4) watchdog logics.

 I installed WM_DEBUG and WM_DEBUG_TX enabled kernel,
 took a debug log, and placed as following URL:

 https://www.csel.org/netbsd/pr/52876/wm-debug-tx-20180101.txt

 How to repeat:
 * setup Intel i82583V.
 * setup wm(4) driver with all hardware offload flags.
 * setup vlan(4) driver attached to the wm(4) and
   enable hardware offload flags including TSO.
 * connect to sshd via vlan(4) and apply large traffic
   (doing cat /var/log/messages is sufficient).

 How to workaround:
 * disabling TSO on vlan(4) is sufficient.


 -- 
 Shinichi Doyashiki <clare@csel.org>

From: clare@csel.org
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/52876: The vlan(4) over wm(4) behaves something strange
Date: Tue, 2 Jan 2018 09:24:19 +0900

 > How to repeat:
 > * setup Intel i82583V.
 > * setup wm(4) driver with all hardware offload flags.
 > * setup vlan(4) driver attached to the wm(4) and
 >   enable hardware offload flags including TSO.
 > * connect to sshd via vlan(4) and apply large traffic
 >   (doing cat /var/log/messages is sufficient).

 On the FreeBSD-11.1, the problem does not reprodued.

 $ ifconfig em5
 em5: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
 	options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,
 			TSO4,WOL_MAGIC,VLAN_HWTSO>
         ether 0c:e8:6c:**:**:**
         hwaddr 0c:e8:6c:**:**:**
         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
         media: Ethernet autoselect (1000baseT <full-duplex>)
         status: active

 $ ifconfig em5.10
 em5.10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
         options=103<RXCSUM,TXCSUM,TSO4>
         ether 0c:e8:6c:**:**:**
         inet 192.168.**.** netmask 0xffffff00 broadcast 192.168.**.255
         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
         media: Ethernet autoselect (1000baseT <full-duplex>)
         status: active
         vlan: 10 vlanpcp: 0 parent interface: em5
         groups: vlan


 -- 
 Shinichi Doyashiki <clare@csel.org>

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.