NetBSD Problem Report #57743
From www@netbsd.org Sat Dec 2 13:06:59 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 21FB51A9239
for <gnats-bugs@gnats.NetBSD.org>; Sat, 2 Dec 2023 13:06:59 +0000 (UTC)
Message-Id: <20231202130657.8E7461A923A@mollari.NetBSD.org>
Date: Sat, 2 Dec 2023 13:06:57 +0000 (UTC)
From: marcotte@panix.com
Reply-To: marcotte@panix.com
To: gnats-bugs@NetBSD.org
Subject: ARP lossage with xennet
X-Send-Pr-Version: www-1.0
>Number: 57743
>Category: port-xen
>Synopsis: ARP lossage with xennet
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: jdolecek
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Dec 02 13:10:00 +0000 2023
>Closed-Date: Sun Jan 14 15:46:36 +0000 2024
>Last-Modified: Sun Jan 14 15:46:36 +0000 2024
>Originator: Brian Marcotte
>Release: 10.0 or -current
>Organization:
Public Access Networks, Corp
>Environment:
NetBSD xxx 10.0_RC1 NetBSD 10.0_RC1 (XEN3_DOMU) #1: Fri Dec 1 19:02:12 EST 2023 root@xxx:/misc/bug/src/sys/arch/amd64/compile/XEN3_DOMU amd64
>Description:
I've found a problem with ARP and the xennet driver.
This is happening with NetBSD-10 (or -current) domUs ONLY when they're
running on the same Linux dom0 host in the same subnet. If the NetBSD-10
domUs are running on different hosts or on different subnets, there is
no problem.
What I'm seeing is that ARP replies between NetBSD-10 domUs are not
reaching each other. Tcpdump in the NetBSD domUs show the replies going
out, but tcpdump on the Linux dom0 doesn't see them at all.
I also found that if I initiate contact on both domUs to the other, this
causes an "ARP storm" bogging down the whole machine. They both send out
ARP requests very fast never receiving replies.
Thanks. I've been running NetBSD-10 for months. Sorry for only noticing
this now.
>How-To-Repeat:
My environment:
dom0: Xen 4.17, Linux 5.10, bridging
domUs: NetBSD-10 or -current; PV, PVH, or PVHVM
same subnet
Have one domU ping the other so they will ARP.
>Fix:
Unsure, but I found that this first broke on March 23, 2020.
The xennet changes on that date were related to checksums committed by
Jaromir Dolecek.
>Release-Note:
>Audit-Trail:
From: Brian Marcotte <marcotte@panix.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/57743: ARP lossage with xennet
Date: Mon, 4 Dec 2023 11:20:13 -0500
I have found that I can work around the problem by commenting one line
in if_xennet_xenbus.c. It's probably not a real fix, but I offer it in
case it helps you debug the problem.
Thanks.
- Brian
--- if_xennet_xenbus.c.orig 2023-08-01 23:11:14.581294439 -0400
+++ if_xennet_xenbus.c 2023-12-04 10:47:30.735761957 -0500
@@ -1122,7 +1122,9 @@
if (m->m_pkthdr.csum_flags & XN_M_CSUM_SUPPORTED) {
txreq->flags |= NETTXF_csum_blank;
} else {
+/*
txreq->flags |= NETTXF_data_validated;
+*/
}
}
if (multiseg && i < lastseg)
Responsible-Changed-From-To: port-xen-maintainer->jdolecek
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Sun, 07 Jan 2024 03:22:38 +0000
Responsible-Changed-Why:
appears to be related to jdolecek's changes to if_xennet_xenbus.c
State-Changed-From-To: open->analyzed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Mon, 08 Jan 2024 22:56:45 +0000
State-Changed-Why:
Looking into this. The proposed patch is good stop-gap ffix
for netbsd-10 at least, will go with that if no better solution is found.
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57743 CVS commit: src/sys/arch/xen/xen
Date: Tue, 9 Jan 2024 18:39:53 +0000
Module Name: src
Committed By: jdolecek
Date: Tue Jan 9 18:39:53 UTC 2024
Modified Files:
src/sys/arch/xen/xen: if_xennet_xenbus.c
Log Message:
disable TX checksum optimization, it's causing ARP lossage in some
configurations using Linux dom0
PR port-xen/57743 by Brian Marcotte, thanks for the patch
To generate a diff of this commit:
cvs rdiff -u -r1.129 -r1.130 src/sys/arch/xen/xen/if_xennet_xenbus.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: analyzed->pending-pullups
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Tue, 09 Jan 2024 18:43:04 +0000
State-Changed-Why:
I've committed the proposed change. It's good stop-gap for NetBSD 10.
I've requested pullup to the release branch.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57743 CVS commit: [netbsd-10] src/sys/arch/xen/xen
Date: Sun, 14 Jan 2024 15:25:54 +0000
Module Name: src
Committed By: martin
Date: Sun Jan 14 15:25:54 UTC 2024
Modified Files:
src/sys/arch/xen/xen [netbsd-10]: if_xennet_xenbus.c
Log Message:
Pull up following revision(s) (requested by jdolecek in ticket #543):
sys/arch/xen/xen/if_xennet_xenbus.c: revision 1.130
disable TX checksum optimization, it's causing ARP lossage in some
configurations using Linux dom0
PR port-xen/57743 by Brian Marcotte, thanks for the patch
To generate a diff of this commit:
cvs rdiff -u -r1.128.20.1 -r1.128.20.2 \
src/sys/arch/xen/xen/if_xennet_xenbus.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sun, 14 Jan 2024 15:46:36 +0000
State-Changed-Why:
Change pulled up to netbsd-10. Thanks for report.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.