NetBSD Problem Report #57830

From www@netbsd.org  Mon Jan  8 08:22:05 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id F13841A9238
	for <gnats-bugs@gnats.NetBSD.org>; Mon,  8 Jan 2024 08:22:04 +0000 (UTC)
Message-Id: <20240108082203.09B651A9239@mollari.NetBSD.org>
Date: Mon,  8 Jan 2024 08:22:02 +0000 (UTC)
From: dean.anderson71@yahoo.com
Reply-To: dean.anderson71@yahoo.com
To: gnats-bugs@NetBSD.org
Subject: Xennet devices don't work properly with other NetBSD10 clients
X-Send-Pr-Version: www-1.0

>Number:         57830
>Category:       port-xen
>Synopsis:       Xennet devices don't work properly with other NetBSD10 clients
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    jdolecek
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jan 08 08:25:00 +0000 2024
>Closed-Date:    Sun Jan 14 15:53:20 +0000 2024
>Last-Modified:  Sun Jan 14 15:53:20 +0000 2024
>Originator:     Dean Anderson
>Release:        10.0 Beta, 10.0RC1, 10.0RC2
>Organization:
na
>Environment:
NetBSD ansible1.av8.net 10.0_RC1 NetBSD 10.0_RC1 (GENERIC) #0: Sun Nov  5 18:30:08 UTC 2023  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64

>Description:
Under xen 4.17 running on Kali Linux, netbsd 10x clients use xennet0 even when running as hvm guests.  This is a change from 9.3, which would have used re0 emulated interfaces.

The problem is rather strange, in that dhcp works, but no other IP works, and since ARP times out, I'm not clear how dhcp worked between them.  And it has no problem with other kinds of guests: Netbsd93,92, Freebsd, and linux.   Kali has some quirks, as can be observed by the interfaces file. This setup runs in an off-network lab with a Cisco router and real servers under test.

Anyway, as an HVM client, I would expect they're using emulated devices. I'm not a big fan of detecting XEN and then trying to use a XEN driver. If I wanted that behavior, I'd have used a PV configuration.

I suspect I can build a kernel that will disable the xennet device, and my problem will be solved. But I thought you should know.
>How-To-Repeat:
Xen 4.17
Kali Linux 6.3.0

cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
auto eth0
#auto br1
auto eth0.700
auto br700
auto eth0.10
auto br10
auto eth0.11
auto br11
auto dummy0

iface dummy0 inet manual
        pre-up modprobe dummy
        pre-up ip link add dummy0 type dummy
        pre-up ifconfig $IFACE up arp
        pre-up ifconfig $IFACE hw ether 40:22:aa:44:bb:04
        pre-up ifconfig $IFACE 198.3.136.2 netmask 0xffffffe0
        post-up route del -net 198.3.136.0 netmask 255.255.255.224 dev dummy0
        post-up route add -net 198.3.136.0 netmask 255.255.255.224 dev br700
        post-up brctl addif br700 dummy0
        post-down ifconfig $IFACE down
        post-down ip link delete dummy0

iface lo inet loopback

iface eth0 inet manual
        pre-up ifconfig $IFACE up
        post-down ifconfig $IFACE down

#VLANS


iface eth0.700 inet manual
        pre-up ifconfig $IFACE up
        post-down ifconfig $IFACE down

#vlan 10 hypervisors
iface eth0.10 inet manual
        pre-up ifconfig $IFACE up
        post-down ifconfig $IFACE down

#vlan 11 ilo
iface eth0.11 inet manual
        pre-up ifconfig $IFACE up
        post-down ifconfig $IFACE down


#BRIDGES
#iface br1 inet manual
        #bridge_ports eth0

iface br700 inet manual
        bridge_ports dummy0 eth0.700
        bridge_stp on
        bridge_fd 0
        bridge_maxwait 5
        post-up route del -net 198.3.136.0 netmask 255.255.255.224 dev dummy0
        post-up route add -net 198.3.136.0 netmask 255.255.255.224 dev br700
        #address 198.3.136.2
        #netmask 255.255.255.224
        #post-up route add -net 198.3.136.0 netmask 0xfffff800 gw 198.3.136.1

iface br10 inet manual
        bridge_ports eth0.10
        bridge_stp on
        bridge_fd 0
        bridge_maxwait 1
        #address 198.3.136.71
        #netmask 255.255.255.192
        #post-up route add -net 198.3.136.64 netmask 255.255.255.192 dev br10

iface br11 inet manual
        bridge_ports eth0.11
        bridge_stp on
        bridge_fd 0
        bridge_maxwait 1
        #address 198.3.136.135
        #netmask 255.255.255.192
        #post-up route add -net 198.3.136.128 netmask 255.255.255.192 dev br11



Typical xen config:

cat DDns1.cfg
name="DDns1"

disk=[
#123456789
#NETBSD93#'file:/space/isos/NetBSD-9.3-amd64.iso,hdc:cdrom,r',
#NETBSD10RC1#'file:/space/isos/NetBSD-10.0_RC1-amd64.iso,hdc:cdrom,r',
#'file:/space/isos/NetBSD-10.0_RC2-amd64.iso,hdc:cdrom,r',
'tap:aio:qcow2:/space/images/DDns1.qcow2,hda,w',
'tap:aio:qcow2:/space/images/DDns1-space.qcow2,hdb,w'
]

vif=[
'mac=00:16:3e:00:00:10,type=ioemu,model=rtl8139,bridge=br700'
]

memory=256

type='hvm'

vcpus=2

boot='c'

serial='pty'


>Fix:
Disable xennet on hvm kernel.

>Release-Note:

>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Dean Anderson <dean.anderson71@yahoo.com>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/57830: Xennet devices don't work properly with other NetBSD10 clients
Date: Mon, 8 Jan 2024 14:35:14 +0000

 I wonder whether this might be a duplicate of another recent PR?

    port-xen/57743: ARP lossage with xennet
    https://gnats.NetBSD.org/57743

 There is a patch in that PR that apparently has the effect of working
 around the problem (but I'm not familiar enough with xennet internals
 to understand what the problem is or whether the patch is a solution
 or just a band-aid).

From: Dean Anderson <dean.anderson71@yahoo.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org
Subject: Re: kern/57830: Xennet devices don't work properly with other NetBSD10 clients
Date: Mon, 8 Jan 2024 21:28:29 -0600

 I tried the suggested fix on file (renamed to xennetback_xenbus in RC2) and r=
 ebuilt GENERIC.

 No change in behavior.

 However, this maybe should be a separate bug:

 When you follow the instructions in man 7 module to rebuild modules, by sett=
 ing OBJDIR and TOOLDIR environment vars, it fails, but I don=E2=80=99t have t=
 he complete source trees installed, just syssrc.  So I=E2=80=99m not sure if=
  it shows work without the full set of sources=E2=80=A6 but I=E2=80=99d argu=
 e it should work with only syssrc installed.

 Thanks,

   =E2=80=94Dean
  =20
 Sent from my iPhone

 > On Jan 8, 2024, at 8:40=E2=80=AFAM, Taylor R Campbell <riastradh@netbsd.or=
 g> wrote:
 >=20
 > =EF=BB=BFThe following reply was made to PR kern/57830; it has been noted b=
 y GNATS.
 >=20
 > From: Taylor R Campbell <riastradh@NetBSD.org>
 > To: Dean Anderson <dean.anderson71@yahoo.com>
 > Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
 > Subject: Re: kern/57830: Xennet devices don't work properly with other Net=
 BSD10 clients
 > Date: Mon, 8 Jan 2024 14:35:14 +0000
 >=20
 > I wonder whether this might be a duplicate of another recent PR?
 >=20
 >    port-xen/57743: ARP lossage with xennet
 >    https://gnats.NetBSD.org/57743
 >=20
 > There is a patch in that PR that apparently has the effect of working
 > around the problem (but I'm not familiar enough with xennet internals
 > to understand what the problem is or whether the patch is a solution
 > or just a band-aid).
 >=20

From: Brian Marcotte <marcotte@panix.com>
To: dean.anderson71@yahoo.com
Cc: netbsd-bugs@NetBSD.org
Subject: Re: kern/57830: Xennet devices don't work properly with other
 NetBSD10 clients
Date: Tue, 09 Jan 2024 02:21:08 -0500

 > ... in that dhcp works, but no other IP works ... And it has no problem
 > with other kinds of guests: Netbsd93,92, Freebsd, and linux

 This sounds exactly like my problem in port-xen/57743.

 This makes me wonder if the correct file was successfully patched:

 > I tried the suggested fix on file (renamed to xennetback_xenbus in RC2)
 > and rebuilt GENERIC.

 The file in question is still if_xennet_xenbus.c, not xennetback_xenbus.c.

 If there is any doubt, you could change the attachment message to show
 that the right file was patched.

 --- if_xennet_xenbus.c.orig	2023-07-31 11:23:02.000000000 -0400
 +++ if_xennet_xenbus.c	2024-01-09 02:15:04.307107315 -0500
 @@ -273,7 +273,7 @@
  	bus_size_t maxsz;
  	int nsegs;

 -	aprint_normal(": Xen Virtual Network Interface\n");
 +	aprint_normal(": Xen Virtual Network Interface (patched)\n");
  	sc->sc_dev = self;

  	sc->sc_xbusd = xa->xa_xbusd;
 @@ -1122,7 +1122,10 @@
  			if (m->m_pkthdr.csum_flags & XN_M_CSUM_SUPPORTED) {
  				txreq->flags |= NETTXF_csum_blank;
  			} else {
 +// Workaround for PR port-xen/57743
 +/*
  				txreq->flags |= NETTXF_data_validated;
 +*/
  			}
  		}
  		if (multiseg && i < lastseg)



 Thanks.

 - Brian

Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Tue, 09 Jan 2024 19:03:52 +0000
Responsible-Changed-Why:
I'm looking into this.


State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Tue, 09 Jan 2024 19:03:52 +0000
State-Changed-Why:
This looks like the same problem as port-xen/57743.
Can you check if sys/arch/xen/xen/if_xennet_xenbus.c rev. 1.130 fixes the 
problem?


State-Changed-From-To: feedback->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sun, 14 Jan 2024 15:53:20 +0000
State-Changed-Why:
Most likely duplicate of port-xen/57743. Thanks for report.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.