NetBSD Problem Report #45728
From marcotte@panix.com Wed Dec 21 09:28:27 2011
Return-Path: <marcotte@panix.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 77BA563BB83
for <gnats-bugs@gnats.NetBSD.org>; Wed, 21 Dec 2011 09:28:27 +0000 (UTC)
Message-Id: <20111221092828.F31E12425C@panix5.panix.com>
Date: Wed, 21 Dec 2011 04:28:28 -0500 (EST)
From: marcotte@panix.com
Reply-To: marcotte@panix.com
To: gnats-bugs@gnats.NetBSD.org
Subject: NetBSD/xen network loss
X-Send-Pr-Version: 3.95
>Number: 45728
>Category: port-xen
>Synopsis: NetBSD/xen loses network connectivity
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-xen-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Dec 21 09:30:01 +0000 2011
>Closed-Date: Sat Jul 10 05:37:27 +0000 2021
>Last-Modified: Sat Jul 10 05:37:27 +0000 2021
>Originator: Brian Marcotte
>Release: NetBSD 5.1
>Organization:
Panix
>Environment:
System: NetBSD panix5.panix.com 5.1 NetBSD 5.1 (PANIX-XEN3U-USER) #3: Wed Mar 9 20:45:57 EST 2011 root@juggler.panix.com:/devel/netbsd/5.1-shellkernel/src/sys/arch/i386/compile/PANIX-XEN3U-USER i386
Architecture: i386
Machine: i386
>Description:
For some time, my machines have had very occasional network problems
which I have not been able to diagnose or reproduce. In the past I
thought it was specific to NFS, but now it looks like the NFS issues
are just a symptom of a network issue.
It only happens under Xen, or I can only reproduce it under Xen. I've
also tried -current and there is no change in behavior.
What happens is that the machine either goes off the net entirely (with
feature-rx-notify), or starts to experience major packet loss (without
feature-rx-notify).
>How-To-Repeat:
Two servers are required to reproduce the problem. The first is the
NetBSD system to be diagnosed. The second needs to be running
telnetd. I used another NetBSD system for this, but that doesn't seem
to matter. The problem also happens when suspended processes continue to
receive data from the network, but this telnet example is a very simple
way to reproduce the problem.
First, you need to make sure that flow control characters are
making it to the system to be tested. I did this by ssh-ing in. It
should probably also work if you had a local xterm or console. You
should be able to enter Control-V, Control-S and see the "^S"
appear.
telnet to the machine running the server and log in as some user.
run this on the remote end: while :; do date ; sleep .1; done
Type Control-S. In my testing, this is processed on the system
running the telnet client, not the remote system. This is key to
reproducing the problem.
Wait a few minutes. Running "netstat -f inet -n" should show the
"Recv-Q" filling up on the connection. Eventually, the system
should go off the network when it becomes full (NetBSD defaults to
using feature-rx-notify).
You may need to log in on the console and kill the telnet client to
fix things.
The behavior when not using feature-rx-notify (by modifying
if_xennet_xenbus.c) is somewhat different. Instead of the machine
going off the network entirely, there is severe packet loss instead
(make NFS grind to a halt). I can see this by running "netstat -i" in my
Linux dom0.
>Fix:
Unknown.
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Fri, 17 Jul 2020 11:02:30 +0000
State-Changed-Why:
Any chance to retry this with up-to-date -current and Xen 4.11?
From: Brian Marcotte <marcotte@panix.com>
To: jdolecek@NetBSD.org
Cc: port-xen-maintainer@netbsd.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: port-xen/45728 (NetBSD/xen loses network connectivity)
Date: Fri, 17 Jul 2020 20:04:13 -0400
I had forgotten I once had a way to reproduce this problem.
Right now, I can't reproduce this on -current, but I can't reproduce it
on 8 or 9 either.
We do still have a problem with xennet falling off the network though.
Right now, the machines which show the problem most often are in HVM
mode as a workaround.
I can switch back to PV to debug, but I need to wait until after the
weekend. I should be able to run -current on one of them.
--
- Brian
From: Brian Marcotte <marcotte@panix.com>
To: port-xen-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Cc: Brian Marcotte <marcotte@panix.com>
Subject: Re: port-xen/45728 (NetBSD/xen loses network connectivity)
Date: Mon, 20 Jul 2020 23:12:48 -0400
> I can switch back to PV to debug, but I need to wait until after the
I tried 9.0 in PV mode, but it couldn't keep up with the required load.
I now have -current in PVH mode running on two of the three systems. So
far it's running well, but I'd say we need to give it at least a week to
see if there is any problem with xennet.
--
- Brian
From: Brian Marcotte <marcotte@panix.com>
To: port-xen-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Cc: Brian Marcotte <marcotte@panix.com>
Subject: Re: port-xen/45728 (NetBSD/xen loses network connectivity)
Date: Mon, 27 Jul 2020 18:18:21 -0400
> I now have -current in PVH mode running on two of the three systems. So
> far it's running well, but I'd say we need to give it at least a week ..
It's been running for a week without problems on the systems which
usually show problems with xennet. So, I can say that the issue we've
had with xennet is "probably" fixed.
You can close the ticket.
--
- Brian
State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 10 Jul 2021 05:37:27 +0000
State-Changed-Why:
Submitter says fixed.
(curiously the email in question is in gnats but I never got it via
netbsd-bugs...)
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.