NetBSD Problem Report #45728

From marcotte@panix.com  Wed Dec 21 09:28:27 2011
Return-Path: <marcotte@panix.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 77BA563BB83
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 21 Dec 2011 09:28:27 +0000 (UTC)
Message-Id: <20111221092828.F31E12425C@panix5.panix.com>
Date: Wed, 21 Dec 2011 04:28:28 -0500 (EST)
From: marcotte@panix.com
Reply-To: marcotte@panix.com
To: gnats-bugs@gnats.NetBSD.org
Subject: NetBSD/xen network loss
X-Send-Pr-Version: 3.95

>Number:         45728
>Category:       port-xen
>Synopsis:       NetBSD/xen loses network connectivity
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-xen-maintainer
>State:          feedback
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Dec 21 09:30:01 +0000 2011
>Closed-Date:    
>Last-Modified:  Mon Jul 27 22:20:01 +0000 2020
>Originator:     Brian Marcotte
>Release:        NetBSD 5.1
>Organization:
	Panix
>Environment:
System: NetBSD panix5.panix.com 5.1 NetBSD 5.1 (PANIX-XEN3U-USER) #3: Wed Mar 9 20:45:57 EST 2011 root@juggler.panix.com:/devel/netbsd/5.1-shellkernel/src/sys/arch/i386/compile/PANIX-XEN3U-USER i386
Architecture: i386
Machine: i386
>Description:

For some time, my machines have had very occasional network problems
which I have not been able to diagnose or reproduce. In the past I
thought it was specific to NFS, but now it looks like the NFS issues
are just a symptom of a network issue.

It only happens under Xen, or I can only reproduce it under Xen. I've
also tried -current and there is no change in behavior.

What happens is that the machine either goes off the net entirely (with
feature-rx-notify), or starts to experience major packet loss (without
feature-rx-notify). 

>How-To-Repeat:

Two servers are required to reproduce the problem. The first is the
NetBSD system to be diagnosed. The second needs to be running
telnetd. I used another NetBSD system for this, but that doesn't seem
to matter. The problem also happens when suspended processes continue to
receive data from the network, but this telnet example is a very simple
way to reproduce the problem.

   First, you need to make sure that flow control characters are
   making it to the system to be tested. I did this by ssh-ing in. It
   should probably also work if you had a local xterm or console. You
   should be able to enter Control-V, Control-S and see the "^S"
   appear.

   telnet to the machine running the server and log in as some user.

   run this on the remote end: while :; do date ; sleep .1; done

   Type Control-S. In my testing, this is processed on the system
   running the telnet client, not the remote system. This is key to
   reproducing the problem.

   Wait a few minutes. Running "netstat -f inet -n" should show the
   "Recv-Q" filling up on the connection. Eventually, the system
   should go off the network when it becomes full (NetBSD defaults to
   using feature-rx-notify).

   You may need to log in on the console and kill the telnet client to
   fix things.

The behavior when not using feature-rx-notify (by modifying
if_xennet_xenbus.c) is somewhat different. Instead of the machine
going off the network entirely, there is severe packet loss instead
(make NFS grind to a halt). I can see this by running "netstat -i" in my
Linux dom0.

>Fix:
   Unknown.

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Fri, 17 Jul 2020 11:02:30 +0000
State-Changed-Why:
Any chance to retry this with up-to-date -current and Xen 4.11?


From: Brian Marcotte <marcotte@panix.com>
To: jdolecek@NetBSD.org
Cc: port-xen-maintainer@netbsd.org, netbsd-bugs@netbsd.org,
	gnats-admin@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: port-xen/45728 (NetBSD/xen loses network connectivity)
Date: Fri, 17 Jul 2020 20:04:13 -0400

 I had forgotten I once had a way to reproduce this problem.

 Right now, I can't reproduce this on -current, but I can't reproduce it
 on 8 or 9 either.

 We do still have a problem with xennet falling off the network though.
 Right now, the machines which show the problem most often are in HVM
 mode as a workaround.

 I can switch back to PV to debug, but I need to wait until after the
 weekend. I should be able to run -current on one of them.

 --
 - Brian

From: Brian Marcotte <marcotte@panix.com>
To: port-xen-maintainer@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Cc: Brian Marcotte <marcotte@panix.com>
Subject: Re: port-xen/45728 (NetBSD/xen loses network connectivity)
Date: Mon, 20 Jul 2020 23:12:48 -0400

 >  I can switch back to PV to debug, but I need to wait until after the

 I tried 9.0 in PV mode, but it couldn't keep up with the required load.
 I now have -current in PVH mode running on two of the three systems. So
 far it's running well, but I'd say we need to give it at least a week to
 see if there is any problem with xennet.

 --
 - Brian

From: Brian Marcotte <marcotte@panix.com>
To: port-xen-maintainer@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Cc: Brian Marcotte <marcotte@panix.com>
Subject: Re: port-xen/45728 (NetBSD/xen loses network connectivity)
Date: Mon, 27 Jul 2020 18:18:21 -0400

 > I now have -current in PVH mode running on two of the three systems. So
 > far it's running well, but I'd say we need to give it at least a week ..

 It's been running for a week without problems on the systems which
 usually show problems with xennet. So, I can say that the issue we've
 had with xennet is "probably" fixed.

 You can close the ticket.

 --
 - Brian

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.