NetBSD Problem Report #57972
From john@ziaspace.com Thu Feb 29 05:54:57 2024
Return-Path: <john@ziaspace.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id C0E871A9239
for <gnats-bugs@gnats.NetBSD.org>; Thu, 29 Feb 2024 05:54:56 +0000 (UTC)
Message-Id: <202402290554.41T5soY1013558@anath.zia.io>
Date: Thu, 29 Feb 2024 05:54:50 GMT
From: john@ziaspace.com
Reply-To: john@ziaspace.com
To: gnats-bugs@NetBSD.org
Subject: rge* interface stops communicating after a while
X-Send-Pr-Version: 3.95
>Number: 57972
>Category: kern
>Synopsis: rge* interface stops communicating after a while
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Feb 29 05:55:00 +0000 2024
>Last-Modified: Sun Mar 17 22:25:01 +0000 2024
>Originator: John Klos
>Release: NetBSD 10.0_RC5
>Organization:
>Environment:
System: NetBSD sage.zia.io 10.0_RC5 NetBSD 10.0_RC5 (SAGE) #0: Tue Feb 27 07:17:37 UTC 2024 john@sage.zia.io:/usr/obj-amd64/sys/arch/amd64/compile/SAGE amd64
Architecture: x86_64
Machine: amd64
>Description:
Running a system with options GATEWAY as a NAT router with npf and
with rge* as the primary public interface occasionally leads to a
state where traffic on the public rge* interface stops flowing.
In case this was an issue with the specific card, I tried a completely
different card. No change.
Relevant lines from npf:
$ext_if = rge0
$ext_ip = { inet4($ext_if) }
map $ext_if dynamic $localnet_lan -> $ext_ip
group "external" on $ext_ip {
pass stateful out final all
pass stateful in final family inet4 proto tcp to $ext_ip port ssh apply "log"
block in final from <blocklist>
block final all apply "log"
}
group default {
pass final on lo0 all
pass in final all
pass out final all
}
When in this state, netstat -m shows:
7360 mbufs in use:
7302 mbufs allocated to data
51 mbufs allocated to packet headers
7 mbufs allocated to socket names and addresses
0 calls to protocol drain routines
Even though no traffic appears to be flowing, mbufs change a bit:
7363 mbufs in use:
7300 mbufs allocated to data
52 mbufs allocated to packet headers
11 mbufs allocated to socket names and addresses
0 calls to protocol drain routines
After running "ifconfig rge0 down" and waiting a few seconds, netstat -m gives:
4109 mbufs in use:
4102 mbufs allocated to data
2 mbufs allocated to packet headers
5 mbufs allocated to socket names and addresses
0 calls to protocol drain routines
Then, running "ifconfig rge0 up" gives working communications again.
Kernel config file is GENERIC plus options GATEWAY. Machine also routes public IPv6
and runs dhcpcd on public interface.
>How-To-Repeat:
>Fix:
>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57972: rge* interface stops communicating after a while
Date: Thu, 29 Feb 2024 06:22:16 -0000 (UTC)
john@ziaspace.com writes:
>Running a system with options GATEWAY as a NAT router with npf and
>with rge* as the primary public interface occasionally leads to a
>state where traffic on the public rge* interface stops flowing.
Maybe the same as kern/57694.
From: "David H. Gutteridge" <david@gutteridge.ca>
To: Gnats Bugs <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/57972: rge* interface stops communicating after a while
Date: Thu, 29 Feb 2024 20:51:50 -0500
On Thu, 29 Feb 2024 at 05:55:00 +0000 (UTC), John Klos wrote:
> In case this was an issue with the specific card, I tried a completely
> different card. No change.
By "completely different", I assume you mean something other than rge?
In which case, which type of card?
Dave
From: John Klos <john@klos.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/57972: rge* interface stops communicating after a while
Date: Fri, 1 Mar 2024 09:13:45 +0000 (UTC)
> Maybe the same as kern/57694.
It looks similar enough. I'm intentionally using interfaces on PCIe
because motherboard interfaces have issues being brought up on busy
networks, which might be related to the other issue reported in
kern/57694.
> > In case this was an issue with the specific card, I tried a completely
> > different card. No change.
>
> By "completely different", I assume you mean something other than rge?
> In which case, which type of card?
Apologies for not being clear. I was first running a dual port rge* PCIe
card (8125 rev. 0x04), and wanted to make sure it wasn't a faulty card, so
I bought a single port PCIe card with the same chipset (but rev. 0x05).
Same issue.
I've switched to Broadcom gigabit for now, but can switch back to rge* if
there's a need to test anything.
Thanks,
John
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57972: rge* interface stops communicating after a while
Date: Fri, 1 Mar 2024 12:36:34 -0000 (UTC)
john@klos.com (John Klos) writes:
>> Maybe the same as kern/57694.
>It looks similar enough. I'm intentionally using interfaces on PCIe
>because motherboard interfaces have issues being brought up on busy
>networks, which might be related to the other issue reported in
>kern/57694.
You could try the patch (i.e. use the -current version). While wiz
still seems to experience problems, I cannot reproduce them anymore
with the fix.
The system here is a NanoPi R6S (aarch64) that comes with two rge
interfaces on PCI.
From: John Klos <john@klos.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/57972: rge* interface stops communicating after a while
Date: Sun, 17 Mar 2024 22:21:18 +0000 (UTC)
> You could try the patch (i.e. use the -current version). While wiz
> still seems to experience problems, I cannot reproduce them anymore
> with the fix.
I tried with the patch, and while the issue happens less often, it still
happens.
> The system here is a NanoPi R6S (aarch64) that comes with two rge
> interfaces on PCI.
The three systems on which I'm seeing this regularly are various amd64
systems running NetBSD 10. Two have rge on the motherboard, and the third
on PCIe cards (I tried more than one because I initially thought it might
be a bad card).
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.