NetBSD Problem Report #57972

From john@ziaspace.com  Thu Feb 29 05:54:57 2024
Return-Path: <john@ziaspace.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id C0E871A9239
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 29 Feb 2024 05:54:56 +0000 (UTC)
Message-Id: <202402290554.41T5soY1013558@anath.zia.io>
Date: Thu, 29 Feb 2024 05:54:50 GMT
From: john@ziaspace.com
Reply-To: john@ziaspace.com
To: gnats-bugs@NetBSD.org
Subject: rge* interface stops communicating after a while
X-Send-Pr-Version: 3.95

>Number:         57972
>Category:       kern
>Synopsis:       rge* interface stops communicating after a while
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Feb 29 05:55:00 +0000 2024
>Last-Modified:  Sun Mar 17 22:25:01 +0000 2024
>Originator:     John Klos
>Release:        NetBSD 10.0_RC5
>Organization:

>Environment:


System: NetBSD sage.zia.io 10.0_RC5 NetBSD 10.0_RC5 (SAGE) #0: Tue Feb 27 07:17:37 UTC 2024 john@sage.zia.io:/usr/obj-amd64/sys/arch/amd64/compile/SAGE amd64
Architecture: x86_64
Machine: amd64
>Description:

Running a system with options GATEWAY as a NAT router with npf and 
with rge* as the primary public interface occasionally leads to a 
state where traffic on the public rge* interface stops flowing.

In case this was an issue with the specific card, I tried a completely 
different card. No change.

Relevant lines from npf:

$ext_if = rge0
$ext_ip = { inet4($ext_if) }
map $ext_if dynamic $localnet_lan -> $ext_ip
group "external" on $ext_ip {
	pass stateful out final all
	pass stateful in final family inet4 proto tcp to $ext_ip port ssh apply "log"
	block in final from <blocklist>
	block final all apply "log"
}
group default {
	pass final on lo0 all
	pass in final all
	pass out final all
}

When in this state, netstat -m shows:

7360 mbufs in use:
	7302 mbufs allocated to data
	51 mbufs allocated to packet headers
	7 mbufs allocated to socket names and addresses
0 calls to protocol drain routines

Even though no traffic appears to be flowing, mbufs change a bit:

7363 mbufs in use:
	7300 mbufs allocated to data
	52 mbufs allocated to packet headers
	11 mbufs allocated to socket names and addresses
0 calls to protocol drain routines

After running "ifconfig rge0 down" and waiting a few seconds, netstat -m gives:

4109 mbufs in use:
	4102 mbufs allocated to data
	2 mbufs allocated to packet headers
	5 mbufs allocated to socket names and addresses
0 calls to protocol drain routines

Then, running "ifconfig rge0 up" gives working communications again.

Kernel config file is GENERIC plus options GATEWAY. Machine also routes public IPv6 
and runs dhcpcd on public interface.
>How-To-Repeat:

>Fix:


>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57972: rge* interface stops communicating after a while
Date: Thu, 29 Feb 2024 06:22:16 -0000 (UTC)

 john@ziaspace.com writes:

 >Running a system with options GATEWAY as a NAT router with npf and 
 >with rge* as the primary public interface occasionally leads to a 
 >state where traffic on the public rge* interface stops flowing.

 Maybe the same as kern/57694.

From: "David H. Gutteridge" <david@gutteridge.ca>
To: Gnats Bugs <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: kern/57972: rge* interface stops communicating after a while
Date: Thu, 29 Feb 2024 20:51:50 -0500

 On Thu, 29 Feb 2024 at 05:55:00 +0000 (UTC), John Klos wrote:
 > In case this was an issue with the specific card, I tried a completely
 > different card. No change.

 By "completely different", I assume you mean something other than rge?
 In which case, which type of card?

 Dave

From: John Klos <john@klos.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/57972: rge* interface stops communicating after a while
Date: Fri, 1 Mar 2024 09:13:45 +0000 (UTC)

 > Maybe the same as kern/57694.

 It looks similar enough. I'm intentionally using interfaces on PCIe 
 because motherboard interfaces have issues being brought up on busy 
 networks, which might be related to the other issue reported in 
 kern/57694.

 > > In case this was an issue with the specific card, I tried a completely
 > > different card. No change.
 >
 > By "completely different", I assume you mean something other than rge?
 > In which case, which type of card?

 Apologies for not being clear. I was first running a dual port rge* PCIe 
 card (8125 rev. 0x04), and wanted to make sure it wasn't a faulty card, so 
 I bought a single port PCIe card with the same chipset (but rev. 0x05). 
 Same issue.

 I've switched to Broadcom gigabit for now, but can switch back to rge* if 
 there's a need to test anything.

 Thanks,
 John

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57972: rge* interface stops communicating after a while
Date: Fri, 1 Mar 2024 12:36:34 -0000 (UTC)

 john@klos.com (John Klos) writes:

 >> Maybe the same as kern/57694.

 >It looks similar enough. I'm intentionally using interfaces on PCIe 
 >because motherboard interfaces have issues being brought up on busy 
 >networks, which might be related to the other issue reported in 
 >kern/57694.

 You could try the patch (i.e. use the -current version). While wiz
 still seems to experience problems, I cannot reproduce them anymore
 with the fix.

 The system here is a NanoPi R6S (aarch64) that comes with two rge
 interfaces on PCI.

From: John Klos <john@klos.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/57972: rge* interface stops communicating after a while
Date: Sun, 17 Mar 2024 22:21:18 +0000 (UTC)

 > You could try the patch (i.e. use the -current version). While wiz
 > still seems to experience problems, I cannot reproduce them anymore
 > with the fix.

 I tried with the patch, and while the issue happens less often, it still 
 happens.

 > The system here is a NanoPi R6S (aarch64) that comes with two rge
 > interfaces on PCI.

 The three systems on which I'm seeing this regularly are various amd64 
 systems running NetBSD 10. Two have rge on the motherboard, and the third 
 on PCIe cards (I tried more than one because I initially thought it might 
 be a bad card).

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.