NetBSD Problem Report #57694
From wiz@exadelic.gatalith.at Sun Nov 12 19:43:35 2023
Return-Path: <wiz@exadelic.gatalith.at>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 12BB41A9238
for <gnats-bugs@gnats.NetBSD.org>; Sun, 12 Nov 2023 19:43:35 +0000 (UTC)
Message-Id: <20231112194328.DE6C92EBBAD3@exadelic.gatalith.at>
Date: Sun, 12 Nov 2023 20:43:28 +0100 (CET)
From: Thomas Klausner <wiz@NetBSD.org>
Reply-To: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@NetBSD.org
Subject: rge(4) hang
X-Send-Pr-Version: 3.95
>Number: 57694
>Category: kern
>Synopsis: rge(4) hang
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Nov 12 19:45:00 +0000 2023
>Closed-Date:
>Last-Modified: Fri Aug 23 01:45:01 +0000 2024
>Originator: Thomas Klausner
>Release: NetBSD 10.99.10
>Organization:
>Environment:
Architecture: x86_64
Machine: amd64
>Description:
After the latest fixes, rge(4) is better, but it's completely hung up
the network interface twice so far - no network traffic possible on it
- both times so hard, that the BIOS had some kind of issue on the next
boot and needed 15 minutes to sort itself out (before even showing
anything on the screen).
I'm running a kernel from Oct 22.
That's in 1Gb mode.
dmesg:
rge0 at pci7 dev 0 function 0: Realtek Semiconductor 8125 10/100/1G/2.5G Ethernet (rev. 0x05)
rge0: interrupting at msix1 vec 0
rge0: Ethernet address xx.xx.xx.xx.xx.xx
In /var/log/messages I see:
Nov 1 18:59:43 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov 1 18:59:43 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov 1 18:59:45 exadelic dhcpcd[2191]: rge0: Router Advertisement from xxxx::1 1 18:59:46 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov 1 18:59:46 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov 1 18:59:58 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov 1 19:01:11 exadelic dhcpcd[2191]: rge0: xxxx::1 is reachable again
Nov 1 19:01:19 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov 1 19:01:19 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov 1 19:01:31 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov 1 19:01:57 exadelic dhcpcd[2191]: rge0: xxxx::1 is reachable again
Nov 1 19:02:05 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov 1 19:02:05 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov 1 19:02:17 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov 1 19:04:27 exadelic dhcpcd[2191]: rge0: xxxx::1 is reachable again
Nov 1 19:04:35 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov 1 19:04:35 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov 1 19:04:47 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov 1 19:06:12 exadelic dhcpcd[2191]: rge0: xxxx::1 is reachable again
Nov 1 19:06:20 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov 1 19:06:21 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov 1 19:06:33 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov 1 19:09:27 exadelic /netbsd: [ 91537.5847758] nfs server 192.168.178.19:/path: not responding
Nov 1 19:15:16 exadelic dhcpcd[2191]: rge0: xxxx::1 is reachable again
Nov 1 19:15:24 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov 1 19:15:24 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov 1 19:15:36 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov 1 19:16:51 exadelic dhcpcd[2191]: rge0: xxxx::1 is reachable again
Nov 1 19:16:52 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space available
Nov 1 19:16:52 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space available
Nov 1 19:16:59 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov 1 19:16:59 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov 1 19:16:59 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space available
Nov 1 19:17:11 exadelic syslogd[2290]: last message repeated 3 times
Nov 1 19:17:11 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov 1 19:17:44 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space available
Just in case it matters, I'm not running with default sysctl's, I have
kern.sbmax: 262144 -> 16777216
net.inet.tcp.recvbuf_max: 262144 -> 16777216
net.inet.tcp.sendbuf_max: 262144 -> 16777216
net.inet.tcp.recvspace: 32768 -> 262144
net.inet.tcp.sendspace: 32768 -> 262144
because of
https://mail-index.netbsd.org/current-users/2017/09/21/msg032369.html
>How-To-Repeat:
Make heavy use of an rge(4) device with NFS + SFTP.
>Fix:
Yes, please.
>Release-Note:
>Audit-Trail:
From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57694: rge(4) hang
Date: Sat, 16 Dec 2023 13:38:25 +0100
The following patch helped here:
--- sys/dev/pci/if_rge.c 19 Oct 2023 23:43:40 -0000 1.28
+++ sys/dev/pci/if_rge.c 16 Dec 2023 12:33:04 -0000
@@ -1404,24 +1404,14 @@ rge_txeof(struct rge_softc *sc)
sc->rge_ldata.rge_txq_considx = cons;
-#if 0
- if (ifq_is_oactive(&ifp->if_snd))
- ifq_restart(&ifp->if_snd);
- else if (free == 2)
- ifq_serialize(&ifp->if_snd, &sc->sc_task);
- else
- ifp->if_timer = 0;
-#else
-#if 0
+ if (free == 2)
+ rge_txstart(&sc->sc_task, sc);
+
+ CLR(ifp->if_flags, IFF_OACTIVE);
if (!IF_IS_EMPTY(&ifp->if_snd))
rge_start(ifp);
else
- if (free == 2)
- if (0) { rge_txstart(&sc->sc_task, sc); }
- else
-#endif
ifp->if_timer = 0;
-#endif
return (1);
}
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: "Michael van Elst" <mlelstv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57694 CVS commit: src/sys/dev/pci
Date: Sat, 16 Dec 2023 16:35:49 +0000
Module Name: src
Committed By: mlelstv
Date: Sat Dec 16 16:35:49 UTC 2023
Modified Files:
src/sys/dev/pci: if_rge.c
Log Message:
- handle stuck transmitter (descriptor still owned)
- restart send queue after transmit
- count output packets
- use deferred start
Should fix PR 57694
To generate a diff of this commit:
cvs rdiff -u -r1.28 -r1.29 src/sys/dev/pci/if_rge.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Sun, 31 Dec 2023 10:03:34 +0000
State-Changed-Why:
mlelstv's commit improved the situation a lot - I saw no more hangs
running the new kernel for over a week.
From: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57694: rge(4) hang
Date: Sun, 31 Dec 2023 11:02:24 +0100
After running with rge(4) with mlelstv's change for more than a week,
there were no problems. So this has improved stability a lot.
Thank you, Michael!
Thomas
State-Changed-From-To: closed->open
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Tue, 02 Jan 2024 16:03:24 +0000
State-Changed-Why:
Not solved yet.
From: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: Michael van Elst <mlelstv@serpens.de>
Subject: Re: kern/57694 (rge(4) hang)
Date: Tue, 2 Jan 2024 17:08:45 +0100
Today the machine's network hung again, after about 10 days of uptime.
In syslog I saw:
Jan 2 16:34:27 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:34:27 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:34:31 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:34:39 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:34:39 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:34:42 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:34:47 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:34:50 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:34:51 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:34:51 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:34:53 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:34:55 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:34:55 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:01 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:35:02 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:35:02 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:05 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:35:06 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:06 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:13 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:35:14 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:35:14 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:17 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:35:18 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:18 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:25 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:35:26 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:35:26 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:29 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:35:30 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:30 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:37 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:35:37 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:35:37 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:41 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:35:41 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:41 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:49 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:35:49 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:35:49 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:49 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:53 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:35:57 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:35:57 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:36:01 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
Jan 2 16:36:01 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:36:02 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:36:02 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:36:04 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:36:06 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:36:06 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:36:12 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:36:13 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:36:13 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:36:16 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:36:17 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:36:17 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:36:24 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:36:24 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:36:24 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:36:24 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:36:30 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:36:32 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:36:32 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:36:36 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
Jan 2 16:36:38 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:36:39 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:36:39 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:36:51 exadelic syslogd[2503]: last message repeated 3 times
Jan 2 16:36:51 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
Jan 2 16:36:54 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:37:02 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:37:03 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:37:03 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:37:05 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:37:07 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:37:07 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:37:13 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:37:13 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:37:13 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:37:16 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:37:17 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:37:17 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:37:24 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:37:25 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:37:25 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:37:25 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:37:30 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:37:33 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:37:33 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:37:37 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
Jan 2 16:37:38 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:37:39 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:37:39 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:37:42 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:37:43 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:37:43 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:37:50 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:37:50 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:37:50 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:38:02 exadelic syslogd[2503]: last message repeated 3 times
Jan 2 16:38:02 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
Jan 2 16:38:15 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:38:15 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:38:23 exadelic syslogd[2503]: last message repeated 2 times
Jan 2 16:38:23 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:38:23 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:38:23 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:38:35 exadelic syslogd[2503]: last message repeated 3 times
Jan 2 16:38:35 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
Jan 2 16:38:45 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:38:53 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:38:53 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:38:53 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:39:05 exadelic syslogd[2503]: last message repeated 3 times
Jan 2 16:39:05 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
Jan 2 16:39:39 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:39:47 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:39:47 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:39:47 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:39:50 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:39:51 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:39:51 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:39:58 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:39:58 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:39:58 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:39:58 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:40:03 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:40:06 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:40:06 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:40:10 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
Jan 2 16:40:11 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:40:12 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:40:12 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:40:12 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:40:19 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
Jan 2 16:40:20 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:40:20 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
Jan 2 16:40:24 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
Jan 2 16:40:27 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
Jan 2 16:40:28 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
Jan 2 16:40:28 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
This was during a high load situation (bulk build). The machine
stopped being accessible from the network.
Thomas
From: matthew green <mrg@eterna23.net>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: re: kern/57694: rge(4) hang
Date: Sun, 11 Aug 2024 05:32:22 +1000
i saw this problem last night, when doing fairly light NFS work (
about 2MB/s reads.)
i noticed i started trying to merge recent fixes from openbsd into
if_rge.c late last year but didn't finish. i'll look again.
.mrg.
From: matthew green <mrg@eterna23.net>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, wiz@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Cc:
Subject: re: kern/57694: rge(4) hang
Date: Sun, 11 Aug 2024 08:13:46 +1000
matthew green writes:
> i saw this problem last night, when doing fairly light NFS work (
> about 2MB/s reads.)
FWIW, "ifconfig rge0 down up" made it come back, so this seems like
a bug in the descriptor ring handling, as that will all be reset by
the down/up sequence.
i had a look at the latest openbsd changes and i didn't spot anything
that seemed relevant for this problem.
.mrg.
From: matthew green <mrg@eterna23.net>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Cc:
Subject: re: kern/57694: rge(4) hang
Date: Sun, 11 Aug 2024 09:23:24 +1000
ah - i just noticed that my system is has an older netbsd-10
kernel, that does not have revision 1.24.4.3 included, but
also that netbsd-10 itself is missing rev 1.29, which fixed
some hangs for mlelstv and made this problem less bad for wiz.
so that may be all i am seeing..
.mrg.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57694 CVS commit: [netbsd-10] src/sys/dev/pci
Date: Thu, 22 Aug 2024 19:22:35 +0000
Module Name: src
Committed By: martin
Date: Thu Aug 22 19:22:35 UTC 2024
Modified Files:
src/sys/dev/pci [netbsd-10]: if_rge.c
Log Message:
Pull up following revision(s) (requested by mrg in ticket #782):
sys/dev/pci/if_rge.c: revision 1.29
- handle stuck transmitter (descriptor still owned)
- restart send queue after transmit
- count output packets
- use deferred start
Should fix PR 57694
To generate a diff of this commit:
cvs rdiff -u -r1.24.4.3 -r1.24.4.4 src/sys/dev/pci/if_rge.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: matthew green <mrg@eterna23.net>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Cc:
Subject: re: kern/57694: rge(4) hang
Date: Fri, 23 Aug 2024 11:44:15 +1000
matthew green writes:
> ah - i just noticed that my system is has an older netbsd-10
> kernel, that does not have revision 1.24.4.3 included, but
> also that netbsd-10 itself is missing rev 1.29, which fixed
> some hangs for mlelstv and made this problem less bad for wiz.
>
> so that may be all i am seeing..
unfortunately, i did see one more soft-hang that was fixed
with 'ifconfig rge0 down up', but haven't seen it for over
12 more days, and my attempts to trigger failure with the
same card in a zen2 testbox (the host i have seen hangs on
is zen1+) have not gained any insight.
netbsd-10 has the latest fixes now at least, thanks martin.
.mrg.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.