NetBSD Problem Report #57694

From wiz@exadelic.gatalith.at  Sun Nov 12 19:43:35 2023
Return-Path: <wiz@exadelic.gatalith.at>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 12BB41A9238
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 12 Nov 2023 19:43:35 +0000 (UTC)
Message-Id: <20231112194328.DE6C92EBBAD3@exadelic.gatalith.at>
Date: Sun, 12 Nov 2023 20:43:28 +0100 (CET)
From: Thomas Klausner <wiz@NetBSD.org>
Reply-To: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@NetBSD.org
Subject: rge(4) hang
X-Send-Pr-Version: 3.95

>Number:         57694
>Category:       kern
>Synopsis:       rge(4) hang
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Nov 12 19:45:00 +0000 2023
>Closed-Date:    
>Last-Modified:  Fri Aug 23 01:45:01 +0000 2024
>Originator:     Thomas Klausner
>Release:        NetBSD 10.99.10
>Organization:

>Environment:


Architecture: x86_64
Machine: amd64
>Description:
After the latest fixes, rge(4) is better, but it's completely hung up
the network interface twice so far - no network traffic possible on it
- both times so hard, that the BIOS had some kind of issue on the next
boot and needed 15 minutes to sort itself out (before even showing
anything on the screen).

I'm running a kernel from Oct 22.

That's in 1Gb mode.

dmesg:
rge0 at pci7 dev 0 function 0: Realtek Semiconductor 8125 10/100/1G/2.5G Ethernet (rev. 0x05)
rge0: interrupting at msix1 vec 0
rge0: Ethernet address xx.xx.xx.xx.xx.xx


In /var/log/messages I see:
Nov  1 18:59:43 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov  1 18:59:43 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 18:59:45 exadelic dhcpcd[2191]: rge0: Router Advertisement from xxxx::1  1 18:59:46 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov  1 18:59:46 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 18:59:58 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov  1 19:01:11 exadelic dhcpcd[2191]: rge0: xxxx::1 is reachable again
Nov  1 19:01:19 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov  1 19:01:19 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 19:01:31 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov  1 19:01:57 exadelic dhcpcd[2191]: rge0: xxxx::1 is reachable again
Nov  1 19:02:05 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov  1 19:02:05 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 19:02:17 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov  1 19:04:27 exadelic dhcpcd[2191]: rge0: xxxx::1 is reachable again
Nov  1 19:04:35 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov  1 19:04:35 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 19:04:47 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov  1 19:06:12 exadelic dhcpcd[2191]: rge0: xxxx::1 is reachable again
Nov  1 19:06:20 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov  1 19:06:21 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 19:06:33 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov  1 19:09:27 exadelic /netbsd: [ 91537.5847758] nfs server 192.168.178.19:/path: not responding
Nov  1 19:15:16 exadelic dhcpcd[2191]: rge0: xxxx::1 is reachable again
Nov  1 19:15:24 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov  1 19:15:24 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 19:15:36 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov  1 19:16:51 exadelic dhcpcd[2191]: rge0: xxxx::1 is reachable again
Nov  1 19:16:52 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space available
Nov  1 19:16:52 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space available
Nov  1 19:16:59 exadelic dhcpcd[2191]: rge0: xxxx::1 is unreachable
Nov  1 19:16:59 exadelic dhcpcd[2191]: rge0: soliciting an IPv6 router
Nov  1 19:16:59 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space available
Nov  1 19:17:11 exadelic syslogd[2290]: last message repeated 3 times
Nov  1 19:17:11 exadelic dhcpcd[2191]: rge0: no IPv6 Routers available
Nov  1 19:17:44 exadelic dhcpcd[2191]: ps_root_recvmsg: No buffer space available

Just in case it matters, I'm not running with default sysctl's, I have

kern.sbmax: 262144 -> 16777216
net.inet.tcp.recvbuf_max: 262144 -> 16777216
net.inet.tcp.sendbuf_max: 262144 -> 16777216
net.inet.tcp.recvspace: 32768 -> 262144
net.inet.tcp.sendspace: 32768 -> 262144

because of

https://mail-index.netbsd.org/current-users/2017/09/21/msg032369.html
>How-To-Repeat:
Make heavy use of an rge(4) device with NFS + SFTP.
>Fix:
Yes, please.

>Release-Note:

>Audit-Trail:
From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57694: rge(4) hang
Date: Sat, 16 Dec 2023 13:38:25 +0100

 The following patch helped here:

 --- sys/dev/pci/if_rge.c        19 Oct 2023 23:43:40 -0000      1.28
 +++ sys/dev/pci/if_rge.c        16 Dec 2023 12:33:04 -0000
 @@ -1404,24 +1404,14 @@ rge_txeof(struct rge_softc *sc)

         sc->rge_ldata.rge_txq_considx = cons;

 -#if 0
 -       if (ifq_is_oactive(&ifp->if_snd))
 -               ifq_restart(&ifp->if_snd);
 -       else if (free == 2)
 -               ifq_serialize(&ifp->if_snd, &sc->sc_task);
 -       else
 -               ifp->if_timer = 0;
 -#else
 -#if 0
 +       if (free == 2)
 +               rge_txstart(&sc->sc_task, sc);
 +
 +       CLR(ifp->if_flags, IFF_OACTIVE);
         if (!IF_IS_EMPTY(&ifp->if_snd))
                 rge_start(ifp);
         else
 -       if (free == 2)
 -               if (0) { rge_txstart(&sc->sc_task, sc); }
 -       else
 -#endif
                 ifp->if_timer = 0;
 -#endif

         return (1);
  }


 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: "Michael van Elst" <mlelstv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57694 CVS commit: src/sys/dev/pci
Date: Sat, 16 Dec 2023 16:35:49 +0000

 Module Name:	src
 Committed By:	mlelstv
 Date:		Sat Dec 16 16:35:49 UTC 2023

 Modified Files:
 	src/sys/dev/pci: if_rge.c

 Log Message:
 - handle stuck transmitter (descriptor still owned)
 - restart send queue after transmit
 - count output packets
 - use deferred start

 Should fix PR 57694


 To generate a diff of this commit:
 cvs rdiff -u -r1.28 -r1.29 src/sys/dev/pci/if_rge.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Sun, 31 Dec 2023 10:03:34 +0000
State-Changed-Why:
mlelstv's commit improved the situation a lot - I saw no more hangs
running the new kernel for over a week.


From: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57694: rge(4) hang
Date: Sun, 31 Dec 2023 11:02:24 +0100

 After running with rge(4) with mlelstv's change for more than a week,
 there were no problems. So this has improved stability a lot.

 Thank you, Michael!
  Thomas

State-Changed-From-To: closed->open
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Tue, 02 Jan 2024 16:03:24 +0000
State-Changed-Why:
Not solved yet.


From: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: Michael van Elst <mlelstv@serpens.de>
Subject: Re: kern/57694 (rge(4) hang)
Date: Tue, 2 Jan 2024 17:08:45 +0100

 Today the machine's network hung again, after about 10 days of uptime.

 In syslog I saw:

 Jan  2 16:34:27 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:34:27 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:34:31 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:34:39 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:34:39 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:34:42 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:34:47 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:34:50 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:34:51 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:34:51 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:34:53 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:34:55 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:34:55 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:01 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:35:02 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:35:02 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:05 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:35:06 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:06 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:13 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:35:14 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:35:14 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:17 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:35:18 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:18 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:25 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:35:26 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:35:26 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:29 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:35:30 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:30 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:37 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:35:37 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:35:37 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:41 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:35:41 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:41 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:49 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:35:49 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:35:49 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:49 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:53 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:35:57 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:35:57 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:36:01 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
 Jan  2 16:36:01 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:36:02 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:36:02 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:36:04 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:36:06 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:36:06 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:36:12 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:36:13 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:36:13 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:36:16 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:36:17 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:36:17 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:36:24 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:36:24 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:36:24 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:36:24 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:36:30 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:36:32 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:36:32 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:36:36 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
 Jan  2 16:36:38 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:36:39 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:36:39 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:36:51 exadelic syslogd[2503]: last message repeated 3 times
 Jan  2 16:36:51 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
 Jan  2 16:36:54 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:37:02 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:37:03 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:37:03 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:37:05 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:37:07 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:37:07 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:37:13 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:37:13 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:37:13 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:37:16 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:37:17 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:37:17 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:37:24 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:37:25 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:37:25 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:37:25 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:37:30 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:37:33 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:37:33 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:37:37 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
 Jan  2 16:37:38 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:37:39 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:37:39 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:37:42 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:37:43 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:37:43 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:37:50 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:37:50 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:37:50 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:38:02 exadelic syslogd[2503]: last message repeated 3 times
 Jan  2 16:38:02 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
 Jan  2 16:38:15 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:38:15 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:38:23 exadelic syslogd[2503]: last message repeated 2 times
 Jan  2 16:38:23 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:38:23 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:38:23 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:38:35 exadelic syslogd[2503]: last message repeated 3 times
 Jan  2 16:38:35 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
 Jan  2 16:38:45 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:38:53 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:38:53 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:38:53 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:39:05 exadelic syslogd[2503]: last message repeated 3 times
 Jan  2 16:39:05 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
 Jan  2 16:39:39 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:39:47 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:39:47 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:39:47 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:39:50 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:39:51 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:39:51 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:39:58 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:39:58 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:39:58 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:39:58 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:40:03 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:40:06 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:40:06 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:40:10 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
 Jan  2 16:40:11 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:40:12 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:40:12 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:40:12 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:40:19 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is reachable again
 Jan  2 16:40:20 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:40:20 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available
 Jan  2 16:40:24 exadelic dhcpcd[1790]: rge0: no IPv6 Routers available
 Jan  2 16:40:27 exadelic dhcpcd[1790]: rge0: fe80::de15:c8ff:fe36:2e98 is unreachable
 Jan  2 16:40:28 exadelic dhcpcd[1790]: rge0: soliciting an IPv6 router
 Jan  2 16:40:28 exadelic dhcpcd[1790]: ps_root_recvmsg: No buffer space available

 This was during a high load situation (bulk build). The machine
 stopped being accessible from the network.
  Thomas

From: matthew green <mrg@eterna23.net>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: kern/57694: rge(4) hang
Date: Sun, 11 Aug 2024 05:32:22 +1000

 i saw this problem last night, when doing fairly light NFS work (
 about 2MB/s reads.)

 i noticed i started trying to merge recent fixes from openbsd into
 if_rge.c late last year but didn't finish.  i'll look again.


 .mrg.

From: matthew green <mrg@eterna23.net>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, wiz@netbsd.org,
    netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Cc: 
Subject: re: kern/57694: rge(4) hang
Date: Sun, 11 Aug 2024 08:13:46 +1000

 matthew green writes:
 > i saw this problem last night, when doing fairly light NFS work (
 > about 2MB/s reads.)

 FWIW, "ifconfig rge0 down up" made it come back, so this seems like
 a bug in the descriptor ring handling, as that will all be reset by
 the down/up sequence.

 i had a look at the latest openbsd changes and i didn't spot anything
 that seemed relevant for this problem.


 .mrg.

From: matthew green <mrg@eterna23.net>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Cc: 
Subject: re: kern/57694: rge(4) hang
Date: Sun, 11 Aug 2024 09:23:24 +1000

 ah - i just noticed that my system is has an older netbsd-10
 kernel, that does not have revision 1.24.4.3 included, but
 also that netbsd-10 itself is missing rev 1.29, which fixed
 some hangs for mlelstv and made this problem less bad for wiz.

 so that may be all i am seeing..


 .mrg.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57694 CVS commit: [netbsd-10] src/sys/dev/pci
Date: Thu, 22 Aug 2024 19:22:35 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Thu Aug 22 19:22:35 UTC 2024

 Modified Files:
 	src/sys/dev/pci [netbsd-10]: if_rge.c

 Log Message:
 Pull up following revision(s) (requested by mrg in ticket #782):

 	sys/dev/pci/if_rge.c: revision 1.29

 - handle stuck transmitter (descriptor still owned)
 - restart send queue after transmit
 - count output packets
 - use deferred start

 Should fix PR 57694


 To generate a diff of this commit:
 cvs rdiff -u -r1.24.4.3 -r1.24.4.4 src/sys/dev/pci/if_rge.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: matthew green <mrg@eterna23.net>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Cc: 
Subject: re: kern/57694: rge(4) hang
Date: Fri, 23 Aug 2024 11:44:15 +1000

 matthew green writes:
 > ah - i just noticed that my system is has an older netbsd-10
 > kernel, that does not have revision 1.24.4.3 included, but
 > also that netbsd-10 itself is missing rev 1.29, which fixed
 > some hangs for mlelstv and made this problem less bad for wiz.
 >
 > so that may be all i am seeing..

 unfortunately, i did see one more soft-hang that was fixed
 with 'ifconfig rge0 down up', but haven't seen it for over
 12 more days, and my attempts to trigger failure with the
 same card in a zen2 testbox (the host i have seen hangs on
 is zen1+) have not gained any insight.

 netbsd-10 has the latest fixes now at least, thanks martin.


 .mrg.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.