NetBSD Problem Report #56850
From www@netbsd.org Mon May 23 01:55:40 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 53DEC1A9242
for <gnats-bugs@gnats.NetBSD.org>; Mon, 23 May 2022 01:55:40 +0000 (UTC)
Message-Id: <20220523015538.8FD681A9243@mollari.NetBSD.org>
Date: Mon, 23 May 2022 01:55:38 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: system locks up with NFS root & swap on mvgbe(4)
X-Send-Pr-Version: www-1.0
>Number: 56850
>Category: port-arm
>Synopsis: system locks up with NFS root for Kirkwood and Orion
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon May 23 02:00:01 +0000 2022
>Last-Modified: Fri Oct 13 08:50:01 +0000 2023
>Originator: Rin Okuyama
>Release: 9.99.96
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD obsa6 9.99.96 NetBSD 9.99.96 (OBSA6_BE) #23: Mon May 23 00:06:35 JST 2022 rin@latipes:/build/src/sys/arch/evbarm/compile/OBSA6_BE evbarm
>Description:
* Summary
The system eventually locks up with NFS root/swap on mvgbe(4).
This is probably due to software or hardware bugs of mvgbe(4), but
at the same time, I suspect that our NFS client may be fragile for
packet loss or other problems for NICs.
* Details
The failure occurs on ARM9E-based Marvell SoCs:
- KUROBOX_PRO: https://dmesgd.nycbug.org/index.cgi?do=view&id=6594
- OPENBLOCKS_A6: https://dmesgd.nycbug.org/index.cgi?do=view&id=6595
both in little- and big-endian mode.
With NFS root/swap on mvgbe(4), the system eventually locks up under
heavy I/O while building some pkgsrc's. Once the failure occurs, the
system does not respond to anything but input from serial console.
Then, I observe that many processes sleep at "nfsrecv":
https://gist.github.com/rokuyama/228f7afe67ffa8fe8024eb10bc2f14a1
The problem seems to be significantly mitigated by using UDP, but
it is not perfect; the failure occurs ~ every few hours for TCP,
while it does ~ every day for UDP.
For a similar generation armv5-based machine but with wm(4):
- HDL_G: https://dmesgd.nycbug.org/index.cgi?do=view&id=6139
I've never observed a similar failure.
Therefore, there should be bugs in mvgbe(4), or hardware problems.
However, at the same time, I wonder whether we can improve NFS or
socket layers in kernel; even if some packets are unexpectedly lost,
NFS routines should not sleep forever.
>How-To-Repeat:
Build some pkgsrc's with NFS root/swap on mvgbe(4).
>Fix:
N/A
>Release-Note:
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56850: system locks up with NFS root & swap on mvgbe(4)
Date: Thu, 26 May 2022 02:35:28 +0000
Not sent to gnats.
------
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/56850: system locks up with NFS root & swap on mvgbe(4)
Date: Mon, 23 May 2022 21:41:40 +0900
Similar failures were observed with axe(4) and axen(4) for OPENBLOCKS_A6.
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, netbsd-bugs@netbsd.org
Cc:
Subject: Re: port-arm/56850 (system locks up with NFS root for Kirkwood and Orion)
Date: Thu, 12 Oct 2023 16:16:21 +0900
Category and title have been updated.
As reported earlier, this lock up occurs for USB NICs. Therefore, the
problem should be due to the very MD parts of {,evb}arm/marvell.
KURO-BOX/PRO (Orion) has a PCIe slot:
https://dmesgd.nycbug.org/index.cgi?do=view&id=6594
Even with wm(4) variants in this slot, the system locked up within ~ a day.
I will revisit this problem soon. LOCKDEBUG may or may not be helpful...
Thanks,
rin
From: "Jonathan A. Kollasch" <jakllsch@kollasch.net>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56850: system locks up with NFS root & swap on mvgbe(4)
Date: Thu, 12 Oct 2023 15:05:46 -0500
I have a vague recollection this might be related or similar to what I
tried to fix in r1.14 src/sys/dev/marvell/if_mvgbe.c
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: "Jonathan A. Kollasch" <jakllsch@kollasch.net>, "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/56850: system locks up with NFS root & swap on mvgbe(4)
Date: Fri, 13 Oct 2023 17:45:28 +0900
On Fri, Oct 13, 2023 at 5:10=E2=80=AFAM Jonathan A. Kollasch
<jakllsch@kollasch.net> wrote:
> I have a vague recollection this might be related or similar to what I
> tried to fix in r1.14 src/sys/dev/marvell/if_mvgbe.c
Thanks for hints! I will examine documents and/or how other OSes handle DMA=
C.
rin
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.