NetBSD Problem Report #55326
From www@netbsd.org Sun May 31 12:15:58 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 383E61A9218
for <gnats-bugs@gnats.NetBSD.org>; Sun, 31 May 2020 12:15:58 +0000 (UTC)
Message-Id: <20200531121557.4FC161A921A@mollari.NetBSD.org>
Date: Sun, 31 May 2020 12:15:57 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: gem(4): memory corruption by RX DMA
X-Send-Pr-Version: www-1.0
>Number: 55326
>Notify-List: david@gutteridge.ca
>Category: port-macppc
>Synopsis: gem(4): memory corruption by RX DMA
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-macppc-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun May 31 12:20:00 +0000 2020
>Last-Modified: Sun May 31 19:26:52 +0000 2020
>Originator: Rin Okuyama
>Release: 9.99.64
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD macmini 9.99.64 NetBSD 9.99.64 (GENERIC) #65: Sun May 31 01:11:41 JST 2020 rin@latipes:/usr/src/sys/arch/macppc/compile/GENERIC macppc
>Description:
If DIAGNOSTIC is enabled for machine with gem(4), Mac mini for me, panic
occurs as:
panic: pr_phinpage_check: [mclpl] item 0x3fb0b040 not part of pool
cpu0: Begin traceback...
...: at vpanic+...
...: at panic+...
...: at pool_cache_put_paddr+...
...: at m_ext_free+...
...: at m_freem.part.7+...
...: at ether_input+...
...: at if_percpuq_softint+...
...: at softint_dispatch+...
...: at softint_fast_dispatch+...
saved LR(0x1c) is invalid.cpu0: End traceback...
I found that ph_page field became NULL when this panic occurred, whereas
it was correctly initialized at the time of MCLGET(9).
This dirty hack fixes the problem as far as I can see:
----
Index: sys/kern/uipc_mbuf.c
===================================================================
RCS file: /home/netbsd/src/sys/kern/uipc_mbuf.c,v
retrieving revision 1.241
diff -p -u -r1.241 uipc_mbuf.c
--- sys/kern/uipc_mbuf.c 5 May 2020 20:36:48 -0000 1.241
+++ sys/kern/uipc_mbuf.c 25 May 2020 14:08:51 -0000
@@ -188,8 +188,13 @@ mbinit(void)
NULL, IPL_VM, mb_ctor, NULL, NULL);
KASSERT(mb_cache != NULL);
+#ifdef GEM_WORKAROUND /* XXXXXXXX */
+ mcl_cache = pool_cache_init(mclbytes, PAGE_SIZE, 0, 0, "mclpl",
+ NULL, IPL_VM, NULL, NULL, NULL);
+#else
mcl_cache = pool_cache_init(mclbytes, COHERENCY_UNIT, 0, 0, "mclpl",
NULL, IPL_VM, NULL, NULL, NULL);
+#endif
KASSERT(mcl_cache != NULL);
pool_cache_set_drain_hook(mb_cache, mb_drain, NULL);
----
Therefore, I guess that RX DMA of gem(4) pollutes memory located at the
page offset of DMA buffer. However, this is not documented in the manual[1].
(They only recommends buffers to be aligned in cache line (not mandatory),
but this is achieved even if DIAGNOSTIC is enabled; m_ext.ext_buf is aligned
in COHERENT_UNIT = 64, that is larger than 32, cache line of Mac mini.)
[1] Sun Microsystems, Gigabit Ethernet ASIC Specification
>How-To-Repeat:
Described above.
>Fix:
N/A. Hardware limitation? In that case, use its own pool for DMA buffer?
>Release-Note:
>Audit-Trail:
From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-macppc/55326: gem(4): memory corruption by RX DMA
Date: Sun, 31 May 2020 14:47:53 -0400
FWIW, I reported a similar backtrace in PR port-macppc/54331:
[ 20.6900578] panic: pr_phinpage_check: [mclpl] item 0x1fb2f040 not
part of pool
[ 20.6900578] cpu0: Begin traceback...
[ 20.6900578] 0x10007da0: at vpanic+0x144
[ 20.7300428] 0x10007dd0: at panic+0x50
[ 20.7300428] 0x10007e20: at pool_cache_put_paddr+0x25c
[ 20.7600388] 0x10007e50: at m_ext_free+0x130
[ 20.7600388] 0x10007e60: at m_free+0x9c
[ 20.7600388] 0x10007e70: at m_freem.part.8+0xc
[ 20.7600388] 0x10007e80: at ether_input+0x67c
[ 20.7600388] 0x10007eb0: at if_percpuq_softint+0xb4
[ 20.7600388] 0x10007ed0: at softint_dispatch+0x1d0
[ 20.7600388] 0x10007f20: at softint_fast_dispatch+0xdc
[ 20.8700561] 0x10007fe8: at 0xfffffffc
[ 20.8700561] cpu0: End traceback...
Stopped in pid 0.3 (system)
at netbsd:vpanic+0x148: or r3, r29, r29
gem0: receive error: RX overflow sc->rxptr 0, complete 4
(Since the rest of PR 54331 is addressed, and this PR contains
analysis and a patch, I'll close 54331.)
Thanks,
Dave
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.