NetBSD Problem Report #56818
From www@netbsd.org Sat May 7 09:31:09 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 06BF21A923B
for <gnats-bugs@gnats.NetBSD.org>; Sat, 7 May 2022 09:31:09 +0000 (UTC)
Message-Id: <20220507093037.816BB1A923C@mollari.NetBSD.org>
Date: Sat, 7 May 2022 09:30:37 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: oea: system trapped in pgdaemon under high memory pressure
X-Send-Pr-Version: www-1.0
>Number: 56818
>Category: port-powerpc
>Synopsis: oea: system trapped in pgdaemon under high memory pressure
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: port-powerpc-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat May 07 09:35:01 +0000 2022
>Closed-Date: Mon May 09 11:46:22 +0000 2022
>Last-Modified: Thu May 12 21:00:04 +0000 2022
>Originator: Rin Okuyama
>Release: 9.99.96
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD 9.99.96 NetBSD 9.99.96 (KUROBOX) #4: Sat May 7 16:18:45 JST 2022 rin@latipes:/build/src/sys/arch/sandpoint/compile/KUROBOX sandpoint
>Description:
For 128MB-RAM 603e-based sandpoint machine, the system gets frozen
with some high-memory-pressure operations, e.g., simultaneous runs of
fc-cache(1) and makemandb(1), ATF like ones in lib/libc/db, etc..
It is trapped into pgdaemon, and no operation other than entering
DDB is possible.
Typical backtrace is like:
---
db> bt
0x00b44c20: at intr_deliver+0x98
0x00b44c60: at pic_handle_intr+0x108
0x00b44cc0: at trapstart+0x690
0x00b44d90: at 0xb44e74
0x00b44db0: at pmap_query_bit+0x90
0x00b44dd0: at uvmpdpol_selectvictim+0x1c8
0x00b44e20: at uvm_pageout+0x298
0x00b44f20: at cpu_lwp_bootstrap+0xc
saved LR(0xfffffefb) is invalid.
---
or
---
db> bt
0x00b44c60: at intr_deliver+0x98
0x00b44ca0: at pic_handle_intr+0x108
0x00b44d00: at trapstart+0x690
0x00b44dd0: at uvmpdpol_selectvictim+0x3dc
0x00b44e20: at uvm_pageout+0x298
0x00b44f20: at cpu_lwp_bootstrap+0xc
saved LR(0xfffffefb) is invalid.
---
or it is at some other functions called from uvm_pageout().
Even for 1GB-RAM Mac Mini G4, a similar failure has been observed;
gem(4) claimed memory shortage forever. Unfortunately, I couldn't get
backtrace for this case.
This is not observed for booke and ibm4xx as far as I can see;
even 64MB-RAM EXPLORA451 survives tests in lib/libc/db.
Therefore, I guess something wrong with oea's pmap.
This should be a recent (~ 6 months or so) regression. This failure
occurs even if PV-tracking is disabled by PMAP_PV_TRACK_ONLY_STUBS
option.
This may or may not be related to instabilities reported on
port-macppc@:
https://mail-index.netbsd.org/port-macppc/2022/03/29/msg002954.html
>How-To-Repeat:
Run ATF on sandpoint (or low-memory powerpc/oea) machine.
>Fix:
N/A
>Release-Note:
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-powerpc/56818: oea: system trapped in pgdaemon under high
memory pressure
Date: Sat, 7 May 2022 13:58:11 +0200
Can you "show uvmexp", continue, break again after a few seconds and repeat
a few times?
Martin
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, port-powerpc-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: port-powerpc/56818: oea: system trapped in pgdaemon under high
memory pressure
Date: Sat, 7 May 2022 23:00:11 +0900
On 2022/05/07 21:00, Martin Husemann wrote:
> Can you "show uvmexp", continue, break again after a few seconds and repeat
> a few times?
Here's 4 successive "show uvmexp" with ~10 sec between continue and break:
https://gist.github.com/rokuyama/d27a0fd3a442fe1a5c9ccd5eed260ac8
and this is diff between them:
https://gist.github.com/rokuyama/9ababdc91b553f76deec22152f31e015
Thanks,
rin
From: matthew green <mrg@eterna.com.au>
To: Rin Okuyama <rokuyama.rk@gmail.com>
Cc: gnats-bugs@netbsd.org, port-powerpc-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: re: port-powerpc/56818: oea: system trapped in pgdaemon under high memory pressure
Date: Sun, 08 May 2022 05:35:25 +1000
Rin Okuyama writes:
> On 2022/05/07 21:00, Martin Husemann wrote:
> > Can you "show uvmexp", continue, break again after a few seconds and=
repeat
> > a few times?
>
> Here's 4 successive "show uvmexp" with ~10 sec between continue and brea=
k:
>
> https://gist.github.com/rokuyama/d27a0fd3a442fe1a5c9ccd5eed260ac8
>
> and this is diff between them:
>
> https://gist.github.com/rokuyama/9ababdc91b553f76deec22152f31e015
i see that file pages are ~60% of the system memory.
what are the vm.{file,anon,exec}{min,max} values? the default for
file{min,max} are {10,50}, so it should be ejecting at least 10%
of memory really easily to get under filemax. if those pages are
dirty why aren't they being paged out... (system boot, and pools
are also using about 15% of memory currently..)
.mrg.
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, port-powerpc-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: port-powerpc/56818: oea: system trapped in pgdaemon under high
memory pressure
Date: Sun, 8 May 2022 20:49:14 +0900
On 2022/05/08 4:40, matthew green wrote:
> i see that file pages are ~60% of the system memory.
>
> what are the vm.{file,anon,exec}{min,max} values? the default for
> file{min,max} are {10,50}, so it should be ejecting at least 10%
> of memory really easily to get under filemax. if those pages are
> dirty why aren't they being paged out... (system boot, and pools
> are also using about 15% of memory currently..)
They are kept default:
----
# sysctl vm
vm.loadavg: 0.01 0.09 0.06
vm.maxslp = 20
vm.uspace = 16384
vm.minaddress = 0
vm.maxaddress = -4096
vm.guard_size = 1048576
vm.thread_guard_size = 65536
vm.user_va0_disable = 1
vm.anonmin = 10
vm.filemin = 10
vm.execmin = 5
vm.anonmax = 80
vm.filemax = 50
vm.execmax = 30
vm.inactivepct = 33
vm.swap_encrypt = 1
vm.bufcache = 15
vm.bufmem = 6926336
vm.bufmem_lowater = 2516480
vm.bufmem_hiwater = 2013184
----
And...
Maybe I've found it; inverted logic introduced in pmap.c rev. 1.108:
----
Index: sys/arch/powerpc/oea/pmap.c
===================================================================
RCS file: /home/netbsd/src/sys/arch/powerpc/oea/pmap.c,v
retrieving revision 1.113
diff -p -u -r1.113 pmap.c
--- sys/arch/powerpc/oea/pmap.c 9 Apr 2022 23:38:32 -0000 1.113
+++ sys/arch/powerpc/oea/pmap.c 8 May 2022 11:40:13 -0000
@@ -674,7 +674,7 @@ static inline void
pmap_pp_attr_clear(struct pmap_page *pp, int ptebit)
{
- pp->pp_attrs &= ptebit;
+ pp->pp_attrs &= ~ptebit;
}
static inline void
----
Referenced/modified bits never get cleared this way...
I will commit soon if full ATF passes on sandpoint.
Thanks,
rin
From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56818 CVS commit: src/sys/arch/powerpc/oea
Date: Mon, 9 May 2022 11:39:44 +0000
Module Name: src
Committed By: rin
Date: Mon May 9 11:39:44 UTC 2022
Modified Files:
src/sys/arch/powerpc/oea: pmap.c
Log Message:
PR port-powerpc/56818
Fix inverted logic introduced in rev. 1.108, by which modified/referenced
bits of pages were never cleared appropriately.
Now, full ATF runs on macppc and sandpoint, with no regression observed.
To generate a diff of this commit:
cvs rdiff -u -r1.113 -r1.114 src/sys/arch/powerpc/oea/pmap.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: rin@NetBSD.org
State-Changed-When: Mon, 09 May 2022 11:46:22 +0000
State-Changed-Why:
Fixed. No release branches affected.
From: Havard Eidnes <he@NetBSD.org>
To: rokuyama.rk@gmail.com
Cc: gnats-bugs@netbsd.org, port-powerpc-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: port-powerpc/56818: oea: system trapped in pgdaemon under high
memory pressure
Date: Thu, 12 May 2022 22:57:25 +0200 (CEST)
Hi,
I have reason to beleive that this problem post-dates 9.99.93
dated January 7 2022 -- I have a G4 Mac Mini running that version
successfully, but have tried 9.99.96 and 9.99.94 unsuccessfully;
the 9.99.94 from March 19 2022, apparently.
(That's the ID strings from the kernels I have lying around as
having been tested.)
Regards,
- H=E5vard
From: Havard Eidnes <he@NetBSD.org>
To: rokuyama.rk@gmail.com
Cc: gnats-bugs@netbsd.org, port-powerpc-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: port-powerpc/56818: oea: system trapped in pgdaemon under high
memory pressure
Date: Thu, 12 May 2022 22:59:14 +0200 (CEST)
Hm,
should have read the whole PR through.
Good catch!
Regards,
- H=E5vard
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.