NetBSD Problem Report #56818

From www@netbsd.org  Sat May  7 09:31:09 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 06BF21A923B
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  7 May 2022 09:31:09 +0000 (UTC)
Message-Id: <20220507093037.816BB1A923C@mollari.NetBSD.org>
Date: Sat,  7 May 2022 09:30:37 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: oea: system trapped in pgdaemon under high memory pressure
X-Send-Pr-Version: www-1.0

>Number:         56818
>Category:       port-powerpc
>Synopsis:       oea: system trapped in pgdaemon under high memory pressure
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-powerpc-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat May 07 09:35:01 +0000 2022
>Closed-Date:    Mon May 09 11:46:22 +0000 2022
>Last-Modified:  Thu May 12 21:00:04 +0000 2022
>Originator:     Rin Okuyama
>Release:        9.99.96
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD  9.99.96 NetBSD 9.99.96 (KUROBOX) #4: Sat May  7 16:18:45 JST 2022  rin@latipes:/build/src/sys/arch/sandpoint/compile/KUROBOX sandpoint
>Description:
For 128MB-RAM 603e-based sandpoint machine, the system gets frozen
with some high-memory-pressure operations, e.g., simultaneous runs of
fc-cache(1) and makemandb(1), ATF like ones in lib/libc/db, etc..

It is trapped into pgdaemon, and no operation other than entering
DDB is possible.

Typical backtrace is like:

---
db> bt
0x00b44c20: at intr_deliver+0x98
0x00b44c60: at pic_handle_intr+0x108
0x00b44cc0: at trapstart+0x690
0x00b44d90: at 0xb44e74
0x00b44db0: at pmap_query_bit+0x90
0x00b44dd0: at uvmpdpol_selectvictim+0x1c8
0x00b44e20: at uvm_pageout+0x298
0x00b44f20: at cpu_lwp_bootstrap+0xc
saved LR(0xfffffefb) is invalid.
---

or

---
db> bt
0x00b44c60: at intr_deliver+0x98
0x00b44ca0: at pic_handle_intr+0x108
0x00b44d00: at trapstart+0x690
0x00b44dd0: at uvmpdpol_selectvictim+0x3dc
0x00b44e20: at uvm_pageout+0x298
0x00b44f20: at cpu_lwp_bootstrap+0xc
saved LR(0xfffffefb) is invalid.
---

or it is at some other functions called from uvm_pageout().

Even for 1GB-RAM Mac Mini G4, a similar failure has been observed;
gem(4) claimed memory shortage forever. Unfortunately, I couldn't get
backtrace for this case.

This is not observed for booke and ibm4xx as far as I can see;
even 64MB-RAM EXPLORA451 survives tests in lib/libc/db.

Therefore, I guess something wrong with oea's pmap.

This should be a recent (~ 6 months or so) regression. This failure
occurs even if PV-tracking is disabled by PMAP_PV_TRACK_ONLY_STUBS
option.

This may or may not be related to instabilities reported on
port-macppc@:

https://mail-index.netbsd.org/port-macppc/2022/03/29/msg002954.html
>How-To-Repeat:
Run ATF on sandpoint (or low-memory powerpc/oea) machine.
>Fix:
N/A

>Release-Note:

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-powerpc/56818: oea: system trapped in pgdaemon under high
 memory pressure
Date: Sat, 7 May 2022 13:58:11 +0200

 Can you "show uvmexp", continue, break again after a few seconds and repeat
 a few times?

 Martin

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, port-powerpc-maintainer@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: port-powerpc/56818: oea: system trapped in pgdaemon under high
 memory pressure
Date: Sat, 7 May 2022 23:00:11 +0900

 On 2022/05/07 21:00, Martin Husemann wrote:
 >   Can you "show uvmexp", continue, break again after a few seconds and repeat
 >   a few times?

 Here's 4 successive "show uvmexp" with ~10 sec between continue and break:

 https://gist.github.com/rokuyama/d27a0fd3a442fe1a5c9ccd5eed260ac8

 and this is diff between them:

 https://gist.github.com/rokuyama/9ababdc91b553f76deec22152f31e015

 Thanks,
 rin

From: matthew green <mrg@eterna.com.au>
To: Rin Okuyama <rokuyama.rk@gmail.com>
Cc: gnats-bugs@netbsd.org, port-powerpc-maintainer@netbsd.org,
    gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: re: port-powerpc/56818: oea: system trapped in pgdaemon under high memory pressure
Date: Sun, 08 May 2022 05:35:25 +1000

 Rin Okuyama writes:
 > On 2022/05/07 21:00, Martin Husemann wrote:
 > >   Can you "show uvmexp", continue, break again after a few seconds and=
  repeat
 > >   a few times?
 >
 > Here's 4 successive "show uvmexp" with ~10 sec between continue and brea=
 k:
 >
 > https://gist.github.com/rokuyama/d27a0fd3a442fe1a5c9ccd5eed260ac8
 >
 > and this is diff between them:
 >
 > https://gist.github.com/rokuyama/9ababdc91b553f76deec22152f31e015

 i see that file pages are ~60% of the system memory.

 what are the vm.{file,anon,exec}{min,max} values?  the default for
 file{min,max} are {10,50}, so it should be ejecting at least 10%
 of memory really easily to get under filemax.  if those pages are
 dirty why aren't they being paged out... (system boot, and pools
 are also using about 15% of memory currently..)


 .mrg.

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, port-powerpc-maintainer@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: port-powerpc/56818: oea: system trapped in pgdaemon under high
 memory pressure
Date: Sun, 8 May 2022 20:49:14 +0900

 On 2022/05/08 4:40, matthew green wrote:
 >   i see that file pages are ~60% of the system memory.
 >   
 >   what are the vm.{file,anon,exec}{min,max} values?  the default for
 >   file{min,max} are {10,50}, so it should be ejecting at least 10%
 >   of memory really easily to get under filemax.  if those pages are
 >   dirty why aren't they being paged out... (system boot, and pools
 >   are also using about 15% of memory currently..)

 They are kept default:

 ----
 # sysctl vm
 vm.loadavg: 0.01 0.09 0.06
 vm.maxslp = 20
 vm.uspace = 16384
 vm.minaddress = 0
 vm.maxaddress = -4096
 vm.guard_size = 1048576
 vm.thread_guard_size = 65536
 vm.user_va0_disable = 1
 vm.anonmin = 10
 vm.filemin = 10
 vm.execmin = 5
 vm.anonmax = 80
 vm.filemax = 50
 vm.execmax = 30
 vm.inactivepct = 33
 vm.swap_encrypt = 1
 vm.bufcache = 15
 vm.bufmem = 6926336
 vm.bufmem_lowater = 2516480
 vm.bufmem_hiwater = 2013184
 ----

 And...

 Maybe I've found it; inverted logic introduced in pmap.c rev. 1.108:

 ----
 Index: sys/arch/powerpc/oea/pmap.c
 ===================================================================
 RCS file: /home/netbsd/src/sys/arch/powerpc/oea/pmap.c,v
 retrieving revision 1.113
 diff -p -u -r1.113 pmap.c
 --- sys/arch/powerpc/oea/pmap.c	9 Apr 2022 23:38:32 -0000	1.113
 +++ sys/arch/powerpc/oea/pmap.c	8 May 2022 11:40:13 -0000
 @@ -674,7 +674,7 @@ static inline void
   pmap_pp_attr_clear(struct pmap_page *pp, int ptebit)
   {

 -	pp->pp_attrs &= ptebit;
 +	pp->pp_attrs &= ~ptebit;
   }

   static inline void
 ----

 Referenced/modified bits never get cleared this way...

 I will commit soon if full ATF passes on sandpoint.

 Thanks,
 rin

From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56818 CVS commit: src/sys/arch/powerpc/oea
Date: Mon, 9 May 2022 11:39:44 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Mon May  9 11:39:44 UTC 2022

 Modified Files:
 	src/sys/arch/powerpc/oea: pmap.c

 Log Message:
 PR port-powerpc/56818

 Fix inverted logic introduced in rev. 1.108, by which modified/referenced
 bits of pages were never cleared appropriately.

 Now, full ATF runs on macppc and sandpoint, with no regression observed.


 To generate a diff of this commit:
 cvs rdiff -u -r1.113 -r1.114 src/sys/arch/powerpc/oea/pmap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: rin@NetBSD.org
State-Changed-When: Mon, 09 May 2022 11:46:22 +0000
State-Changed-Why:
Fixed. No release branches affected.


From: Havard Eidnes <he@NetBSD.org>
To: rokuyama.rk@gmail.com
Cc: gnats-bugs@netbsd.org, port-powerpc-maintainer@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: port-powerpc/56818: oea: system trapped in pgdaemon under high
 memory pressure
Date: Thu, 12 May 2022 22:57:25 +0200 (CEST)

 Hi,

 I have reason to beleive that this problem post-dates 9.99.93
 dated January 7 2022 -- I have a G4 Mac Mini running that version
 successfully, but have tried 9.99.96 and 9.99.94 unsuccessfully;
 the 9.99.94 from March 19 2022, apparently.

 (That's the ID strings from the kernels I have lying around as
 having been tested.)

 Regards,

 - H=E5vard

From: Havard Eidnes <he@NetBSD.org>
To: rokuyama.rk@gmail.com
Cc: gnats-bugs@netbsd.org, port-powerpc-maintainer@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: port-powerpc/56818: oea: system trapped in pgdaemon under high
 memory pressure
Date: Thu, 12 May 2022 22:59:14 +0200 (CEST)

 Hm,

 should have read the whole PR through.

 Good catch!

 Regards,

 - H=E5vard

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.