NetBSD Problem Report #54209

From Frank.Kardel@Acrys.com  Thu May 16 14:39:27 2019
Return-Path: <Frank.Kardel@Acrys.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 087B97A149
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 16 May 2019 14:39:27 +0000 (UTC)
Message-Id: <20190516141447.2209569E1F3@sf2.hw.abs.acrys.com>
Date: Thu, 16 May 2019 14:14:47 +0000 (UTC)
From: kardel@netbsd.org
Reply-To: kardel@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: NetBSD 8 large memory system performance extremely low
X-Send-Pr-Version: 3.95

>Number:         54209
>Category:       kern
>Synopsis:       NetBSD 8 large memory performance extremely low
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    ad
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu May 16 14:40:00 +0000 2019
>Closed-Date:    Sat Apr 11 13:32:22 +0000 2020
>Last-Modified:  Sat Apr 11 13:45:01 +0000 2020
>Originator:     Frank Kardel
>Release:        NetBSD 8.0_STABLE
>Organization:

>Environment:


System: NetBSD sf2 8.0_STABLE NetBSD 8.0_STABLE (GENERIC) #4: Wed Apr 24 15:54:41 CEST 2019 kardel@pip:/src/NetBSD/n8/src/obj.amd64/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
	This is an observation recited from memory to document as a PR for further investigation.
	System is a 16 CPU 382 GB server with 1 NVME disk and 6 SSDs.
	Using the default vm configuration the system gets very unresponsive and performs
	almost no IO and user cpu with following workload:
		5 larger (~10GB) java processes
			These process where performing computational and database work (postgresql 10 - DB is shared between java processes).
		2 rsyncs perfoming local disk/disk copies
	Very slow response time for shell input.
	We observed 100% system time, almost no I/O (for DB or the rsyncs), also free memory was stuck for long times at 380Mb.
	Inactive memory was at 7Gb instead of the expected 2/3 main memory.
	Lots of memory starvation was happening.
	The memory starvation effects on a larger memory system seem to be a bit counterintuative.
>How-To-Repeat:
	see description
>Fix:
	Performance improved radically giving responsiveness, I/O performance and throughput at lower system time
	when setting following vm tunables in order to force more aggressive memory reclaimation.

	    vm.anonmin=10
	    vm.filemin=10
	    vm.execmin=5
	    vm.anonmax=30
	    vm.filemax=20
	    vm.execmax=30
	    vm.inactivepct=40

>Release-Note:

>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54209: NetBSD 8 large memory system performance extremely low
Date: Wed, 5 Jun 2019 07:47:54 -0000 (UTC)

 kardel@netbsd.org writes:

 >	We observed 100% system time, almost no I/O (for DB or the rsyncs), also free memory was stuck for long times at 380Mb.
 >	Inactive memory was at 7Gb instead of the expected 2/3 main memory.
 >	Lots of memory starvation was happening.
 >	The memory starvation effects on a larger memory system seem to be a bit counterintuative.

 That's how the page daemon policy works.

 Active pages have been accessed recently, inactive pages haven't.

 When active pages of type (exec,file,anon) exceed their max value, then
 pages of the other types get reactivated, so there is some pressure
 to reclaim pages of only the type that is overused. This stops when
 all types get overused.

 That method doesn't work good if you have mostly active pages of the
 overused type as you wait for one of them to become inactive.

 There are event counters (vmstat -e) to show this page reactivation.
 E.g.:

 pdpolicy reactanon                            77094        0 misc
 pdpolicy reactexec                            50778        0 misc

 When the file cache exceeded the maximum (that's pretty common),
 the anon and exec types got frequently reactivated. This effectively
 prevents programs from being paged out by the file cache.

 Lowering all 'max' values probably makes all types 'overused', then
 we don't apply this kind of pressure (as all types would need it)
 and page allocation gets fast again.


 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54209 CVS commit: src
Date: Fri, 13 Dec 2019 20:10:23 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Fri Dec 13 20:10:22 UTC 2019

 Modified Files:
 	src/external/cddl/osnet/dist/uts/common/fs/zfs: zfs_vnops.c
 	src/sys/miscfs/genfs: genfs_io.c
 	src/sys/nfs: nfs_bio.c
 	src/sys/rump/librump/rumpkern: vm.c
 	src/sys/rump/librump/rumpvfs: vm_vfs.c
 	src/sys/ufs/lfs: lfs_pages.c lfs_vfsops.c ulfs_inode.c
 	src/sys/ufs/ufs: ufs_inode.c
 	src/sys/uvm: uvm.h uvm_amap.c uvm_anon.c uvm_aobj.c uvm_bio.c
 	    uvm_fault.c uvm_init.c uvm_km.c uvm_loan.c uvm_map.c uvm_object.c
 	    uvm_page.c uvm_page.h uvm_pager.c uvm_pdaemon.c uvm_pdpolicy.h
 	    uvm_pdpolicy_clock.c uvm_pdpolicy_clockpro.c uvm_pglist.c
 	    uvm_physseg.c

 Log Message:
 Break the global uvm_pageqlock into a per-page identity lock and a private
 lock for use of the pagedaemon policy code.  Discussed on tech-kern.

 PR kern/54209: NetBSD 8 large memory performance extremely low
 PR kern/54210: NetBSD-8 processes presumably not exiting
 PR kern/54727: writing a large file causes unreasonable system behaviour


 To generate a diff of this commit:
 cvs rdiff -u -r1.53 -r1.54 \
     src/external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c
 cvs rdiff -u -r1.76 -r1.77 src/sys/miscfs/genfs/genfs_io.c
 cvs rdiff -u -r1.191 -r1.192 src/sys/nfs/nfs_bio.c
 cvs rdiff -u -r1.173 -r1.174 src/sys/rump/librump/rumpkern/vm.c
 cvs rdiff -u -r1.34 -r1.35 src/sys/rump/librump/rumpvfs/vm_vfs.c
 cvs rdiff -u -r1.15 -r1.16 src/sys/ufs/lfs/lfs_pages.c
 cvs rdiff -u -r1.365 -r1.366 src/sys/ufs/lfs/lfs_vfsops.c
 cvs rdiff -u -r1.21 -r1.22 src/sys/ufs/lfs/ulfs_inode.c
 cvs rdiff -u -r1.105 -r1.106 src/sys/ufs/ufs/ufs_inode.c
 cvs rdiff -u -r1.69 -r1.70 src/sys/uvm/uvm.h
 cvs rdiff -u -r1.110 -r1.111 src/sys/uvm/uvm_amap.c
 cvs rdiff -u -r1.68 -r1.69 src/sys/uvm/uvm_anon.c
 cvs rdiff -u -r1.130 -r1.131 src/sys/uvm/uvm_aobj.c
 cvs rdiff -u -r1.100 -r1.101 src/sys/uvm/uvm_bio.c
 cvs rdiff -u -r1.211 -r1.212 src/sys/uvm/uvm_fault.c
 cvs rdiff -u -r1.50 -r1.51 src/sys/uvm/uvm_init.c
 cvs rdiff -u -r1.150 -r1.151 src/sys/uvm/uvm_km.c
 cvs rdiff -u -r1.88 -r1.89 src/sys/uvm/uvm_loan.c
 cvs rdiff -u -r1.366 -r1.367 src/sys/uvm/uvm_map.c
 cvs rdiff -u -r1.15 -r1.16 src/sys/uvm/uvm_object.c
 cvs rdiff -u -r1.200 -r1.201 src/sys/uvm/uvm_page.c
 cvs rdiff -u -r1.84 -r1.85 src/sys/uvm/uvm_page.h
 cvs rdiff -u -r1.113 -r1.114 src/sys/uvm/uvm_pager.c
 cvs rdiff -u -r1.112 -r1.113 src/sys/uvm/uvm_pdaemon.c
 cvs rdiff -u -r1.3 -r1.4 src/sys/uvm/uvm_pdpolicy.h
 cvs rdiff -u -r1.17 -r1.18 src/sys/uvm/uvm_pdpolicy_clock.c \
     src/sys/uvm/uvm_pdpolicy_clockpro.c
 cvs rdiff -u -r1.72 -r1.73 src/sys/uvm/uvm_pglist.c
 cvs rdiff -u -r1.10 -r1.11 src/sys/uvm/uvm_physseg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54209 CVS commit: src/sys/arch/amd64/amd64
Date: Fri, 13 Dec 2019 20:14:25 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Fri Dec 13 20:14:25 UTC 2019

 Modified Files:
 	src/sys/arch/amd64/amd64: machdep.c

 Log Message:
 Break the global uvm_pageqlock into a per-page identity lock and a private
 lock for use of the pagedaemon policy code.  Discussed on tech-kern.

 PR kern/54209: NetBSD 8 large memory performance extremely low
 PR kern/54210: NetBSD-8 processes presumably not exiting
 PR kern/54727: writing a large file causes unreasonable system behaviour


 To generate a diff of this commit:
 cvs rdiff -u -r1.343 -r1.344 src/sys/arch/amd64/amd64/machdep.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54209 CVS commit: src/sys/uvm
Date: Mon, 30 Dec 2019 18:08:38 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Mon Dec 30 18:08:38 UTC 2019

 Modified Files:
 	src/sys/uvm: uvm_pdaemon.c uvm_pdaemon.h uvm_pdpolicy.h
 	    uvm_pdpolicy_clock.c uvm_pdpolicy_clockpro.c

 Log Message:
 pagedaemon:

 - Use marker pages to keep place in the queue when scanning, rather than
   relying on assumptions.

 - In uvmpdpol_balancequeue(), lock the object once instead of twice.

 - When draining pools, the situation is getting desperate, but try to avoid
   saturating the system with xcall, lock and interrupt activity by sleeping
   for 1 clock tick if being continually awoken and all pools have been
   cycled through at least once.

 - Pause & resume the freelist cache during pool draining.

 PR kern/54209: NetBSD 8 large memory performance extremely low
 PR kern/54210: NetBSD-8 processes presumably not exiting
 PR kern/54727: writing a large file causes unreasonable system behaviour


 To generate a diff of this commit:
 cvs rdiff -u -r1.118 -r1.119 src/sys/uvm/uvm_pdaemon.c
 cvs rdiff -u -r1.17 -r1.18 src/sys/uvm/uvm_pdaemon.h
 cvs rdiff -u -r1.4 -r1.5 src/sys/uvm/uvm_pdpolicy.h
 cvs rdiff -u -r1.23 -r1.24 src/sys/uvm/uvm_pdpolicy_clock.c
 cvs rdiff -u -r1.19 -r1.20 src/sys/uvm/uvm_pdpolicy_clockpro.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: kern-bug-people->ad
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Wed, 26 Feb 2020 21:55:16 +0000
Responsible-Changed-Why:
solved in -current


State-Changed-From-To: open->closed
State-Changed-By: maya@NetBSD.org
State-Changed-When: Sat, 11 Apr 2020 13:32:22 +0000
State-Changed-Why:
Feel free to undo it,  but I believe "fixed in current" is good enough. These changes introduced a lot of instability so it's best not to backport them to netbsd-8 and have them sit in netbsd-current until a new release is made.


From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54209 (NetBSD 8 large memory performance extremely low)
Date: Sat, 11 Apr 2020 15:40:34 +0200

 It's fine that it made it in current - backport to -9 and -8 would be 
 very much work.


 On 04/11/20 15:32, maya@NetBSD.org wrote:
 > Synopsis: NetBSD 8 large memory performance extremely low
 >
 > State-Changed-From-To: open->closed
 > State-Changed-By: maya@NetBSD.org
 > State-Changed-When: Sat, 11 Apr 2020 13:32:22 +0000
 > State-Changed-Why:
 > Feel free to undo it,  but I believe "fixed in current" is good enough. These changes introduced a lot of instability so it's best not to backport them to netbsd-8 and have them sit in netbsd-current until a new release is made.
 >
 >
 >

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.