NetBSD Problem Report #54727

From root@netbsd.org  Sun Dec  1 06:58:26 2019
Return-Path: <root@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E616B7A187
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  1 Dec 2019 06:58:26 +0000 (UTC)
Message-Id: <20191201065824.2916168B78@franklin.NetBSD.org>
Date: Sun,  1 Dec 2019 06:58:23 +0000 (UTC)
From: spz@NetBSD.org
Reply-To: spz@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: writing a large file causes unreasonable system behaviour
X-Send-Pr-Version: 3.95

>Number:         54727
>Category:       kern
>Synopsis:       writing a large file causes unreasonable system behaviour
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Dec 01 07:00:00 +0000 2019
>Last-Modified:  Wed May 20 21:38:51 +0000 2020
>Originator:     S.P.Zeidler
>Release:        NetBSD 9.0_RC1
>Organization:
	The NetBSD Foundation
>Environment:
System: NetBSD franklin.NetBSD.org 9.0_RC1 NetBSD 9.0_RC1 (FRANKLIN) #1: Sat Nov 30 15:09:30 UTC 2019 spz@franklin.NetBSD.org:/home/netbsd/9/amd64/obj/sys/arch/amd64/compile/FRANKLIN amd64
Architecture: x86_64
Machine: amd64
>Description:
	running "ssh othersystem dobackup > /storage/file" generating
        a large file while otherwise idle, causes the system to hang
        (no response in 5 minutes on serial console, ddb ps shows
        daemons memory starved or missing) or with 
        "sysctl -w kern.maxvnodes=10000", pgdaemon spinning several minutes
        before any new program can start.
        "vmstat 1" shows:
 procs    memory      page                       disks   faults      cpu
 r b      avm    fre  flt  re  pi   po   fr   sr s0 s1   in   sy  cs us sy id
 3 0 47032084  21836    2   7   0    0    0 28464  0  0   85   20  40  0  7 93
 0 3 47032084  21836    0   0   0    0    0    0  0  0  238   18  36  0  7 93
 1 2 47032168  21824    7  31   0    0    0 28966  0  0   84   76  48  0 14 86
 1 1 47032244  21836   37 240   0    0    0 28805  1  0   83  214  49  0  7 93
 0 2 47032244  21836    0   0   0    0    0    0  0  0   84  108  44  0  6 94
 0 2 47032056  21908   22 140   0    0    9 28954  7  0  137   44  64  0 12 88
 0 2 47032236  21896  101 117   0    0   35 28805  1  0  139  118  59  0  7 93
 0 2 47032236  21896    0   0   0    0    0    0  0  0   90   75  41  0  6 94
 0 2 47032300  21840   34 142   0    0    4 28495  2  0  123   60  57  0  7 93
 0 3 47032312  21872    5  65   0    0    0 29033  0  0   91  101  85  0 10 90
 0 3 47032312  21872    0   0   0    0    0    0  0  0   84   18  33  0  6 94
 0 2 47032312  21852  137 127   0    0    4 28621  0  1  139  323  55  0 15 85
 0 2 47032312  21852    0   0   0    0    0    0  0  0   83   75  37  0  6 94
 0 2 47032320  21764   11  22   0    0    0 28478  0  0   83   19  45  0 16 84
 0 2 47032320  21764    0   0   0    0    0    0  0  0   85   96  44  0  6 94
 0 2 47032380  21820   74 113   0    0    0 28837  0  0  119   87  50  0 11 89
 0 2 47032380  21820    0   0   0    0    0    0  0  0   86   18 132  0  6 94
 1 1 47032452  21828    9 127   0    0    0 28480  0  0   85  106  44  0 12 88
 0 3 47032452  21812    0   0   0    0    0    0  0  0   89   92  44  0 19 81
 0 3 47032472  21828    2  26   0    0    0 29009  1  0   88   25  42  0 13 87
 0 3 47030372  21828    0   0   0    0    0    0  0  1   91   18

top has:
load averages:  0.92,  1.73,  1.89;               up 0+15:20:54        06:38:57
291 threads: 65 idle, 210 sleeping, 16 on CPU
CPU states:  0.0% user,  0.0% nice, 11.2% system,  0.1% interrupt, 88.4% idle
Memory: 33G Act, 11G Inact, 16M Wired, 27M Exec, 45G File, 84M Free
Swap: 48G Total, 48G Free

  PID   LID USERNAME PRI STATE      TIME   WCPU    CPU NAME      COMMAND
    0   216 root     126 pgdaem/6 292:24 84.62% 84.62% pgdaemon  [system]
 1335     1 backups   84 select/6 284:58 12.55% 12.55% -         ssh
    0   217 root     124 syncer/1  59:05 11.77% 11.77% ioflush   [system]
  347     1 root      85 select/1  21:54  8.11%  8.11% -         blacklistd
    0   218 root     125 aiodon/3  13:01  8.06%  8.06% aiodoned  [system]
 3473     1 root      43 CPU/4      3:59  0.00%  0.00% -         top
 2761     1 root      85 select/5   3:41  0.00%  0.00% -         top
  363     1 xymon     85 nanosl/1   2:12  0.00%  0.00% -         xymonlaunch

when no program starts have been requested for a while, the system
calms down to:
load averages:  0.51,  0.66,  1.14;               up 0+15:32:16        06:50:19
287 threads: 66 idle, 205 sleeping, 16 on CPU
CPU states:  0.4% user,  0.0% nice,  2.2% system,  1.0% interrupt, 96.2% idle
Memory: 38G Act, 6619M Inact, 16M Wired, 27M Exec, 45G File, 85M Free
Swap: 48G Total, 48G Free

  PID   LID USERNAME PRI STATE      TIME   WCPU    CPU NAME      COMMAND
 1335     1 backups  116 tstile/9 288:23 27.20% 27.20% -         ssh
    0   216 root     126 CPU/6    295:01 20.07% 20.07% pgdaemon  [system]
    0   217 root     124 syncer/1  59:53  3.22%  3.22% ioflush   [system]
  347     1 root      85 select/8  21:58  0.00%  0.00% -         blacklistd
    0   218 root     125 aiodon/0  13:06  0.00%  0.00% aiodoned  [system]
 3473     1 root      43 CPU/13     4:00  0.00%  0.00% -         top
  363     1 xymon     85 nanosl/9   2:12  0.00%  0.00% -         xymonlaunch

>How-To-Repeat:
	dd if=/dev/zero of=file
>Fix:

>Release-Note:

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54727: writing a large file causes unreasonable system
 behaviour
Date: Fri, 6 Dec 2019 03:42:06 +0000

 On Sun, Dec 01, 2019 at 07:00:00AM +0000, spz@NetBSD.org wrote:
  > 	running "ssh othersystem dobackup > /storage/file" generating
  >         a large file while otherwise idle, causes the system to hang
  >         (no response in 5 minutes on serial console, ddb ps shows
  >         daemons memory starved or missing) or with 
  >         "sysctl -w kern.maxvnodes=10000", pgdaemon spinning several minutes
  >         before any new program can start.
  >         "vmstat 1" shows:
  > [...]

 two notes:

 (1) mlelstv mentioned in chat the other day that all file pages were
 active (rather than inactive) and this seems broken;

 (2) while chasing something else related today a number of people
 found that all their vnodes were active (rather than inactive),
 probably for the same reason, and this also seems broken.


 also this problem probably should be filed as a release stopper :-(

 -- 
 David A. Holland
 dholland@netbsd.org

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: David Holland <dholland-bugs@netbsd.org>
Subject: Re: kern/54727: writing a large file causes unreasonable system
 behaviour
Date: Mon, 9 Dec 2019 12:21:19 +0100

 > On 6. Dec 2019, at 04:45, David Holland <dholland-bugs@netbsd.org> wrote:
 > 
 > The following reply was made to PR kern/54727; it has been noted by GNATS.
 > 
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: kern/54727: writing a large file causes unreasonable system
 > behaviour
 > Date: Fri, 6 Dec 2019 03:42:06 +0000
 > 
 > On Sun, Dec 01, 2019 at 07:00:00AM +0000, spz@NetBSD.org wrote:
 >> 	running "ssh othersystem dobackup > /storage/file" generating
 >>        a large file while otherwise idle, causes the system to hang
 >>        (no response in 5 minutes on serial console, ddb ps shows
 >>        daemons memory starved or missing) or with 
 >>        "sysctl -w kern.maxvnodes=10000", pgdaemon spinning several minutes
 >>        before any new program can start.
 >>        "vmstat 1" shows:
 >> [...]
 > 
 > two notes:
 > 
 > (1) mlelstv mentioned in chat the other day that all file pages were
 > active (rather than inactive) and this seems broken;

 This happens for me too.  Looks like:

 - we are low on memory.
 - nearly all pages are active and belong to ONE vnode (the large file
   we are currently creating).
 - pagedaemon ends up in uvmpdpol_balancequeue() to increase the number
   of inactive pages.
 - often the one vnode v_interlock is held by another thread
   so uvmpd_trylockowner(p) fails for nearly all active pages.

 - the pagedaemon starts busy looping until it finds this vnode
   unlocked and everything proceeds.

 > 
 > (2) while chasing something else related today a number of people
 > found that all their vnodes were active (rather than inactive),
 > probably for the same reason, and this also seems broken.

 If this is with "pstat -v" there is a small error in the kernel,
 the usecount must be decremented by one.

 > also this problem probably should be filed as a release stopper :-(

 Not for -9, I had this problem in -8 too.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54727: writing a large file causes unreasonable system
 behaviour
Date: Mon, 9 Dec 2019 13:20:20 +0100

 On Mon, Dec 09, 2019 at 11:25:01AM +0000, J. Hannken-Illjes wrote:
 >  Not for -9, I had this problem in -8 too.

 It is marked as "would like this to be fixed for -9" but we won't hold
 the release for it (on the releng wiki page).

 Martin

From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, spz@NetBSD.org
Subject: Re: kern/54727: writing a large file causes unreasonable system
 behaviour
Date: Mon, 9 Dec 2019 12:51:24 +0000

 On Mon, Dec 09, 2019 at 11:25:01AM +0000, J. Hannken-Illjes wrote:

 >  This happens for me too.  Looks like:
 >  
 >  - we are low on memory.
 >  - nearly all pages are active and belong to ONE vnode (the large file
 >    we are currently creating).
 >  - pagedaemon ends up in uvmpdpol_balancequeue() to increase the number
 >    of inactive pages.
 >  - often the one vnode v_interlock is held by another thread
 >    so uvmpd_trylockowner(p) fails for nearly all active pages.
 >  
 >  - the pagedaemon starts busy looping until it finds this vnode
 >    unlocked and everything proceeds.

 I agree with this assessment.  It looks very likely to me.  In addition to
 v_interlock being busy simply due to legitimate activity, with more than one
 CPU the lock pressure here can be immense because of the pagedaemon itself:

 	pagedaemon holds uvm_pageqlock and is pondering deeply

 		-> tries to acquire v_interlock in relation to
 		   a specific single page (reasonable enough thing
 		   to try), keeps retrying

 	busily writing process holds v_interlock

 		-> is already waiting on uvm_pageqlock for uvm_page*(),
 		   in relation to a totally different page (unreasonable)

 The UVM locking changes I have may help with this, because it may be
 possible for for the pagedaemon to drop the replacement for uvm_pageqlock
 while it is trying to acquire the uobject lock.  In other words, for the
 duration of the reverse locking dance, the pagedaemon's bothersome
 activities would be confined to a single page, which would allow the writing
 process a chance to continue its work with other pages and drop the uobject
 lock.  Those will hopefully be ready for review later today.

 (That's no use for 9.0 though.)

 Andrew

From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, spz@NetBSD.org
Subject: Re: kern/54727: writing a large file causes unreasonable system
 behaviour
Date: Wed, 11 Dec 2019 14:57:48 +0000

 It seems likely to me that these problems Frank encountered with PostgreSQL
 are a symptom of the same underlying issue:

 http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=54210
 http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=54209

 Andrew

From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54727 CVS commit: src
Date: Fri, 13 Dec 2019 20:10:23 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Fri Dec 13 20:10:22 UTC 2019

 Modified Files:
 	src/external/cddl/osnet/dist/uts/common/fs/zfs: zfs_vnops.c
 	src/sys/miscfs/genfs: genfs_io.c
 	src/sys/nfs: nfs_bio.c
 	src/sys/rump/librump/rumpkern: vm.c
 	src/sys/rump/librump/rumpvfs: vm_vfs.c
 	src/sys/ufs/lfs: lfs_pages.c lfs_vfsops.c ulfs_inode.c
 	src/sys/ufs/ufs: ufs_inode.c
 	src/sys/uvm: uvm.h uvm_amap.c uvm_anon.c uvm_aobj.c uvm_bio.c
 	    uvm_fault.c uvm_init.c uvm_km.c uvm_loan.c uvm_map.c uvm_object.c
 	    uvm_page.c uvm_page.h uvm_pager.c uvm_pdaemon.c uvm_pdpolicy.h
 	    uvm_pdpolicy_clock.c uvm_pdpolicy_clockpro.c uvm_pglist.c
 	    uvm_physseg.c

 Log Message:
 Break the global uvm_pageqlock into a per-page identity lock and a private
 lock for use of the pagedaemon policy code.  Discussed on tech-kern.

 PR kern/54209: NetBSD 8 large memory performance extremely low
 PR kern/54210: NetBSD-8 processes presumably not exiting
 PR kern/54727: writing a large file causes unreasonable system behaviour


 To generate a diff of this commit:
 cvs rdiff -u -r1.53 -r1.54 \
     src/external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c
 cvs rdiff -u -r1.76 -r1.77 src/sys/miscfs/genfs/genfs_io.c
 cvs rdiff -u -r1.191 -r1.192 src/sys/nfs/nfs_bio.c
 cvs rdiff -u -r1.173 -r1.174 src/sys/rump/librump/rumpkern/vm.c
 cvs rdiff -u -r1.34 -r1.35 src/sys/rump/librump/rumpvfs/vm_vfs.c
 cvs rdiff -u -r1.15 -r1.16 src/sys/ufs/lfs/lfs_pages.c
 cvs rdiff -u -r1.365 -r1.366 src/sys/ufs/lfs/lfs_vfsops.c
 cvs rdiff -u -r1.21 -r1.22 src/sys/ufs/lfs/ulfs_inode.c
 cvs rdiff -u -r1.105 -r1.106 src/sys/ufs/ufs/ufs_inode.c
 cvs rdiff -u -r1.69 -r1.70 src/sys/uvm/uvm.h
 cvs rdiff -u -r1.110 -r1.111 src/sys/uvm/uvm_amap.c
 cvs rdiff -u -r1.68 -r1.69 src/sys/uvm/uvm_anon.c
 cvs rdiff -u -r1.130 -r1.131 src/sys/uvm/uvm_aobj.c
 cvs rdiff -u -r1.100 -r1.101 src/sys/uvm/uvm_bio.c
 cvs rdiff -u -r1.211 -r1.212 src/sys/uvm/uvm_fault.c
 cvs rdiff -u -r1.50 -r1.51 src/sys/uvm/uvm_init.c
 cvs rdiff -u -r1.150 -r1.151 src/sys/uvm/uvm_km.c
 cvs rdiff -u -r1.88 -r1.89 src/sys/uvm/uvm_loan.c
 cvs rdiff -u -r1.366 -r1.367 src/sys/uvm/uvm_map.c
 cvs rdiff -u -r1.15 -r1.16 src/sys/uvm/uvm_object.c
 cvs rdiff -u -r1.200 -r1.201 src/sys/uvm/uvm_page.c
 cvs rdiff -u -r1.84 -r1.85 src/sys/uvm/uvm_page.h
 cvs rdiff -u -r1.113 -r1.114 src/sys/uvm/uvm_pager.c
 cvs rdiff -u -r1.112 -r1.113 src/sys/uvm/uvm_pdaemon.c
 cvs rdiff -u -r1.3 -r1.4 src/sys/uvm/uvm_pdpolicy.h
 cvs rdiff -u -r1.17 -r1.18 src/sys/uvm/uvm_pdpolicy_clock.c \
     src/sys/uvm/uvm_pdpolicy_clockpro.c
 cvs rdiff -u -r1.72 -r1.73 src/sys/uvm/uvm_pglist.c
 cvs rdiff -u -r1.10 -r1.11 src/sys/uvm/uvm_physseg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54727 CVS commit: src/sys/arch/amd64/amd64
Date: Fri, 13 Dec 2019 20:14:25 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Fri Dec 13 20:14:25 UTC 2019

 Modified Files:
 	src/sys/arch/amd64/amd64: machdep.c

 Log Message:
 Break the global uvm_pageqlock into a per-page identity lock and a private
 lock for use of the pagedaemon policy code.  Discussed on tech-kern.

 PR kern/54209: NetBSD 8 large memory performance extremely low
 PR kern/54210: NetBSD-8 processes presumably not exiting
 PR kern/54727: writing a large file causes unreasonable system behaviour


 To generate a diff of this commit:
 cvs rdiff -u -r1.343 -r1.344 src/sys/arch/amd64/amd64/machdep.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54727 CVS commit: src/sys/uvm
Date: Mon, 30 Dec 2019 18:08:38 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Mon Dec 30 18:08:38 UTC 2019

 Modified Files:
 	src/sys/uvm: uvm_pdaemon.c uvm_pdaemon.h uvm_pdpolicy.h
 	    uvm_pdpolicy_clock.c uvm_pdpolicy_clockpro.c

 Log Message:
 pagedaemon:

 - Use marker pages to keep place in the queue when scanning, rather than
   relying on assumptions.

 - In uvmpdpol_balancequeue(), lock the object once instead of twice.

 - When draining pools, the situation is getting desperate, but try to avoid
   saturating the system with xcall, lock and interrupt activity by sleeping
   for 1 clock tick if being continually awoken and all pools have been
   cycled through at least once.

 - Pause & resume the freelist cache during pool draining.

 PR kern/54209: NetBSD 8 large memory performance extremely low
 PR kern/54210: NetBSD-8 processes presumably not exiting
 PR kern/54727: writing a large file causes unreasonable system behaviour


 To generate a diff of this commit:
 cvs rdiff -u -r1.118 -r1.119 src/sys/uvm/uvm_pdaemon.c
 cvs rdiff -u -r1.17 -r1.18 src/sys/uvm/uvm_pdaemon.h
 cvs rdiff -u -r1.4 -r1.5 src/sys/uvm/uvm_pdpolicy.h
 cvs rdiff -u -r1.23 -r1.24 src/sys/uvm/uvm_pdpolicy_clock.c
 cvs rdiff -u -r1.19 -r1.20 src/sys/uvm/uvm_pdpolicy_clockpro.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: kern-bug-people->ad
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Wed, 26 Feb 2020 21:56:53 +0000
Responsible-Changed-Why:
solved in -current


From: coypu@sdf.org
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54727: writing a large file causes unreasonable system
 behaviour
Date: Tue, 21 Apr 2020 17:13:58 +0000

 > Responsible-Changed-From-To: kern-bug-people->ad
 > Responsible-Changed-By: ad@NetBSD.org
 > Responsible-Changed-When: Wed, 26 Feb 2020 21:56:53 +0000
 > Responsible-Changed-Why:
 > solved in -current

 Any reason not to close it? thanks!

Responsible-Changed-From-To: ad->kern-bug-people
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Wed, 20 May 2020 21:38:51 +0000
Responsible-Changed-Why:
This is most definitely fixed in -current.
It has taken a ton of work to fix it.
Being realistic we're not going to be able to backport that to 9.0.
If some enterprising sort wants to deal with it in 9.0 then good on them.
But for now over to kern-bug-people.
Up to originator whether or not to close the PR.


>Unformatted:
 	20191130

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.