NetBSD Problem Report #55268

From gson@gson.org  Fri May 15 15:51:35 2020
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 3564E1A921E
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 15 May 2020 15:51:35 +0000 (UTC)
Message-Id: <20200515155130.CB814253F45@guava.gson.org>
Date: Fri, 15 May 2020 18:51:30 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: tmpfs is slow
X-Send-Pr-Version: 3.95

>Number:         55268
>Category:       kern
>Synopsis:       tmpfs is slow
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 15 15:55:00 +0000 2020
>Closed-Date:    Mon May 18 06:54:44 +0000 2020
>Last-Modified:  Mon May 18 06:54:44 +0000 2020
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date 2020.05.14.07.59.03
>Organization:

>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:

Doing a "build.sh -j 24 release" on a tmpfs takes more real time than
on an ffs mounted with "-o async", and uses much more system time.

This is building NetBSD-8 on a 24-core Dell PowerEdge R630 running a
-current from 2020.05.14.07.59.03 with DIAGNOSTIC and acpicpu
disabled.  The timings are not directly comparable to those I have
posted earlier as the RAM has now been increased to 64 GB and node
interleaving has been enabled in the BIOS.

Building on ffs on a WD Green 120 GB SATA SSD:
2319.70 real      9354.58 user      4449.16 sys

Building on ffs with "-o async" on a Samsung 850 EVO 1 TB SATA SSD:
1417.57 real      9452.17 user      4606.12 sys

Building on tmpfs:
2167.63 real     10102.86 user     14655.71 sys

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:
From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55268: tmpfs is slow
Date: Fri, 15 May 2020 18:29:47 +0200

 On Fri, May 15, 2020 at 03:55:00PM +0000, Andreas Gustafsson wrote:
 > Doing a "build.sh -j 24 release" on a tmpfs takes more real time than
 > on an ffs mounted with "-o async", and uses much more system time.

 Careful with DIAGNOSTIC as it increases vnode lock contention quite a
 bit. Don't have the performance numbers for the bulk build anymore, but
 it is quite visible. Otherwise, this greatly suggests some form of lock
 contention or another, so lockstat output would be the most useful thing
 (e.g. lockstat -b 999999 build.sh ...). This doesn't reflect my
 experience at the very least, but lock contention can be somewhat
 catastrophic and small changes can have dramatic impact. That said, in
 the current form the bug report is not really useful.

 Joerg

From: Alexander Nasonov <alnsn@yandex.ru>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, Andreas Gustafsson <gson@gson.org>
Subject: Re: kern/55268: tmpfs is slow
Date: Fri, 15 May 2020 18:14:51 +0100

 Joerg Sonnenberger wrote:
 > The following reply was made to PR kern/55268; it has been noted by GNATS.
 > 
 > From: Joerg Sonnenberger <joerg@bec.de>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: kern/55268: tmpfs is slow
 > Date: Fri, 15 May 2020 18:29:47 +0200
 > 
 >  On Fri, May 15, 2020 at 03:55:00PM +0000, Andreas Gustafsson wrote:
 >  > Doing a "build.sh -j 24 release" on a tmpfs takes more real time than
 >  > on an ffs mounted with "-o async", and uses much more system time.
 >  
 >  Careful with DIAGNOSTIC as it increases vnode lock contention quite a
 >  bit. Don't have the performance numbers for the bulk build anymore, but
 >  it is quite visible. Otherwise, this greatly suggests some form of lock
 >  contention or another, so lockstat output would be the most useful thing
 >  (e.g. lockstat -b 999999 build.sh ...). This doesn't reflect my
 >  experience at the very least, but lock contention can be somewhat
 >  catastrophic and small changes can have dramatic impact. That said, in
 >  the current form the bug report is not really useful.

 Andreas stated that DIAGNOSTIC and acpicpu are DISABLED.

 This is very similar to my setup but I run on AMD EPYC. When I was
 measuring build performance on tmpfs, I often ran lockstat and I
 sent some of those reports to ad@ for analysis. He may have some
 ideas what's going on there.

 -- 
 Alex

From: Andreas Gustafsson <gson@gson.org>
To: Joerg Sonnenberger <joerg@bec.de>, Alexander Nasonov <alnsn@yandex.ru>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/55268: tmpfs is slow
Date: Fri, 15 May 2020 21:28:04 +0300

 Yes, both DIAGNOSTIC and acpicpu are disabled.

 I have now also done a test build with a 32 GB mfs:

 1472.76 real      9677.81 user      4857.59 sys

 So that looks like a viable alternative to tmpfs, but with the
 drawback that you need to know how much space to allocate in advance,
 and probaly less space-efficient.

 I'll do a tmpfs lockstat run next.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/55268: tmpfs is slow
Date: Fri, 15 May 2020 21:29:48 +0000

 http://www.netbsd.org/~ad/2020/tmpfs-before.svg

From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55268 CVS commit: src/sys/fs/tmpfs
Date: Fri, 15 May 2020 22:15:43 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Fri May 15 22:15:43 UTC 2020

 Modified Files:
 	src/sys/fs/tmpfs: tmpfs_vnops.c

 Log Message:
 PR kern/55268: tmpfs is slow

 Enter dotdot into the namecache.


 To generate a diff of this commit:
 cvs rdiff -u -r1.136 -r1.137 src/sys/fs/tmpfs/tmpfs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55268 CVS commit: src/sys/arch/x86/x86
Date: Fri, 15 May 2020 22:19:01 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Fri May 15 22:19:01 UTC 2020

 Modified Files:
 	src/sys/arch/x86/x86: pmap.c

 Log Message:
 PR kern/55268: tmpfs is slow

 pmap_clear_attrs(): if a brand new page with no mappings just zap pp_attrs.


 To generate a diff of this commit:
 cvs rdiff -u -r1.389 -r1.390 src/sys/arch/x86/x86/pmap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55268 CVS commit: src/sys/uvm
Date: Fri, 15 May 2020 22:27:04 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Fri May 15 22:27:04 UTC 2020

 Modified Files:
 	src/sys/uvm: uvm_aobj.c

 Log Message:
 PR kern/55268: tmpfs is slow

 uao_get(): in the PGO_LOCKED case, we're okay to allocate a new page as long
 as the caller holds a write lock.  PGO_NOBUSY doesn't put a stop to that.


 To generate a diff of this commit:
 cvs rdiff -u -r1.139 -r1.140 src/sys/uvm/uvm_aobj.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/55268: tmpfs is slow
Date: Fri, 15 May 2020 22:53:29 +0000

 It's a little better now but there is still a major problem with lock
 contention and context switch storms during fault processing because of a
 mix of reader/writer holds on the uobject locks.

 http://www.netbsd.org/~ad/2020/tmpfs-sucks.svg

 I guess it's something to do with tmpfs pages always being marked
 UVM_PAGE_STATUS_UNKNOWN unless committed to swap and that affecting the
 fault type.

 Andrew

From: Andreas Gustafsson <gson@gson.org>
To: Andrew Doran <ad@netbsd.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/55268: tmpfs is slow
Date: Sat, 16 May 2020 17:23:59 +0300

 Andrew Doran wrote:
 >  It's a little better now

 Yes, a little:

   2020.05.14.07.59.03      2167.63 real     10102.86 user     14655.71 sys
   2020.05.15.22.35.05      2162.21 real     10105.98 user     14524.89 sys

 With lockstat overhead:

   2020.05.15.22.35.05:     2370.32 real     10290.34 user     16899.28 sys

 And the corresponding lockstat output:

   http://www.gson.org/netbsd/bugs/55268/2020.05.15.22.35.05.lockstat.txt

 -- 
 Andreas Gustafsson, gson@gson.org

From: Andrew Doran <ad@netbsd.org>
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/55268: tmpfs is slow
Date: Sat, 16 May 2020 17:50:15 +0000

 On Sat, May 16, 2020 at 05:23:59PM +0300, Andreas Gustafsson wrote:

 > Yes, a little:
 > 
 >   2020.05.14.07.59.03      2167.63 real     10102.86 user     14655.71 sys
 >   2020.05.15.22.35.05      2162.21 real     10105.98 user     14524.89 sys

 That's surprising, I guess there's still something up with tmpfs lookup.

 Anyway I see the problem, tmpfs doesn't handle PGO_LOCKED.

 > With lockstat overhead:
 > 
 >   2020.05.15.22.35.05:     2370.32 real     10290.34 user     16899.28 sys
 > 
 > And the corresponding lockstat output:
 > 
 >   http://www.gson.org/netbsd/bugs/55268/2020.05.15.22.35.05.lockstat.txt

 Thanks,
 Andrew

From: Andrew Doran <ad@netbsd.org>
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/55268: tmpfs is slow
Date: Sat, 16 May 2020 19:24:44 +0000

 With tmpfs/PGO_LOCKED (not committed yet) it's a lot better.  Not ideal by
 any means but a big difference.

 http://www.netbsd.org/~ad/2020/tmpfs-after.svg

 Andrew

From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55268 CVS commit: src/sys/fs/tmpfs
Date: Sun, 17 May 2020 19:39:15 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Sun May 17 19:39:15 UTC 2020

 Modified Files:
 	src/sys/fs/tmpfs: tmpfs.h tmpfs_subr.c tmpfs_vnops.c

 Log Message:
 PR kern/55268: tmpfs is slow

 tmpfs_getpages(): handle the PGO_LOCKED case and implement lazy update of
 atime/mtime.


 To generate a diff of this commit:
 cvs rdiff -u -r1.55 -r1.56 src/sys/fs/tmpfs/tmpfs.h
 cvs rdiff -u -r1.111 -r1.112 src/sys/fs/tmpfs/tmpfs_subr.c
 cvs rdiff -u -r1.138 -r1.139 src/sys/fs/tmpfs/tmpfs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: ad@NetBSD.org
State-Changed-When: Sun, 17 May 2020 20:29:04 +0000
State-Changed-Why:
There's still plenty of room for improvement but this should be fixed now.


From: Andreas Gustafsson <gson@gson.org>
To: ad@NetBSD.org
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/55268 (tmpfs is slow)
Date: Mon, 18 May 2020 09:45:02 +0300

 ad@NetBSD.org wrote:
 > There's still plenty of room for improvement but this should be fixed now.

 Indeed it is - thank you!  Here are timings from before and after the
 latest commit:

    2020.05.15.22.35.05      2162.21 real     10105.98 user     14524.89 sys
    2020.05.17.19.57.25      1371.08 real      9407.80 user      4240.01 sys

 -- 
 Andreas Gustafsson, gson@gson.org

State-Changed-From-To: feedback->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Mon, 18 May 2020 06:54:44 +0000
State-Changed-Why:
Confirmed fixed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.