NetBSD Problem Report #55268
From gson@gson.org Fri May 15 15:51:35 2020
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 3564E1A921E
for <gnats-bugs@gnats.NetBSD.org>; Fri, 15 May 2020 15:51:35 +0000 (UTC)
Message-Id: <20200515155130.CB814253F45@guava.gson.org>
Date: Fri, 15 May 2020 18:51:30 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: tmpfs is slow
X-Send-Pr-Version: 3.95
>Number: 55268
>Category: kern
>Synopsis: tmpfs is slow
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri May 15 15:55:00 +0000 2020
>Closed-Date: Mon May 18 06:54:44 +0000 2020
>Last-Modified: Mon May 18 06:54:44 +0000 2020
>Originator: Andreas Gustafsson
>Release: NetBSD-current, source date 2020.05.14.07.59.03
>Organization:
>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:
Doing a "build.sh -j 24 release" on a tmpfs takes more real time than
on an ffs mounted with "-o async", and uses much more system time.
This is building NetBSD-8 on a 24-core Dell PowerEdge R630 running a
-current from 2020.05.14.07.59.03 with DIAGNOSTIC and acpicpu
disabled. The timings are not directly comparable to those I have
posted earlier as the RAM has now been increased to 64 GB and node
interleaving has been enabled in the BIOS.
Building on ffs on a WD Green 120 GB SATA SSD:
2319.70 real 9354.58 user 4449.16 sys
Building on ffs with "-o async" on a Samsung 850 EVO 1 TB SATA SSD:
1417.57 real 9452.17 user 4606.12 sys
Building on tmpfs:
2167.63 real 10102.86 user 14655.71 sys
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55268: tmpfs is slow
Date: Fri, 15 May 2020 18:29:47 +0200
On Fri, May 15, 2020 at 03:55:00PM +0000, Andreas Gustafsson wrote:
> Doing a "build.sh -j 24 release" on a tmpfs takes more real time than
> on an ffs mounted with "-o async", and uses much more system time.
Careful with DIAGNOSTIC as it increases vnode lock contention quite a
bit. Don't have the performance numbers for the bulk build anymore, but
it is quite visible. Otherwise, this greatly suggests some form of lock
contention or another, so lockstat output would be the most useful thing
(e.g. lockstat -b 999999 build.sh ...). This doesn't reflect my
experience at the very least, but lock contention can be somewhat
catastrophic and small changes can have dramatic impact. That said, in
the current form the bug report is not really useful.
Joerg
From: Alexander Nasonov <alnsn@yandex.ru>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, Andreas Gustafsson <gson@gson.org>
Subject: Re: kern/55268: tmpfs is slow
Date: Fri, 15 May 2020 18:14:51 +0100
Joerg Sonnenberger wrote:
> The following reply was made to PR kern/55268; it has been noted by GNATS.
>
> From: Joerg Sonnenberger <joerg@bec.de>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: kern/55268: tmpfs is slow
> Date: Fri, 15 May 2020 18:29:47 +0200
>
> On Fri, May 15, 2020 at 03:55:00PM +0000, Andreas Gustafsson wrote:
> > Doing a "build.sh -j 24 release" on a tmpfs takes more real time than
> > on an ffs mounted with "-o async", and uses much more system time.
>
> Careful with DIAGNOSTIC as it increases vnode lock contention quite a
> bit. Don't have the performance numbers for the bulk build anymore, but
> it is quite visible. Otherwise, this greatly suggests some form of lock
> contention or another, so lockstat output would be the most useful thing
> (e.g. lockstat -b 999999 build.sh ...). This doesn't reflect my
> experience at the very least, but lock contention can be somewhat
> catastrophic and small changes can have dramatic impact. That said, in
> the current form the bug report is not really useful.
Andreas stated that DIAGNOSTIC and acpicpu are DISABLED.
This is very similar to my setup but I run on AMD EPYC. When I was
measuring build performance on tmpfs, I often ran lockstat and I
sent some of those reports to ad@ for analysis. He may have some
ideas what's going on there.
--
Alex
From: Andreas Gustafsson <gson@gson.org>
To: Joerg Sonnenberger <joerg@bec.de>, Alexander Nasonov <alnsn@yandex.ru>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/55268: tmpfs is slow
Date: Fri, 15 May 2020 21:28:04 +0300
Yes, both DIAGNOSTIC and acpicpu are disabled.
I have now also done a test build with a 32 GB mfs:
1472.76 real 9677.81 user 4857.59 sys
So that looks like a viable alternative to tmpfs, but with the
drawback that you need to know how much space to allocate in advance,
and probaly less space-efficient.
I'll do a tmpfs lockstat run next.
--
Andreas Gustafsson, gson@gson.org
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/55268: tmpfs is slow
Date: Fri, 15 May 2020 21:29:48 +0000
http://www.netbsd.org/~ad/2020/tmpfs-before.svg
From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55268 CVS commit: src/sys/fs/tmpfs
Date: Fri, 15 May 2020 22:15:43 +0000
Module Name: src
Committed By: ad
Date: Fri May 15 22:15:43 UTC 2020
Modified Files:
src/sys/fs/tmpfs: tmpfs_vnops.c
Log Message:
PR kern/55268: tmpfs is slow
Enter dotdot into the namecache.
To generate a diff of this commit:
cvs rdiff -u -r1.136 -r1.137 src/sys/fs/tmpfs/tmpfs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55268 CVS commit: src/sys/arch/x86/x86
Date: Fri, 15 May 2020 22:19:01 +0000
Module Name: src
Committed By: ad
Date: Fri May 15 22:19:01 UTC 2020
Modified Files:
src/sys/arch/x86/x86: pmap.c
Log Message:
PR kern/55268: tmpfs is slow
pmap_clear_attrs(): if a brand new page with no mappings just zap pp_attrs.
To generate a diff of this commit:
cvs rdiff -u -r1.389 -r1.390 src/sys/arch/x86/x86/pmap.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55268 CVS commit: src/sys/uvm
Date: Fri, 15 May 2020 22:27:04 +0000
Module Name: src
Committed By: ad
Date: Fri May 15 22:27:04 UTC 2020
Modified Files:
src/sys/uvm: uvm_aobj.c
Log Message:
PR kern/55268: tmpfs is slow
uao_get(): in the PGO_LOCKED case, we're okay to allocate a new page as long
as the caller holds a write lock. PGO_NOBUSY doesn't put a stop to that.
To generate a diff of this commit:
cvs rdiff -u -r1.139 -r1.140 src/sys/uvm/uvm_aobj.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/55268: tmpfs is slow
Date: Fri, 15 May 2020 22:53:29 +0000
It's a little better now but there is still a major problem with lock
contention and context switch storms during fault processing because of a
mix of reader/writer holds on the uobject locks.
http://www.netbsd.org/~ad/2020/tmpfs-sucks.svg
I guess it's something to do with tmpfs pages always being marked
UVM_PAGE_STATUS_UNKNOWN unless committed to swap and that affecting the
fault type.
Andrew
From: Andreas Gustafsson <gson@gson.org>
To: Andrew Doran <ad@netbsd.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/55268: tmpfs is slow
Date: Sat, 16 May 2020 17:23:59 +0300
Andrew Doran wrote:
> It's a little better now
Yes, a little:
2020.05.14.07.59.03 2167.63 real 10102.86 user 14655.71 sys
2020.05.15.22.35.05 2162.21 real 10105.98 user 14524.89 sys
With lockstat overhead:
2020.05.15.22.35.05: 2370.32 real 10290.34 user 16899.28 sys
And the corresponding lockstat output:
http://www.gson.org/netbsd/bugs/55268/2020.05.15.22.35.05.lockstat.txt
--
Andreas Gustafsson, gson@gson.org
From: Andrew Doran <ad@netbsd.org>
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/55268: tmpfs is slow
Date: Sat, 16 May 2020 17:50:15 +0000
On Sat, May 16, 2020 at 05:23:59PM +0300, Andreas Gustafsson wrote:
> Yes, a little:
>
> 2020.05.14.07.59.03 2167.63 real 10102.86 user 14655.71 sys
> 2020.05.15.22.35.05 2162.21 real 10105.98 user 14524.89 sys
That's surprising, I guess there's still something up with tmpfs lookup.
Anyway I see the problem, tmpfs doesn't handle PGO_LOCKED.
> With lockstat overhead:
>
> 2020.05.15.22.35.05: 2370.32 real 10290.34 user 16899.28 sys
>
> And the corresponding lockstat output:
>
> http://www.gson.org/netbsd/bugs/55268/2020.05.15.22.35.05.lockstat.txt
Thanks,
Andrew
From: Andrew Doran <ad@netbsd.org>
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/55268: tmpfs is slow
Date: Sat, 16 May 2020 19:24:44 +0000
With tmpfs/PGO_LOCKED (not committed yet) it's a lot better. Not ideal by
any means but a big difference.
http://www.netbsd.org/~ad/2020/tmpfs-after.svg
Andrew
From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55268 CVS commit: src/sys/fs/tmpfs
Date: Sun, 17 May 2020 19:39:15 +0000
Module Name: src
Committed By: ad
Date: Sun May 17 19:39:15 UTC 2020
Modified Files:
src/sys/fs/tmpfs: tmpfs.h tmpfs_subr.c tmpfs_vnops.c
Log Message:
PR kern/55268: tmpfs is slow
tmpfs_getpages(): handle the PGO_LOCKED case and implement lazy update of
atime/mtime.
To generate a diff of this commit:
cvs rdiff -u -r1.55 -r1.56 src/sys/fs/tmpfs/tmpfs.h
cvs rdiff -u -r1.111 -r1.112 src/sys/fs/tmpfs/tmpfs_subr.c
cvs rdiff -u -r1.138 -r1.139 src/sys/fs/tmpfs/tmpfs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: ad@NetBSD.org
State-Changed-When: Sun, 17 May 2020 20:29:04 +0000
State-Changed-Why:
There's still plenty of room for improvement but this should be fixed now.
From: Andreas Gustafsson <gson@gson.org>
To: ad@NetBSD.org
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/55268 (tmpfs is slow)
Date: Mon, 18 May 2020 09:45:02 +0300
ad@NetBSD.org wrote:
> There's still plenty of room for improvement but this should be fixed now.
Indeed it is - thank you! Here are timings from before and after the
latest commit:
2020.05.15.22.35.05 2162.21 real 10105.98 user 14524.89 sys
2020.05.17.19.57.25 1371.08 real 9407.80 user 4240.01 sys
--
Andreas Gustafsson, gson@gson.org
State-Changed-From-To: feedback->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Mon, 18 May 2020 06:54:44 +0000
State-Changed-Why:
Confirmed fixed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.