NetBSD Problem Report #52184
From gson@gson.org Fri Apr 21 09:36:52 2017
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id B90617A1F5
for <gnats-bugs@gnats.NetBSD.org>; Fri, 21 Apr 2017 09:36:52 +0000 (UTC)
Message-Id: <20170421093646.92D3F7446DB@guava.gson.org>
Date: Fri, 21 Apr 2017 12:36:46 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Recent qemu performs badly on NetBSD hosts under load
X-Send-Pr-Version: 3.95
>Number: 52184
>Category: misc
>Synopsis: Recent qemu performs badly on NetBSD hosts under load
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: misc-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Apr 21 09:40:00 +0000 2017
>Closed-Date: Mon Apr 16 14:45:17 +0000 2018
>Last-Modified: Mon Apr 16 14:45:17 +0000 2018
>Originator: Andreas Gustafsson
>Release: NetBSD 7.1
>Organization:
>Environment:
System: NetBSD 7.1
Architecture: x86_64
Machine: amd64
>Description:
The NetBSD testbed babylon5.netbsd.org runs automated full-system
tests using qemu. After the qemu on babylon5 was upgraded from
version 0.15.1 to 2.8.0 in early March, the tests have been taking
much longer to run than they did with 0.15.1, at least twice as long
on average.
The exact reason for the slowdown is still unknown, and could be a
qemu issue, a NetBSD issue, or a combination of the two; hence the
"misc" category of this PR.
I have run a number of tests to try to track down the circumstances in
which the problem occurs. As far as I can tell, they are as follows:
1. The version of qemu must include commit
05e514b1d4d5bd4209e2c8bbc76ff05c85a235f3
(found by bisection).
2. The host system that qemu is executed on must be NetBSD
rather than Linux.
3. Some minor slowdown may occur under just the two above
conditions, but for the slowdown to be dramatic, the host system
must be under some CPU load. On babylon5, this load typically
consists of a parallel build of NetBSD (started with "nice 10")
and other qemu processes. In my own tests, I have simulated it
by running multiple infinite loop processes in the background
(again with "nice 10").
A shell script to automate reproducing the problem in available at
http://www.gson.org/netbsd/bugs/qemu-slow/test.sh
This script builds two versions of qemu from git, from before and
after the above commit, downloads a NetBSD 6.0.1 image pre-configured
to run the ATF tests, starts infinite loop processes in the background
to generate CPU load, and runs the test image using each qemu version
in turn. It needs about 2.5 GB of disk and several hours to run, and
due to the high CPU load it generates, I would not recommend running
it on a production system.
When run under NetBSD 7.1/amd64 on a 12-core HP DL360 G7 server, the
script reported the following execution times for qemu git revision
21a03d17f2edb1e63f7137d97ba355cc6f19d79f:
real 3871.12
user 2934.44
sys 17.25
And for qemu git revision 05e514b1d4d5bd4209e2c8bbc76ff05c85a235f3:
real 16813.29
user 17774.38
sys 5884.99
Note that the difference is especially dramatic for the system time,
which increased by a factor of more than 300. When the script was run
on a Linux system, there was no significant difference in execution
time between the two qemu versions.
The commit message for qemu commit 05e514b1d4d5bd4209e2c8bbc76ff05c85a235f3
reads:
AioContext: optimize clearing the EventNotifier
It is pretty rare for aio_notify to actually set the EventNotifier. It
can happen with worker threads such as thread-pool.c's, but otherwise it
should never be set thanks to the ctx->notify_me optimization. The
previous patch, unfortunately, added an unconditional call to
event_notifier_test_and_clear; now add a userspace fast path that
avoids the call.
Note that it is not possible to do the same with event_notifier_set;
it would break, as proved (again) by the included formal model.
This patch survived over 3000 reboots on aarch64 KVM.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Message-id: 1437487673-23740-7-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reading the diff of the commit in case did not make it immediately
clear to me why this should cause a performance regression on NetBSD.
Any help in explaining this is appreciated.
>How-To-Repeat:
Run the test script.
>Fix:
>Release-Note:
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: misc/52184: Recent qemu performs badly on NetBSD hosts under load
Date: Fri, 21 Apr 2017 12:00:20 +0200
On Fri, Apr 21, 2017 at 09:40:00AM +0000, Andreas Gustafsson wrote:
> Reading the diff of the commit in case did not make it immediately
> clear to me why this should cause a performance regression on NetBSD.
> Any help in explaining this is appreciated.
We should probably investigate what atomic_xchg() actually expands to:
+void aio_notify_accept(AioContext *ctx)
+{
+ if (atomic_xchg(&ctx->notified, false)) {
+ event_notifier_test_and_clear(&ctx->notifier);
}
ctx->notified is a boolean, so I guess __atomic_exchange_1().
For testing, it might be interesting to align it on a cache line boundary
or make it a 32bit value.
It may also be a compiler issue, can you test a qemu compiled with a newer
gcc, or even repeat the test in NetBSD -current?
Martin
From: coypu@sdf.org
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: misc/52184: Recent qemu performs badly on NetBSD hosts under load
Date: Fri, 21 Apr 2017 10:27:10 +0000
These came up while searching for the commit ID
https://patchwork.kernel.org/patch/9497653/
https://bugs.launchpad.net/qemu/+bug/1481272
From: Andreas Gustafsson <gson@gson.org>
To: copyu@sdf.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: misc/52184: Recent qemu performs badly on NetBSD hosts under load
Date: Fri, 21 Apr 2017 15:17:29 +0300
coypu@sdf.org wrote:
> These came up while searching for the commit ID
>
> https://patchwork.kernel.org/patch/9497653/
> https://bugs.launchpad.net/qemu/+bug/1481272
For what it's worth, the message "main-loop: WARNING: I/O thread spun
for 1000 iterations" reported in the latter link also appears in the
output of the NetBSD test runs where qemu includes commit
05e514b1d4d5bd4209e2c8bbc76ff05c85a235f3, whether hosted on NetBSD
on Linux.
--
Andreas Gustafsson, gson@gson.org
From: Andreas Gustafsson <gson@gson.org>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@NetBSD.org
Subject: Re: misc/52184: Recent qemu performs badly on NetBSD hosts under load
Date: Mon, 24 Apr 2017 15:17:39 +0300
Martin Husemann wrote:
> We should probably investigate what atomic_xchg() actually expands to:
Just a plain xchg of an 8-bit register and memory:
$ gdb qemu-system-x86_64.05e514b1d4d5bd4209e2c8bbc76ff05c85a235f3
(gdb) x/10i aio_notify_accept
0x61afb8 <aio_notify_accept>: sub $0x18,%rsp
0x61afbc <aio_notify_accept+4>: mov 0x4eaddd(%rip),%rax # 0xb05da0 <__stack_chk_guard>
0x61afc3 <aio_notify_accept+11>: mov %rax,0x8(%rsp)
0x61afc8 <aio_notify_accept+16>: xor %eax,%eax
0x61afca <aio_notify_accept+18>: xchg %al,0x12c(%rdi)
0x61afd0 <aio_notify_accept+24>: test %al,%al
0x61afd2 <aio_notify_accept+26>: jne 0x61afe7 <aio_notify_accept+47>
> For testing, it might be interesting to align it on a cache line boundary
> or make it a 32bit value.
>
> It may also be a compiler issue, can you test a qemu compiled with a newer
> gcc, or even repeat the test in NetBSD -current?
I repeated the test in NetBSD-current (source date
2017.04.17.09.54.59) using qemu prerequisite packages built from
source using a recent pkgsrc. Since PaX mprotect is enabled by
default in -current, I had to disable it using "sysctl -w
security.pax.mprotect.enabled=0" to get qemu to run.
The results were similar to those from NetSBD 7.1:
qemu git revision 21a03d17f2edb1e63f7137d97ba355cc6f19d79f:
real 4468.86
user 2870.44
sys 25.24
qemu git revision 05e514b1d4d5bd4209e2c8bbc76ff05c85a235f3:
real 13180.34
user 10890.35
sys 6131.38
--
Andreas Gustafsson, gson@gson.org
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: misc/52184: Recent qemu performs badly on NetBSD hosts under load
Date: Mon, 16 Apr 2018 17:24:28 +0300
I believe the performance issues reported in this PR have now mostly
been resolved. There were multiple causes:
1. The increase in system time was because of QEMU bug 1481272, which
has now been fixed. See https://bugs.launchpad.net/qemu/+bug/1481272/
for details.
2. Much of the increase in user time appears to have been due to QEMU
2 running slowly when using a small translation buffer size. By
default, QEMU sizes the translation buffer based on the amount of RAM
configured for the VM, and i386 in particular was configured with a
small amount of RAM (32 MB), which caused a translation buffer of only
8 MB to be used. This was partly addressed some time ago by
increasing the RAM configured for the i386 tests, and performance
improved further when I recently added the qemu command line option
"-tb-size 128" to the configuration on babylon5.netbsd.org to increase
the size further.
The size of the translation buffer appears to have a much smaller
effect on performance in qemu 0.15, but I don't know why.
3. There was also an issue with qemu failing to detect the presence
of posix_memalign() on NetBSD, falling back to valloc() and wasting
memory by unnecessarily rounding allocations up to full pages. This
does not appear to have been a major cause of the performance issues,
but in any case, this too is now fixed.
4. The NetBSD tests themselves have grown substantially, and now
require several times more CPU time than they did a couple of years
ago.
Testing historic versions of NetBSD using a recent qemu with -m 128
-tb-size 128 shows it running the tests faster than qemu 0.15 did
(both running on the current babylon5.netbsd.org hardware):
NetBSD source date 2016.01.01.19.14.45, qemu-0.15 -m 32
18986.10 real 15865.06 user 604.28 sys
NetBSD source date 2016.01.01.19.46.48, qemu-2.11.1 -m 128 -tb-size 128
14637.12 real 9664.66 user 142.80 sys
--
Andreas Gustafsson, gson@gson.org
State-Changed-From-To: open->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Mon, 16 Apr 2018 14:45:17 +0000
State-Changed-Why:
After bug fixes and configuration changes, recent qemu runs faster than 0.15.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.