NetBSD Problem Report #39564
From jmcneill@black.invisible.ca Tue Sep 16 16:10:58 2008
Return-Path: <jmcneill@black.invisible.ca>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 6BEEF63B842
for <gnats-bugs@gnats.NetBSD.org>; Tue, 16 Sep 2008 16:10:58 +0000 (UTC)
Message-Id: <20080916161048.243F17D96D4@black.invisible.ca>
Date: Tue, 16 Sep 2008 12:10:47 -0400 (EDT)
From: jmcneill@invisible.ca
Reply-To: jmcneill@invisible.ca
To: gnats-bugs@gnats.NetBSD.org
Subject: wapbl performance issues with disk cache flushing
X-Send-Pr-Version: 3.95
>Number: 39564
>Category: kern
>Synopsis: wapbl performance issues with disk cache flushing
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: ad
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Sep 16 16:15:01 +0000 2008
>Closed-Date: Sun Feb 22 20:12:17 +0000 2009
>Last-Modified: Tue Feb 24 04:15:02 +0000 2009
>Originator: Jared D. McNeill
>Release: NetBSD 4.99.72
>Organization:
>Environment:
System: NetBSD black.invisible.ca 4.99.72 NetBSD 4.99.72 (GENERIC) #0: Mon Sep 15 11:17:37 EDT 2008 jmcneill@black.invisible.ca:/export/home/jmcneill/branches/HEAD/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
System: NetBSD blue.invisible.ca 4.99.72 NetBSD 4.99.72 (GENERIC) #1: Tue Sep 16 11:07:56 EDT 2008 jmcneill@blue.invisible.ca:/export/home/jmcneill/branches/HEAD/src/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
Disk cache flushing can severely impact performance on a wapbl-enabled
filesystem.
>How-To-Repeat:
1. Grab qt3 and qt4 distfiles from pkgsrc:
-rw-r--r-- 1 jmcneill users 14360483 Mar 15 2007 qt-x11-free-3.3.8.tar.bz2
-rw-r--r-- 1 jmcneill users 98584925 May 2 08:03 qt-x11-opensource-src-4.4.0.tar.bz2
2. In one terminal, extract 4.4.0 tarball in a loop
$ while true; do tar -zxvf qt-x11-opensource-src-4.4.0.tar.bz2; done
3. In another terminal, time how long it takes to extract
the 3.3.8 distfile:
$ time tar -zxvf qt-x11-free-3.3.8.tar.bz2
4. Results with and without disk cache flush (patch from simonb):
flush_disk_cache=0: 18.85s real 10.12s user 2.55s system
flush_disk_cache=1: 662.49s real 10.35s user 3.13s system
>Fix:
>Release-Note:
>Audit-Trail:
From: "Jared D. McNeill" <jmcneill@invisible.ca>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/39564: wapbl performance issues with disk cache flushing
Date: Tue, 16 Sep 2008 12:38:09 -0400
As requested offline, here are the disks I'm using:
wd0 at atabus2 drive 0: <WDC WD800JB-00CRA1> (@ viaide)
wd0 at atabus1 drive 0: <HITACHI HTS541612J9SA00> (@ ahcisata)
Responsible-Changed-From-To: kern-bug-people->simonb
Responsible-Changed-By: jmcneill@NetBSD.org
Responsible-Changed-When: Tue, 16 Sep 2008 16:44:37 +0000
Responsible-Changed-Why:
Simon volunteered.
From: Havard Eidnes <he@uninett.no>
To: jmcneill@invisible.ca
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/39564: wapbl performance issues with disk cache flushing
Date: Sun, 12 Oct 2008 19:47:52 +0200 (CEST)
Hi,
I have some additional data points and comments related to this
issue. I'm testing by doing parallel builds on a HP ProLiant
460c G1 blade, which is equipped with 16GB memory, dual quad-
core Xeon processors at 2.83GHz, and with an on-board ciss(4)
controller with a single file system (root, /usr, /var etc.).
I'm testing by doing "build.sh -j 16" on this system. The
different variants and the result of "time" from csh were as
follows:
First a few attempts without simon's patch to enable one to turn
off cache flushing via sysctl:
1) Plain "build.sh -j 16":
6515.491u 2108.338s 2:19:19.68 103.1% 0+0k 3+688241io 32279pf+0w
(yes, that's almost 2 hours and 20 minutes(!))
2) As above, but with a "time sh -c 'while true; do sync; sleep 1; done=
'
loop running in parallel with the build:
5777.841u 2284.086s 21:59.48 610.9% 0+0k 627+44423io 32323pf+0w
while the sync loop got
0.878u 62.666s 24:11.92 4.3% 0+0k 0+48589io 0pf+0w
3) Just for comparison, a "plain" build.sh without -j specified:
5312.852u 970.429s 2:52:58.37 60.5% 0+0k 103+190425io 32351pf+0w=
Some observations / comments:
o Even when taking the i/o's triggered by the sync loop in 2)
into account, the amount of IO under 1) is massively larger
than in 2), with a factor of about 7.4(!)
o Running "top" and "systat vm" in two windows while the build
in 1) is running shows that in periods of low cpu utilization,
lots of processes appear "stuck" in "tstile/<n>" state.
o The "tstile/<n>" status is kind of non-descript, it basically just
says "I'm waiting on some unspecified kernel-internal resource".
Pinpointing exactly which resource there is contention for cannot
easily be done by using these normal system state monitor tools.
Some other developers hinted that it would be better if the
resource was described by the wait channel string. It appears that
to do this, the locking primitives which end up calling
turnstile_block() would need to pass down some informative string
which better describes the resource locked instead of using a
constant "tstile" string as is done in the turnstile_block()
function when calling sleepq_enqueue().
So, in the next round I installed simon's patch and turned off wapbl
disk cache flushing, and repeated the tests 1) and 2) above. Contrary
to what one might beleive, these actually revealed a difference(!):
4) A standard "build.sh -j 16"
5987.961u 2149.343s 28:30.74 475.6% 0+0k 11678+562510io 56375pf+=
0w
5) as in 4), but with the sync shell loop:
5746.054u 2302.403s 21:08.90 634.2% 0+0k 1+46619io 32331pf+0w
and for the sync loop itself:
0.774u 52.742s 21:21.80 4.1% 0+0k 0+49910io 0pf+0w
Observations / comments:
o While the difference in wall-clock time is much less than between
1) and 2) above, the difference in the amount of IO is still very
significant, with a factor around 5.8(!)
o Observing the "top" display in times of ebbs in the CPU
utilization, I still see many processes in "tstile/<n>" state.
I've repeated this experiment on a non-wapbl file system on the same
system, but there the difference was in the "sync loop" disfavour:
5799.240u 2232.222s 24:57.93 536.1% 0+0k 494+710112io 32612pf+0w
vs.
5881.837u 2435.640s 27:14.19 508.9% 0+0k 0+720501io 32646pf+0w
and the number of IOs is comparable.
So... It seems to me that something or other is causing the wapbl
file systems to experience massive lock contention with parallel
activity on the file system. Also, this seems to indicate that the
patch to make it possible to turn off disk cache flushing with wapbl
via sysctl may just paper over an underlying issue. However, due to
the non-descript nature of the "tstile" status, it's difficult to give
any further hints as to which resource is actually contested.
Best regards,
- H=E5vard
From: Havard Eidnes <he@uninett.no>
To: jmcneill@invisible.ca
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/39564: wapbl performance issues with disk cache flushing
Date: Mon, 27 Oct 2008 00:52:59 +0100 (CET)
Hi,
here's some more data on this case. I increased the amount of
buffering in lockstat.c to be able to do a lockstat over an
entire build (increased LOCKSTAT_MAXBUFS tenfold, and bumped
LOCKSTAT_HASH_SIZE by 8), and with vfs.wapbl.flush_disk_cache=3D0,
I repeated these two tests:
4) A standard "build.sh -j 16":
6001.172u 2184.995s 29:04.68 469.2% 0+0k 12721+559454io 56360pf+=
0w
5) as in 4), but with the sync shell loop, 1 second delay:
5751.959u 2341.618s 21:22.68 630.9% 0+0k 8+45947io 32327pf+0w
The corresponding lockstat files are, respectively
ftp://quattro.urc.uninett.no/pub/NetBSD/lockstat.25
and
ftp://quattro.urc.uninett.no/pub/NetBSD/lockstat.26
I'm no expert in interpreting the lockstat output, but perhaps
someone with more experience in the kernel innards can dig out
some more useful information from this.
Best regards,
- H=E5vard
Responsible-Changed-From-To: simonb->joerg
Responsible-Changed-By: joerg@NetBSD.org
Responsible-Changed-When: Tue, 18 Nov 2008 23:47:57 +0000
Responsible-Changed-Why:
Related to the discussion
http://mail-index.netbsd.org/current-users/2008/10/31/msg005580.html
Responsible-Changed-From-To: joerg->ad
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Sat, 21 Feb 2009 20:02:16 +0000
Responsible-Changed-Why:
i have a fix for this
State-Changed-From-To: open->closed
State-Changed-By: ad@NetBSD.org
State-Changed-When: Sun, 22 Feb 2009 20:12:17 +0000
State-Changed-Why:
fixed
we should batch transactions and allow new generations of transactions
to run in parallel with flushes, but that would be a new feature.
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/39564 CVS commit: src/sys
Date: Sun, 22 Feb 2009 20:10:25 +0000 (UTC)
Module Name: src
Committed By: ad
Date: Sun Feb 22 20:10:25 UTC 2009
Modified Files:
src/sys/kern: vfs_wapbl.c
src/sys/miscfs/syncfs: sync_subr.c sync_vnops.c
src/sys/ufs/ffs: ffs_alloc.c ffs_vfsops.c ffs_vnops.c
Log Message:
PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
To generate a diff of this commit:
cvs rdiff -r1.22 -r1.23 src/sys/kern/vfs_wapbl.c
cvs rdiff -r1.35 -r1.36 src/sys/miscfs/syncfs/sync_subr.c
cvs rdiff -r1.25 -r1.26 src/sys/miscfs/syncfs/sync_vnops.c
cvs rdiff -r1.120 -r1.121 src/sys/ufs/ffs/ffs_alloc.c
cvs rdiff -r1.241 -r1.242 src/sys/ufs/ffs/ffs_vfsops.c
cvs rdiff -r1.109 -r1.110 src/sys/ufs/ffs/ffs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/39564 CVS commit: [netbsd-5] src/sys
Date: Tue, 24 Feb 2009 04:13:35 +0000 (UTC)
Module Name: src
Committed By: snj
Date: Tue Feb 24 04:13:35 UTC 2009
Modified Files:
src/sys/kern [netbsd-5]: vfs_wapbl.c
src/sys/miscfs/syncfs [netbsd-5]: sync_subr.c sync_vnops.c
src/sys/ufs/ffs [netbsd-5]: ffs_alloc.c ffs_vfsops.c ffs_vnops.c
Log Message:
Pull up following revision(s) (requested by ad in ticket #490):
sys/kern/vfs_wapbl.c: revision 1.23
sys/miscfs/syncfs/sync_subr.c: revision 1.36
sys/miscfs/syncfs/sync_vnops.c: revision 1.26
sys/ufs/ffs/ffs_alloc.c: revision 1.121
sys/ufs/ffs/ffs_vfsops.c: revision 1.242
sys/ufs/ffs/ffs_vnops.c: revision 1.110
PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
To generate a diff of this commit:
cvs rdiff -r1.3 -r1.3.8.1 src/sys/kern/vfs_wapbl.c
cvs rdiff -r1.34 -r1.34.20.1 src/sys/miscfs/syncfs/sync_subr.c
cvs rdiff -r1.25 -r1.25.10.1 src/sys/miscfs/syncfs/sync_vnops.c
cvs rdiff -r1.113 -r1.113.4.1 src/sys/ufs/ffs/ffs_alloc.c
cvs rdiff -r1.239 -r1.239.2.1 src/sys/ufs/ffs/ffs_vfsops.c
cvs rdiff -r1.104.4.5 -r1.104.4.6 src/sys/ufs/ffs/ffs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.