NetBSD Problem Report #35187
From www@NetBSD.org Tue Dec 5 06:24:38 2006
Return-Path: <www@NetBSD.org>
Received: by narn.NetBSD.org (Postfix, from userid 31301)
id B868E63BA4D; Tue, 5 Dec 2006 06:24:38 +0000 (UTC)
Message-Id: <20061205062438.B868E63BA4D@narn.NetBSD.org>
Date: Tue, 5 Dec 2006 06:24:38 +0000 (UTC)
From: blair.sadewitz@gmail.com
Reply-To: blair.sadewitz@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Certain file operations--such as chown--take inordinate amounts of time on LFS
X-Send-Pr-Version: www-1.0
>Number: 35187
>Category: kern
>Synopsis: Certain file operations--such as chown--take inordinate amounts of time on LFS
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Dec 05 06:25:00 +0000 2006
>Last-Modified: Tue Jan 02 21:05:01 +0000 2007
>Originator: Blair Sadewitz
>Release: 4.99.5 as of dec 3-4
>Organization:
Organization? Maybe next year ...
>Environment:
NetBSD amd64 4.99.5
>Description:
root 1128 98.0 0.1 172 1088 ttyE1 D 1:00AM 13:19.82 chown -R blair pkg
src
This process has been running for virtually the same amount of time as the CPU time column indicates. All I'm doing is chown -R blair:wsrc /usr/pkgsrc. On ffs, this would be of no consequence.
I find LFS advantageous for a number of reasons, but I'm hesistant to use it if things such as this, the tail end of a pax/tar job, chmod, etc. takes this much CPU time. I have a 3GHz Pentium D, and if it weren't for the other core, my machine would be virtually unusable, at least interactively.
Yay, it finally finished 20 minutes later! ;)
What is going on here? I have tried setting vfs.lfs.pagetrip as instructed, as well as different buffer queue strategies, etc, all to no avail.
Obviously, this problem is beyond my comprehension. I can't be the only one annoyed by this, especially considering that LFS shines in situations (such as large RAIDframe arrays) where one may want to do operations like this on 2TB of files.
>How-To-Repeat:
Perform filesystem metadata operations on many files in an LFS filesystem.
>Fix:
I wish I knew, and if I ever meet the person who fixes this, you can have a few beers on me. ;)
Please ignore the last incomplete PR I filed, that happened because my keyboard response was lagged due to the aforementioned chown process.
>Audit-Trail:
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/35187 (Certain file operations--such as chown--take inordinate amounts of time on LFS)
Date: Mon, 1 Jan 2007 22:37:17 +0000
I have been seeing this too. Basic setup:
Filesystem 1K-blocks Used Avail Capacity iused ifree %iused Mounted on
/dev/ld0e 14535448 3111227 9969928 23% 342102 97543026 0% /local
maxvnodes ~= 128k
I did a bit of profiling, and it seems that the problem is lfs_writevnodes()
spends an unreasonable amount of time traversing the vnode list for the
mount, and in lfs_vref() -> vget().
Output from gprof is below. The numbers for Xspllower() and kernel_lock()
are a result of the activity in lfs_writevnodes().
% cumulative self self total
time seconds seconds calls ms/call ms/call name
29.29 162.03 162.03 19092 8.49 8.49 Xspllower
24.39 296.94 134.91 1122369 0.12 0.23 _kernel_lock
23.68 427.91 130.97 3948 33.17 55.33 lfs_writevnodes
7.29 468.25 40.34 176313571 0.00 0.00 vget
4.43 492.74 24.49 175754664 0.00 0.00 lfs_vref
4.06 515.21 22.47 175750428 0.00 0.00 lfs_vunref
2.01 526.34 11.13 mpidle
index % time self children called name
550 lfs_vflush <cycle 3> [740]
3398 lfs_segwrite <cycle 3> [61]
[5] 39.5 130.97 87.48 3948 lfs_writevnodes <cycle 3> [5]
24.49 40.52 175745415/175754664 lfs_vref [10]
22.47 0.00 175743764/175750428 lfs_vunref [23]
223127 lfs_writefile <cycle 3> [267]
223093 lfs_writeinode <cycle 3> [111]
12 lfs_writeseg <cycle 3> [211]
index % time self children called name
0.00 0.00 3131/175754664 lfs_mark_vnode [436]
0.00 0.00 6118/175754664 lfs_reserve [539]
24.49 40.52 175745415/175754664 lfs_writevnodes <cycle 3> [5]
[10] 11.8 24.49 40.52 175754664 lfs_vref [10]
40.21 0.31 175761681/176313571 vget [12]
index % time self children called name
0.00 0.00 16/176313571 checkalias [279]
0.00 0.00 4145/176313571 ffs_sync <cycle 3> [583]
0.00 0.00 4177/176313571 getcwd_common [114]
0.01 0.00 46692/176313571 cache_lookup [112]
0.11 0.00 496860/176313571 ufs_ihashget [125]
40.21 0.31 175761681/176313571 lfs_vref [10]
[12] 7.3 40.34 0.31 176313571 vget [12]
0.07 0.24 505279/1809247 vn_lock [69]
To repeat this:
1. Associate a large number of vnodes with the filesystem (untarring
pkgsrc.tar.gz and checking out src/sys, for example).
2. Start a few "MAKEDEV all" in the background.
3. Start a few "chown -R ... chmod -R" in the background.
4. Number of disk xfers as a result of LFS activity slows to around
10/s, and the system becomes unusable:
CPU0 states: 0.0% user, 0.0% nice, 100% system, 0.0% interrupt, 0.0% idle
CPU1 states: 0.0% user, 0.0% nice, 100% system, 0.0% interrupt, 0.0% idle
To fix, maybe alter lfs_writevnodes() to have to scan only a subset of all
vnodes which are dirty or potentially dirty. I'm not sure.
From: "Blair Sadewitz" <blair.sadewitz@gmail.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/35187 (Certain file operations--such as chown--take inordinate amounts of time on LFS)
Date: Tue, 2 Jan 2007 16:00:47 -0500
There is a simple way to mitigate this to some extent (seems to help
with chmod/chown, but not with the queue backup when untarring
something) though I doubt it is safe only because it is purposefully
excluded from the code:
At the beginning lfs_writevnodes() in lfs_segment.c, change the #if 0
to 1 so that it starts with the newest instead of oldest vnode.
Unfortunately I havent thought of any better way to do this yet, as
most likely doing so is beyond my ability. :(
--Blair
--
Support WFMU-FM: free-form radio for the masses!
<http://www.wfmu.org/>
91.1 FM Jersey City, NJ
90.1 FM Mt. Hope, NY
"The Reggae Schoolroom":
<http://www.wfmu.org/playlists/RS/>
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.