NetBSD Problem Report #40562

From yamt@mwd.biglobe.ne.jp  Fri Feb  6 00:06:07 2009
Return-Path: <yamt@mwd.biglobe.ne.jp>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id B211063C07D
	for <gnats-bugs@gnats.NetBSD.org>; Fri,  6 Feb 2009 00:06:07 +0000 (UTC)
Message-Id: <20090206000602.DB7A811704@yamt.dyndns.org>
Date: Fri,  6 Feb 2009 09:06:02 +0900 (JST)
From: yamt@mwd.biglobe.ne.jp
Reply-To: yamt@mwd.biglobe.ne.jp
To: gnats-bugs@gnats.NetBSD.org
Subject: busy loop in ffs_sync when unmounting a file system
X-Send-Pr-Version: 3.95

>Number:         40562
>Category:       kern
>Synopsis:       busy loop in ffs_sync when unmounting a file system
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Feb 06 00:10:00 +0000 2009
>Closed-Date:    Tue Sep 08 03:47:57 +0000 2015
>Last-Modified:  Tue Sep 08 03:47:57 +0000 2015
>Originator:     YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
>Release:        NetBSD 5.99.7
>Organization:

>Environment:


	amd64
>Description:
	it seems that ffs_fsync failed to write out dirty blocks
	for VBLK if the file system which the VBLK special file is on
	is mounted with wapbl.  it causes ffs_sync busy-loop with calling
	VOP_FSYNC on devvp.

>How-To-Repeat:
	make /dev an ffs with logging, run the following,
	and see that umount takes too long.

	rm foo
	dd if=/dev/zero of=foo bs=1m count=16
	vnconfig vnd0 foo
	newfs -F /dev/rvnd0d
	mount /dev/vnd0d /mnt
	touch /mnt/0
	umount /dev/vnd0d

>Fix:
	ensure that ffs_fsync do vflushbuf equivalent for VBLK.

	i'd suggest:

	- ensure that all VOP_FSYNC implementations call VFS_FSYNC
	  for VBLK if a file system is mounted on it.
	- make ffs VFS_FSYNC have its own function, say, ffs_vfs_fsync,
	  rather than sharing ffs_full_fsync or ffs_fsync.
	- move the softdep/wapbl VBLK handling for a mounted file system
	  from ffs_full_fsync to ffs_vfs_fsync.
	- kill FSYNC_VFS.

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->ad
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Fri, 06 Feb 2009 09:58:23 +0000
Responsible-Changed-Why:
i'm (still) working on this


State-Changed-From-To: open->feedback
State-Changed-By: ad@NetBSD.org
State-Changed-When: Sun, 22 Feb 2009 20:13:38 +0000
State-Changed-Why:
fixed
last time i looked i didn't see missing calls to VFS_FSYNC()
did you?


From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/40562 CVS commit: src/sys
Date: Sun, 22 Feb 2009 20:10:25 +0000 (UTC)

 Module Name:	src
 Committed By:	ad
 Date:		Sun Feb 22 20:10:25 UTC 2009

 Modified Files:
 	src/sys/kern: vfs_wapbl.c
 	src/sys/miscfs/syncfs: sync_subr.c sync_vnops.c
 	src/sys/ufs/ffs: ffs_alloc.c ffs_vfsops.c ffs_vnops.c

 Log Message:
 PR kern/39564 wapbl performance issues with disk cache flushing
 PR kern/40361 WAPBL locking panic in -current
 PR kern/40361 WAPBL locking panic in -current
 PR kern/40470 WAPBL corrupts ext2fs
 PR kern/40562 busy loop in ffs_sync when unmounting a file system
 PR kern/40525 panic: ffs_valloc: dup alloc

 - A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
   buffers being invalidated. Problem discovered and patch by dholland@.

 - If the syncer fails to lazily sync a vnode due to lock contention,
   retry 1 second later instead of 30 seconds later.

 - Flush inode atime updates every ~10 seconds (this makes most sense with
   logging). Presently they didn't hit the disk for read-only files or
   devices until the file system was unmounted. It would be better to trickle
   the updates out but that would require more extensive changes.

 - Fix issues with file system corruption, busy looping and other nasty
   problems when logging and non-logging file systems are intermixed,
   with one being the root file system.

 - For logging, do not flush metadata on an inode-at-a-time basis if the sync
   has been requested by ioflush. Previously, we could try hundreds of log
   sync operations a second due to inode update activity, causing the syncer
   to fall behind and metadata updates to be serialized across the entire
   file system. Instead, burst out metadata and log flushes at a minimum
   interval of every 10 seconds on an active file system (happens more often
   if the log becomes full). Note this does not change the operation of
   fsync() etc.

 - With the flush issue fixed, re-enable concurrent metadata updates in
   vfs_wapbl.c.


 To generate a diff of this commit:
 cvs rdiff -r1.22 -r1.23 src/sys/kern/vfs_wapbl.c
 cvs rdiff -r1.35 -r1.36 src/sys/miscfs/syncfs/sync_subr.c
 cvs rdiff -r1.25 -r1.26 src/sys/miscfs/syncfs/sync_vnops.c
 cvs rdiff -r1.120 -r1.121 src/sys/ufs/ffs/ffs_alloc.c
 cvs rdiff -r1.241 -r1.242 src/sys/ufs/ffs/ffs_vfsops.c
 cvs rdiff -r1.109 -r1.110 src/sys/ufs/ffs/ffs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: ad@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
	ad@NetBSD.org
Subject: Re: kern/40562 (busy loop in ffs_sync when unmounting a file system)
Date: Mon, 23 Feb 2009 08:54:55 +0900 (JST)

 > Synopsis: busy loop in ffs_sync when unmounting a file system
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: ad@NetBSD.org
 > State-Changed-When: Sun, 22 Feb 2009 20:13:38 +0000
 > State-Changed-Why:
 > fixed
 > last time i looked i didn't see missing calls to VFS_FSYNC()
 > did you?

 at least ffs is missing the call.
 i haven't checked other filesystems.

 YAMAMOTO Takashi

From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/40562 CVS commit: [netbsd-5] src/sys
Date: Tue, 24 Feb 2009 04:13:35 +0000 (UTC)

 Module Name:	src
 Committed By:	snj
 Date:		Tue Feb 24 04:13:35 UTC 2009

 Modified Files:
 	src/sys/kern [netbsd-5]: vfs_wapbl.c
 	src/sys/miscfs/syncfs [netbsd-5]: sync_subr.c sync_vnops.c
 	src/sys/ufs/ffs [netbsd-5]: ffs_alloc.c ffs_vfsops.c ffs_vnops.c

 Log Message:
 Pull up following revision(s) (requested by ad in ticket #490):
 	sys/kern/vfs_wapbl.c: revision 1.23
 	sys/miscfs/syncfs/sync_subr.c: revision 1.36
 	sys/miscfs/syncfs/sync_vnops.c: revision 1.26
 	sys/ufs/ffs/ffs_alloc.c: revision 1.121
 	sys/ufs/ffs/ffs_vfsops.c: revision 1.242
 	sys/ufs/ffs/ffs_vnops.c: revision 1.110
 PR kern/39564 wapbl performance issues with disk cache flushing
 PR kern/40361 WAPBL locking panic in -current
 PR kern/40361 WAPBL locking panic in -current
 PR kern/40470 WAPBL corrupts ext2fs
 PR kern/40562 busy loop in ffs_sync when unmounting a file system
 PR kern/40525 panic: ffs_valloc: dup alloc
 - A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
   buffers being invalidated. Problem discovered and patch by dholland@.
 - If the syncer fails to lazily sync a vnode due to lock contention,
   retry 1 second later instead of 30 seconds later.
 - Flush inode atime updates every ~10 seconds (this makes most sense with
   logging). Presently they didn't hit the disk for read-only files or
   devices until the file system was unmounted. It would be better to trickle
   the updates out but that would require more extensive changes.
 - Fix issues with file system corruption, busy looping and other nasty
   problems when logging and non-logging file systems are intermixed,
   with one being the root file system.
 - For logging, do not flush metadata on an inode-at-a-time basis if the sync
   has been requested by ioflush. Previously, we could try hundreds of log
   sync operations a second due to inode update activity, causing the syncer
   to fall behind and metadata updates to be serialized across the entire
   file system. Instead, burst out metadata and log flushes at a minimum
   interval of every 10 seconds on an active file system (happens more often
   if the log becomes full). Note this does not change the operation of
   fsync() etc.
 - With the flush issue fixed, re-enable concurrent metadata updates in
   vfs_wapbl.c.


 To generate a diff of this commit:
 cvs rdiff -r1.3 -r1.3.8.1 src/sys/kern/vfs_wapbl.c
 cvs rdiff -r1.34 -r1.34.20.1 src/sys/miscfs/syncfs/sync_subr.c
 cvs rdiff -r1.25 -r1.25.10.1 src/sys/miscfs/syncfs/sync_vnops.c
 cvs rdiff -r1.113 -r1.113.4.1 src/sys/ufs/ffs/ffs_alloc.c
 cvs rdiff -r1.239 -r1.239.2.1 src/sys/ufs/ffs/ffs_vfsops.c
 cvs rdiff -r1.104.4.5 -r1.104.4.6 src/sys/ufs/ffs/ffs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: feedback->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 15 Nov 2009 02:26:37 +0000
State-Changed-Why:
feedback was received in February


Responsible-Changed-From-To: ad->kern-bug-people
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Mon, 09 Apr 2012 05:57:35 +0000
Responsible-Changed-Why:
ad resigned, should not own PRs any more


From: "Sergio L. Pascual" <slp@sinrega.org>
To: "gnats-bugs@netbsd.org" <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: kern/40562
Date: Tue, 13 Jan 2015 18:20:15 +0100

 I can't reproduce this on -current:

 <--- cut here --->
 current64# cat testvnd.sh             
 rm foo                                                                                                    
 dd if=/dev/zero of=foo bs=1m count=16                                                                     
 vnconfig vnd0 foo                                                                                         
 newfs -F /dev/rvnd0d                                                                                      
 mount /dev/vnd0d /mnt                                                                                     
 touch /mnt/0                                                                                              
 umount /dev/vnd0d             
 current64# time ./testvnd.sh             
 16+0 records in             
 16+0 records out             
 16777216 bytes transferred in 0.067 secs (250406208 bytes/sec)                                            
 /dev/rvnd0d: 16.0MB (32768 sectors) block size 4096, fragment size 512                                    
         using 4 cylinder groups of 4.00MB, 1024 blks, 1920 inodes.                                        
 super-block backups (for fsck_ffs -b #) at:                                                               
 32, 8224, 16416, 24608,                                                                                   
         0.19 real         0.01 user         0.06 sys                                                      
 current64# mount                                                                                          
 /dev/wd0a on / type ffs (log, local)                                                                      
 kernfs on /kern type kernfs (local)                                                                       
 ptyfs on /dev/pts type ptyfs (local)                                                                      
 procfs on /proc type procfs (local)                                                                       
 tmpfs on /var/shm type tmpfs (local)                                                                      
 /dev/wd1a on /build type ffs (log, local)                                                                 
 current64#                                                                 
 <--- cut here --->


State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Tue, 08 Sep 2015 03:47:57 +0000
State-Changed-Why:
yeah, this was fixed


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.