NetBSD Problem Report #47231

From jakllsch@xenotaph.kollasch.net  Wed Nov 21 17:16:10 2012
Return-Path: <jakllsch@xenotaph.kollasch.net>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id E3B2763DCB2
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 21 Nov 2012 17:16:09 +0000 (UTC)
Message-Id: <20121121171607.B96951703DF@xenotaph.kollasch.net>
Date: Wed, 21 Nov 2012 17:16:07 +0000 (UTC)
From: jakllsch@kollasch.net
Reply-To: jakllsch@kollasch.net
To: gnats-bugs@gnats.NetBSD.org
Subject: WAPBL dangers
X-Send-Pr-Version: 3.95

>Number:         47231
>Category:       kern
>Synopsis:       ffs (with or without WAPBL) does not preserve referential integrity of file data and corrupts files when crashing
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    jdolecek
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 21 17:20:01 +0000 2012
>Last-Modified:  Thu Jul 23 19:48:01 +0000 2020
>Originator:     Jonathan A. Kollasch
>Release:        NetBSD 6.99.15
>Organization:
>Environment:

System: NetBSD xenotaph.kollasch.net 6.99.15 NetBSD 6.99.15 (GENERIC) #81: Tue Nov 13 18:46:31 CST 2012 jakllsch@xenotaph.kollasch.net:/local/jakllsch/NetBSD/obj-amd64/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
Crash or power loss during cvs update on WAPBL-enabled volume results in
near-irreparable damage to CVS working copy metadata.
>How-To-Repeat:
Stop kernel (via debugger/crash/power-loss) while cvs update is running
on a larger tree (such as pkgsrc) within a WAPBL/log-enabled file system.
When the machine comes up again (and journal replayed), try
to run the cvs update again and note how confused it is.
>Fix:
Unknown, but workaround would be to disable WAPBL.

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: misc-bug-people->kern-bug-people
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Wed, 28 Nov 2012 15:30:07 +0000
Responsible-Changed-Why:
kernel issue.



From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/47231: crash or power loss during cvs update on
 WAPBL-enabled volume results in near-irreparable damage to CVS metadata.
Date: Wed, 28 Nov 2012 15:36:22 +0000

 On Wed, Nov 21, 2012 at 05:20:01PM +0000, jakllsch@kollasch.net wrote:
  > Crash or power loss during cvs update on WAPBL-enabled volume results in
  > near-irreparable damage to CVS working copy metadata.

 As discussed periodically elsewhere, this is a long-standing design
 bug in ffs (the original motivation for softupdates) which WAPBL fails
 to correct. Experience so far seems to indicate that the problem is
 more severe in practice with WAPBL, probably because it runs faster
 and more data is buffered and unwritten at crash time.

 That CVS is somewhat lame about its metadata handling isn't really
 relevant.

 As far as I can tell there isn't an existing PR about this issue,
 possibly because it's older than NetBSD, so I'm going to adjust the
 synopsis of this one to serve the general case.

 I'm also setting it to critical/high.

 -- 
 David A. Holland
 dholland@netbsd.org

From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/47231 (crash or power loss during cvs update on WAPBL-enabled volume results in near-irreparable damage to CVS metadata.)
Date: Thu, 29 Nov 2012 08:27:32 +0000

 On Wed, Nov 28, 2012 at 03:30:10PM +0000, dholland@NetBSD.org wrote:
 > Synopsis: crash or power loss during cvs update on WAPBL-enabled volume results in near-irreparable damage to CVS metadata.

 Is this the 'problem' that is caused by cvs doing the 'safe' sequence
 of writing a new CVS/Entries file and then renaming it to replace the
 old one?

 After a crash all the directory and inode information is correct,
 but the disk blocks backing the files have not been rewritten?

 The effect being that most of the CVS/Entries files contain the
 contents of the files for other directories!

 I got an fs into that mess after a cvs diff (which also seems to
 rewrite the Entries files) - but I might have had it mounted async.

 	David

 -- 
 David Laight: david@l8s.co.uk

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/47231: crash or power loss during cvs update on WAPBL-enabled volume results in near-irreparable damage to CVS metadata.
Date: Thu, 29 Nov 2012 17:13:28 +0100

 On Nov 28, 2012, at 4:40 PM, David Holland <dholland-bugs@netbsd.org> wrote:

 > As discussed periodically elsewhere, this is a long-standing design
 > bug in ffs (the original motivation for softupdates) which WAPBL fails
 > to correct. Experience so far seems to indicate that the problem is
 > more severe in practice with WAPBL, probably because it runs faster
 > and more data is buffered and unwritten at crash time.

 Should we enforce some kind of fsync-on-close on WAPBL enabled fs?

 This way at least the common open-write-close-rename sequence should
 survive a crash so either the old file or the new one has valid data.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: pedro martelletto <pedro@ambientworks.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/47231
Date: Wed, 25 Dec 2013 12:12:01 +0100

 As mentioned previously, this problem is indeed caused by metadata
 pointers making it to disk before the newly allocated data blocks that
 they point to.

 The issue is further aggravated by WAPBL, since there are situations
 where the journal is pushed to disk while regular file data is not,
 which means there is a higher probability that, upon log replay, the
 pointers in the inode will be updated to reflect an ongoing allocation
 at the time of the crash.

 One way to circumvent the problem is to asynchronously push blocks in
 FFS's write routine for the '!overwrite' case:

 Index: ufs/ufs/ufs_readwrite.c
 ===================================================================
 RCS file: /cvsroot/src/sys/ufs/ufs/ufs_readwrite.c,v
 retrieving revision 1.107
 diff -u -r1.107 ufs_readwrite.c
 --- ufs/ufs/ufs_readwrite.c	23 Jun 2013 07:28:37 -0000	1.107
 +++ ufs/ufs/ufs_readwrite.c	25 Dec 2013 10:28:28 -0000
 @@ -423,6 +423,14 @@
   		 * XXXUBC simplistic async flushing.
   		 */

 +		if (!overwrite && vp->v_mount && vp->v_mount->mnt_wapbl) {
 +			mutex_enter(vp->v_interlock);
 +			error = VOP_PUTPAGES(vp,
 +			    trunc_page(oldoff & fs->fs_bmask),
 +			    round_page(ufs_blkroundup(fs, uio->uio_offset)),
 +			    PGO_CLEANIT | PGO_JOURNALLOCKED | PGO_LAZY);
 +		}
 +
   #ifndef LFS_READWRITE
   		if (!async && oldoff >> 16 != uio->uio_offset >> 16) {
   			mutex_enter(vp->v_interlock);

 That is not an acceptable solution though, as it (unsurprisingly)
 disrupts write clustering:

 dd if=/dev/zero of=x bs=2048 count=1024k  0.13s user 3.87s system 88% cpu 4.493 total
 dd if=/dev/zero of=x bs=2048 count=1024k  0.38s user 10.95s system 12% cpu 1:30.63 total

 softdep's answer to this problem is to keep track of newly allocated
 data blocks, and to zero the pointers in the inode if it happens to be
 pushed to disk before the blocks have been written. We might have to do
 something similar.

 Alternatives that come to my mind are a) to have these data blocks added
 to the journal, since they are needed to preserve file system integrity;
 or b) ensure they are pushed to disk every time the journal is flushed.

 None of them strike me as particularly appealing, though. Any ideas?

 -p.

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/47231
Date: Sat, 26 Apr 2014 05:51:32 +0000

 On Wed, Dec 25, 2013 at 11:15:01AM +0000, pedro martelletto wrote:
  >  As mentioned previously, this problem is indeed caused by metadata
  >  pointers making it to disk before the newly allocated data blocks that
  >  they point to.
  >  
  >  The issue is further aggravated by WAPBL, since there are situations
  >  where the journal is pushed to disk while regular file data is not,
  >  which means there is a higher probability that, upon log replay, the
  >  pointers in the inode will be updated to reflect an ongoing allocation
  >  at the time of the crash.
  >
  > [...]
  >
  >  Alternatives that come to my mind are a) to have these data blocks added
  >  to the journal, since they are needed to preserve file system integrity;
  >  or b) ensure they are pushed to disk every time the journal is flushed.
  >  
  >  None of them strike me as particularly appealing, though. Any ideas?

 What ext3 does is force these blocks to disk in advance of the journal
 entries that point to them. Doing this in wapbl has been looked at (by
 Joerg, iirc) with the conclusion that it would be hard.

 My conclusion is that in the long run we need a different file
 system, but I haven't had any time whatsoever to act on this. :-/

 If anyone wants to work on it, btw, I have a copy of the Harvard
 journaling ffs, which is fundamentally different from wapbl and
 (AFAIK) doesn't have this problem. However, it was originally written
 against BSD/OS a long time ago and will need a lot of merging and
 fixing to even compile.

 -- 
 David A. Holland
 dholland@netbsd.org

From: pedro martelletto <pedro@ambientworks.net>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, jakllsch@kollasch.net
Subject: Re: kern/47231
Date: Sat, 26 Apr 2014 08:33:08 +0200

 > If anyone wants to work on it, btw, I have a copy of the Harvard
 > journaling ffs, which is fundamentally different from wapbl and
 > (AFAIK) doesn't have this problem. However, it was originally written
 > against BSD/OS a long time ago and will need a lot of merging and
 > fixing to even compile.

 I would definitely be interested in giving it a look.

 -p.

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/47231
Date: Mon, 2 Jun 2014 01:26:52 +0000

 On Sat, Apr 26, 2014 at 06:35:01AM +0000, pedro martelletto wrote:
  >  > If anyone wants to work on it, btw, I have a copy of the Harvard
  >  > journaling ffs, which is fundamentally different from wapbl and
  >  > (AFAIK) doesn't have this problem. However, it was originally written
  >  > against BSD/OS a long time ago and will need a lot of merging and
  >  > fixing to even compile.
  >  
  >  I would definitely be interested in giving it a look.

 Was it you I already gave a copy to? (I think so but I lose track
 easily...)

 if so, email me off-gnats.

 (If not, email me off-gnats anyway...)

 -- 
 David A. Holland
 dholland@netbsd.org

Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Thu, 23 Jul 2020 19:48:01 +0000
Responsible-Changed-Why:
This is one of blockers for making WAPBL default, I'd like to eventually
look at this. I don't have timeframe, so feel free to take over if you want
to work on this.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.