NetBSD Problem Report #44568
From Manuel.Bouyer@lip6.fr Mon Feb 14 16:12:42 2011
Return-Path: <Manuel.Bouyer@lip6.fr>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id ED11563B100
for <gnats-bugs@gnats.NetBSD.org>; Mon, 14 Feb 2011 16:12:41 +0000 (UTC)
Message-Id: <20110214161237.D429334C29@armandeche.soc.lip6.fr>
Date: Mon, 14 Feb 2011 17:12:37 +0100 (MET)
From: Manuel.Bouyer@lip6.fr
Reply-To: Manuel.Bouyer@lip6.fr
To: gnats-bugs@gnats.NetBSD.org
Subject: WAPBL doens't play nice with snapshots
X-Send-Pr-Version: 3.95
>Number: 44568
>Category: kern
>Synopsis: WAPBL doens't play nice with snapshots
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: hannken
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Feb 14 16:15:00 +0000 2011
>Closed-Date: Tue Feb 22 08:43:49 +0000 2011
>Last-Modified: Tue Feb 22 08:43:49 +0000 2011
>Originator: Manuel Bouyer
>Release: NetBSD 5.99.45
>Organization:
>Environment:
System: NetBSD java 5.99.45 NetBSD 5.99.45 (GENERIC) #0: Thu Feb 10 05:03:13 UTC 2011 builds@b7.netbsd.org:/home/builds/ab/HEAD/amd64/201102100300Z-obj/home/builds/ab/HEAD/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: x86_64
>Description:
taking a persistent snapshot of a 500GB WAPBL-enabled ffs filesystem
panics with:
panic: wapbl_flush: current transaction too big to flush.
I've seen different stack trace but ffs_sync() is in it,
either called from the VOP_FSYNC() call in ffs_snapshot() or from
sched_fsync().
>How-To-Repeat:
assuming /home is a 500Go ffs rw,log filesystem:
fssconfig fss0 /home /home/snap
>Fix:
WAPBL transactions needs to be splitted. The patch below makes
things better for me but it still panics on the second snapshot.
There are other suspicious places, like snapshot_expunge() which seems
to do a lot of things inside a single transaction (maybe
we could start/end the transaction inside the loop instead?)
I also get this same panic when rm'ing the snapshot file,
from ufs_inactive().
Index: sys/ufs/ffs/ffs_snapshot.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_snapshot.c,v
retrieving revision 1.102.4.2
diff -u -p -u -r1.102.4.2 ffs_snapshot.c
--- sys/ufs/ffs/ffs_snapshot.c 12 Feb 2011 21:48:09 -0000 1.102.4.2
+++ sys/ufs/ffs/ffs_snapshot.c 14 Feb 2011 16:01:27 -0000
@@ -489,6 +489,12 @@ snapshot_setup(struct mount *mp, struct
if (error)
goto out;
bawrite(nbp);
+ if ((loc % 16) == 0) {
+ UFS_WAPBL_END(mp);
+ error = UFS_WAPBL_BEGIN(mp);
+ if (error)
+ return error;
+ }
}
out:
@@ -825,6 +831,12 @@ snapshot_writefs(struct mount *mp, struc
memcpy(bp->b_data, space, fs->fs_bsize);
space = (char *)space + fs->fs_bsize;
bawrite(bp);
+ if (((loc + 1) % 16) == 0) {
+ UFS_WAPBL_END(mp);
+ error = UFS_WAPBL_BEGIN(mp);
+ if (error)
+ return error;
+ }
}
if (error)
goto out;
@@ -892,6 +904,12 @@ cgaccount(struct vnode *vp, int passno,
bawrite(nbp);
if (error)
break;
+ if (((cg + 1) % 16) == 0) {
+ UFS_WAPBL_END(vp->v_mount);
+ error = UFS_WAPBL_BEGIN(vp->v_mount);
+ if (error)
+ return error;
+ }
}
UFS_WAPBL_END(vp->v_mount);
return error;
>Release-Note:
>Audit-Trail:
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/44568: WAPBL doens't play nice with snapshots
Date: Mon, 14 Feb 2011 17:32:49 +0100
Manuel,
please append the output of: dumpfs /home | awk '/^file/, /^volname/'
--
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/44568: WAPBL doens't play nice with snapshots
Date: Mon, 14 Feb 2011 18:06:29 +0100
--/04w6evG8XlLl3ft
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
On Mon, Feb 14, 2011 at 04:35:10PM +0000, Juergen Hannken-Illjes wrote:
> The following reply was made to PR kern/44568; it has been noted by GNATS.
>
> From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: kern/44568: WAPBL doens't play nice with snapshots
> Date: Mon, 14 Feb 2011 17:32:49 +0100
>
> Manuel,
> please append the output of: dumpfs /home | awk '/^file/, /^volname/'
Sure, here it is
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
--/04w6evG8XlLl3ft
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="dumpfs.out"
file system: /dev/rwd0e
format FFSv1
endian little-endian
magic 11954 time Mon Feb 14 18:03:22 2011
superblock location 8192 id [ 4d53e7ad 7189d21c ]
cylgrp dynamic inodes 4.4BSD sblock FFSv2 fslevel 4
nbfree 29644201 ndir 8 nifree 59530263 nffree 34
ncg 2556 size 241108796 blocks 237346336
bsize 16384 shift 14 mask 0xffffc000
fsize 2048 shift 11 mask 0xfffff800
frag 8 shift 3 fsbtodb 2
bpg 11794 fpg 94352 ipg 23296
minfree 5% optim time maxcontig 4 maxbpg 4096
symlinklen 60 contigsumsize 4
maxfilesize 0x000400400402ffff
nindir 4096 inopb 128
avgfilesize 16384 avgfpdir 64
sblkno 8 cblkno 16 iblkno 24 dblkno 1480
sbsize 2048 cgsize 16384
csaddr 1480 cssize 40960
cgrotor 0 fmod 0 ronly 0 clean 0x02
wapbl version 0x1 location 2 flags 0x0
wapbl loc0 482333376 loc1 131072 loc2 512 loc3 3
flags wapbl
fsmnt /home
volname swuid 0
--/04w6evG8XlLl3ft--
Responsible-Changed-From-To: kern-bug-people->hannken
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Wed, 16 Feb 2011 10:16:31 +0000
Responsible-Changed-Why:
I'm working on a fix.
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/44568 CVS commit: src/sys/kern
Date: Wed, 16 Feb 2011 19:43:06 +0000
Module Name: src
Committed By: hannken
Date: Wed Feb 16 19:43:06 UTC 2011
Modified Files:
src/sys/kern: vfs_wapbl.c
Log Message:
Set the limit for deallocations in one transaction to a more realistic
(and much lower) value. When flushing the log these deallocations will
produce new blocks and that may execeed the journal size resulting in
a "wapbl_flush: current transaction too big to flush" panic.
Seen when removing a large snapshot.
Adresses PR #44568 (WAPBL doens't play nice with snapshots).
To generate a diff of this commit:
cvs rdiff -u -r1.40 -r1.41 src/sys/kern/vfs_wapbl.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/44568 CVS commit: src/sys/ufs/ffs
Date: Wed, 16 Feb 2011 19:43:50 +0000
Module Name: src
Committed By: hannken
Date: Wed Feb 16 19:43:50 UTC 2011
Modified Files:
src/sys/ufs/ffs: ffs_snapshot.c
Log Message:
Refine the scope of WAPBL transactions so we should no longer get
a "wapbl_flush: current transaction too big to flush" panic when
creating or removing snapshots on larger logging disks.
Adresses PR #44568 (WAPBL doens't play nice with snapshots).
To generate a diff of this commit:
cvs rdiff -u -r1.102 -r1.103 src/sys/ufs/ffs/ffs_snapshot.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: hannken@NetBSD.org, kern-bug-people@NetBSD.org, netbsd-bugs@NetBSD.org,
gnats-admin@NetBSD.org
Subject: Re: kern/44568 (WAPBL doens't play nice with snapshots)
Date: Thu, 17 Feb 2011 17:49:13 +0100
On Wed, Feb 16, 2011 at 10:16:32AM +0000, hannken@NetBSD.org wrote:
> Synopsis: WAPBL doens't play nice with snapshots
>
> Responsible-Changed-From-To: kern-bug-people->hannken
> Responsible-Changed-By: hannken@NetBSD.org
> Responsible-Changed-When: Wed, 16 Feb 2011 10:16:31 +0000
> Responsible-Changed-Why:
> I'm working on a fix.
thanks for your fixes, I couldn't make my test system panic any more.
But there's a strange thing:
I've been doing a
fssconfig -u fss0; rm /home/snaps/snap0; fssconfig fss0 /home /home/snaps/snap0
in a loop, while a bonnie++ was running in another loop. And I see:
/home: suspended 80.452 sec, redo 1217 of 2556
/home: suspended 226.932 sec, redo 1177 of 2556
/home: suspended 350.192 sec, redo 1206 of 2556
/home: suspended 477.433 sec, redo 1205 of 2556
/home: suspended 635.357 sec, redo 1208 of 2556
/home: suspended 813.420 sec, redo 1201 of 2556
any idea why is takes more and more time to take the snapshot ?
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: hannken@NetBSD.org, netbsd-bugs@NetBSD.org, gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/44568 (WAPBL doens't play nice with snapshots)
Date: Thu, 17 Feb 2011 18:06:22 +0100
On Thu, Feb 17, 2011 at 04:50:05PM +0000, Manuel Bouyer wrote:
> thanks for your fixes, I couldn't make my test system panic any more.
Well, I could. At reboot time I got the panic "transaction too big to flush"
again:
wapbl_flush
wapbl_begin
ufs_inactive
VOP_INACTIVE
vrelel
ffs_snapshot_unmount
ffs_flushfiles
ffs_unmount
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/44568 (WAPBL doens't play nice with snapshots)
Date: Thu, 17 Feb 2011 19:01:01 +0100
On Thu, Feb 17, 2011 at 05:50:05PM +0000, Manuel Bouyer wrote:
> The following reply was made to PR kern/44568; it has been noted by GNATS.
>
> From: Manuel Bouyer <bouyer@antioche.eu.org>
> To: hannken@NetBSD.org, netbsd-bugs@NetBSD.org, gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/44568 (WAPBL doens't play nice with snapshots)
> Date: Thu, 17 Feb 2011 18:06:22 +0100
>
> On Thu, Feb 17, 2011 at 04:50:05PM +0000, Manuel Bouyer wrote:
> > thanks for your fixes, I couldn't make my test system panic any more.
>
> Well, I could. At reboot time I got the panic "transaction too big to flush"
> again:
> wapbl_flush
> wapbl_begin
> ufs_inactive
> VOP_INACTIVE
> vrelel
> ffs_snapshot_unmount
> ffs_flushfiles
> ffs_unmount
Please try it with a smaller wl_dealloclim in file vfs_wapbl.c line 482:
- wl->wl_dealloclim = wl->wl_bufbytes_max / mp->mnt_stat.f_bsize / 2;
+ wl->wl_dealloclim = wl->wl_bufbytes_max / mp->mnt_stat.f_bsize / 4;
--
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/44568 CVS commit: src/sys/ufs/ffs
Date: Fri, 18 Feb 2011 08:39:13 +0000
Module Name: src
Committed By: hannken
Date: Fri Feb 18 08:39:13 UTC 2011
Modified Files:
src/sys/ufs/ffs: ffs_snapshot.c
Log Message:
Revert rev. 1.101. Dead snapshots would hang around until unmount.
Adresses PR #44568 (WAPBL doens't play nice with snapshots).
To generate a diff of this commit:
cvs rdiff -u -r1.103 -r1.104 src/sys/ufs/ffs/ffs_snapshot.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/44568: WAPBL doens't play nice with snapshots
Date: Mon, 21 Feb 2011 12:16:43 +0100
Manuel,
is there still an open problem regarding this PR or is it ok to close it?
--
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: hannken@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/44568: WAPBL doens't play nice with snapshots
Date: Mon, 21 Feb 2011 23:47:16 +0100
On Mon, Feb 21, 2011 at 11:20:06AM +0000, Juergen Hannken-Illjes wrote:
> The following reply was made to PR kern/44568; it has been noted by GNATS.
>
> From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: kern/44568: WAPBL doens't play nice with snapshots
> Date: Mon, 21 Feb 2011 12:16:43 +0100
>
> Manuel,
>
> is there still an open problem regarding this PR or is it ok to close it?
None that I know in HEAD. Do some of these fixes need to be pulled up to
netbsd-5 ?
thanks !
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
State-Changed-From-To: open->closed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Tue, 22 Feb 2011 08:43:49 +0000
State-Changed-Why:
Fixed on head.
Maybe pullup later to netbsd-5.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.