NetBSD Problem Report #44568

From Manuel.Bouyer@lip6.fr  Mon Feb 14 16:12:42 2011
Return-Path: <Manuel.Bouyer@lip6.fr>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id ED11563B100
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 14 Feb 2011 16:12:41 +0000 (UTC)
Message-Id: <20110214161237.D429334C29@armandeche.soc.lip6.fr>
Date: Mon, 14 Feb 2011 17:12:37 +0100 (MET)
From: Manuel.Bouyer@lip6.fr
Reply-To: Manuel.Bouyer@lip6.fr
To: gnats-bugs@gnats.NetBSD.org
Subject: WAPBL doens't play nice with snapshots
X-Send-Pr-Version: 3.95

>Number:         44568
>Category:       kern
>Synopsis:       WAPBL doens't play nice with snapshots
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    hannken
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb 14 16:15:00 +0000 2011
>Closed-Date:    Tue Feb 22 08:43:49 +0000 2011
>Last-Modified:  Tue Feb 22 08:43:49 +0000 2011
>Originator:     Manuel Bouyer
>Release:        NetBSD 5.99.45
>Organization:
>Environment:
System: NetBSD java 5.99.45 NetBSD 5.99.45 (GENERIC) #0: Thu Feb 10 05:03:13 UTC 2011  builds@b7.netbsd.org:/home/builds/ab/HEAD/amd64/201102100300Z-obj/home/builds/ab/HEAD/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: x86_64
>Description:
	taking a persistent snapshot of a 500GB WAPBL-enabled ffs filesystem
	panics with:
panic: wapbl_flush: current transaction too big to flush.
I've seen different stack trace but ffs_sync() is in it,
either called from the VOP_FSYNC() call in ffs_snapshot() or from
sched_fsync().


>How-To-Repeat:
	assuming /home is a 500Go ffs rw,log filesystem:
	fssconfig fss0 /home /home/snap
>Fix:
	WAPBL transactions needs to be splitted. The patch below makes
	things better for me but it still panics on the second snapshot.
	There are other suspicious places, like snapshot_expunge() which seems
	to do a lot of things inside a single transaction (maybe
	we could start/end the transaction inside the loop instead?)
	I also get this same panic when rm'ing the snapshot file,
	from ufs_inactive().

Index: sys/ufs/ffs/ffs_snapshot.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_snapshot.c,v
retrieving revision 1.102.4.2
diff -u -p -u -r1.102.4.2 ffs_snapshot.c
--- sys/ufs/ffs/ffs_snapshot.c	12 Feb 2011 21:48:09 -0000	1.102.4.2
+++ sys/ufs/ffs/ffs_snapshot.c	14 Feb 2011 16:01:27 -0000
@@ -489,6 +489,12 @@ snapshot_setup(struct mount *mp, struct 
 		if (error)
 			goto out;
 		bawrite(nbp);
+		if ((loc % 16) == 0) {
+			UFS_WAPBL_END(mp);
+			error = UFS_WAPBL_BEGIN(mp);
+			if (error)
+				return error;
+		}
 	}

 out:
@@ -825,6 +831,12 @@ snapshot_writefs(struct mount *mp, struc
 		memcpy(bp->b_data, space, fs->fs_bsize);
 		space = (char *)space + fs->fs_bsize;
 		bawrite(bp);
+		if (((loc + 1) % 16) == 0) {
+			UFS_WAPBL_END(mp);
+			error = UFS_WAPBL_BEGIN(mp);
+			if (error)
+				return error;
+		}
 	}
 	if (error)
 		goto out;
@@ -892,6 +904,12 @@ cgaccount(struct vnode *vp, int passno, 
 		bawrite(nbp);
 		if (error)
 			break;
+		if (((cg + 1) % 16) == 0) {
+			UFS_WAPBL_END(vp->v_mount);
+			error = UFS_WAPBL_BEGIN(vp->v_mount);
+			if (error)
+				return error;
+		}
 	}
 	UFS_WAPBL_END(vp->v_mount);
 	return error;

>Release-Note:

>Audit-Trail:
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/44568: WAPBL doens't play nice with snapshots
Date: Mon, 14 Feb 2011 17:32:49 +0100

 Manuel,
 please append the output of: dumpfs /home | awk '/^file/, /^volname/'
 -- 
 Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/44568: WAPBL doens't play nice with snapshots
Date: Mon, 14 Feb 2011 18:06:29 +0100

 --/04w6evG8XlLl3ft
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Mon, Feb 14, 2011 at 04:35:10PM +0000, Juergen Hannken-Illjes wrote:
 > The following reply was made to PR kern/44568; it has been noted by GNATS.
 > 
 > From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: kern/44568: WAPBL doens't play nice with snapshots
 > Date: Mon, 14 Feb 2011 17:32:49 +0100
 > 
 >  Manuel,
 >  please append the output of: dumpfs /home | awk '/^file/, /^volname/'

 Sure, here it is

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

 --/04w6evG8XlLl3ft
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="dumpfs.out"

 file system: /dev/rwd0e
 format	FFSv1
 endian	little-endian
 magic	11954   	time	Mon Feb 14 18:03:22 2011
 superblock location	8192	id	[ 4d53e7ad 7189d21c ]
 cylgrp	dynamic	inodes	4.4BSD	sblock	FFSv2	fslevel 4
 nbfree	29644201	ndir	8	nifree	59530263	nffree	34
 ncg	2556	size	241108796	blocks	237346336
 bsize	16384	shift	14	mask	0xffffc000
 fsize	2048	shift	11	mask	0xfffff800
 frag	8	shift	3	fsbtodb	2
 bpg	11794	fpg	94352	ipg	23296
 minfree	5%	optim	time	maxcontig 4	maxbpg	4096
 symlinklen 60	contigsumsize 4
 maxfilesize 0x000400400402ffff
 nindir	4096	inopb	128
 avgfilesize 16384	avgfpdir 64
 sblkno	8	cblkno	16	iblkno	24	dblkno	1480
 sbsize	2048	cgsize	16384
 csaddr	1480	cssize	40960
 cgrotor	0	fmod	0	ronly	0	clean	0x02
 wapbl version 0x1	location 2	flags 0x0
 wapbl loc0 482333376	loc1 131072	loc2 512	loc3 3
 flags	wapbl 
 fsmnt	/home
 volname		swuid	0

 --/04w6evG8XlLl3ft--

Responsible-Changed-From-To: kern-bug-people->hannken
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Wed, 16 Feb 2011 10:16:31 +0000
Responsible-Changed-Why:
I'm working on a fix.


From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44568 CVS commit: src/sys/kern
Date: Wed, 16 Feb 2011 19:43:06 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Wed Feb 16 19:43:06 UTC 2011

 Modified Files:
 	src/sys/kern: vfs_wapbl.c

 Log Message:
 Set the limit for deallocations in one transaction to a more realistic
 (and much lower) value.  When flushing the log these deallocations will
 produce new blocks and that may execeed the journal size resulting in
 a "wapbl_flush: current transaction too big to flush" panic.
 Seen when removing a large snapshot.

 Adresses PR #44568 (WAPBL doens't play nice with snapshots).


 To generate a diff of this commit:
 cvs rdiff -u -r1.40 -r1.41 src/sys/kern/vfs_wapbl.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44568 CVS commit: src/sys/ufs/ffs
Date: Wed, 16 Feb 2011 19:43:50 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Wed Feb 16 19:43:50 UTC 2011

 Modified Files:
 	src/sys/ufs/ffs: ffs_snapshot.c

 Log Message:
 Refine the scope of WAPBL transactions so we should no longer get
 a "wapbl_flush: current transaction too big to flush" panic when
 creating or removing snapshots on larger logging disks.

 Adresses PR #44568 (WAPBL doens't play nice with snapshots).


 To generate a diff of this commit:
 cvs rdiff -u -r1.102 -r1.103 src/sys/ufs/ffs/ffs_snapshot.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: hannken@NetBSD.org, kern-bug-people@NetBSD.org, netbsd-bugs@NetBSD.org,
        gnats-admin@NetBSD.org
Subject: Re: kern/44568 (WAPBL doens't play nice with snapshots)
Date: Thu, 17 Feb 2011 17:49:13 +0100

 On Wed, Feb 16, 2011 at 10:16:32AM +0000, hannken@NetBSD.org wrote:
 > Synopsis: WAPBL doens't play nice with snapshots
 > 
 > Responsible-Changed-From-To: kern-bug-people->hannken
 > Responsible-Changed-By: hannken@NetBSD.org
 > Responsible-Changed-When: Wed, 16 Feb 2011 10:16:31 +0000
 > Responsible-Changed-Why:
 > I'm working on a fix.

 thanks for your fixes, I couldn't make my test system panic any more.
 But there's a strange thing:
 I've been doing a
 fssconfig -u fss0; rm /home/snaps/snap0; fssconfig fss0 /home /home/snaps/snap0
 in a loop, while a bonnie++ was running in another loop. And I see:
 /home: suspended 80.452 sec, redo 1217 of 2556
 /home: suspended 226.932 sec, redo 1177 of 2556
 /home: suspended 350.192 sec, redo 1206 of 2556
 /home: suspended 477.433 sec, redo 1205 of 2556
 /home: suspended 635.357 sec, redo 1208 of 2556
 /home: suspended 813.420 sec, redo 1201 of 2556

 any idea why is takes more and more time to take the snapshot ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: hannken@NetBSD.org, netbsd-bugs@NetBSD.org, gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/44568 (WAPBL doens't play nice with snapshots)
Date: Thu, 17 Feb 2011 18:06:22 +0100

 On Thu, Feb 17, 2011 at 04:50:05PM +0000, Manuel Bouyer wrote:
 >  thanks for your fixes, I couldn't make my test system panic any more.

 Well, I could. At reboot time I got the panic "transaction too big to flush"
 again:
 wapbl_flush
 wapbl_begin
 ufs_inactive
 VOP_INACTIVE
 vrelel
 ffs_snapshot_unmount
 ffs_flushfiles
 ffs_unmount

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/44568 (WAPBL doens't play nice with snapshots)
Date: Thu, 17 Feb 2011 19:01:01 +0100

 On Thu, Feb 17, 2011 at 05:50:05PM +0000, Manuel Bouyer wrote:
 > The following reply was made to PR kern/44568; it has been noted by GNATS.
 > 
 > From: Manuel Bouyer <bouyer@antioche.eu.org>
 > To: hannken@NetBSD.org, netbsd-bugs@NetBSD.org, gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/44568 (WAPBL doens't play nice with snapshots)
 > Date: Thu, 17 Feb 2011 18:06:22 +0100
 > 
 >  On Thu, Feb 17, 2011 at 04:50:05PM +0000, Manuel Bouyer wrote:
 >  >  thanks for your fixes, I couldn't make my test system panic any more.
 >  
 >  Well, I could. At reboot time I got the panic "transaction too big to flush"
 >  again:
 >  wapbl_flush
 >  wapbl_begin
 >  ufs_inactive
 >  VOP_INACTIVE
 >  vrelel
 >  ffs_snapshot_unmount
 >  ffs_flushfiles
 >  ffs_unmount

 Please try it with a smaller wl_dealloclim in file vfs_wapbl.c line 482:

 -	wl->wl_dealloclim = wl->wl_bufbytes_max / mp->mnt_stat.f_bsize / 2;
 +	wl->wl_dealloclim = wl->wl_bufbytes_max / mp->mnt_stat.f_bsize / 4;

 -- 
 Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44568 CVS commit: src/sys/ufs/ffs
Date: Fri, 18 Feb 2011 08:39:13 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Fri Feb 18 08:39:13 UTC 2011

 Modified Files:
 	src/sys/ufs/ffs: ffs_snapshot.c

 Log Message:
 Revert rev. 1.101.  Dead snapshots would hang around until unmount.

 Adresses PR #44568 (WAPBL doens't play nice with snapshots).


 To generate a diff of this commit:
 cvs rdiff -u -r1.103 -r1.104 src/sys/ufs/ffs/ffs_snapshot.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/44568: WAPBL doens't play nice with snapshots
Date: Mon, 21 Feb 2011 12:16:43 +0100

 Manuel,

 is there still an open problem regarding this PR or is it ok to close it?

 -- 
 Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: hannken@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/44568: WAPBL doens't play nice with snapshots
Date: Mon, 21 Feb 2011 23:47:16 +0100

 On Mon, Feb 21, 2011 at 11:20:06AM +0000, Juergen Hannken-Illjes wrote:
 > The following reply was made to PR kern/44568; it has been noted by GNATS.
 > 
 > From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: kern/44568: WAPBL doens't play nice with snapshots
 > Date: Mon, 21 Feb 2011 12:16:43 +0100
 > 
 >  Manuel,
 >  
 >  is there still an open problem regarding this PR or is it ok to close it?

 None that I know in HEAD. Do some of these fixes need to be pulled up to 
 netbsd-5 ?

 thanks !

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

State-Changed-From-To: open->closed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Tue, 22 Feb 2011 08:43:49 +0000
State-Changed-Why:
Fixed on head.
Maybe pullup later to netbsd-5.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.