NetBSD Problem Report #54504

From kardel@pip.kardel.name  Thu Aug 29 08:57:00 2019
Return-Path: <kardel@pip.kardel.name>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 0FDFE7A18A
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 29 Aug 2019 08:57:00 +0000 (UTC)
Message-Id: <20190829085650.D0E34DA0D90@pip.kardel.name>
Date: Thu, 29 Aug 2019 10:56:50 +0200 (CEST)
From: kardel@netbsd.org
Reply-To: kardel@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: -9/-current WAPBL panic: current transaction too big to flush
X-Send-Pr-Version: 3.95

>Number:         54504
>Category:       kern
>Synopsis:       panic: kernel diagnostic assertion "(wapbl_transaction_len(wl) <= (wl->wl_circ_size - wl->wl_reserved_bytes))" failed: file ".../src/sys/kern/vfs_wapbl.c", line 1271 wapbl_end: current transaction too big to flush
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    jdolecek
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Aug 29 09:00:00 +0000 2019
>Last-Modified:  Sat Aug 29 17:28:28 +0000 2020
>Originator:     Frank Kardel
>Release:        NetBSD 9.99.6
>Organization:

>Environment:


System: NetBSD pip.kardel.name 9.99.6 NetBSD 9.99.6 (PIPGEN) #20: Sat Aug 10 12:14:12 CEST 2019 kardel@:.../src/obj.amd64/sys/arch/amd64/compile/PIPGEN amd64
Architecture: x86_64
Machine: amd64
>Description:
	WAPBL panics when attempting to delete big files as the transaction log size is insufficient.
	On reboot the subsequent mount will also panic when attempting the re-play the log.
>How-To-Repeat:
	Create a big file like 190000000000 bytes and rm/unlink this file in a ffs file system mounted with -o log.
        watch the panic and subsequent panics when attempting to mount the file system again with -o log.
	The mount panic stack trace looks like this:
#0  0xffffffff80222bd5 in cpu_reboot ()
#1  0xffffffff80a36b16 in vpanic ()
#2  0xffffffff80e759eb in kern_assert ()
#3  0xffffffff80aa2326 in wapbl_end ()
#4  0xffffffff80983ddd in ufs_truncate_retry ()
#5  0xffffffff8098406b in ufs_inactive ()
#6  0xffffffff80aa91a1 in VOP_INACTIVE ()
#7  0xffffffff80a9ae5e in vrelel ()
#8  0xffffffff80950ede in ffs_wapbl_replay_finish ()
#9  0xffffffff80951297 in ffs_wapbl_start ()
#10 0xffffffff8094cf71 in ffs_mountfs ()
#11 0xffffffff8094d859 in ffs_mount ()
#12 0xffffffff80a8ff43 in VFS_MOUNT ()
#13 0xffffffff80a8cc41 in mount_domount ()
#14 0xffffffff80a93009 in do_sys_mount ()
#15 0xffffffff80a93677 in sys___mount50 ()
#16 0xffffffff80252ba8 in syscall ()
#17 0xffffffff802096dd in handle_syscall ()

>Fix:
	Workround (presumably): don't use WAPBL.
	Fix: Change WAPBL logic to allow truncation/deletion of big files aka avoid running into transaction size limitations.

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Thu, 29 Aug 2019 15:21:34 +0000
Responsible-Changed-Why:
I want to fix this.


From: Jason Thorpe <thorpej@me.com>
To: NetBSD Kernel Technical Discussion List <tech-kern@netbsd.org>
Cc: netbsd-bugs@netbsd.org,
 "gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>
Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too big
 to flush
Date: Thu, 29 Aug 2019 08:38:00 -0700

 > On Aug 29, 2019, at 2:00 AM, kardel@netbsd.org wrote:
 >=20
 > 	Workround (presumably): don't use WAPBL.
 > 	Fix: Change WAPBL logic to allow truncation/deletion of big =
 files aka avoid running into transaction size limitations.

 WAPBL is going to need some help with this -- but in all likelihood, the =
 WAPBL-using file system is going to have to participate, as well; i.e. =
 it's not just changes to WAPBL in order to solve this problem.

 The way some other file systems that I'm familiar with deal with this =
 situation is to move the file (or the file's extents / blocks) into a =
 "purgatory" (essentially, a hidden directory within the file system) -- =
 from the perspective of the original file, the operation is complete and =
 the file system consistent.  Then, a background task iterates on items =
 in the purgatory, operating in chunks that will fit within the =
 transaction size limitations.

 This sometimes means that space isn't freed immediately, but it works =
 pretty well in principle, and you can always add an escape valve that =
 boosts the priority of the purgatory task and blocks while it does its =
 work if you are desperate for space.

 -- thorpej

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too big
 to flush
Date: Fri, 30 Aug 2019 05:25:10 +0000

 On Thu, Aug 29, 2019 at 09:00:00AM +0000, kardel@netbsd.org wrote:
  > WAPBL panics when attempting to delete big files as the transaction
  > log size is insufficient. On reboot the subsequent mount will also
  > panic when attempting the re-play the log.

 I thought this had been fixed, or at least bodged around, some time
 ago. Has it regressed?

 -- 
 David A. Holland
 dholland@netbsd.org

From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too big
 to flush
Date: Fri, 30 Aug 2019 08:35:04 +0200

 I just tripped over it this week with 9.99.6 from Sat Aug 10 12:14:12 
 CEST 2019. I am not aware that this was an issue in the past, but I may 
 have missed that episode.

 Nevertheless it seems to be there.

 I can retry the test with with a more current -current if needed but I 
 do not right away recall any commits with respect to that since August 
 10th.

 Frank


 On 08/30/19 07:30, David Holland wrote:
 > The following reply was made to PR kern/54504; it has been noted by GNATS.
 >
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too big
 >   to flush
 > Date: Fri, 30 Aug 2019 05:25:10 +0000
 >
 >   On Thu, Aug 29, 2019 at 09:00:00AM +0000, kardel@netbsd.org wrote:
 >    > WAPBL panics when attempting to delete big files as the transaction
 >    > log size is insufficient. On reboot the subsequent mount will also
 >    > panic when attempting the re-play the log.
 >   
 >   I thought this had been fixed, or at least bodged around, some time
 >   ago. Has it regressed?
 >   
 >   --
 >   David A. Holland
 >   dholland@netbsd.org
 >   

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: kardel@netbsd.org
Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too big
 to flush
Date: Fri, 30 Aug 2019 08:50:18 +0200

 On Fri, Aug 30, 2019 at 06:40:01AM +0000, Frank Kardel wrote:
 >  I just tripped over it this week with 9.99.6 from Sat Aug 10 12:14:12 
 >  CEST 2019. I am not aware that this was an issue in the past, but I may 
 >  have missed that episode.
 >  
 >  Nevertheless it seems to be there.
 >  
 >  I can retry the test with with a more current -current if needed but I 
 >  do not right away recall any commits with respect to that since August 
 >  10th.

 There was a similar issue fixed quite some time ago, but nothing in this
 area changed since the netbsd-9 branch.

 Martin

From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: Frank Kardel <kardel@netbsd.org>
Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too big
 to flush
Date: Fri, 30 Aug 2019 08:57:56 +0200

 Le ven. 30 ao=C3=BBt 2019 =C3=A0 08:55, Martin Husemann <martin@duskware.de=
 > a =C3=A9crit :
 >
 > The following reply was made to PR kern/54504; it has been noted by GNATS=
 .
 >
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@netbsd.org
 > Cc: kardel@netbsd.org
 > Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too=
  big
 >  to flush
 > Date: Fri, 30 Aug 2019 08:50:18 +0200
 >
 >  On Fri, Aug 30, 2019 at 06:40:01AM +0000, Frank Kardel wrote:
 >  >  I just tripped over it this week with 9.99.6 from Sat Aug 10 12:14:12
 >  >  CEST 2019. I am not aware that this was an issue in the past, but I m=
 ay
 >  >  have missed that episode.
 >  >
 >  >  Nevertheless it seems to be there.
 >  >
 >  >  I can retry the test with with a more current -current if needed but =
 I
 >  >  do not right away recall any commits with respect to that since Augus=
 t
 >  >  10th.
 >
 >  There was a similar issue fixed quite some time ago, but nothing in this
 >  area changed since the netbsd-9 branch.

 Right. I've fixed what looks like exactly this problem at that time,
 need to look at why exactly that fix doesn't work any more.

 Jaromir

From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc: jdolecek@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	kardel@netbsd.org
Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too big
 to flush
Date: Fri, 30 Aug 2019 15:49:54 +0200

 On Fri, Aug 30, 2019 at 05:30:02AM +0000, David Holland wrote:
 > The following reply was made to PR kern/54504; it has been noted by GNATS.
 > 
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too big
 >  to flush
 > Date: Fri, 30 Aug 2019 05:25:10 +0000
 > 
 >  On Thu, Aug 29, 2019 at 09:00:00AM +0000, kardel@netbsd.org wrote:
 >   > WAPBL panics when attempting to delete big files as the transaction
 >   > log size is insufficient. On reboot the subsequent mount will also
 >   > panic when attempting the re-play the log.
 >  
 >  I thought this had been fixed, or at least bodged around, some time
 >  ago. Has it regressed?

 Not exactly. The issue we fixed last time was removal of huge sparse
 files. This case is different in that the log space necessary for
 removal is linear in the size of the filesystem.

 Joerg

From: Konrad Schroder <perseant@hhhh.org>
To: NetBSD bugtracking <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too big,
 to flush
Date: Mon, 3 Aug 2020 20:22:09 -0700

 This patch successfully avoids a crash on my test setup.

 I'm not sure I like polling wapbl_check.  I don't see, though, how WAPBL 
 could deal with this problem on its own without relaxing its guarantees 
 to the point that a fsck would be required at boot anyway; and I don't 
 see how FFS could estimate the number of metadata blocks to be written 
 well enough to avoid some kind of periodic checking, without in essence 
 running ufs_truncate twice.  It's perfectly likely that I'm missing 
 something, but if not, this seems the simplest and most reasonable solution.

 Thoughts?

Index: sys/kern/vfs_wapbl.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_wapbl.c,v
retrieving revision 1.108
diff -u -r1.108 vfs_wapbl.c
--- sys/kern/vfs_wapbl.c	12 Apr 2020 17:02:52 -0000	1.108
+++ sys/kern/vfs_wapbl.c	4 Aug 2020 03:29:59 -0000
@@ -312,9 +312,9 @@
 	.wo_wapbl_end		= wapbl_end,
 	.wo_wapbl_junlock_assert= wapbl_junlock_assert,
 	.wo_wapbl_jlock_assert	= wapbl_jlock_assert,
-
-	/* XXX: the following is only used to say "this is a wapbl buf" */
+	/* XXX: wp_wapbl_biodone is only used to say "this is a wapbl buf" */
 	.wo_wapbl_biodone	= wapbl_biodone,
+	.wo_wapbl_check		= wapbl_check,
 };

 SYSCTL_SETUP(wapbl_sysctl_init, "wapbl sysctl")
@@ -3420,6 +3420,19 @@
 	return 0;
 }

+int
+wapbl_check(struct wapbl *wl)
+{
+	int error = 0;
+
+	if (wapbl_transaction_len(wl) >
+		(wl->wl_circ_size - wl->wl_reserved_bytes) / 2) {
+		error = EAGAIN;
+	}
+
+	return error;
+}
+
 #ifdef _KERNEL

 MODULE(MODULE_CLASS_VFS, wapbl, NULL);
Index: sys/sys/mount.h
===================================================================
RCS file: /cvsroot/src/sys/sys/mount.h,v
retrieving revision 1.236
diff -u -r1.236 mount.h
--- sys/sys/mount.h	17 Jan 2020 20:08:10 -0000	1.236
+++ sys/sys/mount.h	4 Aug 2020 03:30:08 -0000
@@ -295,6 +295,7 @@
 	void (*wo_wapbl_junlock_assert)(struct wapbl *);
 	void (*wo_wapbl_jlock_assert)(struct wapbl *);
 	void (*wo_wapbl_biodone)(struct buf *);
+	int (*wo_wapbl_check)(struct wapbl *);
 };
 #define WAPBL_DISCARD(MP)						\
     (*(MP)->mnt_wapbl_op->wo_wapbl_discard)((MP)->mnt_wapbl)
Index: sys/sys/wapbl.h
===================================================================
RCS file: /cvsroot/src/sys/sys/wapbl.h,v
retrieving revision 1.21
diff -u -r1.21 wapbl.h
--- sys/sys/wapbl.h	10 Dec 2018 21:19:33 -0000	1.21
+++ sys/sys/wapbl.h	4 Aug 2020 03:30:10 -0000
@@ -139,6 +139,9 @@
 /* End a transaction or decrement the transaction recursion level */
 void	wapbl_end(struct wapbl *);

+/* Check to see whether the transaction is approaching full */
+int	wapbl_check(struct wapbl *);
+
 /*
  * Add a new buffer to the current transaction.  The buffers
  * data will be copied to the current transaction log and the
Index: sys/ufs/ffs/ffs_inode.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_inode.c,v
retrieving revision 1.130
diff -u -r1.130 ffs_inode.c
--- sys/ufs/ffs/ffs_inode.c	26 Jul 2020 00:21:24 -0000	1.130
+++ sys/ufs/ffs/ffs_inode.c	4 Aug 2020 03:30:12 -0000
@@ -495,6 +495,9 @@
 		}
 		if (lastiblock[level] >= 0)
 			goto done;
+		error = UFS_WAPBL_CHECK(oip->i_ump->um_mountp);
+		if (error)
+			goto out;
 	}

 	/*
@@ -756,6 +759,10 @@

 		BAP_ASSIGN(ip, i, 0);
 		*countp += nblocks;
+
+		error = UFS_WAPBL_CHECK(ip->i_ump->um_mountp);
+		if (error)
+			goto out;
 	}

 	/*
Index: sys/ufs/ufs/ufs_wapbl.h
===================================================================
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_wapbl.h,v
retrieving revision 1.19
diff -u -r1.19 ufs_wapbl.h
--- sys/ufs/ufs/ufs_wapbl.h	11 Apr 2020 17:43:54 -0000	1.19
+++ sys/ufs/ufs/ufs_wapbl.h	4 Aug 2020 03:30:12 -0000
@@ -94,6 +94,18 @@
 	return 0;
 }

+static __inline int
+ufs_wapbl_check(struct mount *mp)
+{
+	if (mp->mnt_wapbl) {
+		int error;
+		error = wapbl_check(mp->mnt_wapbl);
+		if (error)
+			return error;
+	}
+	return 0;
+}
+
 static __inline void
 ufs_wapbl_end(struct mount *mp)
 {
@@ -104,6 +116,7 @@

 #define	UFS_WAPBL_BEGIN(mp)						\
 	ufs_wapbl_begin(mp, __func__, __LINE__)
+#define	UFS_WAPBL_CHECK(mp) ufs_wapbl_check(mp)
 #define	UFS_WAPBL_END(mp) ufs_wapbl_end(mp)

 #define	UFS_WAPBL_UPDATE(vp, access, modify, flags)			\
@@ -145,6 +158,7 @@

 #else /* ! WAPBL */
 #define	UFS_WAPBL_BEGIN(mp) (__USE(mp), 0)
+#define	UFS_WAPBL_CHECK(mp) (__USE(mp), 0)
 #define	UFS_WAPBL_END(mp)	do { } while (0)
 #define	UFS_WAPBL_UPDATE(vp, access, modify, flags)	do { } while (0)
 #define	UFS_WAPBL_JLOCK_ASSERT(mp)
 -- 
 Konrad Schroder
 perseant@hhhh.org

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: jdolecek@netbsd.org, kardel@netbsd.org
Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too
 big, to flush
Date: Tue, 4 Aug 2020 05:51:08 +0200

 Your patch did no make it through unmangled - can you put it up somewhere
 and provide a URL?

 Martin

From: Konrad Schroder <perseant@hhhh.org>
To: NetBSD bugtracking <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too big,
 to flush
Date: Mon, 3 Aug 2020 22:00:19 -0700

 I've updated the problem report, and also placed a copy here:

 http://www.hhhh.org/perseant/ffs-wapbl-check.diff

 Thanks,

 -Konrad

 -- 
 Konrad Schroder
 perseant@hhhh.org

From: Konrad Schroder <perseant@hhhh.org>
To: NetBSD bugtracking <gnats-bugs@netbsd.org>
Cc: jdolecek@netbsd.org
Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too big,
 to flush
Date: Sun, 23 Aug 2020 15:41:02 -0700

 An updated patch, so that "build.sh release" succeeds (the definition of 
 wapbl_check() should be within "#idfef _KERNEL"):

 http://www.hhhh.org/perseant/ffs-wapbl-check-r2.diff

 Any objections to this before I commit?

 Thanks,

 -- 
 Konrad Schroder
 perseant@hhhh.org

From: Jaromir Dolecek <jaromir.dolecek@gmail.com>
To: gnats-bugs@netbsd.org
Cc: kardel@netbsd.org
Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too big, to flush
Date: Mon, 24 Aug 2020 08:32:04 +0200

 Let me chceck the detail first. It seemms to me the check is inherently not M=
 P safe, and thus not a hood final solution.  I would prefer if it properly b=
 ailed out if unallocation registration fails, as is done elsewhere, rather t=
 hen checking size before operation.=20

 Please dont commit this for now.  I should be again available later in the w=
 eek, I am currentlt on move so cant check.=20

 Jaromir

 > Le 24 ao=C3=BBt 2020 =C3=A0 00:45, Konrad Schroder <perseant@hhhh.org> a =C3=
 =A9crit :
 >=20
 > =EF=BB=BFThe following reply was made to PR kern/54504; it has been noted b=
 y GNATS.
 >=20
 > From: Konrad Schroder <perseant@hhhh.org>
 > To: NetBSD bugtracking <gnats-bugs@netbsd.org>
 > Cc: jdolecek@netbsd.org
 > Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too b=
 ig,
 > to flush
 > Date: Sun, 23 Aug 2020 15:41:02 -0700
 >=20
 > An updated patch, so that "build.sh release" succeeds (the definition of=20=

 > wapbl_check() should be within "#idfef _KERNEL"):
 >=20
 > http://www.hhhh.org/perseant/ffs-wapbl-check-r2.diff
 >=20
 > Any objections to this before I commit?
 >=20
 > Thanks,
 >=20
 > --=20
 > Konrad Schroder
 > perseant@hhhh.org
 >=20

From: Konrad Schroder <perseant@hhhh.org>
To: Jaromir Dolecek <jaromir.dolecek@gmail.com>
Cc: gnats-bugs@netbsd.org, kardel@netbsd.org
Subject: Re: kern/54504: -9/-current WAPBL panic: current transaction too big,
 to flush
Date: Fri, 28 Aug 2020 21:12:07 -0700

 > Let me chceck the detail first. It seemms to me the check is inherently not MP safe, and thus not a hood final solution.  I would prefer if it properly bailed out if unallocation registration fails, as is done elsewhere, rather then checking size before operation.
 >
 > Please dont commit this for now.  I should be again available later in the week, I am currentlt on move so cant check.
 >
 > Jaromir

 Okay, I won't commit until you give me the green light.  I've added the same mutex lock as is present in wapbl_begin(), and the result is at

 	http://www.hhhh.org/perseant/ffs-wapbl-check-r3.diff

 It would of course be possible to try to allocate space for all of the blocks, but managing the allocations for the other metadata (cylinder groups, in particular) sounds more difficult than it's worth.  If I understand correctly, to be conservative you'd need to allocate one CG bitmap block per data block freed, plus indirect blocks.  That's why I went for the much simpler, though admittedly crude, method of checking every once in a while (after every indirect block freed), while being very conservative about how much space is needed.

 Thanks,

 -Konrad

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.