NetBSD Problem Report #57307
From manu@netbsd.org Wed Mar 29 07:51:03 2023
Return-Path: <manu@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 5599B1A9239
for <gnats-bugs@gnats.NetBSD.org>; Wed, 29 Mar 2023 07:51:03 +0000 (UTC)
Message-Id: <20230329075102.B34D184E69@mail.netbsd.org>
Date: Wed, 29 Mar 2023 07:51:02 +0000 (UTC)
From: manu@netbsd.org
Reply-To: manu@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: panic: ffs_blkfree: bad size
X-Send-Pr-Version: 3.95
>Number: 57307
>Category: kern
>Synopsis: panic: ffs_blkfree: bad size
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: chs
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Mar 29 07:55:00 +0000 2023
>Closed-Date: Sun May 14 00:36:03 +0000 2023
>Last-Modified: Sun May 14 00:36:03 +0000 2023
>Originator: Emmanuel Dreyfus
>Release: NetBSD 9.3
>Organization:
NetBSD
>Environment:
NetBSD 9.3 / i386, FFSv2 mounted with -o log
Architecture: i386
Machine: i386
>Description:
Taking snapshot on a FFSv2 filesystem with -o log causes a reproductible panic. After reboot, the machine will panic again when mounting the filesystem, until the problem is cleared using fsck.
Backtrace and fsck output are below.
Snapshot is created with
fss_flags = FSS_UNCONFIG_ON_CLOSE|unlink_on_create
Backing store is truncate()'ed to vfs.f_blocks * vfs.f_frsize which means the size of the partition, 14 To.
The panic is
panic: ffs_blkfree: bad size: dev = 0xa804, bno = 1 bsize = 32768, size = 32768, fs = /raid0
It happens in src/sys/ufs/ffs/ffs_alloc.c on
if ((u_int)size > fs->fs_bsize || ffs_fragoff(fs, size) != 0 ||
ffs_fragnum(fs, bno) + ffs_numfrags(fs, size) > fs->fs_frag)
Here we have three conditions:
1) size == fs->fs_bsize
2) ffs_fragoff is ((loc) & (fs)->fs_qfmask) but fs_qfmask seems only defined for FFSv1 so I expect it to be 0
3) ffs_fragnum is ((fsb) & ((fs)->fs_frag - 1))
ffs_numfrags is ((loc) >> (fs)->fs_fshift)
dumpfs says:
bsize 32768 shift 15 mask 0xffff8000
fsize 4096 shift 12 mask 0xfffff000
frag 8 shift 3 fsbtodb 3
Reading src/usr.sbin/dumpfs/dumpfs.c
fs->fs_frag = 8 hence ffs_fragnum(fs, bno) = 1 & 7 = 1
fd->fs_fshift = 12 hence ffs_numfrags(fs, size) = 32768 >> 12 = 8
The third condition turns into 1 + 8 > 8 and we panic. But I have no idea of what it means.
panic: ffs_blkfree: bad size: dev = 0xa804, bno = 1 bsize = 32768, size = 32768, fs = /raid0
cpu1: Begin traceback...
vpanic(c0573c9b,dd846c10,dd846c48,c0392968,c0573c9b,c0515df4,a804,0,1,0) at netbsd:vpanic+0x16a
snprintf(c0573c9b,c0515df4,a804,0,1,0,8000,8000,c56210d4,8000)at netbsd:snprintf
ffs_check_bad_allocation(1,0,8000,a804,0,28210501,0,c53e8df8,c5621000,c561c344)at netbsd:ffs_check_bad_allocation+0x97
ffs_blkfree(c5621000,c561c344,1,0,8000,28210501,0,100,fffe8008,c55bc940) at netbsd:ffs_blkfree+0x85
ffs_truncate(c03a309d,c6b819ac,0,0,0,ffffffff,23,c6db398c,c6b819ac,dd846e58) at netbsd:ffs_truncate+0xf8e
ufs_truncate_retry(c6b819ac,0,0,ffffffff,c55d7000,dd846e54,c6b819ac,c6b819ac,0,c74a5800) at netbsd:ufs_truncate_retry+0x42
ufs_inactive(dd846e58,20012,1020012,c55d7000,c0524714,c6b819ac,dd846e7f,c6b819ac,dd846e88,c042ae31) at netbsd:ufs_inactive+0x6e
VOP_INACTIVE(c6b819ac,dd846e7f,c879e780,5a16e0,c6b819ac,0,dd846eac,c03a5dc0,c6b819ac,c55d7000) at netbsd:VOP_INACTIVE+0x38
vrelel(c6b819ac,c55d7000,c6db398c,c6294968,c74a5800,c6b819ac,cb225000,dd846ed8,c043194b,dd846ec4) at netbsd:vrelel+0xf6
ufs_remove(dd846ec4,0,1000000,c55d7000,c052486c,c74a5800,c6b819ac,dd846f20,14,dd846f44) at netbsd:ufs_remove+0xae
VOP_REMOVE(c74a5800,c6b819ac,dd846f20,0,1,c8eb4480,0,c8eb4480,c9608000,c53674b8) at netbsd:VOP_REMOVE+0x3e
do_sys_unlinkat.isra.4(0,dd846f68,dd846f60,0,a,0,0,bfbfe248,25ac,bfbfef65) at netbsd:do_sys_unlinkat.isra.4+0xdc
ffs_check_bad_allocation(1,0,8000,a804,0,28210501,0,c53e8df8,c5621000,c561c344) at netbsd:ffs_check_bad_allocation+0x97
ffs_blkfree(c5621000,c561c344,1,0,8000,28210501,0,100,fffe8008,c55bc940) at netbsd:ffs_blkfree+0x85
ffs_truncate(c03a309d,c6b819ac,0,0,0,ffffffff,23,c6db398c,c6b819ac,dd846e58) at netbsd:ffs_truncate+0xf8e
ufs_truncate_retry(c6b819ac,0,0,ffffffff,c55d7000,dd846e54,c6b819ac,c6b819ac,0,c74a5800) at netbsd:ufs_truncate_retry+0x42
ufs_inactive(dd846e58,20012,1020012,c55d7000,c0524714,c6b819ac,dd846e7f,c6b819ac,dd846e88,c042ae31) at netbsd:ufs_inactive+0x6e
VOP_INACTIVE(c6b819ac,dd846e7f,c879e780,5a16e0,c6b819ac,0,dd846eac,c03a5dc0,c6b819ac,c55d7000) at netbsd:VOP_INACTIVE+0x38
vrelel(c6b819ac,c55d7000,c6db398c,c6294968,c74a5800,c6b819ac,cb225000,dd846ed8,c043194b,dd846ec4) at netbsd:vrelel+0xf6
ufs_remove(dd846ec4,0,1000000,c55d7000,c052486c,c74a5800,c6b819ac,dd846f20,14,dd846f44) at netbsd:ufs_remove+0xae
VOP_REMOVE(c74a5800,c6b819ac,dd846f20,0,1,c8eb4480,0,c8eb4480,c9608000,c53674b8) at netbsd:VOP_REMOVE+0x3e
do_sys_unlinkat.isra.4(0,dd846f68,dd846f60,0,a,0,0,bfbfe248,25ac,bfbfef65) at netbsd:do_sys_unlinkat.isra.4+0xdc
fsck -fy /dev/dk4
** /dev/rdk4
** File system is journaled; replaying journal
** Last Mounted on /raid0
** Phase 1 - Check Blocks and Sizes
1 DUP I=673252609
2 DUP I=673252609
3 DUP I=673252609
4 DUP I=673252609
5 DUP I=673252609
6 DUP I=673252609
7 DUP I=673252609
8 DUP I=673252609
** Phase 1b - Rescan For More DUPS
1 DUP I=673252609
2 DUP I=673252609
3 DUP I=673252609
4 DUP I=673252609
5 DUP I=673252609
6 DUP I=673252609
7 DUP I=673252609
8 DUP I=673252609
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
BAD/DUP FILE I=673252609 OWNER=0 MODE=100600
SIZE=12000138526728 MTIME=Mar 21 02:26 2023
CLEAR? yes
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes
SUMMARY INFORMATION BAD
SALVAGE? yes
BLK(S) MISSING IN BIT MAPS
SALVAGE? yes
618106 files, 760176660 used, 2124159866 free (28234 frags, 265516454
blocks, 0.0% fragmentation)
MARK FILE SYSTEM CLEAN? yes
***** FILE SYSTEM MARKED CLEAN *****
***** FILE SYSTEM WAS MODIFIED *****
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57307: panic: ffs_blkfree: bad size
Date: Wed, 29 Mar 2023 08:49:34 -0000 (UTC)
manu@netbsd.org writes:
>dumpfs says:
>bsize 32768 shift 15 mask 0xffff8000
>fsize 4096 shift 12 mask 0xfffff000
>frag 8 shift 3 fsbtodb 3
>Reading src/usr.sbin/dumpfs/dumpfs.c
>fs->fs_frag = 8 hence ffs_fragnum(fs, bno) = 1 & 7 = 1
>fd->fs_fshift = 12 hence ffs_numfrags(fs, size) = 32768 >> 12 = 8
>The third condition turns into 1 + 8 > 8 and we panic. But I have no idea of wh
at it means.
You have 32KB blocks split into 8 fragments of 4KB each.
Something wants to use a block starting from fragment 1 (offset 4096)
but with a size of 32KB, so that exceeds the block size.
>ffs_check_bad_allocation(1,0,8000,a804,0,28210501,0,c53e8df8,c5621000,c561c344)
at netbsd:ffs_check_bad_allocation+0x97
>ffs_blkfree(c5621000,c561c344,1,0,8000,28210501,0,100,fffe8008,c55bc940) at net
bsd:ffs_blkfree+0x85
>ffs_truncate(c03a309d,c6b819ac,0,0,0,ffffffff,23,c6db398c,c6b819ac,dd846e58) at
netbsd:ffs_truncate+0xf8e
There are several calls to ffs_blkfree in ffs_truncate. Can you
identify which one is at ffs_truncate+0xf8e ?
My guess is line 527 that should free 'all whole direct blocks or frags'.
for (i = UFS_NDADDR - 1; i > lastblock; i--) {
bn = ffs_getdb(fs, oip, i);
bsize = ffs_blksize(fs, oip, i);
...
ffs_blkfree(fs, oip->i_devvp, bn, bsize, oip->i_number);
...
}
which means there is a bad entry for a direct block. Only the last
entry (which is handled below this loop) is allowed to reference
a fragment. Everything here must be aligned to a full block (then
ffs_fragnum(fs, bno) == 0).
From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, manu@netbsd.org
Subject: Re: kern/57307: panic: ffs_blkfree: bad size
Date: Wed, 29 Mar 2023 09:20:04 +0000
On Wed, Mar 29, 2023 at 08:50:02AM +0000, Michael van Elst wrote:
> There are several calls to ffs_blkfree in ffs_truncate. Can you
> identify which one is at ffs_truncate+0xf8e ?
I must have upgraded the kernel since that time, the addresse does
not fit. 0xf8e = 3982, I guess this must be the second call:
0xc0398229 <ffs_truncate+3495>: call 0xc03939a5 <ffs_blkfree>
0xc039840b <ffs_truncate+3977>: call 0xc03939a5 <ffs_blkfree>
0xc03986b7 <ffs_truncate+4661>: call 0xc03939a5 <ffs_blkfree>
Which is your guess.
> My guess is line 527 that should free 'all whole direct blocks or frags'.
I tried a smaller backing store. At 64 Go it does not crash. I will test
next night to find the limit.
--
Emmanuel Dreyfus
manu@netbsd.org
From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, manu@netbsd.org
Subject: Re: kern/57307: panic: ffs_blkfree: bad size
Date: Wed, 29 Mar 2023 10:15:14 +0000
On Wed, Mar 29, 2023 at 09:25:02AM +0000, Emmanuel Dreyfus wrote:
> I tried a smaller backing store. At 64 Go it does not crash. I will test
> next night to find the limit.
It does not crash immediatly, but still does.
--
Emmanuel Dreyfus
manu@netbsd.org
From: Chuck Silvers <chuq@chuq.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57307: panic: ffs_blkfree: bad size
Date: Wed, 29 Mar 2023 07:31:06 -0700
On Wed, Mar 29, 2023 at 07:55:00AM +0000, manu@netbsd.org wrote:
> Snapshot is created with
> fss_flags = FSS_UNCONFIG_ON_CLOSE|unlink_on_create
> Backing store is truncate()'ed to vfs.f_blocks * vfs.f_frsize which means the size of the partition, 14 To.
could you please supply the source for the program you're using to create the snapshot?
> The panic is
> panic: ffs_blkfree: bad size: dev = 0xa804, bno = 1 bsize = 32768, size = 32768, fs = /raid0
"bno = 1" means that this is probably a UFS native snapshot inode,
which uses special bno values such as:
#define BLK_NOCOPY ((daddr_t)(1))
ffs_truncate() is supposed to call ffs_snapremove() to remove all of the special
bno values from the bmap before going into the normal loop to free real blocks,
but it looks like somehow that is not happening.
-Chuck
From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57307: panic: ffs_blkfree: bad size
Date: Wed, 29 Mar 2023 15:27:04 +0000
On Wed, Mar 29, 2023 at 02:35:01PM +0000, Chuck Silvers wrote:
> could you please supply the source for the program you're using to create the snapshot?
https://ftp.espci.fr/pub/lastfss/lastfss-0.2.tgz
The relevant parts:
unlink_on_create = 0;
(...)
if (statvfs(mount, &vfs) != 0)
err(EX_OSERR, "Cannot get info on %s", mount);
(...)
memset(&fs, 0, sizeof(fs));
(void)snprintf(backend, sizeof(backend),
"%s/fssbackend-XXXXX", backend_dir);
if ((tmpfd = mkstemp(backend)) == -1)
err(EX_OSERR, "cannot create backing store in %s", backend_dir);
if (truncate(backend, vfs.f_blocks * vfs.f_frsize) != 0)
err(EX_OSERR, "cannot resize %s to %" PRId64,
backend, vfs.f_blocks * vfs.f_frsize);
fs.fss_bstore = backend;
fs.fss_mount = mount;
fs.fss_flags = FSS_UNCONFIG_ON_CLOSE|unlink_on_create;
(...)
if (ioctl(fd, FSSIOCSET, &fs) != 0) {
>
>
> > The panic is
> > panic: ffs_blkfree: bad size: dev = 0xa804, bno = 1 bsize = 32768, size = 32768, fs = /raid0
>
> "bno = 1" means that this is probably a UFS native snapshot inode,
> which uses special bno values such as:
>
> #define BLK_NOCOPY ((daddr_t)(1))
>
> ffs_truncate() is supposed to call ffs_snapremove() to remove all of the special
> bno values from the bmap before going into the normal loop to free real blocks,
> but it looks like somehow that is not happening.
>
> -Chuck
>
--
Emmanuel Dreyfus
manu@netbsd.org
From: "Chuck Silvers" <chs@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57307 CVS commit: src/sys/ufs/ffs
Date: Thu, 11 May 2023 23:11:25 +0000
Module Name: src
Committed By: chs
Date: Thu May 11 23:11:25 UTC 2023
Modified Files:
src/sys/ufs/ffs: ffs_snapshot.c
Log Message:
ffs: apply the remaining ffs_snapshot.c part of this FreeBSD commit:
commit 364ed814e7285c8216d8a201d3ab3674eb34ce29
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Dec 9 21:24:00 2004 +0000
Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.
Submitted by: Henry Whincup <henry@jot.to>
MFC after: 1 week
all the other changes in that commit were applied previously by others:
- sborrill commmitted ffs_alloc.c rev 1.123 in 2009
- simonb committed ffs_alloc.c rev 1.110 in 2008
- the ffs_clusteralloc() part is not needed because we no longer have
that function.
fixes PR 57307
To generate a diff of this commit:
cvs rdiff -u -r1.154 -r1.155 src/sys/ufs/ffs/ffs_snapshot.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: kern-bug-people->chs
Responsible-Changed-By: chs@NetBSD.org
Responsible-Changed-When: Thu, 11 May 2023 23:15:48 +0000
Responsible-Changed-Why:
to me
State-Changed-From-To: open->closed
State-Changed-By: chs@NetBSD.org
State-Changed-When: Thu, 11 May 2023 23:15:48 +0000
State-Changed-Why:
confirmed fixed
State-Changed-From-To: closed->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Fri, 12 May 2023 11:18:31 +0000
State-Changed-Why:
unless I'm much mistaken this bug is old and needs pullups to all branches
From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57307 (panic: ffs_blkfree: bad size)
Date: Fri, 12 May 2023 12:21:58 +0000
On Fri, May 12, 2023 at 11:18:31AM +0000, riastradh@NetBSD.org wrote:
> unless I'm much mistaken this bug is old and needs pullups to all branches
I tested it on netbsd-9 with success.
--
Emmanuel Dreyfus
manu@netbsd.org
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57307 CVS commit: [netbsd-10] src/sys/ufs/ffs
Date: Sat, 13 May 2023 12:20:49 +0000
Module Name: src
Committed By: martin
Date: Sat May 13 12:20:49 UTC 2023
Modified Files:
src/sys/ufs/ffs [netbsd-10]: ffs_snapshot.c
Log Message:
Pull up following revision(s) (requested by chs in ticket #165):
sys/ufs/ffs/ffs_snapshot.c: revision 1.155
ffs: apply the remaining ffs_snapshot.c part of this FreeBSD commit:
commit 364ed814e7285c8216d8a201d3ab3674eb34ce29
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Dec 9 21:24:00 2004 +0000
Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.
Submitted by: Henry Whincup <henry@jot.to>
MFC after: 1 week
all the other changes in that commit were applied previously by others:
- sborrill commmitted ffs_alloc.c rev 1.123 in 2009
- simonb committed ffs_alloc.c rev 1.110 in 2008
- the ffs_clusteralloc() part is not needed because we no longer have
that function.
fixes PR 57307
To generate a diff of this commit:
cvs rdiff -u -r1.154 -r1.154.4.1 src/sys/ufs/ffs/ffs_snapshot.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57307 CVS commit: [netbsd-9] src/sys/ufs/ffs
Date: Sat, 13 May 2023 12:23:13 +0000
Module Name: src
Committed By: martin
Date: Sat May 13 12:23:13 UTC 2023
Modified Files:
src/sys/ufs/ffs [netbsd-9]: ffs_snapshot.c
Log Message:
Pull up following revision(s) (requested by chs in ticket #1633):
sys/ufs/ffs/ffs_snapshot.c: revision 1.155
ffs: apply the remaining ffs_snapshot.c part of this FreeBSD commit:
commit 364ed814e7285c8216d8a201d3ab3674eb34ce29
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Dec 9 21:24:00 2004 +0000
Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.
Submitted by: Henry Whincup <henry@jot.to>
MFC after: 1 week
all the other changes in that commit were applied previously by others:
- sborrill commmitted ffs_alloc.c rev 1.123 in 2009
- simonb committed ffs_alloc.c rev 1.110 in 2008
- the ffs_clusteralloc() part is not needed because we no longer have
that function.
fixes PR 57307
To generate a diff of this commit:
cvs rdiff -u -r1.149 -r1.149.14.1 src/sys/ufs/ffs/ffs_snapshot.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: needs-pullups->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 14 May 2023 00:36:03 +0000
State-Changed-Why:
pulled up
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.