NetBSD Problem Report #57307

From manu@netbsd.org  Wed Mar 29 07:51:03 2023
Return-Path: <manu@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 5599B1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 29 Mar 2023 07:51:03 +0000 (UTC)
Message-Id: <20230329075102.B34D184E69@mail.netbsd.org>
Date: Wed, 29 Mar 2023 07:51:02 +0000 (UTC)
From: manu@netbsd.org
Reply-To: manu@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: panic: ffs_blkfree: bad size
X-Send-Pr-Version: 3.95

>Number:         57307
>Category:       kern
>Synopsis:       panic: ffs_blkfree: bad size
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    chs
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Mar 29 07:55:00 +0000 2023
>Closed-Date:    Sun May 14 00:36:03 +0000 2023
>Last-Modified:  Sun May 14 00:36:03 +0000 2023
>Originator:     Emmanuel Dreyfus
>Release:        NetBSD 9.3
>Organization:
NetBSD
>Environment:
	NetBSD 9.3 / i386, FFSv2 mounted with -o log
Architecture: i386
Machine: i386
>Description:
Taking snapshot on a FFSv2 filesystem with -o log causes a reproductible panic. After reboot, the machine will panic again when mounting the filesystem, until the problem is cleared using fsck. 

Backtrace and fsck output are below.

Snapshot is created with 
fss_flags = FSS_UNCONFIG_ON_CLOSE|unlink_on_create
Backing store is truncate()'ed to  vfs.f_blocks * vfs.f_frsize which means the size of the partition, 14 To.

The panic is 
panic: ffs_blkfree: bad size: dev = 0xa804, bno = 1 bsize = 32768, size = 32768, fs = /raid0

It happens in src/sys/ufs/ffs/ffs_alloc.c on 
if ((u_int)size > fs->fs_bsize || ffs_fragoff(fs, size) != 0 ||
            ffs_fragnum(fs, bno) + ffs_numfrags(fs, size) > fs->fs_frag) 

Here we have three conditions:
1) size == fs->fs_bsize
2) ffs_fragoff is ((loc) & (fs)->fs_qfmask) but fs_qfmask seems only defined for FFSv1 so I expect it to be 0 
3) ffs_fragnum is ((fsb) & ((fs)->fs_frag - 1))
   ffs_numfrags is ((loc) >> (fs)->fs_fshift)
dumpfs says:
bsize   32768   shift   15      mask    0xffff8000
fsize   4096    shift   12      mask    0xfffff000
frag    8       shift   3       fsbtodb 3
Reading src/usr.sbin/dumpfs/dumpfs.c
fs->fs_frag = 8 hence ffs_fragnum(fs, bno) = 1 & 7 = 1
fd->fs_fshift = 12 hence ffs_numfrags(fs, size) = 32768 >> 12 = 8

The third condition turns into 1 + 8 > 8 and we panic. But I have no idea of what it means.


panic: ffs_blkfree: bad size: dev = 0xa804, bno = 1 bsize = 32768, size = 32768, fs = /raid0
cpu1: Begin traceback...
vpanic(c0573c9b,dd846c10,dd846c48,c0392968,c0573c9b,c0515df4,a804,0,1,0) at netbsd:vpanic+0x16a
snprintf(c0573c9b,c0515df4,a804,0,1,0,8000,8000,c56210d4,8000)at netbsd:snprintf
ffs_check_bad_allocation(1,0,8000,a804,0,28210501,0,c53e8df8,c5621000,c561c344)at netbsd:ffs_check_bad_allocation+0x97
ffs_blkfree(c5621000,c561c344,1,0,8000,28210501,0,100,fffe8008,c55bc940) at netbsd:ffs_blkfree+0x85
ffs_truncate(c03a309d,c6b819ac,0,0,0,ffffffff,23,c6db398c,c6b819ac,dd846e58) at netbsd:ffs_truncate+0xf8e
ufs_truncate_retry(c6b819ac,0,0,ffffffff,c55d7000,dd846e54,c6b819ac,c6b819ac,0,c74a5800) at netbsd:ufs_truncate_retry+0x42
ufs_inactive(dd846e58,20012,1020012,c55d7000,c0524714,c6b819ac,dd846e7f,c6b819ac,dd846e88,c042ae31) at netbsd:ufs_inactive+0x6e
VOP_INACTIVE(c6b819ac,dd846e7f,c879e780,5a16e0,c6b819ac,0,dd846eac,c03a5dc0,c6b819ac,c55d7000) at netbsd:VOP_INACTIVE+0x38
vrelel(c6b819ac,c55d7000,c6db398c,c6294968,c74a5800,c6b819ac,cb225000,dd846ed8,c043194b,dd846ec4) at netbsd:vrelel+0xf6
ufs_remove(dd846ec4,0,1000000,c55d7000,c052486c,c74a5800,c6b819ac,dd846f20,14,dd846f44) at netbsd:ufs_remove+0xae
VOP_REMOVE(c74a5800,c6b819ac,dd846f20,0,1,c8eb4480,0,c8eb4480,c9608000,c53674b8) at netbsd:VOP_REMOVE+0x3e
do_sys_unlinkat.isra.4(0,dd846f68,dd846f60,0,a,0,0,bfbfe248,25ac,bfbfef65) at netbsd:do_sys_unlinkat.isra.4+0xdc
ffs_check_bad_allocation(1,0,8000,a804,0,28210501,0,c53e8df8,c5621000,c561c344) at netbsd:ffs_check_bad_allocation+0x97
ffs_blkfree(c5621000,c561c344,1,0,8000,28210501,0,100,fffe8008,c55bc940) at netbsd:ffs_blkfree+0x85
ffs_truncate(c03a309d,c6b819ac,0,0,0,ffffffff,23,c6db398c,c6b819ac,dd846e58) at netbsd:ffs_truncate+0xf8e
ufs_truncate_retry(c6b819ac,0,0,ffffffff,c55d7000,dd846e54,c6b819ac,c6b819ac,0,c74a5800) at netbsd:ufs_truncate_retry+0x42
ufs_inactive(dd846e58,20012,1020012,c55d7000,c0524714,c6b819ac,dd846e7f,c6b819ac,dd846e88,c042ae31) at netbsd:ufs_inactive+0x6e
VOP_INACTIVE(c6b819ac,dd846e7f,c879e780,5a16e0,c6b819ac,0,dd846eac,c03a5dc0,c6b819ac,c55d7000) at netbsd:VOP_INACTIVE+0x38
vrelel(c6b819ac,c55d7000,c6db398c,c6294968,c74a5800,c6b819ac,cb225000,dd846ed8,c043194b,dd846ec4) at netbsd:vrelel+0xf6
ufs_remove(dd846ec4,0,1000000,c55d7000,c052486c,c74a5800,c6b819ac,dd846f20,14,dd846f44) at netbsd:ufs_remove+0xae
VOP_REMOVE(c74a5800,c6b819ac,dd846f20,0,1,c8eb4480,0,c8eb4480,c9608000,c53674b8) at netbsd:VOP_REMOVE+0x3e
do_sys_unlinkat.isra.4(0,dd846f68,dd846f60,0,a,0,0,bfbfe248,25ac,bfbfef65) at netbsd:do_sys_unlinkat.isra.4+0xdc


fsck -fy /dev/dk4
** /dev/rdk4
** File system is journaled; replaying journal
** Last Mounted on /raid0
** Phase 1 - Check Blocks and Sizes
1 DUP I=673252609
2 DUP I=673252609
3 DUP I=673252609
4 DUP I=673252609
5 DUP I=673252609
6 DUP I=673252609
7 DUP I=673252609
8 DUP I=673252609
** Phase 1b - Rescan For More DUPS
1 DUP I=673252609
2 DUP I=673252609
3 DUP I=673252609
4 DUP I=673252609
5 DUP I=673252609
6 DUP I=673252609
7 DUP I=673252609
8 DUP I=673252609
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
BAD/DUP FILE I=673252609 OWNER=0 MODE=100600
SIZE=12000138526728 MTIME=Mar 21 02:26 2023
CLEAR? yes

** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes

SUMMARY INFORMATION BAD
SALVAGE? yes

BLK(S) MISSING IN BIT MAPS
SALVAGE? yes

618106 files, 760176660 used, 2124159866 free (28234 frags, 265516454
blocks, 0.0% fragmentation)

MARK FILE SYSTEM CLEAN? yes


***** FILE SYSTEM MARKED CLEAN *****

***** FILE SYSTEM WAS MODIFIED *****
>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57307: panic: ffs_blkfree: bad size
Date: Wed, 29 Mar 2023 08:49:34 -0000 (UTC)

 manu@netbsd.org writes:

 >dumpfs says:
 >bsize   32768   shift   15      mask    0xffff8000
 >fsize   4096    shift   12      mask    0xfffff000
 >frag    8       shift   3       fsbtodb 3
 >Reading src/usr.sbin/dumpfs/dumpfs.c
 >fs->fs_frag = 8 hence ffs_fragnum(fs, bno) = 1 & 7 = 1
 >fd->fs_fshift = 12 hence ffs_numfrags(fs, size) = 32768 >> 12 = 8

 >The third condition turns into 1 + 8 > 8 and we panic. But I have no idea of wh
 at it means.

 You have 32KB blocks split into 8 fragments of 4KB each.

 Something wants to use a block starting from fragment 1 (offset 4096)
 but with a size of 32KB, so that exceeds the block size.



 >ffs_check_bad_allocation(1,0,8000,a804,0,28210501,0,c53e8df8,c5621000,c561c344)
 at netbsd:ffs_check_bad_allocation+0x97
 >ffs_blkfree(c5621000,c561c344,1,0,8000,28210501,0,100,fffe8008,c55bc940) at net
 bsd:ffs_blkfree+0x85
 >ffs_truncate(c03a309d,c6b819ac,0,0,0,ffffffff,23,c6db398c,c6b819ac,dd846e58) at
  netbsd:ffs_truncate+0xf8e


 There are several calls to ffs_blkfree in ffs_truncate. Can you
 identify which one is at ffs_truncate+0xf8e ?

 My guess is line 527 that should free 'all whole direct blocks or frags'.

 for (i = UFS_NDADDR - 1; i > lastblock; i--) {
         bn = ffs_getdb(fs, oip, i);
         bsize = ffs_blksize(fs, oip, i);
         ...
         ffs_blkfree(fs, oip->i_devvp, bn, bsize, oip->i_number);
         ...
 }

 which means there is a bad entry for a direct block. Only the last
 entry (which is handled below this loop) is allowed to reference
 a fragment. Everything here must be aligned to a full block (then
 ffs_fragnum(fs, bno) == 0).


From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, manu@netbsd.org
Subject: Re: kern/57307: panic: ffs_blkfree: bad size
Date: Wed, 29 Mar 2023 09:20:04 +0000

 On Wed, Mar 29, 2023 at 08:50:02AM +0000, Michael van Elst wrote:
 >  There are several calls to ffs_blkfree in ffs_truncate. Can you
 >  identify which one is at ffs_truncate+0xf8e ?

 I must have upgraded the kernel since that time, the addresse does
 not fit. 0xf8e = 3982, I guess this must be the second call:
    0xc0398229 <ffs_truncate+3495>:      call   0xc03939a5 <ffs_blkfree>
    0xc039840b <ffs_truncate+3977>:      call   0xc03939a5 <ffs_blkfree>
    0xc03986b7 <ffs_truncate+4661>:      call   0xc03939a5 <ffs_blkfree>

 Which is your guess.

 >  My guess is line 527 that should free 'all whole direct blocks or frags'.

 I tried a smaller backing store. At 64 Go it does not crash. I will test
 next night to find the limit.

 -- 
 Emmanuel Dreyfus
 manu@netbsd.org

From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, manu@netbsd.org
Subject: Re: kern/57307: panic: ffs_blkfree: bad size
Date: Wed, 29 Mar 2023 10:15:14 +0000

 On Wed, Mar 29, 2023 at 09:25:02AM +0000, Emmanuel Dreyfus wrote:
 >  I tried a smaller backing store. At 64 Go it does not crash. I will test
 >  next night to find the limit.

 It does not crash immediatly, but still does.

 -- 
 Emmanuel Dreyfus
 manu@netbsd.org

From: Chuck Silvers <chuq@chuq.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57307: panic: ffs_blkfree: bad size
Date: Wed, 29 Mar 2023 07:31:06 -0700

 On Wed, Mar 29, 2023 at 07:55:00AM +0000, manu@netbsd.org wrote:
 > Snapshot is created with 
 > fss_flags = FSS_UNCONFIG_ON_CLOSE|unlink_on_create
 > Backing store is truncate()'ed to  vfs.f_blocks * vfs.f_frsize which means the size of the partition, 14 To.

 could you please supply the source for the program you're using to create the snapshot?


 > The panic is 
 > panic: ffs_blkfree: bad size: dev = 0xa804, bno = 1 bsize = 32768, size = 32768, fs = /raid0

 "bno = 1" means that this is probably a UFS native snapshot inode,
 which uses special bno values such as:

 #define	BLK_NOCOPY	((daddr_t)(1))

 ffs_truncate() is supposed to call ffs_snapremove() to remove all of the special
 bno values from the bmap before going into the normal loop to free real blocks,
 but it looks like somehow that is not happening.

 -Chuck

From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57307: panic: ffs_blkfree: bad size
Date: Wed, 29 Mar 2023 15:27:04 +0000

 On Wed, Mar 29, 2023 at 02:35:01PM +0000, Chuck Silvers wrote:
 >  could you please supply the source for the program you're using to create the snapshot?

 https://ftp.espci.fr/pub/lastfss/lastfss-0.2.tgz

 The relevant parts:
                 unlink_on_create = 0;
 (...)
         if (statvfs(mount, &vfs) != 0)
                 err(EX_OSERR, "Cannot get info on %s", mount);
 (...)
         memset(&fs, 0, sizeof(fs));

         (void)snprintf(backend, sizeof(backend),
                        "%s/fssbackend-XXXXX", backend_dir);
         if ((tmpfd = mkstemp(backend)) == -1)
                 err(EX_OSERR, "cannot create backing store in %s", backend_dir);

         if (truncate(backend, vfs.f_blocks * vfs.f_frsize) != 0)
                 err(EX_OSERR, "cannot resize %s to %" PRId64,
                               backend, vfs.f_blocks * vfs.f_frsize);

         fs.fss_bstore = backend;
         fs.fss_mount = mount;
         fs.fss_flags = FSS_UNCONFIG_ON_CLOSE|unlink_on_create;

 (...)
         if (ioctl(fd, FSSIOCSET, &fs) != 0) {

 >  
 >  
 >  > The panic is 
 >  > panic: ffs_blkfree: bad size: dev = 0xa804, bno = 1 bsize = 32768, size = 32768, fs = /raid0
 >  
 >  "bno = 1" means that this is probably a UFS native snapshot inode,
 >  which uses special bno values such as:
 >  
 >  #define	BLK_NOCOPY	((daddr_t)(1))
 >  
 >  ffs_truncate() is supposed to call ffs_snapremove() to remove all of the special
 >  bno values from the bmap before going into the normal loop to free real blocks,
 >  but it looks like somehow that is not happening.
 >  
 >  -Chuck
 >  

 -- 
 Emmanuel Dreyfus
 manu@netbsd.org

From: "Chuck Silvers" <chs@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57307 CVS commit: src/sys/ufs/ffs
Date: Thu, 11 May 2023 23:11:25 +0000

 Module Name:	src
 Committed By:	chs
 Date:		Thu May 11 23:11:25 UTC 2023

 Modified Files:
 	src/sys/ufs/ffs: ffs_snapshot.c

 Log Message:
 ffs: apply the remaining ffs_snapshot.c part of this FreeBSD commit:

   commit 364ed814e7285c8216d8a201d3ab3674eb34ce29
   Author: Kirk McKusick <mckusick@FreeBSD.org>
   Date:   Thu Dec 9 21:24:00 2004 +0000

     Fixes a bug that caused UFS2 filesystems bigger than 2TB to
     prematurely report that they were full and/or to panic the kernel
     with the message ``ffs_clusteralloc: allocated out of group''.

     Submitted by:   Henry Whincup <henry@jot.to>
     MFC after:      1 week

 all the other changes in that commit were applied previously by others:
  - sborrill commmitted ffs_alloc.c rev 1.123 in 2009
  - simonb committed ffs_alloc.c rev 1.110 in 2008
  - the ffs_clusteralloc() part is not needed because we no longer have
    that function.

 fixes PR 57307


 To generate a diff of this commit:
 cvs rdiff -u -r1.154 -r1.155 src/sys/ufs/ffs/ffs_snapshot.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: kern-bug-people->chs
Responsible-Changed-By: chs@NetBSD.org
Responsible-Changed-When: Thu, 11 May 2023 23:15:48 +0000
Responsible-Changed-Why:
to me


State-Changed-From-To: open->closed
State-Changed-By: chs@NetBSD.org
State-Changed-When: Thu, 11 May 2023 23:15:48 +0000
State-Changed-Why:
confirmed fixed


State-Changed-From-To: closed->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Fri, 12 May 2023 11:18:31 +0000
State-Changed-Why:
unless I'm much mistaken this bug is old and needs pullups to all branches


From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57307 (panic: ffs_blkfree: bad size)
Date: Fri, 12 May 2023 12:21:58 +0000

 On Fri, May 12, 2023 at 11:18:31AM +0000, riastradh@NetBSD.org wrote:
 > unless I'm much mistaken this bug is old and needs pullups to all branches

 I tested it on netbsd-9 with success.

 -- 
 Emmanuel Dreyfus
 manu@netbsd.org

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57307 CVS commit: [netbsd-10] src/sys/ufs/ffs
Date: Sat, 13 May 2023 12:20:49 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sat May 13 12:20:49 UTC 2023

 Modified Files:
 	src/sys/ufs/ffs [netbsd-10]: ffs_snapshot.c

 Log Message:
 Pull up following revision(s) (requested by chs in ticket #165):

 	sys/ufs/ffs/ffs_snapshot.c: revision 1.155

 ffs: apply the remaining ffs_snapshot.c part of this FreeBSD commit:
   commit 364ed814e7285c8216d8a201d3ab3674eb34ce29
   Author: Kirk McKusick <mckusick@FreeBSD.org>
   Date:   Thu Dec 9 21:24:00 2004 +0000
     Fixes a bug that caused UFS2 filesystems bigger than 2TB to
     prematurely report that they were full and/or to panic the kernel
     with the message ``ffs_clusteralloc: allocated out of group''.
     Submitted by:   Henry Whincup <henry@jot.to>
     MFC after:      1 week

 all the other changes in that commit were applied previously by others:
  - sborrill commmitted ffs_alloc.c rev 1.123 in 2009
  - simonb committed ffs_alloc.c rev 1.110 in 2008
  - the ffs_clusteralloc() part is not needed because we no longer have
    that function.

 fixes PR 57307


 To generate a diff of this commit:
 cvs rdiff -u -r1.154 -r1.154.4.1 src/sys/ufs/ffs/ffs_snapshot.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57307 CVS commit: [netbsd-9] src/sys/ufs/ffs
Date: Sat, 13 May 2023 12:23:13 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sat May 13 12:23:13 UTC 2023

 Modified Files:
 	src/sys/ufs/ffs [netbsd-9]: ffs_snapshot.c

 Log Message:
 Pull up following revision(s) (requested by chs in ticket #1633):

 	sys/ufs/ffs/ffs_snapshot.c: revision 1.155

 ffs: apply the remaining ffs_snapshot.c part of this FreeBSD commit:
   commit 364ed814e7285c8216d8a201d3ab3674eb34ce29
   Author: Kirk McKusick <mckusick@FreeBSD.org>
   Date:   Thu Dec 9 21:24:00 2004 +0000
     Fixes a bug that caused UFS2 filesystems bigger than 2TB to
     prematurely report that they were full and/or to panic the kernel
     with the message ``ffs_clusteralloc: allocated out of group''.
     Submitted by:   Henry Whincup <henry@jot.to>
     MFC after:      1 week

 all the other changes in that commit were applied previously by others:
  - sborrill commmitted ffs_alloc.c rev 1.123 in 2009
  - simonb committed ffs_alloc.c rev 1.110 in 2008
  - the ffs_clusteralloc() part is not needed because we no longer have
    that function.

 fixes PR 57307


 To generate a diff of this commit:
 cvs rdiff -u -r1.149 -r1.149.14.1 src/sys/ufs/ffs/ffs_snapshot.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: needs-pullups->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 14 May 2023 00:36:03 +0000
State-Changed-Why:
pulled up


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.