NetBSD Problem Report #56421

From Manuel.Bouyer@lip6.fr  Wed Sep 29 12:33:43 2021
Return-Path: <Manuel.Bouyer@lip6.fr>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 97FC61A921F
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 29 Sep 2021 12:33:43 +0000 (UTC)
Message-Id: <20210929123327.E19616B02@armandeche.soc.lip6.fr>
Date: Wed, 29 Sep 2021 14:33:27 +0200 (MEST)
From: Manuel.Bouyer@lip6.fr
Reply-To: Manuel.Bouyer@lip6.fr
To: gnats-bugs@NetBSD.org
Subject: panic: ffs_blkfree: bad size, fsck doens't fix it
X-Send-Pr-Version: 3.95

>Number:         56421
>Category:       kern
>Synopsis:       panic: ffs_blkfree: bad size, fsck doens't fix it
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Sep 29 12:35:00 +0000 2021
>Last-Modified:  Fri Oct 01 05:36:51 +0000 2021
>Originator:     Manuel Bouyer
>Release:        NetBSD 9.2_STABLE
>Organization:
>Environment:
System: NetBSD armandeche.soc.lip6.fr 9.2_STABLE NetBSD 9.2_STABLE (GENERIC) #0: Thu Sep 23 10:13:28 UTC 2021 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
	I was running pbulk on a WAPBL-enabled filesystem. I ended up with a
	stuck, unkillable rm command: 100% CPU, doens't react to kill or
	kill -9, ktrace -p shows no activity. I guess it was stuck in a loop
	in the kernel. This was with an older 9.0_STABLE kernel so maybe this
	specific issue has been fixed since then. 

	After a power cycle, restarting pbulk would panic the kernel with:
[   233.080282] panic: ffs_blkfree: bad size: dev = 0x14, bno = 54274495 bsize =
 32768, size = 28672, fs = /local/armandeche2
[   233.080282] cpu0: Begin traceback...
[   233.080282] vpanic() at netbsd:vpanic+0x160
[   233.080282] snprintf() at netbsd:snprintf
[   233.080282] ffs_mapsearch() at netbsd:ffs_mapsearch
[   233.080282] ffs_blkfree() at netbsd:ffs_blkfree+0x82
[   233.080282] ffs_truncate() at netbsd:ffs_truncate+0xb7e
[   233.080282] ufs_rmdir() at netbsd:ufs_rmdir+0x276
[   233.080282] VOP_RMDIR() at netbsd:VOP_RMDIR+0x50
[   233.080282] do_sys_unlinkat.isra.6() at netbsd:do_sys_unlinkat.isra.6+0x1a9
[   233.090286] syscall() at netbsd:syscall+0x157
[   233.090286] --- syscall (number 137) ---

	(this one was with WAPBL disabled; the stack trace was sighly different
	with WAPBL). This is 100% reproductible. The issue here is that
	fsck doens't find any problem with the fileystem:
fsck -fy /local/armandeche2
** /dev/rwd1e
** File system is already clean
** Last Mounted on /local/armandeche2
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
7464502 files, 77168922 used, 43969433 free (216449 frags, 5469123 blocks, 0.2% fragmentation)

	So the filesystem remains in an unstable state.

>How-To-Repeat:
	Not sure, it seems to be a "bad luck" issue
>Fix:
	unknown

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: gnats-admin->kern-bug-people
Responsible-Changed-By: spz@NetBSD.org
Responsible-Changed-When: Fri, 01 Oct 2021 05:36:51 +0000
Responsible-Changed-Why:
kern issue


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.