NetBSD Problem Report #51601
From martin@duskware.de Sat Nov 5 11:50:42 2016
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 7FA657A279
for <gnats-bugs@gnats.NetBSD.org>; Sat, 5 Nov 2016 11:50:42 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: file system inconsistency and ffs_blkfree panic
X-Send-Pr-Version: 3.95
>Number: 51601
>Category: kern
>Synopsis: file system inconsistency and ffs_blkfree panic
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: jdolecek
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Nov 05 11:55:00 +0000 2016
>Closed-Date: Thu Nov 10 21:22:30 +0000 2016
>Last-Modified: Thu Nov 10 21:22:30 +0000 2016
>Originator: Martin Husemann
>Release: NetBSD 7.99.42
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD gemini.duskware.de 7.99.42 NetBSD 7.99.42 (GENERIC-$Revision: 1.369 $) #29: Fri Nov 4 13:18:37 CET 2016 martin@martins.aprisoft.de:/ssd/src/sys/arch/alpha/compile/GENERIC alpha
Architecture: alpha
Machine: alpha
>Description:
While running the atf tests, my alpha paniced and went into an endless reboot
loop:
Mounting all file systems...
Clearing temporary files.
panic: ffs_blkfree: bad size: dev = 0x1000, bno = 7394611223040414225 bsize = 16384, size = 16384, fs = /
After booting to single user and running fsck I got:
** Phase 1 - Check Blocks and Sizes
PARTIALLY TRUNCATED INODE I=994061
SALVAGE? yes
49040 DUP I=994061
49041 DUP I=994061
49042 DUP I=994061
49043 DUP I=994061
49044 DUP I=994061
49045 DUP I=994061
49046 DUP I=994061
49047 DUP I=994061
49048 DUP I=994061
49049 DUP I=994061
49050 DUP I=994061
EXCESSIVE DUP BLKS I=994061
CONTINUE? yes
INCORRECT BLOCK COUNT I=994061 (672 should be 776)
CORRECT? yes
** Phase 1b - Rescan For More DUPS
49040 DUP I=3348
49041 DUP I=3348
49042 DUP I=3348
49043 DUP I=3348
49044 DUP I=3348
49045 DUP I=3348
49046 DUP I=3348
49047 DUP I=3348
49048 DUP I=3350
49049 DUP I=3350
** Phase 2 - Check Pathnames
DUP/BAD I=994061 OWNER=0 MODE=100755
SIZE=2048000 MTIME=Nov 4 21:51 2016
FILE=/tmp/atf-run.Tekdh4/fsimage
REMOVE? yes
DUP/BAD I=3348 OWNER=0 MODE=100644
SIZE=1402755 MTIME=Apr 14 23:45 2015
FILE=/test-bed/logs/57_atf.raw
REMOVE? yes
DUP/BAD I=3350 OWNER=0 MODE=100644
SIZE=1349711 MTIME=Apr 14 23:45 2015
FILE=/test-bed/logs/57_atf.xml
REMOVE? yes
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
BAD/DUP FILE I=3348 OWNER=0 MODE=100644
SIZE=1402755 MTIME=Apr 14 23:45 2015
CLEAR? yes
BAD/DUP FILE I=3350 OWNER=0 MODE=100644
SIZE=1349711 MTIME=Apr 14 23:45 2015
CLEAR? yes
BAD/DUP FILE I=994061 OWNER=0 MODE=100755
SIZE=2048000 MTIME=Nov 4 21:51 2016
CLEAR? yes
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes
SUMMARY INFORMATION BAD
SALVAGE? yes
BLK(S) MISSING IN BIT MAPS
SALVAGE? yes
61733 files, 1995410 used, 2248669 free (22469 frags, 278275 blocks, 0.5% fragmentation)
MARK FILE SYSTEM CLEAN? yes
>How-To-Repeat:
s/a
>Fix:
n/a - maybe related to the WAPBL commit J. Hannken-Illjes pointed at?
(for PR kern/47146 and kern/49175)
>Release-Note:
>Audit-Trail:
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51601: file system inconsistency and ffs_blkfree panic
Date: Sat, 5 Nov 2016 13:44:56 +0100
Looks like this commit from Oct 28, 20:38:
Module Name: src
Committed By: jdolecek
Date: Fri Oct 28 20:38:12 UTC 2016
Modified Files:
src/sys/kern: vfs_wapbl.c
src/sys/sys: wapbl.h
src/sys/ufs/ffs: ffs_alloc.c ffs_inode.c ffs_snapshot.c
src/sys/ufs/ufs: ufs_extern.h ufs_inode.c ufs_rename.c ufs_vnops.c
ufs_wapbl.h
Log Message:
reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
succeed; change wapbl_register_deallocation() to return EAGAIN
rather than panic when code hits the limit
callers changed to either loop calling ffs_truncate() using new
utility ufs_truncate_retry() if their semantics requires it, or
just ignore the failure; remove ufs_wapbl_truncate()
this fixes possible user-triggerable panic during truncate, and
resolves WAPBL performance issue with truncates of large files
PR kern/47146 and kern/49175
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/51601 CVS commit: src/sys/ufs/ffs
Date: Mon, 7 Nov 2016 21:14:23 +0000
Module Name: src
Committed By: jdolecek
Date: Mon Nov 7 21:14:23 UTC 2016
Modified Files:
src/sys/ufs/ffs: ffs_inode.c
Log Message:
fix broken test for partial truncate, introduced in rev 1.118
PR kern/51601 kern/51602
To generate a diff of this commit:
cvs rdiff -u -r1.119 -r1.120 src/sys/ufs/ffs/ffs_inode.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Mon, 07 Nov 2016 21:22:42 +0000
Responsible-Changed-Why:
Likely related to my latest wapbl change.
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51601 (file system inconsistency and ffs_blkfree panic)
Date: Wed, 9 Nov 2016 18:14:23 +0000
On Mon, Nov 07, 2016 at 09:22:42PM +0000, jdolecek@NetBSD.org wrote:
> Synopsis: file system inconsistency and ffs_blkfree panic
>
> Responsible-Changed-From-To: kern-bug-people->jdolecek
> Responsible-Changed-By: jdolecek@NetBSD.org
> Responsible-Changed-When: Mon, 07 Nov 2016 21:22:42 +0000
> Responsible-Changed-Why:
> Likely related to my latest wapbl change.
Is this confirmed fixed yet? Because if not the wapbl changes should
be backed out until it is.
--
David A. Holland
dholland@netbsd.org
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: jdolecek@NetBSD.org
Subject: Re: kern/51601 (file system inconsistency and ffs_blkfree panic)
Date: Wed, 9 Nov 2016 19:34:53 +0100
On Wed, Nov 09, 2016 at 06:15:00PM +0000, David Holland wrote:
> Is this confirmed fixed yet? Because if not the wapbl changes should
> be backed out until it is.
Yes, the machine survived the next test run. Not sure if Jaromir wants
to keep it open for further changes; fine with me to just close it now.
Martin
(the other machine for #51602 is still midway in the test run, will reply
there when it is done)
State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Thu, 10 Nov 2016 21:15:57 +0000
State-Changed-Why:
Fixes committed to -current. Can you please confirm that the problem is fixed?
State-Changed-From-To: feedback->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Thu, 10 Nov 2016 21:22:30 +0000
State-Changed-Why:
Yes, works for me
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.