NetBSD Problem Report #51601

From martin@duskware.de  Sat Nov  5 11:50:42 2016
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 7FA657A279
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  5 Nov 2016 11:50:42 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: file system inconsistency and ffs_blkfree panic
X-Send-Pr-Version: 3.95

>Number:         51601
>Category:       kern
>Synopsis:       file system inconsistency and ffs_blkfree panic
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    jdolecek
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Nov 05 11:55:00 +0000 2016
>Closed-Date:    Thu Nov 10 21:22:30 +0000 2016
>Last-Modified:  Thu Nov 10 21:22:30 +0000 2016
>Originator:     Martin Husemann
>Release:        NetBSD 7.99.42
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD gemini.duskware.de 7.99.42 NetBSD 7.99.42 (GENERIC-$Revision: 1.369 $) #29: Fri Nov 4 13:18:37 CET 2016 martin@martins.aprisoft.de:/ssd/src/sys/arch/alpha/compile/GENERIC alpha
Architecture: alpha
Machine: alpha
>Description:

While running the atf tests, my alpha paniced and went into an endless reboot
loop:

Mounting all file systems...
Clearing temporary files.
panic: ffs_blkfree: bad size: dev = 0x1000, bno = 7394611223040414225 bsize = 16384, size = 16384, fs = /

After booting to single user and running fsck I got:

** Phase 1 - Check Blocks and Sizes
PARTIALLY TRUNCATED INODE I=994061
SALVAGE? yes

49040 DUP I=994061
49041 DUP I=994061
49042 DUP I=994061
49043 DUP I=994061
49044 DUP I=994061
49045 DUP I=994061
49046 DUP I=994061
49047 DUP I=994061
49048 DUP I=994061
49049 DUP I=994061
49050 DUP I=994061
EXCESSIVE DUP BLKS I=994061

CONTINUE? yes

INCORRECT BLOCK COUNT I=994061 (672 should be 776)
CORRECT? yes

** Phase 1b - Rescan For More DUPS
49040 DUP I=3348
49041 DUP I=3348
49042 DUP I=3348
49043 DUP I=3348
49044 DUP I=3348
49045 DUP I=3348
49046 DUP I=3348
49047 DUP I=3348
49048 DUP I=3350
49049 DUP I=3350
** Phase 2 - Check Pathnames
DUP/BAD  I=994061  OWNER=0 MODE=100755
SIZE=2048000 MTIME=Nov  4 21:51 2016  
FILE=/tmp/atf-run.Tekdh4/fsimage

REMOVE? yes

DUP/BAD  I=3348  OWNER=0 MODE=100644
SIZE=1402755 MTIME=Apr 14 23:45 2015  
FILE=/test-bed/logs/57_atf.raw

REMOVE? yes

DUP/BAD  I=3350  OWNER=0 MODE=100644
SIZE=1349711 MTIME=Apr 14 23:45 2015  
FILE=/test-bed/logs/57_atf.xml

REMOVE? yes

** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
BAD/DUP FILE I=3348  OWNER=0 MODE=100644
SIZE=1402755 MTIME=Apr 14 23:45 2015  
CLEAR? yes

BAD/DUP FILE I=3350  OWNER=0 MODE=100644
SIZE=1349711 MTIME=Apr 14 23:45 2015  
CLEAR? yes

BAD/DUP FILE I=994061  OWNER=0 MODE=100755
SIZE=2048000 MTIME=Nov  4 21:51 2016  
CLEAR? yes

** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes

SUMMARY INFORMATION BAD
SALVAGE? yes

BLK(S) MISSING IN BIT MAPS
SALVAGE? yes

61733 files, 1995410 used, 2248669 free (22469 frags, 278275 blocks, 0.5% fragmentation)

MARK FILE SYSTEM CLEAN? yes


>How-To-Repeat:
s/a

>Fix:
n/a - maybe related to the WAPBL commit J. Hannken-Illjes pointed at?
(for PR kern/47146 and kern/49175)

>Release-Note:

>Audit-Trail:
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51601: file system inconsistency and ffs_blkfree panic
Date: Sat, 5 Nov 2016 13:44:56 +0100

 Looks like this commit from Oct 28, 20:38:

 Module Name:	src
 Committed By:	jdolecek
 Date:		Fri Oct 28 20:38:12 UTC 2016

 Modified Files:
 	src/sys/kern: vfs_wapbl.c
 	src/sys/sys: wapbl.h
 	src/sys/ufs/ffs: ffs_alloc.c ffs_inode.c ffs_snapshot.c
 	src/sys/ufs/ufs: ufs_extern.h ufs_inode.c ufs_rename.c ufs_vnops.c
 	    ufs_wapbl.h

 Log Message:
 reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
 succeed; change wapbl_register_deallocation() to return EAGAIN
 rather than panic when code hits the limit

 callers changed to either loop calling ffs_truncate() using new
 utility ufs_truncate_retry() if their semantics requires it, or
 just ignore the failure; remove ufs_wapbl_truncate()

 this fixes possible user-triggerable panic during truncate, and
 resolves WAPBL performance issue with truncates of large files

 PR kern/47146 and kern/49175

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/51601 CVS commit: src/sys/ufs/ffs
Date: Mon, 7 Nov 2016 21:14:23 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Mon Nov  7 21:14:23 UTC 2016

 Modified Files:
 	src/sys/ufs/ffs: ffs_inode.c

 Log Message:
 fix broken test for partial truncate, introduced in rev 1.118

 PR kern/51601 kern/51602


 To generate a diff of this commit:
 cvs rdiff -u -r1.119 -r1.120 src/sys/ufs/ffs/ffs_inode.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Mon, 07 Nov 2016 21:22:42 +0000
Responsible-Changed-Why:
Likely related to my latest wapbl change.


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51601 (file system inconsistency and ffs_blkfree panic)
Date: Wed, 9 Nov 2016 18:14:23 +0000

 On Mon, Nov 07, 2016 at 09:22:42PM +0000, jdolecek@NetBSD.org wrote:
  > Synopsis: file system inconsistency and ffs_blkfree panic
  > 
  > Responsible-Changed-From-To: kern-bug-people->jdolecek
  > Responsible-Changed-By: jdolecek@NetBSD.org
  > Responsible-Changed-When: Mon, 07 Nov 2016 21:22:42 +0000
  > Responsible-Changed-Why:
  > Likely related to my latest wapbl change.

 Is this confirmed fixed yet? Because if not the wapbl changes should
 be backed out until it is.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: jdolecek@NetBSD.org
Subject: Re: kern/51601 (file system inconsistency and ffs_blkfree panic)
Date: Wed, 9 Nov 2016 19:34:53 +0100

 On Wed, Nov 09, 2016 at 06:15:00PM +0000, David Holland wrote:
 >  Is this confirmed fixed yet? Because if not the wapbl changes should
 >  be backed out until it is.

 Yes, the machine survived the next test run. Not sure if Jaromir wants
 to keep it open for further changes; fine with me to just close it now.

 Martin
 (the other machine for #51602 is still midway in the test run, will reply
 there when it is done)

State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Thu, 10 Nov 2016 21:15:57 +0000
State-Changed-Why:
Fixes committed to -current. Can you please confirm that the problem is fixed?


State-Changed-From-To: feedback->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Thu, 10 Nov 2016 21:22:30 +0000
State-Changed-Why:
Yes, works for me


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.