NetBSD Problem Report #51602
From martin@duskware.de Sat Nov 5 12:22:56 2016
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id BD5D27A279
for <gnats-bugs@gnats.NetBSD.org>; Sat, 5 Nov 2016 12:22:56 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: file system inconsistency and panic in ffs_blkfree_common
X-Send-Pr-Version: 3.95
>Number: 51602
>Category: kern
>Synopsis: file system inconsistency and panic in ffs_blkfree_common
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: jdolecek
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Nov 05 12:25:00 +0000 2016
>Closed-Date: Thu Nov 10 21:22:49 +0000 2016
>Last-Modified: Thu Nov 10 21:22:49 +0000 2016
>Originator: Martin Husemann
>Release: NetBSD 7.99.42
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD night-rest.duskware.de 7.99.42 NetBSD 7.99.42 (GENERIC) #25: Fri Nov 4 13:34:09 CET 2016 martin@martins.aprisoft.de:/ssd/src/sys/arch/shark/compile/GENERIC shark
Architecture: earmv4
Machine: shark
>Description:
While running the atf tests, my shark paniced:
panic: ffs_blkfree_common: freeing free block: dev = 0x1000, block = 24512, fs =
/
0xf6119a54: netbsd:vpanic+0xc
0xf6119a6c: netbsd:snprintf
0xf6119ae4: netbsd:ffs_blkfree_common.isra.3+0x340
0xf6119b3c: netbsd:ffs_blkfree_cg+0x154
0xf6119bcc: netbsd:ffs_indirtrunc+0x4e8
0xf6119d2c: netbsd:ffs_truncate+0xda8
0xf6119d6c: netbsd:ufs_truncate_retry+0x9c
0xf6119d9c: netbsd:ufs_inactive+0x16c
0xf6119dbc: netbsd:VOP_INACTIVE+0x30
0xf6119df4: netbsd:vrelel+0x250
0xf6119e2c: netbsd:ufs_remove+0xc8
0xf6119e4c: netbsd:VOP_REMOVE+0x34
0xf6119ebc: netbsd:do_sys_unlinkat+0xe8
0xf6119ed4: netbsd:sys_unlink+0x28
0xf6119f4c: netbsd:syscall+0x9c
After rebooting to single user, fsck said:
** /dev/rwd0a
** File system is journaled; replaying journal
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
PARTIALLY TRUNCATED INODE I=10646308
SALVAGE? yes
405911018 BAD I=10646308
403082449 BAD I=10646308
1860989214 BAD I=10646308
3317784935 BAD I=10646308
4203596339 BAD I=10646308
4023974793 BAD I=10646308
2016968809 BAD I=10646308
242868444 BAD I=10646308
4134308919 BAD I=10646308
2506537194 BAD I=10646308
1620161144 BAD I=10646308
EXCESSIVE BAD BLKS I=10646308
CONTINUE? yes
INCORRECT BLOCK COUNT I=10646308 (672 should be 640)
CORRECT? yes
[...]
CANNOT WRITE: BLK 3513131968
CONTINUE? yes
THE FOLLOWING SECTORS COULD NOT BE WRITTEN: 3513131968, 3513131969, 3513131970, 3513131971, 3513131972, 3513131973, 3513131974, 3513131975, 3513131976, 3513131977, 3513131978, 3513131979, 3513131980, 3513131981, 3513131982, 3513131983, 3513131984, 3513131985, 3513131986, 3513131987, 3513131988, 3513131989, 3513131990, 3513131991, 3513131992, 3513131993, 3513131994, 3513131995, 3513131996, 3513131997, 3513131998, 3513131999,
[lots of more similar out of range blocks]
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes
SUMMARY INFORMATION BAD
SALVAGE? yes
BLK(S) MISSING IN BIT MAPS
SALVAGE? yes
79982 files, 1934491 used, 54729164 free (18796 frags, 6838796 blocks, 0.0% fragmentation)
MARK FILE SYSTEM CLEAN? yes
Disklabel is:
8 partitions:
# size offset fstype [fsize bsize cpg/sgs]
a: 230246352 0 4.2BSD 2048 16384 0 # (Cyl. 0 - 228418)
b: 4195296 230246352 swap # (Cyl. 228419 - 232580)
c: 234441648 0 unused 0 0 # (Cyl. 0 - 232580)
Dumpfs output is:
file system: /dev/rwd0c
format FFSv1
endian little-endian
magic 11954 time Sat Nov 5 13:11:26 2016
superblock location 8192 id [ 4c4ad59b 16a2beb3 ]
cylgrp dynamic inodes 4.4BSD sblock FFSv2 fslevel 4
nbfree 6838796 ndir 9392 nifree 14130576 nffree 18796
ncg 610 size 57561588 blocks 56663655
bsize 16384 shift 14 mask 0xffffc000
fsize 2048 shift 11 mask 0xfffff800
frag 8 shift 3 fsbtodb 2
bpg 11796 fpg 94368 ipg 23296
minfree 5% optim time maxcontig 4 maxbpg 4096
symlinklen 60 contigsumsize 4
maxfilesize 0x000400400402ffff
nindir 4096 inopb 128
avgfilesize 16384 avgfpdir 64
sblkno 8 cblkno 16 iblkno 24 dblkno 1480
sbsize 2048 cgsize 16384
csaddr 1480 cssize 10240
cgrotor 0 fmod 0 ronly 0 clean 0x01
wapbl version 0x1 location 2 flags 0x0
wapbl loc0 115134912 loc1 131072 loc2 512 loc3 3
flags none
fsmnt /
volname swuid 0
>How-To-Repeat:
s/a
>Fix:
n/a
>Release-Note:
>Audit-Trail:
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51602: file system inconsistency and panic in ffs_blkfree_common
Date: Sat, 5 Nov 2016 13:44:55 +0100
Looks like this commit from Oct 28, 20:38:
Module Name: src
Committed By: jdolecek
Date: Fri Oct 28 20:38:12 UTC 2016
Modified Files:
src/sys/kern: vfs_wapbl.c
src/sys/sys: wapbl.h
src/sys/ufs/ffs: ffs_alloc.c ffs_inode.c ffs_snapshot.c
src/sys/ufs/ufs: ufs_extern.h ufs_inode.c ufs_rename.c ufs_vnops.c
ufs_wapbl.h
Log Message:
reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
succeed; change wapbl_register_deallocation() to return EAGAIN
rather than panic when code hits the limit
callers changed to either loop calling ffs_truncate() using new
utility ufs_truncate_retry() if their semantics requires it, or
just ignore the failure; remove ufs_wapbl_truncate()
this fixes possible user-triggerable panic during truncate, and
resolves WAPBL performance issue with truncates of large files
PR kern/47146 and kern/49175
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51602: file system inconsistency and panic in
ffs_blkfree_common
Date: Sat, 5 Nov 2016 19:58:07 +0000
On Sat, Nov 05, 2016 at 12:50:01PM +0000, J. Hannken-Illjes wrote:
> Looks like this commit from Oct 28, 20:38:
> [the wapbl fix stuff]
oh dear :(
--
David A. Holland
dholland@netbsd.org
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51602: file system inconsistency and panic in ffs_blkfree_common
Date: Sat, 5 Nov 2016 21:24:16 +0100
> On 05 Nov 2016, at 21:00, David Holland <dholland-bugs@netbsd.org> wrote:
>
> The following reply was made to PR kern/51602; it has been noted by GNATS.
>
> From: David Holland <dholland-bugs@netbsd.org>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/51602: file system inconsistency and panic in
> ffs_blkfree_common
> Date: Sat, 5 Nov 2016 19:58:07 +0000
>
> On Sat, Nov 05, 2016 at 12:50:01PM +0000, J. Hannken-Illjes wrote:
>> Looks like this commit from Oct 28, 20:38:
>> [the wapbl fix stuff]
>
> oh dear :(
I'm working on some fixes ...
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
Martin Husemann <martin@netbsd.org>
Subject: Re: kern/51602: file system inconsistency and panic in ffs_blkfree_common
Date: Sat, 5 Nov 2016 21:25:37 +0100
Revieweing the code again, I see one missing BAP_ASSIGN() in the code
for the last partial block. That one would definitely explain the
failing newvnode assert and also "ffs_blkfree_common: freeing free
block". Not sure if it's also likely to explain the other problem with
garbage block number.
I'll check it on Sunday/Monday (traveling now).
Jaromir
2016-11-05 21:00 GMT+01:00 David Holland <dholland-bugs@netbsd.org>:
> The following reply was made to PR kern/51602; it has been noted by GNATS.
>
> From: David Holland <dholland-bugs@netbsd.org>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/51602: file system inconsistency and panic in
> ffs_blkfree_common
> Date: Sat, 5 Nov 2016 19:58:07 +0000
>
> On Sat, Nov 05, 2016 at 12:50:01PM +0000, J. Hannken-Illjes wrote:
> > Looks like this commit from Oct 28, 20:38:
> > [the wapbl fix stuff]
>
> oh dear :(
>
> --
> David A. Holland
> dholland@netbsd.org
>
Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Mon, 07 Nov 2016 21:22:13 +0000
Responsible-Changed-Why:
Looking at this, likely related to latest wapbl change. Likely same problem
as kern/51601, but keeping open for now.
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51602: file system inconsistency and panic in ffs_blkfree_common
Date: Thu, 10 Nov 2016 08:15:16 +0100
The shark survived the next test run, so fine to close this from my POV.
Martin
State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Thu, 10 Nov 2016 21:16:19 +0000
State-Changed-Why:
Fixes committed to -current. Can you please confirm that the problem is fixed?
State-Changed-From-To: feedback->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Thu, 10 Nov 2016 21:22:49 +0000
State-Changed-Why:
Yes, works for me.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.