NetBSD Problem Report #51602

From martin@duskware.de  Sat Nov  5 12:22:56 2016
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id BD5D27A279
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  5 Nov 2016 12:22:56 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: file system inconsistency and panic in ffs_blkfree_common
X-Send-Pr-Version: 3.95

>Number:         51602
>Category:       kern
>Synopsis:       file system inconsistency and panic in ffs_blkfree_common
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    jdolecek
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Nov 05 12:25:00 +0000 2016
>Closed-Date:    Thu Nov 10 21:22:49 +0000 2016
>Last-Modified:  Thu Nov 10 21:22:49 +0000 2016
>Originator:     Martin Husemann
>Release:        NetBSD 7.99.42
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD night-rest.duskware.de 7.99.42 NetBSD 7.99.42 (GENERIC) #25: Fri Nov 4 13:34:09 CET 2016 martin@martins.aprisoft.de:/ssd/src/sys/arch/shark/compile/GENERIC shark
Architecture: earmv4
Machine: shark
>Description:

While running the atf tests, my shark paniced:


panic: ffs_blkfree_common: freeing free block: dev = 0x1000, block = 24512, fs =
 /
0xf6119a54: netbsd:vpanic+0xc
0xf6119a6c: netbsd:snprintf
0xf6119ae4: netbsd:ffs_blkfree_common.isra.3+0x340
0xf6119b3c: netbsd:ffs_blkfree_cg+0x154
0xf6119bcc: netbsd:ffs_indirtrunc+0x4e8
0xf6119d2c: netbsd:ffs_truncate+0xda8
0xf6119d6c: netbsd:ufs_truncate_retry+0x9c
0xf6119d9c: netbsd:ufs_inactive+0x16c
0xf6119dbc: netbsd:VOP_INACTIVE+0x30
0xf6119df4: netbsd:vrelel+0x250
0xf6119e2c: netbsd:ufs_remove+0xc8
0xf6119e4c: netbsd:VOP_REMOVE+0x34
0xf6119ebc: netbsd:do_sys_unlinkat+0xe8
0xf6119ed4: netbsd:sys_unlink+0x28
0xf6119f4c: netbsd:syscall+0x9c


After rebooting to single user, fsck said:

** /dev/rwd0a
** File system is journaled; replaying journal
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
PARTIALLY TRUNCATED INODE I=10646308
SALVAGE? yes

405911018 BAD I=10646308
403082449 BAD I=10646308
1860989214 BAD I=10646308
3317784935 BAD I=10646308
4203596339 BAD I=10646308
4023974793 BAD I=10646308
2016968809 BAD I=10646308
242868444 BAD I=10646308
4134308919 BAD I=10646308
2506537194 BAD I=10646308
1620161144 BAD I=10646308
EXCESSIVE BAD BLKS I=10646308
CONTINUE? yes

INCORRECT BLOCK COUNT I=10646308 (672 should be 640)
CORRECT? yes
[...]

CANNOT WRITE: BLK 3513131968
CONTINUE? yes

THE FOLLOWING SECTORS COULD NOT BE WRITTEN: 3513131968, 3513131969, 3513131970, 3513131971, 3513131972, 3513131973, 3513131974, 3513131975, 3513131976, 3513131977, 3513131978, 3513131979, 3513131980, 3513131981, 3513131982, 3513131983, 3513131984, 3513131985, 3513131986, 3513131987, 3513131988, 3513131989, 3513131990, 3513131991, 3513131992, 3513131993, 3513131994, 3513131995, 3513131996, 3513131997, 3513131998, 3513131999,

[lots of more similar out of range blocks]

FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes

SUMMARY INFORMATION BAD
SALVAGE? yes

BLK(S) MISSING IN BIT MAPS
SALVAGE? yes

79982 files, 1934491 used, 54729164 free (18796 frags, 6838796 blocks, 0.0% fragmentation)

MARK FILE SYSTEM CLEAN? yes

Disklabel is:


8 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 a: 230246352         0     4.2BSD   2048 16384     0  # (Cyl.      0 - 228418)
 b:   4195296 230246352       swap                     # (Cyl. 228419 - 232580)
 c: 234441648         0     unused      0     0        # (Cyl.      0 - 232580)


Dumpfs output is:

file system: /dev/rwd0c
format  FFSv1
endian  little-endian
magic   11954           time    Sat Nov  5 13:11:26 2016
superblock location     8192    id      [ 4c4ad59b 16a2beb3 ]
cylgrp  dynamic inodes  4.4BSD  sblock  FFSv2   fslevel 4
nbfree  6838796 ndir    9392    nifree  14130576        nffree  18796
ncg     610     size    57561588        blocks  56663655
bsize   16384   shift   14      mask    0xffffc000
fsize   2048    shift   11      mask    0xfffff800
frag    8       shift   3       fsbtodb 2
bpg     11796   fpg     94368   ipg     23296
minfree 5%      optim   time    maxcontig 4     maxbpg  4096
symlinklen 60   contigsumsize 4
maxfilesize 0x000400400402ffff
nindir  4096    inopb   128
avgfilesize 16384       avgfpdir 64
sblkno  8       cblkno  16      iblkno  24      dblkno  1480
sbsize  2048    cgsize  16384
csaddr  1480    cssize  10240
cgrotor 0       fmod    0       ronly   0       clean   0x01
wapbl version 0x1       location 2      flags 0x0
wapbl loc0 115134912    loc1 131072     loc2 512        loc3 3
flags   none
fsmnt   /
volname         swuid   0



>How-To-Repeat:
s/a

>Fix:
n/a

>Release-Note:

>Audit-Trail:
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51602: file system inconsistency and panic in ffs_blkfree_common
Date: Sat, 5 Nov 2016 13:44:55 +0100

 Looks like this commit from Oct 28, 20:38:

 Module Name:	src
 Committed By:	jdolecek
 Date:		Fri Oct 28 20:38:12 UTC 2016

 Modified Files:
 	src/sys/kern: vfs_wapbl.c
 	src/sys/sys: wapbl.h
 	src/sys/ufs/ffs: ffs_alloc.c ffs_inode.c ffs_snapshot.c
 	src/sys/ufs/ufs: ufs_extern.h ufs_inode.c ufs_rename.c ufs_vnops.c
 	    ufs_wapbl.h

 Log Message:
 reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
 succeed; change wapbl_register_deallocation() to return EAGAIN
 rather than panic when code hits the limit

 callers changed to either loop calling ffs_truncate() using new
 utility ufs_truncate_retry() if their semantics requires it, or
 just ignore the failure; remove ufs_wapbl_truncate()

 this fixes possible user-triggerable panic during truncate, and
 resolves WAPBL performance issue with truncates of large files

 PR kern/47146 and kern/49175

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51602: file system inconsistency and panic in
 ffs_blkfree_common
Date: Sat, 5 Nov 2016 19:58:07 +0000

 On Sat, Nov 05, 2016 at 12:50:01PM +0000, J. Hannken-Illjes wrote:
  >  Looks like this commit from Oct 28, 20:38:
  >  [the wapbl fix stuff]

 oh dear :(

 -- 
 David A. Holland
 dholland@netbsd.org

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51602: file system inconsistency and panic in ffs_blkfree_common
Date: Sat, 5 Nov 2016 21:24:16 +0100

 > On 05 Nov 2016, at 21:00, David Holland <dholland-bugs@netbsd.org> wrote:
 > 
 > The following reply was made to PR kern/51602; it has been noted by GNATS.
 > 
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/51602: file system inconsistency and panic in
 > ffs_blkfree_common
 > Date: Sat, 5 Nov 2016 19:58:07 +0000
 > 
 > On Sat, Nov 05, 2016 at 12:50:01PM +0000, J. Hannken-Illjes wrote:
 >> Looks like this commit from Oct 28, 20:38:
 >> [the wapbl fix stuff]
 > 
 > oh dear :(

 I'm working on some fixes ...

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	Martin Husemann <martin@netbsd.org>
Subject: Re: kern/51602: file system inconsistency and panic in ffs_blkfree_common
Date: Sat, 5 Nov 2016 21:25:37 +0100

 Revieweing the code again, I see one missing BAP_ASSIGN() in the code
 for the last partial block. That one would definitely explain the
 failing newvnode assert and also "ffs_blkfree_common: freeing free
 block". Not sure if it's also likely to explain the other problem with
 garbage block number.

 I'll check it on Sunday/Monday (traveling now).

 Jaromir

 2016-11-05 21:00 GMT+01:00 David Holland <dholland-bugs@netbsd.org>:
 > The following reply was made to PR kern/51602; it has been noted by GNATS.
 >
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@NetBSD.org
 > Cc:
 > Subject: Re: kern/51602: file system inconsistency and panic in
 >  ffs_blkfree_common
 > Date: Sat, 5 Nov 2016 19:58:07 +0000
 >
 >  On Sat, Nov 05, 2016 at 12:50:01PM +0000, J. Hannken-Illjes wrote:
 >   >  Looks like this commit from Oct 28, 20:38:
 >   >  [the wapbl fix stuff]
 >
 >  oh dear :(
 >
 >  --
 >  David A. Holland
 >  dholland@netbsd.org
 >

Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Mon, 07 Nov 2016 21:22:13 +0000
Responsible-Changed-Why:
Looking at this, likely related to latest wapbl change. Likely same problem
as kern/51601, but keeping open for now.


From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51602: file system inconsistency and panic in ffs_blkfree_common
Date: Thu, 10 Nov 2016 08:15:16 +0100

 The shark survived the next test run, so fine to close this from my POV.

 Martin

State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Thu, 10 Nov 2016 21:16:19 +0000
State-Changed-Why:
Fixes committed to -current. Can you please confirm that the problem is fixed?


State-Changed-From-To: feedback->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Thu, 10 Nov 2016 21:22:49 +0000
State-Changed-Why:
Yes, works for me.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.