NetBSD Problem Report #55362

From www@netbsd.org  Wed Jun 10 13:22:34 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 264791A9219
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 10 Jun 2020 13:22:34 +0000 (UTC)
Message-Id: <20200610132232.D6AD61A921E@mollari.NetBSD.org>
Date: Wed, 10 Jun 2020 13:22:32 +0000 (UTC)
From: prlw1@cam.ac.uk
Reply-To: prlw1@cam.ac.uk
To: gnats-bugs@NetBSD.org
Subject: bad inodes created after dump|restore
X-Send-Pr-Version: www-1.0

>Number:         55362
>Category:       kern
>Synopsis:       bad inodes created after dump|restore
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jun 10 13:25:00 +0000 2020
>Last-Modified:  Wed Jul 29 10:35:01 +0000 2020
>Originator:     Patrick Welche
>Release:        NetBSD-9.99.65/amd64
>Organization:
>Environment:
-current of 9 June 2020, 8:07 UTC
>Description:
In brief dump | restore creates something which is unusable. Better to describe as the steps to reproduce to avoid semantic meanings of "filesystem" or "partition".

dk20 at cgd0: "oldbackup", 7814057728 blocks at 128, type: ffs
dk21 at cgd1: "backup", 12884901600 blocks at 40, type: ffs

Both are on raidframe, it so happens that "oldbackup" is raid 5, whereas "backup" is on raid1. Both are ffs v2.

fsck -f /dev/rdk20           all OK
mount name=oldbackup /store/raid5
newfs -O2 /dev/rdk21
mount -o async name=backup /store/backup   (same happened earlier without async)
cd /store/backup
dump -0uaXf - /store/raid5 | restore rf -

  DUMP: 61.44% done, finished in 7:52
  DUMP: 3481184598 tape blocks
  DUMP: Volume 1 completed at: Wed Jun 10 08:12:00 2020
  DUMP: Volume 1 took 20:38:21
  DUMP: Volume 1 transfer rate: 46852 KB/s
  DUMP: Date of this level 0 dump: Tue Jun  9 11:33:05 2020
  DUMP: Date this dump completed:  Wed Jun 10 08:12:00 2020
  DUMP: Average transfer rate: 46852 KB/s
  DUMP: level 0 dump on Tue Jun  9 11:33:05 2020
  DUMP: DUMP IS DONE
# cd ..
# umount /store/backup
# fsck /dev/rdk21
** /dev/rdk21
** File system is clean; not checking
quantz# fsck -f /dev/rdk21
** /dev/rdk21
** File system is already clean
** Last Mounted on /store/backup
** Phase 1 - Check Blocks and Sizes
PARTIALLY TRUNCATED INODE I=187721310
SALVAGE? [yn] n

-3806487682048676796 BAD I=187721310
-8054706583720281705 BAD I=187721310
-2116918007204608326 BAD I=187721310
4171795335284772382 BAD I=187721310
-6832354079944068152 BAD I=187721310
-5029522714663959145 BAD I=187721310
101993166284650241 BAD I=187721310
-3217679531023626255 BAD I=187721310
-1789572272864921531 BAD I=187721310
6627881258759745910 BAD I=187721310
4268130136434363919 BAD I=187721310
EXCESSIVE BAD BLKS I=187721310
CONTINUE? [yn] n


This is the second time around for me. I first mentioned it at:
http://mail-index.netbsd.org/current-users/2020/06/07/msg038814.html
I note that this looks similar:
http://mail-index.netbsd.org/netbsd-users/2020/05/28/msg024985.html
so probably not a local problem.

Not sure if relevant, but on the 2 disks which form the mirror "backup" is on, there is also a zfs pool of 2 pieces of those disks. Doing a similar dump | restore of 2.2T to the zfs partition, zfs scrub is happy afterwards. I assume that is the equivalent of fsck.
>How-To-Repeat:

>Fix:

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55362: bad inodes created after dump|restore
Date: Wed, 10 Jun 2020 16:10:58 +0200

 I have seen such fallout with dying disks + raid, where writes to one
 sector did not read back the same (e.g. when both disks in a raid1 did
 actually write different things to the same sector and bad luck makes
 the raid return one or the other).

 Dump/restore is probably no factor in this at all.

 Can you try recreating the issue on a non-raid partition?
 Do we have some disk verification tool in pkgsrc?

 Martin

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55362: bad inodes created after dump|restore
Date: Wed, 10 Jun 2020 15:33:10 +0100

 On Wed, Jun 10, 2020 at 02:15:02PM +0000, Martin Husemann wrote:
 > The following reply was made to PR kern/55362; it has been noted by GNATS.
 > 
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: kern/55362: bad inodes created after dump|restore
 > Date: Wed, 10 Jun 2020 16:10:58 +0200
 > 
 >  I have seen such fallout with dying disks + raid, where writes to one
 >  sector did not read back the same (e.g. when both disks in a raid1 did
 >  actually write different things to the same sector and bad luck makes
 >  the raid return one or the other).
 >  
 >  Dump/restore is probably no factor in this at all.
 >  
 >  Can you try recreating the issue on a non-raid partition?
 >  Do we have some disk verification tool in pkgsrc?

 If it were the disks, could I expect
 - to have seen a similar problem on the zfs partition of those disks
 - for MLH not to have had a similar observation with a >=  9.99.64 kernel
 - for disks that are only a couple of months old and have been
   reliable to not suddenly mechanically behave strangely

 OK, tell me which hoops to jump through...

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55362: bad inodes created after dump|restore
Date: Thu, 11 Jun 2020 00:02:46 +0700

     Date:        Wed, 10 Jun 2020 14:35:01 +0000 (UTC)
     From:        Patrick Welche <prlw1@cam.ac.uk>
     Message-ID:  <20200610143501.CE8BB1A921E@mollari.NetBSD.org>

   |  If it were the disks, could I expect

 On the off chance  it is, you could simply compare the data
 portions (all except the raid header) of the two partitions that
 make up the raid1 on which the filesystem was created - assuming that
 raid1 init has finished, the two should be, and remain, identical
 (at least when not in active use, so do it with the filesystem(s)
 unmounted).   (You cold just compare the entire partitions, and
 ignore differences in the first 64 blocks).

 If the two differ, then either there really is some hardware issue, or
 raidframe isn't working properly.

   |  - for MLH not to have had a similar observation with a >=  9.99.64 kernel

 It isn't clear that they are the same problem.   It might be, or
 that could be different - the history of that filesystem wasn't
 made clear.

 kre

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55362: bad inodes created after dump|restore
Date: Fri, 12 Jun 2020 14:41:19 +0100

 As these are in-warranty seagate disks, I ran

 SeaTools Bootable v2.1.2
 - SMART check                         PASS
 - short drive self test               PASS
 - short generic test                  PASS
 - long generic test (over 9 hours)    PASS


 > From: Robert Elz <kre@munnari.OZ.AU>
 > Date: Thu, 11 Jun 2020 00:02:46 +0700
 >  On the off chance  it is, you could simply compare the data
 >  portions (all except the raid header) of the two partitions that
 >  make up the raid1 on which the filesystem was created - assuming that
 >  raid1 init has finished, the two should be, and remain, identical
 >  (at least when not in active use, so do it with the filesystem(s)
 >  unmounted).   (You cold just compare the entire partitions, and
 >  ignore differences in the first 64 blocks).
 >  
 >  If the two differ, then either there really is some hardware issue, or
 >  raidframe isn't working properly.

 I ran

     cmp -c -l /dev/rdk2 /dev/rdk6

  16401   0   1              <- ignore as within raid header
 1570687852545   0  73
 1570687852546   0 142
 1570687852547   0 260
 1570687852548   0 302
 ...

 There is always a zero in the second column, i.e., the mismatch is
 always reading a zero from the first disk of the mirror.

 Looking for patterns:
 -> is the byte offset at which cmp finds a difference, i.e., the zeros begin,
 <- is the last byte offset in the range for which cmp finds a difference.
 so e.g., the offsets 1570687852545 to 1570687885312 inclusive are different.

 Within those ranges, there are some byte locations which don't
 appear in the cmp list, but they correspond to zero bytes in rdk6.

   n             (n/512, n%512)  (n/1024,n%1024)
 ->1570687852545 (3067749712,1)  (383468714,1)
 <-1570687885312 (3067749776,0)  (383468722,0)
 ->1770406838273 (3457825856,1)  (432228232,1)
 <-1770406846464 (3457825872,0)  (432228234,0)
 ->1770406903809 (3457825984,1)  (432228248,1)
 <-1770406912000 (3457826000,0)  (432228250,0)
 ->1770406969345 (3457826112,1)  (432228264,1)
 <-1770406977536 (3457826128,0)  (432228266,0)
 ->1770407100417 (3457826368,1)  (432228296,1)
 <-1770407108608 (3457826384,0)  (432228298,0)
 ->1770407428097 (3457827008,1)  (432228376,1)
 <-1770407436288 (3457827024,0)  (432228378,0)
 ->1770407493633 (3457827136,1)  (432228392,1)
 <-1770407501824 (3457827152,0)  (432228394,0)
 ->1770407624705 (3457827392,1)  (432228424,1)
 <-1770407632896 (3457827408,0)  (432228426,0)
 ->1770407690241 (3457827520,1)  (432228440,1)
 <-1770407698432 (3457827536,0)  (432228442,0)
 ->1770407952385 (3457828032,1)  (432228504,1)
 <-1770407960576 (3457828048,0)  (432228506,0)
 ...

 and so on.


 Looking at the bad patch starting at 1570687852545:
                   3067749712 * 512 = 1570687852544

 # dd if=/dev/rdk2 skip=3067749712 count=1 | hexdump -C
 1+0 records in
 1+0 records out
 512 bytes transferred in 0.297 secs (1723 bytes/sec)
 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 *
 00000200

 so it seems there really are zeros on that disk, not just that cmp happened
 to read zero.

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55362: bad inodes created after dump|restore
Date: Fri, 12 Jun 2020 15:00:24 +0100

 On Fri, Jun 12, 2020 at 01:45:02PM +0000, Patrick Welche wrote:
 >  so it seems there really are zeros on that disk, not just that cmp happened
 >  to read zero.

 Note that as this a cgd which wasn't scrubbed first, this points to nothing
 being written, rather than zero being written.

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55362: bad inodes created after dump|restore
Date: Fri, 12 Jun 2020 14:54:12 +0100

 Erratum:

 On Fri, Jun 12, 2020 at 01:45:02PM +0000, Patrick Welche wrote:
 >    n             (n/512, n%512)  (n/1024,n%1024)
      n             (n/512, n%512)  (n/4096,n%4096)
 >  ->1570687852545 (3067749712,1)  (383468714,1)
 >  <-1570687885312 (3067749776,0)  (383468722,0)

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55362: bad inodes created after dump|restore
Date: Fri, 12 Jun 2020 22:53:00 +0700

     Date:        Fri, 12 Jun 2020 14:55:01 +0000 (UTC)
     From:        Patrick Welche <prlw1@cam.ac.uk>
     Message-ID:  <20200612145501.DD39B1A9242@mollari.NetBSD.org>

   |  Note that as this a cgd which wasn't scrubbed first, this points to nothing
   |  being written, rather than zero being written.

 Further, since the two drives have different data, and shouldn't, this
 cannot be a filesystem or above problem, it has to be either the drive
 omitting to write sometimes, or the driver forgetting to perform the write,
 or raidframe not requesting one of the writes.

 kre

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55362: bad inodes created after dump|restore
Date: Wed, 17 Jun 2020 20:51:01 +0100

 Just tried NetBSD-9.99.60/amd64 and after several hours

   DUMP: 14.63% done, finished in 28:54
   DUMP: 14.68% done, finished in 31:06
   DUMP: 14.75% done, finished in 33:57
   DUMP: 14.80% done, finished in 36:20
   DUMP: 14.82% done, finished in 36:48
   DUMP: 14.86% done, finished in 38:40
   DUMP: 14.90% done, finished in 40:19
   DUMP: 14.92% done, finished in 40:52
   DUMP: 14.95% done, finished in 41:45

 the box is 100% idle with restore stuck in biolock

  UID  PID PPID CPU PRI NI    VSZ    RSS WCHAN   STAT TTY       TIME COMMAND
    0 1158 1155   0  85  0  21868   2180 wait    Is   ttyp1  0:00.01 | `-- -sh 
    0 1483 1158  17  85  0 100060  47156 wait    I+   ttyp1  0:01.75 |   |-- dump -0uaXf - /raid5 
    0 1490 1483   2  85  0 100060  78932 netio   I+   ttyp1  3:51.63 |   | `-- dump -0uaXf - /raid5 
    0 1491 1490 311  85  0 100060  33880 pause   I+   ttyp1  5:56.14 |   |   |-- dump -0uaXf - /raid5 
    0 1492 1490 248  85  0 100060  33884 pause   I+   ttyp1  5:58.65 |   |   |-- dump -0uaXf - /raid5 
    0 1493 1490  11  85  0 100060  33888 pipe_wr I+   ttyp1  5:57.36 |   |   `-- dump -0uaXf - /raid5 
    0 1488 1158   0  85  0 904724 878468 biolock D+   ttyp1 10:56.05 |   `-- restore rf - 

 Not sure we learn anything from that.

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55362: bad inodes created after dump|restore
Date: Fri, 19 Jun 2020 20:00:11 +0100

 Pondered that disks might get warmer during 16 hours of dump|restore
 vs 9 hours of seatools test, so added a fan, max sustained disk
 temperature 50C.  9.99.67 repeat of the dump|restore succeeded, 2
 rounds of fsck succeeded, but cmp -l -c found differences.

 Is there a way of forcing raidrame to recheck parity?

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55362: bad inodes created after dump|restore
Date: Fri, 17 Jul 2020 15:01:38 +0100

 Summary (not necessarily in time order):

 - all with a kernel without even DIAGNOSTIC
   (as already takes ca 17hours, and LOCKDEBUG panicked)
 - 2 rounds of SeaTools tests (including long) on all disks passed
 - dd zeros to raid device and cmp components OK
 - dd ONES  to raid device and cmp components OK
 - dump & restore to cgd on raid of zeros (as decribed above)
   FAILs cmp of components (see blocks of zeros on one component)
 - dump & restore to cgd on raid of ONES (as decribed above)
   FAILs cmp of components (see blocks of ones on one component)
 - moved dump to other computer, so dump & restore went across network
   to change timings - same results with ones & zeros
 - dump & restore only 1.5T data to raid rather than cgd - cmp OK! (chance?)
 - no difference with cgd before & after aes changes

 Issue with layered filesystem?

 Now trying with DIAGNOSTIC/DEBUG/LOCKDEBUG kernel and hit PR kern/55493

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55362: bad inodes created after dump|restore
Date: Tue, 28 Jul 2020 08:52:22 +0100

 On Fri, Jul 17, 2020 at 02:05:01PM +0000, Patrick Welche wrote:
 >  
 >  Issue with layered filesystem?
 >  
 >  Now trying with DIAGNOSTIC/DEBUG/LOCKDEBUG kernel and hit PR kern/55493

 Takes days with all that - retried with just DIAGNOSTIC, restore to raid+cgd

 System panicked: /store/backup: bad dir ino 32325321 at offset 512: null entry  

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55362: bad inodes created after dump|restore
Date: Wed, 29 Jul 2020 11:32:53 +0100

 On Tue, Jul 28, 2020 at 07:55:01AM +0000, Patrick Welche wrote:
 >  System panicked: /store/backup: bad dir ino 32325321 at offset 512: null entry  

 daily.conf's find found the bad inode.

 Simplified system:

 .../dir4/file4
 .../dir5

 ino 32325321 points at dir5, yet *cnp points at file4 ?

 (not the underlying issue but I would have expected the panic
 to have been the inode for dir4)

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.