NetBSD Problem Report #57081

From www@netbsd.org  Fri Nov 11 09:35:25 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id BCEA61A921F
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 11 Nov 2022 09:35:25 +0000 (UTC)
Message-Id: <20221111093454.C0C691A9239@mollari.NetBSD.org>
Date: Fri, 11 Nov 2022 09:34:54 +0000 (UTC)
From: roy@marples.name
Reply-To: roy@marples.name
To: gnats-bugs@NetBSD.org
Subject: ZFS pool rolled back two years, all data lost since.
X-Send-Pr-Version: www-1.0

>Number:         57081
>Category:       kern
>Synopsis:       ZFS pool rolled back two years, all data lost since.
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Nov 11 09:40:00 +0000 2022
>Closed-Date:    Wed May 17 11:39:58 +0000 2023
>Last-Modified:  Wed May 17 11:39:58 +0000 2023
>Originator:     Roy Marples
>Release:        9.99.106
>Organization:
>Environment:
# uname -a
NetBSD cube.marples.name 9.99.106 NetBSD 9.99.106 (GENERIC) #3: Thu Nov 10 12:45:17 GMT 2022  roy@cube:/usr/obj/sys/arch/amd64/compile.amd64/GENERIC amd64
>Description:
cube# zpool history | tail -n 10
2020-10-18.14:15:39 zpool import -f -N rpool
2020-10-18.14:16:22 zpool scrub rpool
2020-10-18.15:20:25 zfs send -Rv -i rpool@snapi rpool@snap
2020-10-18.16:31:12 zpool import -f -N rpool
2020-10-18.16:49:30 zpool import -f -N rpool
2020-10-18.16:54:57 zpool import -f -N rpool
2022-11-10.14:21:38 zpool import -f -N rpool
2022-11-10.14:28:55 zpool import -f -N rpool
2022-11-11.09:07:58 zpool import -f -N rpoolv

As you can see there are a few years missing and the mounts on the rpool look like it was about 2 years ago - loads of things missing.
>How-To-Repeat:
Uknown, unwilling to experiment in the thin hope I can recover the data.

What I did was upgrade from 9.99.93 to 99.99.106. Installed everything as usual but when I booted into the new kernel it failed to load the solaris or zfs modules.
In single user `modload solaris` fails. I had to navigate to the actual directory where it was stored and do `modload ./solaris.kmod`

[   168.867653] DEBUG: module: Loading module from /stand/amd64/9.99.106/modules/solaris/solaris.kmod
[   168.867653] DEBUG: module: Cannot load kernel object `solaris' error=2
[   316.886565] DEBUG: module: Loading module from /stand/amd64/9.99.106/modules/solaris.kmod/solaris.kmod.kmod
[   316.886565] DEBUG: module: Cannot load kernel object `solaris.kmod' error=2
[   336.136424] DEBUG: module: Loading module from ./solaris.kmod
[   336.146460] DEBUG: module: Loading plist from ./solaris.plist
[   336.146460] DEBUG: module: plist load returned error 2 for `./solaris.kmod'
[   336.146460] DEBUG: module: module `solaris' loaded successfully
[   382.106086] DEBUG: module: Loading module from /stand/amd64/9.99.106/modules/zfs.kmod/zfs.kmod.kmod
[   382.106086] DEBUG: module: Cannot load kernel object `zfs.kmod' error=2
[   391.146019] DEBUG: module: Loading module from ./zfs
[   391.146019] DEBUG: module: Cannot load kernel object `./zfs' error=2
[   400.465951] DEBUG: module: Loading module from ./zfs.kmod
[   400.465951] DEBUG: module: Loading plist from ./zfs.plist
[   400.465951] DEBUG: module: plist load returned error 2 for `./zfs.kmod'
[   400.465951] DEBUG: module: dependent module `solaris' already loaded
[   400.505950] ZFS filesystem version: 5
[   400.505950] DEBUG: module: module `zfs' loaded successfully

No actual ZFS errors were reported, but the disks remained in this rolled back state.

Any advice on how to recover, welcome :)
>Fix:

>Release-Note:

>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: roy@marples.name
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/57081: ZFS pool rolled back two years, all data lost since.
Date: Fri, 11 Nov 2022 10:00:01 +0000

 Yikes, seems bad!

 1. Can you make an archive of the vdevs, and any zpool.cache file,
    onto another medium just so that you don't lose anything more in
    the event of experimentation?

 2. Did you update your bootloader according to src/UPDATING?
    Otherwise it won't be able to find modules from 9.99.1xx.  (This is
    necessary only for current kernels, and will not be needed for 10.)

 3. How are your disks partitions and file systems arranged and system
    booted?

 4. Can you share output from `zdb -e -l <vdev>' and from `zdb -e -h
    <pool>'?

 5. Are you using a zpool.cache file?  If yes, can you try moving it
    out of the way and see if that changes anything?

From: Simon Burge <simonb@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, roy@marples.name
Subject: Re: kern/57081: ZFS pool rolled back two years, all data lost since.
Date: Fri, 11 Nov 2022 21:12:01 +1100

 Taylor R Campbell wrote:

 > From: Taylor R Campbell <riastradh@NetBSD.org>
 > To: roy@marples.name
 > Cc: gnats-bugs@NetBSD.org
 > Subject: Re: kern/57081: ZFS pool rolled back two years, all data lost s=
 ince.
 > Date: Fri, 11 Nov 2022 10:00:01 +0000
 >
 >  Yikes, seems bad!

 Yeah :/

 >  4. Can you share output from `zdb -e -l <vdev>' and from `zdb -e -h
 >     <pool>'?

 Looks like the -u option when used with the -l option will display
 the contents of the uberblocks (like ffs superblocks).  There
 should be four of these per device.

 Cheers,
 Simon.

From: Roy Marples <roy@marples.name>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/57081: ZFS pool rolled back two years, all data lost since.
Date: Fri, 11 Nov 2022 10:23:19 +0000

 Hi Taylor, thanks for the prompt reply

 On 11/11/2022 10:00, Taylor R Campbell wrote:
 > 1. Can you make an archive of the vdevs, and any zpool.cache file,
 >     onto another medium just so that you don't lose anything more in
 >     the event of experimentation?

 Unsure how?
 I will prep a spare disk to boot from in the meantime.

 > 
 > 2. Did you update your bootloader according to src/UPDATING?
 >     Otherwise it won't be able to find modules from 9.99.1xx.  (This is
 >     necessary only for current kernels, and will not be needed for 10.)

 Sadly no.

 > 
 > 3. How are your disks partitions and file systems arranged and system
 >     booted?

 [     1.208885] wd0: GPT GUID: 31914b98-ccf9-41af-a403-43d6891c6095
 [     1.208885] dk0 at wd0: "boot", 4194304 blocks at 2048, type: ffs
 [     1.208885] dk1 at wd0: "swp", 67004416 blocks at 4196352, type: swap
 [     1.208885] dk2 at wd0: "zroot", 905572367 blocks at 71200768, type: zfs

 # zpool status rpool
    pool: rpool
   state: ONLINE
    scan: scrub repaired 0 in 0h11m with 0 errors on Sun Oct 18 14:27:27 2020
 config:

          NAME        STATE     READ WRITE CKSUM
          rpool       ONLINE       0     0     0
            dk5       ONLINE       0     0     0

 errors: No known data errors

 dk5 is my spare disk I can reuse. No idea what's on it!


 # zfs list
 NAME                    USED  AVAIL  REFER  MOUNTPOINT
 rpool                  94.5G  3.36G    23K  /rpool
 rpool/ROOT             94.5G  3.36G   405M  legacy
 rpool/ROOT/home        67.1G  3.36G  67.1G  /home
 rpool/ROOT/usr         6.03G  3.36G   434M  legacy
 rpool/ROOT/usr/obj     46.6M  3.36G  46.6M  /usr/obj
 rpool/ROOT/usr/pkg      593M  3.36G   592M  /usr/pkg
 rpool/ROOT/usr/pkgsrc  3.10G  3.36G  3.10G  /usr/pkgsrc
 rpool/ROOT/usr/src     1.26G  3.36G  1.26G  /usr/src
 rpool/ROOT/usr/tools    221M  3.36G   197M  /usr/tools
 rpool/ROOT/usr/xsrc     296M  3.36G   296M  /usr/xsrc
 rpool/ROOT/var         20.9G  3.36G  20.6G  legacy
 rpool/ROOT/var/log     2.79M  3.36G  2.63M  legacy
 rpool/ROOT/var/pbulk    273M  3.36G   273M  /var/pbulk
 rpool/ROOT/var/spool    817K  3.36G   793K  /var/spool
 rpool/ROOT/var/www       23K  3.36G    23K  /var/www

 > 
 > 4. Can you share output from `zdb -e -l <vdev>' and from `zdb -e -h
 >     <pool>'?

 # zdb -e -l /dev/dk0
 cannot open '/dev/dk0': Device busy
 # zdb -e -h rpool
 zdb: can't open 'rpool': No such file or directory
 # zdb rpool
 zdb: can't open 'rpool': Device busy

 > 
 > 5. Are you using a zpool.cache file?  If yes, can you try moving it
 >     out of the way and see if that changes anything?

 Not using one.

 Roy

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Roy Marples <roy@marples.name>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/57081: ZFS pool rolled back two years, all data lost since.
Date: Fri, 11 Nov 2022 10:29:28 +0000

 > Date: Fri, 11 Nov 2022 10:23:19 +0000
 > From: Roy Marples <roy@marples.name>
 > 
 > On 11/11/2022 10:00, Taylor R Campbell wrote:
 > > 1. Can you make an archive of the vdevs, and any zpool.cache file,
 > >     onto another medium just so that you don't lose anything more in
 > >     the event of experimentation?
 > 
 > Unsure how?
 > I will prep a spare disk to boot from in the meantime.

 Just dd from the raw disk onto a spare disk -- in this case, it looks
 like your zpool is just on dk5, so copy the content of dk5 to
 somewhere else.

 > > 4. Can you share output from `zdb -e -l <vdev>' and from `zdb -e -h
 > >     <pool>'?
 > 
 > # zdb -e -l /dev/dk0
 > cannot open '/dev/dk0': Device busy
 > # zdb -e -h rpool
 > zdb: can't open 'rpool': No such file or directory
 > # zdb rpool
 > zdb: can't open 'rpool': Device busy

 Can you do these from single-user mode, without the zpool imported?

 <vdev> here should be `/dev/rdk5', judging by your description.

From: Roy Marples <roy@marples.name>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/57081: ZFS pool rolled back two years, all data lost since.
Date: Fri, 11 Nov 2022 14:11:03 +0000

 On 11/11/2022 10:29, Taylor R Campbell wrote:
 > <vdev> here should be `/dev/rdk5', judging by your description.

 That's wrong! But it's also correct.

 The problem, as usual, is between the chair and the keyboard.
 Thanks to me not reading src/UPDATING I didn't update the bootloader so the 
 system didn't load the ZFS modules.
 Upon thinking maybe the ramdisk needed updating I rebuilt that with the latest 
 but forgot to rename the zpool so it booted the wrong disk! This was a disk from 
 my old dev machine which I had installed but unused still.

 Totally confused me.

 I would like to solve this so it doens't happen again.
 I'm sure I mentioned it before, but I need the bootloader to set a kernel string 
 - something like kern.bootopts and is set so:

 :rndseed /var/db/entropy-file;bootopts="POOL=mypool;ROOT=thisroot";boot

 Then the ramdisk knows which pool and root to try to import without any code 
 changes to the ramdisk.

 Anyway, you don't know how glad I am to have my data back :)

 Thanks!

 Roy

State-Changed-From-To: open->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 17 May 2023 11:39:58 +0000
State-Changed-Why:
data found


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.