NetBSD Problem Report #57081
From www@netbsd.org Fri Nov 11 09:35:25 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id BCEA61A921F
for <gnats-bugs@gnats.NetBSD.org>; Fri, 11 Nov 2022 09:35:25 +0000 (UTC)
Message-Id: <20221111093454.C0C691A9239@mollari.NetBSD.org>
Date: Fri, 11 Nov 2022 09:34:54 +0000 (UTC)
From: roy@marples.name
Reply-To: roy@marples.name
To: gnats-bugs@NetBSD.org
Subject: ZFS pool rolled back two years, all data lost since.
X-Send-Pr-Version: www-1.0
>Number: 57081
>Category: kern
>Synopsis: ZFS pool rolled back two years, all data lost since.
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Nov 11 09:40:00 +0000 2022
>Closed-Date: Wed May 17 11:39:58 +0000 2023
>Last-Modified: Wed May 17 11:39:58 +0000 2023
>Originator: Roy Marples
>Release: 9.99.106
>Organization:
>Environment:
# uname -a
NetBSD cube.marples.name 9.99.106 NetBSD 9.99.106 (GENERIC) #3: Thu Nov 10 12:45:17 GMT 2022 roy@cube:/usr/obj/sys/arch/amd64/compile.amd64/GENERIC amd64
>Description:
cube# zpool history | tail -n 10
2020-10-18.14:15:39 zpool import -f -N rpool
2020-10-18.14:16:22 zpool scrub rpool
2020-10-18.15:20:25 zfs send -Rv -i rpool@snapi rpool@snap
2020-10-18.16:31:12 zpool import -f -N rpool
2020-10-18.16:49:30 zpool import -f -N rpool
2020-10-18.16:54:57 zpool import -f -N rpool
2022-11-10.14:21:38 zpool import -f -N rpool
2022-11-10.14:28:55 zpool import -f -N rpool
2022-11-11.09:07:58 zpool import -f -N rpoolv
As you can see there are a few years missing and the mounts on the rpool look like it was about 2 years ago - loads of things missing.
>How-To-Repeat:
Uknown, unwilling to experiment in the thin hope I can recover the data.
What I did was upgrade from 9.99.93 to 99.99.106. Installed everything as usual but when I booted into the new kernel it failed to load the solaris or zfs modules.
In single user `modload solaris` fails. I had to navigate to the actual directory where it was stored and do `modload ./solaris.kmod`
[ 168.867653] DEBUG: module: Loading module from /stand/amd64/9.99.106/modules/solaris/solaris.kmod
[ 168.867653] DEBUG: module: Cannot load kernel object `solaris' error=2
[ 316.886565] DEBUG: module: Loading module from /stand/amd64/9.99.106/modules/solaris.kmod/solaris.kmod.kmod
[ 316.886565] DEBUG: module: Cannot load kernel object `solaris.kmod' error=2
[ 336.136424] DEBUG: module: Loading module from ./solaris.kmod
[ 336.146460] DEBUG: module: Loading plist from ./solaris.plist
[ 336.146460] DEBUG: module: plist load returned error 2 for `./solaris.kmod'
[ 336.146460] DEBUG: module: module `solaris' loaded successfully
[ 382.106086] DEBUG: module: Loading module from /stand/amd64/9.99.106/modules/zfs.kmod/zfs.kmod.kmod
[ 382.106086] DEBUG: module: Cannot load kernel object `zfs.kmod' error=2
[ 391.146019] DEBUG: module: Loading module from ./zfs
[ 391.146019] DEBUG: module: Cannot load kernel object `./zfs' error=2
[ 400.465951] DEBUG: module: Loading module from ./zfs.kmod
[ 400.465951] DEBUG: module: Loading plist from ./zfs.plist
[ 400.465951] DEBUG: module: plist load returned error 2 for `./zfs.kmod'
[ 400.465951] DEBUG: module: dependent module `solaris' already loaded
[ 400.505950] ZFS filesystem version: 5
[ 400.505950] DEBUG: module: module `zfs' loaded successfully
No actual ZFS errors were reported, but the disks remained in this rolled back state.
Any advice on how to recover, welcome :)
>Fix:
>Release-Note:
>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: roy@marples.name
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/57081: ZFS pool rolled back two years, all data lost since.
Date: Fri, 11 Nov 2022 10:00:01 +0000
Yikes, seems bad!
1. Can you make an archive of the vdevs, and any zpool.cache file,
onto another medium just so that you don't lose anything more in
the event of experimentation?
2. Did you update your bootloader according to src/UPDATING?
Otherwise it won't be able to find modules from 9.99.1xx. (This is
necessary only for current kernels, and will not be needed for 10.)
3. How are your disks partitions and file systems arranged and system
booted?
4. Can you share output from `zdb -e -l <vdev>' and from `zdb -e -h
<pool>'?
5. Are you using a zpool.cache file? If yes, can you try moving it
out of the way and see if that changes anything?
From: Simon Burge <simonb@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, roy@marples.name
Subject: Re: kern/57081: ZFS pool rolled back two years, all data lost since.
Date: Fri, 11 Nov 2022 21:12:01 +1100
Taylor R Campbell wrote:
> From: Taylor R Campbell <riastradh@NetBSD.org>
> To: roy@marples.name
> Cc: gnats-bugs@NetBSD.org
> Subject: Re: kern/57081: ZFS pool rolled back two years, all data lost s=
ince.
> Date: Fri, 11 Nov 2022 10:00:01 +0000
>
> Yikes, seems bad!
Yeah :/
> 4. Can you share output from `zdb -e -l <vdev>' and from `zdb -e -h
> <pool>'?
Looks like the -u option when used with the -l option will display
the contents of the uberblocks (like ffs superblocks). There
should be four of these per device.
Cheers,
Simon.
From: Roy Marples <roy@marples.name>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/57081: ZFS pool rolled back two years, all data lost since.
Date: Fri, 11 Nov 2022 10:23:19 +0000
Hi Taylor, thanks for the prompt reply
On 11/11/2022 10:00, Taylor R Campbell wrote:
> 1. Can you make an archive of the vdevs, and any zpool.cache file,
> onto another medium just so that you don't lose anything more in
> the event of experimentation?
Unsure how?
I will prep a spare disk to boot from in the meantime.
>
> 2. Did you update your bootloader according to src/UPDATING?
> Otherwise it won't be able to find modules from 9.99.1xx. (This is
> necessary only for current kernels, and will not be needed for 10.)
Sadly no.
>
> 3. How are your disks partitions and file systems arranged and system
> booted?
[ 1.208885] wd0: GPT GUID: 31914b98-ccf9-41af-a403-43d6891c6095
[ 1.208885] dk0 at wd0: "boot", 4194304 blocks at 2048, type: ffs
[ 1.208885] dk1 at wd0: "swp", 67004416 blocks at 4196352, type: swap
[ 1.208885] dk2 at wd0: "zroot", 905572367 blocks at 71200768, type: zfs
# zpool status rpool
pool: rpool
state: ONLINE
scan: scrub repaired 0 in 0h11m with 0 errors on Sun Oct 18 14:27:27 2020
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
dk5 ONLINE 0 0 0
errors: No known data errors
dk5 is my spare disk I can reuse. No idea what's on it!
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 94.5G 3.36G 23K /rpool
rpool/ROOT 94.5G 3.36G 405M legacy
rpool/ROOT/home 67.1G 3.36G 67.1G /home
rpool/ROOT/usr 6.03G 3.36G 434M legacy
rpool/ROOT/usr/obj 46.6M 3.36G 46.6M /usr/obj
rpool/ROOT/usr/pkg 593M 3.36G 592M /usr/pkg
rpool/ROOT/usr/pkgsrc 3.10G 3.36G 3.10G /usr/pkgsrc
rpool/ROOT/usr/src 1.26G 3.36G 1.26G /usr/src
rpool/ROOT/usr/tools 221M 3.36G 197M /usr/tools
rpool/ROOT/usr/xsrc 296M 3.36G 296M /usr/xsrc
rpool/ROOT/var 20.9G 3.36G 20.6G legacy
rpool/ROOT/var/log 2.79M 3.36G 2.63M legacy
rpool/ROOT/var/pbulk 273M 3.36G 273M /var/pbulk
rpool/ROOT/var/spool 817K 3.36G 793K /var/spool
rpool/ROOT/var/www 23K 3.36G 23K /var/www
>
> 4. Can you share output from `zdb -e -l <vdev>' and from `zdb -e -h
> <pool>'?
# zdb -e -l /dev/dk0
cannot open '/dev/dk0': Device busy
# zdb -e -h rpool
zdb: can't open 'rpool': No such file or directory
# zdb rpool
zdb: can't open 'rpool': Device busy
>
> 5. Are you using a zpool.cache file? If yes, can you try moving it
> out of the way and see if that changes anything?
Not using one.
Roy
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Roy Marples <roy@marples.name>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/57081: ZFS pool rolled back two years, all data lost since.
Date: Fri, 11 Nov 2022 10:29:28 +0000
> Date: Fri, 11 Nov 2022 10:23:19 +0000
> From: Roy Marples <roy@marples.name>
>
> On 11/11/2022 10:00, Taylor R Campbell wrote:
> > 1. Can you make an archive of the vdevs, and any zpool.cache file,
> > onto another medium just so that you don't lose anything more in
> > the event of experimentation?
>
> Unsure how?
> I will prep a spare disk to boot from in the meantime.
Just dd from the raw disk onto a spare disk -- in this case, it looks
like your zpool is just on dk5, so copy the content of dk5 to
somewhere else.
> > 4. Can you share output from `zdb -e -l <vdev>' and from `zdb -e -h
> > <pool>'?
>
> # zdb -e -l /dev/dk0
> cannot open '/dev/dk0': Device busy
> # zdb -e -h rpool
> zdb: can't open 'rpool': No such file or directory
> # zdb rpool
> zdb: can't open 'rpool': Device busy
Can you do these from single-user mode, without the zpool imported?
<vdev> here should be `/dev/rdk5', judging by your description.
From: Roy Marples <roy@marples.name>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/57081: ZFS pool rolled back two years, all data lost since.
Date: Fri, 11 Nov 2022 14:11:03 +0000
On 11/11/2022 10:29, Taylor R Campbell wrote:
> <vdev> here should be `/dev/rdk5', judging by your description.
That's wrong! But it's also correct.
The problem, as usual, is between the chair and the keyboard.
Thanks to me not reading src/UPDATING I didn't update the bootloader so the
system didn't load the ZFS modules.
Upon thinking maybe the ramdisk needed updating I rebuilt that with the latest
but forgot to rename the zpool so it booted the wrong disk! This was a disk from
my old dev machine which I had installed but unused still.
Totally confused me.
I would like to solve this so it doens't happen again.
I'm sure I mentioned it before, but I need the bootloader to set a kernel string
- something like kern.bootopts and is set so:
:rndseed /var/db/entropy-file;bootopts="POOL=mypool;ROOT=thisroot";boot
Then the ramdisk knows which pool and root to try to import without any code
changes to the ramdisk.
Anyway, you don't know how glad I am to have my data back :)
Thanks!
Roy
State-Changed-From-To: open->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 17 May 2023 11:39:58 +0000
State-Changed-Why:
data found
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.