NetBSD Problem Report #54748
From www@netbsd.org Sun Dec 8 04:49:06 2019
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 801BB7A164
for <gnats-bugs@gnats.NetBSD.org>; Sun, 8 Dec 2019 04:49:06 +0000 (UTC)
Message-Id: <20191208044905.5DE207A1A2@mollari.NetBSD.org>
Date: Sun, 8 Dec 2019 04:49:05 +0000 (UTC)
From: mp@petermann-it.de
Reply-To: mp@petermann-it.de
To: gnats-bugs@NetBSD.org
Subject: NetBSD 9_RC1 not booting from disklabel partition on RAIDframe on GPT
X-Send-Pr-Version: www-1.0
>Number: 54748
>Category: misc
>Synopsis: NetBSD 9_RC1 not booting from disklabel partition on RAIDframe on GPT
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: manu
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Dec 08 04:50:00 +0000 2019
>Closed-Date: Tue Apr 21 17:18:50 +0000 2020
>Last-Modified: Tue Apr 21 17:20:03 +0000 2020
>Originator: Matthias Petermann
>Release: 9.0 RC1 (amd64)
>Organization:
>Environment:
NetBSD jupiter.mpnet.local 9.0_RC1 NetBSD 9.0_RC1 (XEN3_DOM0) #0: Wed Nov 27 16:14:52 UTC 2019 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/xen/compile/XEN3_DOM0 amd64
>Description:
Hello,
there seems to be a deviation of the boot behavior in NetBSD 9_RC1 compared to NetBSD 8.1 under the following circumstances:
1) system has 2 disks (wd0, wd1)
2) disks are GPT-partitioned
3) GPT contains one GPT-partition (RAIDframe component) each
4) RAIDframe raid0 contains a NetBSD disklabel
After installing NetBSD 9_RC1 it will not boot, instead the primary bootstrap is telling me:
Unexpected raidframe label version
NAME=sys0 not found
The closest I could find is this commit:
https://www.mail-archive.com/source-changes-d@netbsd.org/msg21099.html
where GPT and RAIDframe support for the boot loaders was enhanced (which is really good!).
Kind regards
Matthias
>How-To-Repeat:
Here is where I set up the system:
* Boot from USB installation media
* Select keyboard type
* Utility Menu - Run /bin/sh
# ksh
# gpt create wd0
# gpt create wd1
# gpt add -l sys0 -a 2m -s 96G -t raid wd0
# gpt add -l sys1 -a 2m -s 96G -t raid wd1
# gpt show -l wd0
# gpt show -l wd1
# dkctl wd0 listwedges
# dkctl wd1 listwedges
# gpt biosboot -A -i 1 wd0
# gpt biosboot -A -i 1 wd1
# vi /tmp/raid0.conf
START array
1 2 0
START disks
/dev/dk0
absent
START layout
128 1 1 1
START queue
fifo 100
# raidctl -C /tmp/raid0.conf raid0
# raidctl -I 201912500 raid0
# raidctl -i raid0
# raidctl -A softroot raid0
* Exit shell and continue with installer
* Install NetBSD to hard disk
* Select raid0
* Select disklabel as partitioning scheme for raid0
* Set sizes of NetBSD Partitions
-> Change input units (MB)
* root 16384
* swap 16384
* usr 32768+
* var 8192
* Bootblock: BIOS console
* Installation w/out X11
* Utility Menu - Run /bin/sh
# installboot -o timeout=30 -v -t raid /dev/rdk0 /usr/mdec/bootxx_ffsv2
* Reboot
* The error will occur while running the primary bootstrap
>Fix:
Emmanuel Dreyfus already contacted me and provided a patch. I will perform testing this weekend.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: misc-bug-people->manu
Responsible-Changed-By: manu@NetBSD.org
Responsible-Changed-When: Sun, 08 Dec 2019 08:34:01 +0000
Responsible-Changed-Why:
To the committer of the change that introduced the bug
From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: misc/54748 (NetBSD 9_RC1 not booting from disklabel partition on
RAIDframe on GPT)
Date: Sun, 8 Dec 2019 08:36:49 +0000
--ALfTUftag+2gvp1h
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Hello
I can reproduce this problem in qemu.
The message "Unexpected raidframe label version" is just there because
the RAIDframe structure are not fully initialized due to the second
component being absent. If you configure the RAID with both disks, it
vanishes. But is is only a warning.
The real problem is here and I will have to investigate:
NAME=sys0 not found
I have a fix for this, and for another one I encountered while
looking at it. Find the patch attached (I can provide you a binary
if that helps).
The code to select the inner RAID partition assumed it had a name,
and hence that there was a GPT inside. I just modified the code
to let the partition be a candidate for booting if it has no name.
With it I was able to run this without an error with your setup.
ls NAME=sys0:
While there, I noticed that commands like this did not work:
ls raid0a:
boot raid0a:/netbsd
This is the second bug. The fix is done in the patch with the
changes to boot/devopen.c and efiboot/devopen.c
This recalls me some work I did, and I wonder how I could miss it
in my testings. I suspect I just forgot it when I did my commit.
On the other hand, it has some breakage potential and should
be tested a lot. It only impacts operations on devices named
after NAME=label: or raidNd:
--
Emmanuel Dreyfus
manu@netbsd.org
--ALfTUftag+2gvp1h
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="boot2.patch"
Index: boot/devopen.c
===================================================================
RCS file: /cvsroot/src/sys/arch/i386/stand/boot/devopen.c,v
retrieving revision 1.9
diff -U4 -r1.9 devopen.c
--- boot/devopen.c 18 Aug 2019 02:18:24 -0000 1.9
+++ boot/devopen.c 8 Dec 2019 03:05:16 -0000
@@ -155,9 +155,9 @@
/* Search by raidframe name */
if (strstr(devname, "raid") == devname) {
f->f_dev = &devsw[0]; /* must be biosdisk */
- return biosdisk_open_name(f, devname);
+ return biosdisk_open_name(f, fname);
}
#endif
error = dev2bios(devname, unit, &biosdev);
Index: efiboot/devopen.c
===================================================================
RCS file: /cvsroot/src/sys/arch/i386/stand/efiboot/devopen.c,v
retrieving revision 1.8
diff -U4 -r1.8 devopen.c
--- efiboot/devopen.c 26 Sep 2019 12:21:03 -0000 1.8
+++ efiboot/devopen.c 8 Dec 2019 03:05:16 -0000
@@ -150,8 +150,9 @@
int
devopen(struct open_file *f, const char *fname, char **file)
{
char *fsname, *devname;
+ const char *xname = NULL;
int unit, partition;
int biosdev;
int i, error;
#if defined(SUPPORT_NFS) || defined(SUPPORT_TFTP)
@@ -160,9 +161,8 @@
char *filename;
size_t fsnamelen;
int n;
#endif
-
error = parsebootfile(fname, &fsname, &devname, &unit, &partition,
(const char **) file);
if (error)
return error;
@@ -171,18 +171,22 @@
sizeof(struct fs_ops) * nfsys_disk);
nfsys = nfsys_disk;
/* Search by GPT label or raidframe name */
- if ((strstr(devname, "NAME=") == devname) ||
- (strstr(devname, "raid") == devname)) {
+ if (strstr(devname, "NAME=") == devname)
+ xname = devname;
+ if (strstr(devname, "raid") == devname)
+ xname = fname;
+
+ if (xname != NULL) {
f->f_dev = &devsw[0]; /* must be biosdisk */
if (!kernel_loaded) {
strncpy(bibp.bootpath, *file, sizeof(bibp.bootpath));
BI_ADD(&bibp, BTINFO_BOOTPATH, sizeof(bibp));
}
- error = biosdisk_open_name(f, devname);
+ error = biosdisk_open_name(f, xname);
return error;
}
/*
Index: lib/biosdisk.c
===================================================================
RCS file: /cvsroot/src/sys/arch/i386/stand/lib/biosdisk.c,v
retrieving revision 1.52
diff -U4 -r1.52 biosdisk.c
--- lib/biosdisk.c 13 Sep 2019 02:19:46 -0000 1.52
+++ lib/biosdisk.c 8 Dec 2019 03:05:16 -0000
@@ -1400,19 +1400,20 @@
if (d->part[part].size == 0)
continue;
if (d->part[part].fstype == FS_UNUSED)
continue;
- if (d->part[part].part_name == NULL)
- continue;
- if (strcmp(d->part[part].part_name, name) == 0) {
+
+ if (d->part[part].part_name != NULL &&
+ strcmp(d->part[part].part_name, name) == 0) {
*biosdev = raidframe[i].biosdev;
*offset = raidframe[i].offset
+ RF_PROTECTED_SECTORS
+ d->part[part].offset;
*size = d->part[part].size;
ret = 0;
goto out;
}
+
if (strcmp(raidframe[i].parent_name, name) == 0) {
if (candidate == -1 || bootme)
candidate = part;
continue;
--ALfTUftag+2gvp1h--
From: Matthias Petermann <mp@petermann-it.de>
To: gnats-bugs@netbsd.org, manu@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Cc:
Subject: Re: misc/54748 (NetBSD 9_RC1 not booting from disklabel partition on
RAIDframe on GPT)
Date: Mon, 9 Dec 2019 09:41:45 +0100
Good morning Emmanuel,
after application of your patch "boot2.patch" I can confirm that the
issue is resolved. I tested it with exactly the same setup / hardware
and now it boots as expected.
I want to thank you for your quick help and great support!
Kind regards
Matthias
From: Emmanuel Dreyfus <manu@netbsd.org>
To: Matthias Petermann <mp@petermann-it.de>
Cc: gnats-bugs@netbsd.org, manu@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: misc/54748 (NetBSD 9_RC1 not booting from disklabel partition on
RAIDframe on GPT)
Date: Mon, 9 Dec 2019 08:13:21 +0000
On Mon, Dec 09, 2019 at 09:41:45AM +0100, Matthias Petermann wrote:
> after application of your patch "boot2.patch" I can confirm that the issue
> is resolved. I tested it with exactly the same setup / hardware and now it
> boots as expected.
Great. I will commit the change when I will have my keys at hand, this
should be in about 20 hours.
--
Emmanuel Dreyfus
manu@netbsd.org
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54748 CVS commit: [netbsd-8] src/sys/arch/i386/stand
Date: Tue, 17 Dec 2019 12:19:49 +0000
Module Name: src
Committed By: martin
Date: Tue Dec 17 12:19:49 UTC 2019
Modified Files:
src/sys/arch/i386/stand/boot [netbsd-8]: devopen.c
src/sys/arch/i386/stand/efiboot [netbsd-8]: devopen.c
src/sys/arch/i386/stand/lib [netbsd-8]: biosdisk.c
Log Message:
Pull up following revision(s) (requested by manu in ticket #1473):
sys/arch/i386/stand/lib/biosdisk.c: revision 1.53
sys/arch/i386/stand/efiboot/devopen.c: revision 1.9
sys/arch/i386/stand/boot/devopen.c: revision 1.10
In-RAID partitions with no name can be candidate for booting
The code to select boot partition in RAID assumed thet had a name,
which is true when there is a GPT inside the RAID, but not when there
is a disklabel inside the RAID. This caused a regression from behavior
of NetBSD 8.1.
We fix this by allowing nameless partition to be boot candidates.
This fixes PR misc/54748
While there, let raid device be used in the boot specification, like
raid0a:/netbsd.
To generate a diff of this commit:
cvs rdiff -u -r1.8.52.1 -r1.8.52.2 src/sys/arch/i386/stand/boot/devopen.c
cvs rdiff -u -r1.1.12.6 -r1.1.12.7 src/sys/arch/i386/stand/efiboot/devopen.c
cvs rdiff -u -r1.46.6.5 -r1.46.6.6 src/sys/arch/i386/stand/lib/biosdisk.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: maya@NetBSD.org
State-Changed-When: Tue, 21 Apr 2020 17:18:50 +0000
State-Changed-Why:
oops, it was committed and pulled up! There's quite a lot of devopen.c's and I was looking at the wrong one.
Thanks for all the work!
From: maya@NetBSD.org
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: misc/54748 (NetBSD 9_RC1 not booting from disklabel partition on
RAIDframe on GPT)
Date: Tue, 21 Apr 2020 17:17:30 +0000
Looks like manu@ expected to commit a working patch but didn't.
Any reason not to? in case you forgot, this is a reminder :-)
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.