NetBSD Problem Report #59451
From www@netbsd.org Sat May 31 19:07:31 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits)
client-signature RSA-PSS (2048 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id AD6071A923D
for <gnats-bugs@gnats.NetBSD.org>; Sat, 31 May 2025 19:07:31 +0000 (UTC)
Message-Id: <20250531190729.C97221A923E@mollari.NetBSD.org>
Date: Sat, 31 May 2025 19:07:29 +0000 (UTC)
From: frchuckz@gmail.com
Reply-To: frchuckz@gmail.com
To: gnats-bugs@NetBSD.org
Subject: XEN3_DOM0 kernel finds the wrong root device
X-Send-Pr-Version: www-1.0
>Number: 59451
>Category: port-xen
>Synopsis: XEN3_DOM0 kernel finds the wrong root device
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-xen-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat May 31 19:10:00 +0000 2025
>Last-Modified: Tue Jun 03 16:00:04 +0000 2025
>Originator: Chuck Zmudzinski
>Release: NetBSD 10.1 Release
>Organization:
Home User
>Environment:
NetBSD netbsd 10.1 NetBSD 10.1 (XEN3_DOM0) #0: Mon Dec 16 13:08:11 UTC 2024 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/xen/compile/XEN3_DOM0 amd64
>Description:
On a system with UEFI/GPT for booting and disk partitioning, the XEN3_DOM0 kernel can spectacularly fail in its attempt to find the correct root device even when the correct dk wedge is passed to the kernel in boot.cfg.
This problem was discussed extensively in a thread on netbsd-users, especially starting with this message: https://mail-index.netbsd.org/netbsd-users/2025/05/29/msg032694.html
I credit Greg Woods, the author of the aforementioned message on netbsd-users, for correctly identifying the problem.
The tl;dr version of the problem:
This code from sys/arch/xen/xen/xen_machdep.c is not strict enough in
its sanity checks and in some cases picks the wrong boot device and
try to find a "booted_partition" when there is no "boot_partition" to find:
if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
continue;
if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
/* XXX check device_cfdata as in x86_autoconf.c? */
booted_partition = toupper(
xcp.xcp_bootdev[strlen(devname)]) - 'A';
DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
}
>How-To-Repeat:
1. It is most likely necessary to have a UEFI/GPT system that can boot NetBSD/xen to reproduce this problem. It might be possible to reproduce it on a system that uses BIOS booting and MBR partitioning, but I would recommend only trying to reproduce this problem on a UEFI/GPT system that relies on dk wedges rather than a NetBSD disklabel to find the root partition.
2. It is also necessary to have more than 10 dk wedges on the system and to install the NetBSD root/boot device on a dk wedge with a double digit index. On my box, NetBSD was installed on dk12. If NetBSD is installed on any wedge with a single digit index, I doubt the problem is reproducible. So if NetBSD is installed on dk9, for example, the problem will not be reproducible unless you do another install of NetBSD on dk10 or dk11, for example. The important thing is to have a NetBSD/xen PV DOM0 system installed on a wedge with a double digit wedge index.
3. Write a boot.cfg file to boot the NetBSD/xen system installed on the wedge with a double digit index and try to boot it. In such a case, you will need to set bootdev=dk12 (or whatever wedge NetBSD is installed on). But remember the problem will not be reproducible if you set bootdev=dk9, dk8, etc. It must be dk10 or higher. And NetBSD/xen PV DOM0 must be installed on the wedge that bootdev points to.
4. Also, it is probably necessary to setup Xen to use a serial console so you can interact with Xen and DOM0 at the serial console during the boot process to actually observe the details of this problem.
5. You should see the problem. In my case, with bootdev=dk12, the kernel tried to boot dk1 instead which, on my box, happened to have a Linux distro installed on it. So of course it failed to boot and the boot failed when the kernel tried to load the Linux distro's /sbin/init which in that case was systemd. This should up on the serial console like this:
snip ...
[ 5.1699079] boot device: dk1
[ 5.1699079] root on dk1 dumps on dk11
[ 5.1799090] Your machine does not initialize mem_clusters; sparse_dumps disabled
[ 5.1799090] root file system type: ext2fs
[ 5.1799090] kern.module.path=/stand/amd64/10.1/modules
[ 5.1826204] exec /sbin/init: error 8
[ 5.1826204] init: trying /sbin/oinit
[ 5.1826204] exec /sbin/oinit: error 2
[ 5.1826204] init: trying /sbin/init.bak
[ 5.1826204] exec /sbin/init.bak: error 2
[ 5.1826204] init: trying /rescue/init
[ 5.1826204] exec /rescue/init: error 2
[ 5.1826204] init path (default /sbin/init):
Note the error 8 in response to exec /sbin/init in the output above. That is when the NetBSD DOM0 kernel tried to execute systemd on the Linux distro that was installed on dk1. It did this even though I gave the kernel the correct parameter of bootdev=dk12 in boot.cfg!
It was impossible to recover from this situation except by rebooting without Xen, setting bootdev=<something that will cause the kernel to give me a "root device" prompt> in the Xen menu item of boot.cfg, and reboot again with Xen and interactively enter the correct root device and dump device at the serial console when the kernel prompts for those. Then, finally, you get a successful boot.
>Fix:
I wrote a very simple 3-line proof of concept patch to fix it for the netbsd-10 branch. I don't endorse this patch as the final fix, though, because with this patch we still have garbage for boot_partition which could throw an assertion later on during the boot process (but it didn't in my test because the code where the assertion is never executed):
--- sys/arch/xen/xen/xen_machdep.c.orig 2023-10-18 12:53:03.000000000 -0400
+++ sys/arch/xen/xen/xen_machdep.c 2025-05-30 20:42:39.936253878 -0400
@@ -553,7 +553,10 @@
/* XXX check device_cfdata as in x86_autoconf.c? */
booted_partition = toupper(
xcp.xcp_bootdev[strlen(devname)]) - 'A';
+ /* Check that the value of booted_partition is sane */
+ if (booted_partition & 0xfffffff0)
+ continue;
DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
}
booted_device = dv;
With this patch, the DOM0 kernel correctly detects dk12 as the root device when I set bootdev=dk12 in boot.cfg, and the kernel successfully loads /sbin/init from the NetBSD root partition and the boot proceeds as normal to completion:
snip ...
[ 13.924443] boot device: dk12
[ 14.114443] root on dk12 dumps on dk11
...
>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Sun, 1 Jun 2025 05:52:25 -0000 (UTC)
frchuckz@gmail.com writes:
>--- sys/arch/xen/xen/xen_machdep.c.orig 2023-10-18 12:53:03.000000000 -0400
>+++ sys/arch/xen/xen/xen_machdep.c 2025-05-30 20:42:39.936253878 -0400
>@@ -553,7 +553,10 @@
> /* XXX check device_cfdata as in x86_autoconf.c? */
> booted_partition = toupper(
> xcp.xcp_bootdev[strlen(devname)]) - 'A';
>+ /* Check that the value of booted_partition is sane */
>+ if (booted_partition & 0xfffffff0)
>+ continue;
> DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
> }
> booted_device = dv;
The code should check that the partition is between 0 and MAXPARTITIONS-1
instead of using some bitmask and it shouldn't touch booted_partition
before the value is validated.
It should also parse xcp_bootdev better (this was bad before and the
x86 version of that code isn't really better).
However, nothing of that really helps to use wedges ("dkXX") as
wedge unit numbers are a bit volatile.
The better alternative is to not set "bootdev" but to pass the
"root" command line option. The value is a string and interpreted
by the MI part of the kernel as a device name (with partition
for a disk) or as NAME=wedgename (or for compatiblity wedge:wedgename).
E.g.:
menu=Boot Xen Dom0:load /netbsd_xen console=pc root=NAME=my-root;multiboot /xen.gz
From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Sun, 1 Jun 2025 08:02:55 -0400
On 6/1/2025 1:55 AM, Michael van Elst via gnats wrote:
> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
> Date: Sun, 1 Jun 2025 05:52:25 -0000 (UTC)
>
> frchuckz@gmail.com writes:
>
> >--- sys/arch/xen/xen/xen_machdep.c.orig 2023-10-18 12:53:03.000000000 -0400
> >+++ sys/arch/xen/xen/xen_machdep.c 2025-05-30 20:42:39.936253878 -0400
> >@@ -553,7 +553,10 @@
> > /* XXX check device_cfdata as in x86_autoconf.c? */
> > booted_partition = toupper(
> > xcp.xcp_bootdev[strlen(devname)]) - 'A';
> >+ /* Check that the value of booted_partition is sane */
> >+ if (booted_partition & 0xfffffff0)
> >+ continue;
> > DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
> > }
> > booted_device = dv;
>
>
> The code should check that the partition is between 0 and MAXPARTITIONS-1
> instead of using some bitmask and it shouldn't touch booted_partition
> before the value is validated.
I agree the actual fix should verify booted_partition using MAXPARTITIONS.
And YES I agree that booted_partition should not be touched before it is
validated. Even the current, unpatched code has that problem!
>
> It should also parse xcp_bootdev better (this was bad before and the
> x86 version of that code isn't really better).
YES again!
>
>
> However, nothing of that really helps to use wedges ("dkXX") as
> wedge unit numbers are a bit volatile.
>
>
> The better alternative is to not set "bootdev" but to pass the
> "root" command line option. The value is a string and interpreted
> by the MI part of the kernel as a device name (with partition
> for a disk) or as NAME=wedgename (or for compatiblity wedge:wedgename).
>
> E.g.:
>
> menu=Boot Xen Dom0:load /netbsd_xen console=pc root=NAME=my-root;multiboot /xen.gz
>
>
I think even in this code which is not in the MI part of the kernel the NAME= syntax
for bootdev or rootdev is supported. I did try that but could not get a successful
boot using NAME=wedgename.
Also, according to a message I received on netbsd-users from Manuel who AFAIK is
a port-xen maintainer, there is no difference in the arch/xen code between
bootdev= and rootdev=. What you say here puts that in doubt, though.
Do you want me to try NAME=wedgename again using root= instead of bootdev= ?
Kind regards,
Chuck
From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 01:46:53 -0400
On 6/1/2025 8:05 AM, Chuck Zmudzinski via gnats wrote:
> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
>
> From: Chuck Zmudzinski <frchuckz@gmail.com>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
> Date: Sun, 1 Jun 2025 08:02:55 -0400
>
> On 6/1/2025 1:55 AM, Michael van Elst via gnats wrote:
> > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
> >
> > From: mlelstv@serpens.de (Michael van Elst)
> > To: gnats-bugs@netbsd.org
> > Cc:
> > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
> > Date: Sun, 1 Jun 2025 05:52:25 -0000 (UTC)
> >
> > ...
> >
> > However, nothing of that really helps to use wedges ("dkXX") as
> > wedge unit numbers are a bit volatile.
> >
> >
> > The better alternative is to not set "bootdev" but to pass the
> > "root" command line option. The value is a string and interpreted
> > by the MI part of the kernel as a device name (with partition
> > for a disk) or as NAME=wedgename (or for compatiblity wedge:wedgename).
> >
> > E.g.:
> >
> > menu=Boot Xen Dom0:load /netbsd_xen console=pc root=NAME=my-root;multiboot /xen.gz
> >
> >
>
> I think even in this code which is not in the MI part of the kernel the NAME= syntax
> for bootdev or rootdev is supported. I did try that but could not get a successful
> boot using NAME=wedgename.
I tested again with root=NAME=wedgename form again and it does work, even without
any kernel patches, but that is with a reduced length of the boot command I used in
boot.cfg. I verified that it did not work in my previous try because my boot.cfg
command line was too long, which include a long UUID plus long strings to specify
com port settings for Xen. When I went to a shorter command line, the NAME=wedgename
form for specifying the bootdev/root worked fine. The problem I noticed was the
last characters of the wedgename (it was a UUID) were truncated unless I reduced
the length of the command line in boot.cfg.
Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
NetBSD/xen is installed on a wedge with a double digit index, so this PR should
remain open.
>
> Also, according to a message I received on netbsd-users from Manuel who AFAIK is
> a port-xen maintainer, there is no difference in the arch/xen code between
> bootdev= and rootdev=. What you say here puts that in doubt, though.
My testing verified that Manuel is correct: bootdev= and root= behave the same way,
but actually root= is better because it gives me three more characters for the
boot command line before it overflows and starts to truncate the end of the command
line.
Kind regards,
Chuck Zmudzinski
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
> Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
> NetBSD/xen is installed on a wedge with a double digit index, so this PR should
> remain open.
Yes. I'm currently testing the following patch. The parsing is
still not perfect, e.g. it accepts a 'partition' for devices that
don't support partitions (like dk or dm).
Index: xen_machdep.c
===================================================================
RCS file: /cvsroot/src/sys/arch/xen/xen/xen_machdep.c,v
retrieving revision 1.29
diff -p -u -r1.29 xen_machdep.c
--- xen_machdep.c 17 Oct 2023 10:24:11 -0000 1.29
+++ xen_machdep.c 2 Jun 2025 11:56:22 -0000
@@ -62,6 +62,7 @@ __KERNEL_RCSID(0, "$NetBSD: xen_machdep.
#include <sys/boot_flag.h>
#include <sys/conf.h>
#include <sys/disk.h>
+#include <sys/disklabel.h>
#include <sys/device.h>
#include <sys/mount.h>
#include <sys/reboot.h>
@@ -546,14 +547,28 @@ xen_bootconf(void)
break;
}
- if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
- continue;
+ if (is_disk) {
+ size_t len;
+ int part;
+
+ len = strlen(devname);
+ if (strncmp(xcp.xcp_bootdev, devname, len) != 0)
+ continue;
+
+ if (xcp.xcp_bootdev[len] != '\0') {
+ part = xcp.xcp_bootdev[len] - 'a';
+ if (part < 0 || part >= MAXPARTITIONS)
+ continue;
+
+ if (xcp.xcp_bootdev[len+1] != '\0')
+ continue;
- if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
- /* XXX check device_cfdata as in x86_autoconf.c? */
- booted_partition = toupper(
- xcp.xcp_bootdev[strlen(devname)]) - 'A';
- DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
+ booted_partition = part;
+ DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
+ }
+ } else {
+ if (strcmp(xcp.xcp_bootdev, devname) != 0)
+ continue;
}
booted_device = dv;
> > Also, according to a message I received on netbsd-users from Manuel who AFAIK is
> > a port-xen maintainer, there is no difference in the arch/xen code between
> > bootdev= and rootdev=. What you say here puts that in doubt, though.
>
> My testing verified that Manuel is correct: bootdev= and root= behave the same way,
> but actually root= is better because it gives me three more characters for the
> boot command line before it overflows and starts to truncate the end of the command
> line.
bootdev= and root= currently work the same way.
The supplied string is checked against all disk and network interface
device names.
If there is a match, then the result is a pointer "booted_device"
to the driver and an integer "booted_partition". The kernel will
later try to access the particular driver and read data from the
numbered partition. Obviously the partition is ignored for a
network interface, and it should be 0 for something like a wedge
(that doesn't have partitions).
If there is no match, then the string is passed as is, similar to
a hardcoded embedded "config root" in the kernel config file. The
result is the string pointer "bootspec" that is evaluated later
by the kernel.
In either case, the supplied string can have 144 bytes (including
the terminating NUL character), but the parser truncates the
full command line to 255 characters first, not sure why.
From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 08:19:22 -0400
On 6/1/2025 1:55 AM, Michael van Elst via gnats wrote:
> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
> Date: Sun, 1 Jun 2025 05:52:25 -0000 (UTC)
>
> frchuckz@gmail.com writes:
>
> >--- sys/arch/xen/xen/xen_machdep.c.orig 2023-10-18 12:53:03.000000000 -0400
> >+++ sys/arch/xen/xen/xen_machdep.c 2025-05-30 20:42:39.936253878 -0400
> >@@ -553,7 +553,10 @@
> > /* XXX check device_cfdata as in x86_autoconf.c? */
> > booted_partition = toupper(
> > xcp.xcp_bootdev[strlen(devname)]) - 'A';
> >+ /* Check that the value of booted_partition is sane */
> >+ if (booted_partition & 0xfffffff0)
> >+ continue;
> > DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
> > }
> > booted_device = dv;
>
>
> The code should check that the partition is between 0 and MAXPARTITIONS-1
> instead of using some bitmask and it shouldn't touch booted_partition
> before the value is validated.
>
> It should also parse xcp_bootdev better (this was bad before and the
> x86 version of that code isn't really better).
>
>
> However, nothing of that really helps to use wedges ("dkXX") as
> wedge unit numbers are a bit volatile.
>
>
> The better alternative is to not set "bootdev" but to pass the
> "root" command line option. The value is a string and interpreted
> by the MI part of the kernel as a device name (with partition
> for a disk) or as NAME=wedgename (or for compatiblity wedge:wedgename).
>
> E.g.:
>
> menu=Boot Xen Dom0:load /netbsd_xen console=pc root=NAME=my-root;multiboot /xen.gz
>
>
I can verify that using the bootdev=NAME=wedgename form (or the root=NAME=wedgename
form) in boot.cfg instead of the bootdev=dkXX form (or the root=dkXX form) is a way
to work around this problem by completely bypassing this problematic xen MD code
and use the MI code that Michael refers to above instead.
Still, we should fix this problematic xen MD code so the bootdev=dkXX form (or the
root=dkXX form) in boot.cfg will not find the wrong root device.
So I will propose a patch for -current (after testing it on my box) that
uses MAXPARTITIONS to validate booted_partition and avoids touching
booted_partition before its value is validated. I don't know if such a patch
fully addresses the problems with this code, but at least it would prevent us
from finding the wrong root device when using the bootdev=dkXX form
(or the root=dkXX form) in boot.cfg and allow us to close this PR.
I also think the patch should be pulled to netbsd-10 and netbsd-9 (if netbsd-9
is still supported). In netbsd-9, I think this problematic code is in a different
file under sys/arch/xen, I think it is sys/arch/xen/x86/autoconf.c IIRC.
From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 10:00:56 -0400
On 6/2/2025 8:20 AM, Michael van Elst via gnats wrote:
> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
> Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
>
> gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
>
> > Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
> > NetBSD/xen is installed on a wedge with a double digit index, so this PR should
> > remain open.
>
> Yes. I'm currently testing the following patch. The parsing is
> still not perfect, e.g. it accepts a 'partition' for devices that
> don't support partitions (like dk or dm).
Yeah, the patches I have been testing also have this problem. Couldn't
we add a supports_partitions(device_t dev) function?
>
> Index: xen_machdep.c
> ===================================================================
> RCS file: /cvsroot/src/sys/arch/xen/xen/xen_machdep.c,v
> retrieving revision 1.29
> diff -p -u -r1.29 xen_machdep.c
> --- xen_machdep.c 17 Oct 2023 10:24:11 -0000 1.29
> +++ xen_machdep.c 2 Jun 2025 11:56:22 -0000
> @@ -62,6 +62,7 @@ __KERNEL_RCSID(0, "$NetBSD: xen_machdep.
> #include <sys/boot_flag.h>
> #include <sys/conf.h>
> #include <sys/disk.h>
> +#include <sys/disklabel.h>
Yes, I have this to pull in MAXPARTITIONS
> #include <sys/device.h>
> #include <sys/mount.h>
> #include <sys/reboot.h>
> @@ -546,14 +547,28 @@ xen_bootconf(void)
> break;
> }
>
> - if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
> - continue;
> + if (is_disk) {
> + size_t len;
> + int part;
> +
> + len = strlen(devname);
> + if (strncmp(xcp.xcp_bootdev, devname, len) != 0)
> + continue;
> +
> + if (xcp.xcp_bootdev[len] != '\0') {
> + part = xcp.xcp_bootdev[len] - 'a';
> + if (part < 0 || part >= MAXPARTITIONS)
> + continue;
This is exactly what I am also doing to validate the value for booted_partition,
except I keep toupper(...) and subtract 'A' instead of 'a', presumably so the code
works if bootdev=wd0A instead of bootdev=wd0a in boot.cfg. I see the code is
not consistent about this. I don't know if this is the only place we are
currently using toupper and 'A'.
> +
> + if (xcp.xcp_bootdev[len+1] != '\0')
> + continue;
I do not have this in the patch I am testing. This looks like another
sanity check to prevent accepting a match between bootdev=wd0a1 and
devname=wd0. I think what we have now would accept such a match, so
I suppose this is an improvement over what we have.
>
> - if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
> - /* XXX check device_cfdata as in x86_autoconf.c? */
> - booted_partition = toupper(
> - xcp.xcp_bootdev[strlen(devname)]) - 'A';
> - DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
> + booted_partition = part;
> + DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
> + }
> + } else {
> + if (strcmp(xcp.xcp_bootdev, devname) != 0)
> + continue;
The patch I am pondering avoids this if .. else logic but it looks like this
is needed the way this patch is organized to make sure we find the bootdev
when we have is_ifnet instead of is_disk.
> }
>
> booted_device = dv;
From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 11:48:18 -0400
On 6/2/2025 8:20 AM, Michael van Elst via gnats wrote:
> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
> Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
>
> gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
>
> > Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
> > NetBSD/xen is installed on a wedge with a double digit index, so this PR should
> > remain open.
>
> Yes. I'm currently testing the following patch. The parsing is
> still not perfect, e.g. it accepts a 'partition' for devices that
> don't support partitions (like dk or dm).
This raises a question in my mind that I don't know the answer to.
If I create a wedge, say it's dk2, on a host Xen dom0 system that
is set to be the virtual disk of a Xen domU and in the guest
domU I write a disklabel on that virtual disk, will those partitions
on the virtual disk in the guest domU show up on the host Xen dom0
system as devices with names like /dev/dk2a, /dev/dk2b, etc.?
In such a case, could we say dk devices can support partitions?
From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 12:48:30 -0400
On 6/2/2025 8:20 AM, Michael van Elst via gnats wrote:
> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
> Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
>
> gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
> ...
> > My testing verified that Manuel is correct: bootdev= and root= behave the same way,
> > but actually root= is better because it gives me three more characters for the
> > boot command line before it overflows and starts to truncate the end of the command
> > line.
>
>
> bootdev= and root= currently work the same way.
>
> The supplied string is checked against all disk and network interface
> device names.
>
> If there is a match, then the result is a pointer "booted_device"
> to the driver and an integer "booted_partition". The kernel will
> later try to access the particular driver and read data from the
> numbered partition. Obviously the partition is ignored for a
> network interface, and it should be 0 for something like a wedge
> (that doesn't have partitions).
>
> If there is no match, then the string is passed as is, similar to
> a hardcoded embedded "config root" in the kernel config file. The
> result is the string pointer "bootspec" that is evaluated later
> by the kernel.
>
> In either case, the supplied string can have 144 bytes (including
> the terminating NUL character), but the parser truncates the
> full command line to 255 characters first, not sure why.
>
After further investigation...
I don't think the string passed by the bootdev= setting is overflowing
in the kernel. Rather, I think the length of the arguments of the
"load" command that the bootloader uses to load the DOM0 kernel image
is overflowing. I think so because I was actually able to use a
long UUID as the wedgename by reducing the number of characters in
the filename of the kernel I am booting, and reducing that helps
reduce the length of the arguments to the bootlader's "load" command.
In any case, I decided to set a label on the GPT partition like "netsbd-root"
instead of using the long UUID as the wedgename to help ensure that nothing
in my boot.cfg gets truncated.
From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 15:33:21 -0400
On 6/2/2025 8:20 AM, Michael van Elst via gnats wrote:
> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
> Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
>
> gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
>
> > Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
> > NetBSD/xen is installed on a wedge with a double digit index, so this PR should
> > remain open.
>
> Yes. I'm currently testing the following patch. The parsing is
> still not perfect, e.g. it accepts a 'partition' for devices that
> don't support partitions (like dk or dm).
>
> Index: xen_machdep.c
> ===================================================================
> RCS file: /cvsroot/src/sys/arch/xen/xen/xen_machdep.c,v
> retrieving revision 1.29
> diff -p -u -r1.29 xen_machdep.c
> --- xen_machdep.c 17 Oct 2023 10:24:11 -0000 1.29
> +++ xen_machdep.c 2 Jun 2025 11:56:22 -0000
> @@ -62,6 +62,7 @@ __KERNEL_RCSID(0, "$NetBSD: xen_machdep.
> #include <sys/boot_flag.h>
> #include <sys/conf.h>
> #include <sys/disk.h>
> +#include <sys/disklabel.h>
> #include <sys/device.h>
> #include <sys/mount.h>
> #include <sys/reboot.h>
> @@ -546,14 +547,28 @@ xen_bootconf(void)
> break;
> }
>
> - if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
> - continue;
> + if (is_disk) {
> + size_t len;
> + int part;
> +
> + len = strlen(devname);
> + if (strncmp(xcp.xcp_bootdev, devname, len) != 0)
> + continue;
> +
> + if (xcp.xcp_bootdev[len] != '\0') {
> + part = xcp.xcp_bootdev[len] - 'a';
> + if (part < 0 || part >= MAXPARTITIONS)
> + continue;
> +
> + if (xcp.xcp_bootdev[len+1] != '\0')
> + continue;
>
> - if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
> - /* XXX check device_cfdata as in x86_autoconf.c? */
> - booted_partition = toupper(
> - xcp.xcp_bootdev[strlen(devname)]) - 'A';
> - DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
> + booted_partition = part;
> + DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
> + }
> + } else {
> + if (strcmp(xcp.xcp_bootdev, devname) != 0)
> + continue;
> }
>
> booted_device = dv;
>
I tested this patch on my box applied to netbsd-10 (it applies cleanly there),
and I can confirm it resolves the problem on my box when bootdev=dk12. Now it
finds the correct root device, dk12.
I cannot say if it introduces any unintended consequences since I cannot easily
test behavior with bootdev=wd1a or bootdev=rge0, etc. But it does look fine to me
and probably does a better job of parsing the bootdev string than what we
have now.
Thanks.
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 20:21:10 -0000 (UTC)
gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
> If I create a wedge, say it's dk2, on a host Xen dom0 system that
> is set to be the virtual disk of a Xen domU and in the guest
> domU I write a disklabel on that virtual disk, will those partitions
> on the virtual disk in the guest domU show up on the host Xen dom0
> system as devices with names like /dev/dk2a, /dev/dk2b, etc.?
The partitions will not show up. The dk driver doesn't know
anything about partitions. You can't open a partition, there
are no ioctls that handle partition information. The bits of
a disklabel on the storage are just bits that you can read.
The disklabel command can be told to read the bits from a file (-F),
and that also works for a wedge device. But that would only
print the disklabel bits, otherwise it has no meaning for the
Dom0.
From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 16:49:37 -0400
On 6/2/2025 4:25 PM, Michael van Elst via gnats wrote:
> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
> Date: Mon, 2 Jun 2025 20:21:10 -0000 (UTC)
>
> gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
>
> > If I create a wedge, say it's dk2, on a host Xen dom0 system that
> > is set to be the virtual disk of a Xen domU and in the guest
> > domU I write a disklabel on that virtual disk, will those partitions
> > on the virtual disk in the guest domU show up on the host Xen dom0
> > system as devices with names like /dev/dk2a, /dev/dk2b, etc.?
>
> The partitions will not show up. The dk driver doesn't know
> anything about partitions. You can't open a partition, there
> are no ioctls that handle partition information. The bits of
> a disklabel on the storage are just bits that you can read.
>
> The disklabel command can be told to read the bits from a file (-F),
> and that also works for a wedge device. But that would only
> print the disklabel bits, otherwise it has no meaning for the
> Dom0.
>
So trying to catch the case of a user setting bootdev=dk2a in boot.cfg is
just trying to catch a case when the user made a mistake. It seems unlikely
to happen because the user will see never see devices named dk2a on a dom0
system. Most likely, it a user did that, your proposed patch would strip
off the a and set booted_partition to 0 and it would most likely just work
if dk2 was the correct root device.
If the user set something like dk2e, then booted_partition will be 4 I think,
but in my testing the code more or less ignores booted_partition, even if
it is a garbage value, in the case of bootdev being set as a dkXX device in
boot.cfg. In my case, it was -15: '2' - 'A'. That bad value was actually
present in a function in init_main.c where there is a KASSERT to check if
booted_partition is in the expected range (>= 0 or < MAXPARTITIONS), but the
KASSERT never got triggered in my testing.
From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 16:51:56 -0400
On 6/2/2025 4:49 PM, Chuck Zmudzinski wrote:
> On 6/2/2025 4:25 PM, Michael van Elst via gnats wrote:
>> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
>>
>> From: mlelstv@serpens.de (Michael van Elst)
>> To: gnats-bugs@netbsd.org
>> Cc:
>> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
>> Date: Mon, 2 Jun 2025 20:21:10 -0000 (UTC)
>>
>> gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
>>
>> > If I create a wedge, say it's dk2, on a host Xen dom0 system that
>> > is set to be the virtual disk of a Xen domU and in the guest
>> > domU I write a disklabel on that virtual disk, will those partitions
>> > on the virtual disk in the guest domU show up on the host Xen dom0
>> > system as devices with names like /dev/dk2a, /dev/dk2b, etc.?
>>
>> The partitions will not show up. The dk driver doesn't know
>> anything about partitions. You can't open a partition, there
>> are no ioctls that handle partition information. The bits of
>> a disklabel on the storage are just bits that you can read.
>>
>> The disklabel command can be told to read the bits from a file (-F),
>> and that also works for a wedge device. But that would only
>> print the disklabel bits, otherwise it has no meaning for the
>> Dom0.
>>
>
> So trying to catch the case of a user setting bootdev=dk2a in boot.cfg is
> just trying to catch a case when the user made a mistake. It seems unlikely
> to happen because the user will see never see devices named dk2a on a dom0
> system. Most likely, it a user did that, your proposed patch would strip
> off the a and set booted_partition to 0 and it would most likely just work
> if dk2 was the correct root device.
>
> If the user set something like dk2e, then booted_partition will be 4 I think,
> but in my testing the code more or less ignores booted_partition, even if
> it is a garbage value, in the case of bootdev being set as a dkXX device in
> boot.cfg. In my case, it was -15: '2' - 'A'. That bad value was actually
> present in a function in init_main.c where there is a KASSERT to check if
> booted_partition is in the expected range (>= 0 or < MAXPARTITIONS), but the
> KASSERT never got triggered in my testing.
Oops, the KASSERT checks it is not in the unexpected range (>= 0 or < MAXPARTITIONS).
From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 16:53:21 -0400
On 6/2/2025 4:51 PM, Chuck Zmudzinski wrote:
> On 6/2/2025 4:49 PM, Chuck Zmudzinski wrote:
>> On 6/2/2025 4:25 PM, Michael van Elst via gnats wrote:
>>> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
>>>
>>> From: mlelstv@serpens.de (Michael van Elst)
>>> To: gnats-bugs@netbsd.org
>>> Cc:
>>> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
>>> Date: Mon, 2 Jun 2025 20:21:10 -0000 (UTC)
>>>
>>> gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
>>>
>>> > If I create a wedge, say it's dk2, on a host Xen dom0 system that
>>> > is set to be the virtual disk of a Xen domU and in the guest
>>> > domU I write a disklabel on that virtual disk, will those partitions
>>> > on the virtual disk in the guest domU show up on the host Xen dom0
>>> > system as devices with names like /dev/dk2a, /dev/dk2b, etc.?
>>>
>>> The partitions will not show up. The dk driver doesn't know
>>> anything about partitions. You can't open a partition, there
>>> are no ioctls that handle partition information. The bits of
>>> a disklabel on the storage are just bits that you can read.
>>>
>>> The disklabel command can be told to read the bits from a file (-F),
>>> and that also works for a wedge device. But that would only
>>> print the disklabel bits, otherwise it has no meaning for the
>>> Dom0.
>>>
>>
>> So trying to catch the case of a user setting bootdev=dk2a in boot.cfg is
>> just trying to catch a case when the user made a mistake. It seems unlikely
>> to happen because the user will see never see devices named dk2a on a dom0
>> system. Most likely, it a user did that, your proposed patch would strip
>> off the a and set booted_partition to 0 and it would most likely just work
>> if dk2 was the correct root device.
>>
>> If the user set something like dk2e, then booted_partition will be 4 I think,
>> but in my testing the code more or less ignores booted_partition, even if
>> it is a garbage value, in the case of bootdev being set as a dkXX device in
>> boot.cfg. In my case, it was -15: '2' - 'A'. That bad value was actually
>> present in a function in init_main.c where there is a KASSERT to check if
>> booted_partition is in the expected range (>= 0 or < MAXPARTITIONS), but the
>> KASSERT never got triggered in my testing.
>
> Oops, the KASSERT checks it is not in the unexpected range (>= 0 or < MAXPARTITIONS).
Oops again, I had it right the first time. Sorry for the noise. I'm getting old...
From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 19:41:32 -0400
On 6/2/2025 8:20 AM, Michael van Elst via gnats wrote:
> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
> Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
>
> gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
>
> > Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
> > NetBSD/xen is installed on a wedge with a double digit index, so this PR should
> > remain open.
>
> Yes. I'm currently testing the following patch. The parsing is
> still not perfect, e.g. it accepts a 'partition' for devices that
> don't support partitions (like dk or dm).
I see your code accepts a 'partition' for dk, but I don't see why for dm.
We have this:
is_disk = is_valid_disk(dv);
and this:
static int
is_valid_disk(device_t dv)
{
if (device_class(dv) != DV_DISK)
return (0);
return (device_is_a(dv, "dk") ||
device_is_a(dv, "sd") ||
device_is_a(dv, "wd") ||
device_is_a(dv, "ld") ||
device_is_a(dv, "ed") ||
device_is_a(dv, "xbd"));
}
"dk" is there as true for is_disk, but not "dm" and that makes
sense because my understanding is that NetBSD/xen does not currently support
booting with an LVM device as root. If we have an LVM device like dm0 and
set bootdev=dm0 when trying to boot NetBSD/xen PV dom0, what do you think
would happen with our current code and with your proposed patch? It looks to
me like you would never get a match, so would the kernel just try dm0 and
see what happens?
>
> Index: xen_machdep.c
> ===================================================================
> RCS file: /cvsroot/src/sys/arch/xen/xen/xen_machdep.c,v
> retrieving revision 1.29
> diff -p -u -r1.29 xen_machdep.c
> --- xen_machdep.c 17 Oct 2023 10:24:11 -0000 1.29
> +++ xen_machdep.c 2 Jun 2025 11:56:22 -0000
> @@ -62,6 +62,7 @@ __KERNEL_RCSID(0, "$NetBSD: xen_machdep.
> #include <sys/boot_flag.h>
> #include <sys/conf.h>
> #include <sys/disk.h>
> +#include <sys/disklabel.h>
> #include <sys/device.h>
> #include <sys/mount.h>
> #include <sys/reboot.h>
> @@ -546,14 +547,28 @@ xen_bootconf(void)
> break;
> }
>
> - if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
> - continue;
> + if (is_disk) {
> + size_t len;
> + int part;
> +
> + len = strlen(devname);
> + if (strncmp(xcp.xcp_bootdev, devname, len) != 0)
> + continue;
> +
> + if (xcp.xcp_bootdev[len] != '\0') {
> + part = xcp.xcp_bootdev[len] - 'a';
> + if (part < 0 || part >= MAXPARTITIONS)
> + continue;
> +
> + if (xcp.xcp_bootdev[len+1] != '\0')
> + continue;
>
> - if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
> - /* XXX check device_cfdata as in x86_autoconf.c? */
> - booted_partition = toupper(
> - xcp.xcp_bootdev[strlen(devname)]) - 'A';
> - DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
> + booted_partition = part;
> + DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
> + }
> + } else {
> + if (strcmp(xcp.xcp_bootdev, devname) != 0)
> + continue;
> }
>
> booted_device = dv;
>
>
> > > Also, according to a message I received on netbsd-users from Manuel who AFAIK is
> > > a port-xen maintainer, there is no difference in the arch/xen code between
> > > bootdev= and rootdev=. What you say here puts that in doubt, though.
> >
> > My testing verified that Manuel is correct: bootdev= and root= behave the same way,
> > but actually root= is better because it gives me three more characters for the
> > boot command line before it overflows and starts to truncate the end of the command
> > line.
>
>
> bootdev= and root= currently work the same way.
>
> The supplied string is checked against all disk and network interface
> device names.
>
> If there is a match, then the result is a pointer "booted_device"
> to the driver and an integer "booted_partition". The kernel will
> later try to access the particular driver and read data from the
> numbered partition. Obviously the partition is ignored for a
> network interface, and it should be 0 for something like a wedge
> (that doesn't have partitions).
>
> If there is no match, then the string is passed as is, similar to
> a hardcoded embedded "config root" in the kernel config file. The
> result is the string pointer "bootspec" that is evaluated later
> by the kernel.
>
> In either case, the supplied string can have 144 bytes (including
> the terminating NUL character), but the parser truncates the
> full command line to 255 characters first, not sure why.
>
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Tue, 3 Jun 2025 05:33:10 -0000 (UTC)
gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
> > Yes. I'm currently testing the following patch. The parsing is
> > still not perfect, e.g. it accepts a 'partition' for devices that
> > don't support partitions (like dk or dm).
>
> I see your code accepts a 'partition' for dk, but I don't see why for dm.
dk and dm are just examples for drivers that don't support partitions.
Currently the dm driver couldn't be used for booting as the instances are
created only by a userland program. But if someone adds code for LVM
autoconfiguration, it could.
To identify a boot device, the code needs to handle 3 different
cases:
- a disk (supports partitions via disklabel).
- some storage (basically a single partition).
- a network interface.
The first two drivers have a class DV_DISK.
The last driver has a class DV_IFNET.
The distinction between the first two cases is difficult
as the driver class is the same. A regular disk supports
the DIOCGDINFO ioctl (to return a label), but a storage
driver like 'dk' does not. The problem to distinguish
both cases exists since wedges were invented and in the
meantime some other added drivers behave similarly.
There is some ad-hoc code in the kernel that tries to
identify such drivers by name.
E.g. from kern/kern_subr.c as part of the "MI" root filesystem
handling that I did talk about:
/*
* Use partition letters if it's a disk class but not a wedge or flash.
* XXX Check for wedge/flash is kinda gross.
*/
#define DEV_USES_PARTITIONS(dv) \
(device_class((dv)) == DV_DISK && \
!device_is_a((dv), "dk") && \
!device_is_a((dv), "flash"))
From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Tue, 3 Jun 2025 08:22:27 -0400
On 6/2/2025 8:20 AM, Michael van Elst via gnats wrote:
> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
> Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
>
> gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
>
> > Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
> > NetBSD/xen is installed on a wedge with a double digit index, so this PR should
> > remain open.
>
> Yes. I'm currently testing the following patch. The parsing is
> still not perfect, e.g. it accepts a 'partition' for devices that
> don't support partitions (like dk or dm).
>
> Index: xen_machdep.c
> ===================================================================
> RCS file: /cvsroot/src/sys/arch/xen/xen/xen_machdep.c,v
> retrieving revision 1.29
> diff -p -u -r1.29 xen_machdep.c
> --- xen_machdep.c 17 Oct 2023 10:24:11 -0000 1.29
> +++ xen_machdep.c 2 Jun 2025 11:56:22 -0000
> @@ -62,6 +62,7 @@ __KERNEL_RCSID(0, "$NetBSD: xen_machdep.
> #include <sys/boot_flag.h>
> #include <sys/conf.h>
> #include <sys/disk.h>
> +#include <sys/disklabel.h>
> #include <sys/device.h>
> #include <sys/mount.h>
> #include <sys/reboot.h>
> @@ -546,14 +547,28 @@ xen_bootconf(void)
> break;
> }
>
> - if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
> - continue;
> + if (is_disk) {
> + size_t len;
> + int part;
> +
> + len = strlen(devname);
> + if (strncmp(xcp.xcp_bootdev, devname, len) != 0)
> + continue;
> +
> + if (xcp.xcp_bootdev[len] != '\0') {
> + part = xcp.xcp_bootdev[len] - 'a';
> + if (part < 0 || part >= MAXPARTITIONS)
> + continue;
> +
> + if (xcp.xcp_bootdev[len+1] != '\0')
> + continue;
>
> - if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
> - /* XXX check device_cfdata as in x86_autoconf.c? */
> - booted_partition = toupper(
> - xcp.xcp_bootdev[strlen(devname)]) - 'A';
> - DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
> + booted_partition = part;
> + DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
> + }
> + } else {
> + if (strcmp(xcp.xcp_bootdev, devname) != 0)
> + continue;
> }
>
> booted_device = dv;
>
So taking into account everything I have learned, the only thing I would
add to this patch is a check that we do not have a "dk" wedge device
before we try to find a partition and set booted_partition. Probably replacing
the first insertion of the second hunk:
+ if (is_disk) {
with
+ if (is_disk && !device_is_a(dv, "dk")) {
But I have not tested this yet.
This patch that attempts to fix the parsing problems here is
probably only suitable for -current. I still plan to propose a
conservative patch more suitable for the stable branches that
will be less likely to introduce unintended consequences for
stable users.
From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Tue, 3 Jun 2025 11:57:14 -0400
On 6/3/2025 8:25 AM, Chuck Zmudzinski via gnats wrote:
> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
>
> From: Chuck Zmudzinski <frchuckz@gmail.com>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
> Date: Tue, 3 Jun 2025 08:22:27 -0400
>
> On 6/2/2025 8:20 AM, Michael van Elst via gnats wrote:
> > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
> >
> > From: mlelstv@serpens.de (Michael van Elst)
> > To: gnats-bugs@netbsd.org
> > Cc:
> > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
> > Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
> >
> > gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
> >
> > > Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
> > > NetBSD/xen is installed on a wedge with a double digit index, so this PR should
> > > remain open.
> >
> > Yes. I'm currently testing the following patch. The parsing is
> > still not perfect, e.g. it accepts a 'partition' for devices that
> > don't support partitions (like dk or dm).
> >
> > Index: xen_machdep.c
> > ===================================================================
> > RCS file: /cvsroot/src/sys/arch/xen/xen/xen_machdep.c,v
> > retrieving revision 1.29
> > diff -p -u -r1.29 xen_machdep.c
> > --- xen_machdep.c 17 Oct 2023 10:24:11 -0000 1.29
> > +++ xen_machdep.c 2 Jun 2025 11:56:22 -0000
> > @@ -62,6 +62,7 @@ __KERNEL_RCSID(0, "$NetBSD: xen_machdep.
> > #include <sys/boot_flag.h>
> > #include <sys/conf.h>
> > #include <sys/disk.h>
> > +#include <sys/disklabel.h>
> > #include <sys/device.h>
> > #include <sys/mount.h>
> > #include <sys/reboot.h>
> > @@ -546,14 +547,28 @@ xen_bootconf(void)
> > break;
> > }
> >
> > - if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
> > - continue;
> > + if (is_disk) {
> > + size_t len;
> > + int part;
> > +
> > + len = strlen(devname);
> > + if (strncmp(xcp.xcp_bootdev, devname, len) != 0)
> > + continue;
> > +
> > + if (xcp.xcp_bootdev[len] != '\0') {
> > + part = xcp.xcp_bootdev[len] - 'a';
> > + if (part < 0 || part >= MAXPARTITIONS)
> > + continue;
> > +
> > + if (xcp.xcp_bootdev[len+1] != '\0')
> > + continue;
> >
> > - if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
> > - /* XXX check device_cfdata as in x86_autoconf.c? */
> > - booted_partition = toupper(
> > - xcp.xcp_bootdev[strlen(devname)]) - 'A';
> > - DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
> > + booted_partition = part;
> > + DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
> > + }
> > + } else {
> > + if (strcmp(xcp.xcp_bootdev, devname) != 0)
> > + continue;
> > }
> >
> > booted_device = dv;
> >
>
> So taking into account everything I have learned, the only thing I would
> add to this patch is a check that we do not have a "dk" wedge device
> before we try to find a partition and set booted_partition. Probably replacing
> the first insertion of the second hunk:
>
> + if (is_disk) {
>
> with
>
> + if (is_disk && !device_is_a(dv, "dk")) {
>
> But I have not tested this yet.
>
> This patch that attempts to fix the parsing problems here is
> probably only suitable for -current. I still plan to propose a
> conservative patch more suitable for the stable branches that
> will be less likely to introduce unintended consequences for
> stable users.
>
Here is my proposed conservative patch for the stable branches:
--- xen_machdep.c 2023-10-18 12:53:03.000000000 -0400
+++ xen_machdep.c 2025-06-03 10:22:46.222889485 -0400
@@ -62,6 +62,7 @@
#include <sys/boot_flag.h>
#include <sys/conf.h>
#include <sys/disk.h>
+#include <sys/disklabel.h>
#include <sys/device.h>
#include <sys/mount.h>
#include <sys/reboot.h>
@@ -90,6 +91,8 @@
#define DPRINTF(a)
#endif
+#define PARTITION_IS_VALID(part) \
+ ((part >= 0) && (part < MAXPARTITIONS))
bool xen_suspend_allow;
@@ -549,10 +552,16 @@
if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
continue;
+ if (device_is_a(dv, "dk"))
+ continue;
+
if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
/* XXX check device_cfdata as in x86_autoconf.c? */
- booted_partition = toupper(
+ int part = toupper(
xcp.xcp_bootdev[strlen(devname)]) - 'A';
+ if (!PARTITION_IS_VALID(part))
+ continue;
+ booted_partition = part;
DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
}
I tested this on netbsd-10 and it fixes the problem. This file is the same in
-current, so it should also work for -current.
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.