NetBSD Problem Report #59451

From www@netbsd.org  Sat May 31 19:07:31 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits)
	 client-signature RSA-PSS (2048 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id AD6071A923D
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 31 May 2025 19:07:31 +0000 (UTC)
Message-Id: <20250531190729.C97221A923E@mollari.NetBSD.org>
Date: Sat, 31 May 2025 19:07:29 +0000 (UTC)
From: frchuckz@gmail.com
Reply-To: frchuckz@gmail.com
To: gnats-bugs@NetBSD.org
Subject: XEN3_DOM0 kernel finds the wrong root device
X-Send-Pr-Version: www-1.0

>Number:         59451
>Category:       port-xen
>Synopsis:       XEN3_DOM0 kernel finds the wrong root device
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-xen-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat May 31 19:10:00 +0000 2025
>Last-Modified:  Tue Jun 03 16:00:04 +0000 2025
>Originator:     Chuck Zmudzinski
>Release:        NetBSD 10.1 Release
>Organization:
Home User
>Environment:
NetBSD netbsd 10.1 NetBSD 10.1 (XEN3_DOM0) #0: Mon Dec 16 13:08:11 UTC 2024  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/xen/compile/XEN3_DOM0 amd64
>Description:
On a system with UEFI/GPT for booting and disk partitioning, the XEN3_DOM0 kernel can spectacularly fail in its attempt to find the correct root device even when the correct dk wedge is passed to the kernel in boot.cfg.

This problem was discussed extensively in a thread on netbsd-users, especially starting with this message: https://mail-index.netbsd.org/netbsd-users/2025/05/29/msg032694.html

I credit Greg Woods, the author of the aforementioned message on netbsd-users, for correctly identifying the problem.

The tl;dr version of the problem:

This code from sys/arch/xen/xen/xen_machdep.c is not strict enough in
its sanity checks and in some cases picks the wrong boot device and
try to find a "booted_partition" when there is no "boot_partition" to find:

		if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
			continue;

		if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
			/* XXX check device_cfdata as in x86_autoconf.c? */
			booted_partition = toupper(
				xcp.xcp_bootdev[strlen(devname)]) - 'A';
			DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
		}
>How-To-Repeat:
1. It is most likely necessary to have a UEFI/GPT system that can boot NetBSD/xen to reproduce this problem. It might be possible to reproduce it on a system that uses BIOS booting and MBR partitioning, but I would recommend only trying to reproduce this problem on a UEFI/GPT system that relies on dk wedges rather than a NetBSD disklabel to find the root partition.

2. It is also necessary to have more than 10 dk wedges on the system and to install the NetBSD root/boot device on a dk wedge with a double digit index. On my box, NetBSD was installed on dk12. If NetBSD is installed on any wedge with a single digit index, I doubt the problem is reproducible. So if NetBSD is installed on dk9, for example, the problem will not be reproducible unless you do another install of NetBSD on dk10 or dk11, for example. The important thing is to have a NetBSD/xen PV DOM0 system installed on a wedge with a double digit wedge index.

3. Write a boot.cfg file to boot the NetBSD/xen system installed on the wedge with a double digit index and try to boot it. In such a case, you will need to set bootdev=dk12 (or whatever wedge NetBSD is installed on). But remember the problem will not be reproducible if you set bootdev=dk9, dk8, etc. It must be dk10 or higher. And NetBSD/xen PV DOM0 must be installed on the wedge that bootdev points to.

4. Also, it is probably necessary to setup Xen to use a serial console so you can interact with Xen and DOM0 at the serial console during the boot process to actually observe the details of this problem.

5. You should see the problem. In my case, with bootdev=dk12, the kernel tried to boot dk1 instead which, on my box, happened to have a Linux distro installed on it. So of course it failed to boot and the boot failed when the kernel tried to load the Linux distro's /sbin/init which in that case was systemd. This should up on the serial console like this:

snip ...
[   5.1699079] boot device: dk1

[   5.1699079] root on dk1 dumps on dk11

[   5.1799090] Your machine does not initialize mem_clusters; sparse_dumps disabled

[   5.1799090] root file system type: ext2fs

[   5.1799090] kern.module.path=/stand/amd64/10.1/modules

[   5.1826204] exec /sbin/init: error 8

[   5.1826204] init: trying /sbin/oinit

[   5.1826204] exec /sbin/oinit: error 2

[   5.1826204] init: trying /sbin/init.bak

[   5.1826204] exec /sbin/init.bak: error 2

[   5.1826204] init: trying /rescue/init

[   5.1826204] exec /rescue/init: error 2

[   5.1826204] init path (default /sbin/init):

Note the error 8 in response to exec /sbin/init in the output above. That is when the NetBSD DOM0 kernel tried to execute systemd on the Linux distro that was installed on dk1. It did this even though I gave the kernel the correct parameter of bootdev=dk12 in boot.cfg!

It was impossible to recover from this situation except by rebooting without Xen, setting bootdev=<something that will cause the kernel to give me a "root device" prompt> in the Xen menu item of boot.cfg, and reboot again with Xen and interactively enter the correct root device and dump device at the serial console when the kernel prompts for those. Then, finally, you get a successful boot.
>Fix:
I wrote a very simple 3-line proof of concept patch to fix it for the netbsd-10 branch. I don't endorse this patch as the final fix, though, because with this patch we still have garbage for boot_partition which could throw an assertion later on during the boot process (but it didn't in my test because the code where the assertion is never executed):

--- sys/arch/xen/xen/xen_machdep.c.orig 2023-10-18 12:53:03.000000000 -0400
+++ sys/arch/xen/xen/xen_machdep.c      2025-05-30 20:42:39.936253878 -0400
@@ -553,7 +553,10 @@
                        /* XXX check device_cfdata as in x86_autoconf.c? */
                        booted_partition = toupper(
                                xcp.xcp_bootdev[strlen(devname)]) - 'A';
+                       /* Check that the value of booted_partition is sane */
+                       if (booted_partition & 0xfffffff0)
+                               continue;
                        DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
                }

                booted_device = dv;

With this patch, the DOM0 kernel correctly detects dk12 as the root device when I set bootdev=dk12 in boot.cfg, and the kernel successfully loads /sbin/init from the NetBSD root partition and the boot proceeds as normal to completion:

snip ...
[    13.924443] boot device: dk12
[    14.114443] root on dk12 dumps on dk11
...

>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Sun, 1 Jun 2025 05:52:25 -0000 (UTC)

 frchuckz@gmail.com writes:

 >--- sys/arch/xen/xen/xen_machdep.c.orig 2023-10-18 12:53:03.000000000 -0400
 >+++ sys/arch/xen/xen/xen_machdep.c      2025-05-30 20:42:39.936253878 -0400
 >@@ -553,7 +553,10 @@
 >                        /* XXX check device_cfdata as in x86_autoconf.c? */
 >                        booted_partition = toupper(
 >                                xcp.xcp_bootdev[strlen(devname)]) - 'A';
 >+                       /* Check that the value of booted_partition is sane */
 >+                       if (booted_partition & 0xfffffff0)
 >+                               continue;
 >                        DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 >                }
 >                booted_device = dv;


 The code should check that the partition is between 0 and MAXPARTITIONS-1
 instead of using some bitmask and it shouldn't touch booted_partition
 before the value is validated.

 It should also parse xcp_bootdev better (this was bad before and the
 x86 version of that code isn't really better).


 However, nothing of that really helps to use wedges ("dkXX") as
 wedge unit numbers are a bit volatile.


 The better alternative is to not set "bootdev" but to pass the
 "root" command line option. The value is a string and interpreted
 by the MI part of the kernel as a device name (with partition
 for a disk) or as NAME=wedgename (or for compatiblity wedge:wedgename).

 E.g.:

 menu=Boot Xen Dom0:load /netbsd_xen console=pc root=NAME=my-root;multiboot /xen.gz


From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Sun, 1 Jun 2025 08:02:55 -0400

 On 6/1/2025 1:55 AM, Michael van Elst via gnats wrote:
 > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 > 
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 > Date: Sun, 1 Jun 2025 05:52:25 -0000 (UTC)
 > 
 >  frchuckz@gmail.com writes:
 >  
 >  >--- sys/arch/xen/xen/xen_machdep.c.orig 2023-10-18 12:53:03.000000000 -0400
 >  >+++ sys/arch/xen/xen/xen_machdep.c      2025-05-30 20:42:39.936253878 -0400
 >  >@@ -553,7 +553,10 @@
 >  >                        /* XXX check device_cfdata as in x86_autoconf.c? */
 >  >                        booted_partition = toupper(
 >  >                                xcp.xcp_bootdev[strlen(devname)]) - 'A';
 >  >+                       /* Check that the value of booted_partition is sane */
 >  >+                       if (booted_partition & 0xfffffff0)
 >  >+                               continue;
 >  >                        DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 >  >                }
 >  >                booted_device = dv;
 >  
 >  
 >  The code should check that the partition is between 0 and MAXPARTITIONS-1
 >  instead of using some bitmask and it shouldn't touch booted_partition
 >  before the value is validated.

 I agree the actual fix should verify booted_partition using MAXPARTITIONS.
 And YES I agree that booted_partition should not be touched before it is
 validated. Even the current, unpatched code has that problem!

 >  
 >  It should also parse xcp_bootdev better (this was bad before and the
 >  x86 version of that code isn't really better).

 YES again!

 >  
 >  
 >  However, nothing of that really helps to use wedges ("dkXX") as
 >  wedge unit numbers are a bit volatile.
 >  
 >  
 >  The better alternative is to not set "bootdev" but to pass the
 >  "root" command line option. The value is a string and interpreted
 >  by the MI part of the kernel as a device name (with partition
 >  for a disk) or as NAME=wedgename (or for compatiblity wedge:wedgename).
 >  
 >  E.g.:
 >  
 >  menu=Boot Xen Dom0:load /netbsd_xen console=pc root=NAME=my-root;multiboot /xen.gz
 >  
 >  

 I think even in this code which is not in the MI part of the kernel the NAME= syntax
 for bootdev or rootdev is supported. I did try that but could not get a successful
 boot using NAME=wedgename.

 Also, according to a message I received on netbsd-users from Manuel who AFAIK is
 a port-xen maintainer, there is no difference in the arch/xen code between
 bootdev= and rootdev=. What you say here puts that in doubt, though.

 Do you want me to try NAME=wedgename again using root= instead of bootdev= ?

 Kind regards,

 Chuck

From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 01:46:53 -0400

 On 6/1/2025 8:05 AM, Chuck Zmudzinski via gnats wrote:
 > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 > 
 > From: Chuck Zmudzinski <frchuckz@gmail.com>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 > Date: Sun, 1 Jun 2025 08:02:55 -0400
 > 
 >  On 6/1/2025 1:55 AM, Michael van Elst via gnats wrote:
 >  > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 >  > 
 >  > From: mlelstv@serpens.de (Michael van Elst)
 >  > To: gnats-bugs@netbsd.org
 >  > Cc: 
 >  > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 >  > Date: Sun, 1 Jun 2025 05:52:25 -0000 (UTC)
 >  > 
 >  > ...
 >  >  
 >  >  However, nothing of that really helps to use wedges ("dkXX") as
 >  >  wedge unit numbers are a bit volatile.
 >  >  
 >  >  
 >  >  The better alternative is to not set "bootdev" but to pass the
 >  >  "root" command line option. The value is a string and interpreted
 >  >  by the MI part of the kernel as a device name (with partition
 >  >  for a disk) or as NAME=wedgename (or for compatiblity wedge:wedgename).
 >  >  
 >  >  E.g.:
 >  >  
 >  >  menu=Boot Xen Dom0:load /netbsd_xen console=pc root=NAME=my-root;multiboot /xen.gz
 >  >  
 >  >  
 >  
 >  I think even in this code which is not in the MI part of the kernel the NAME= syntax
 >  for bootdev or rootdev is supported. I did try that but could not get a successful
 >  boot using NAME=wedgename.

 I tested again with root=NAME=wedgename form again and it does work, even without
 any kernel patches, but that is with a reduced length of the boot command I used in
 boot.cfg. I verified that it did not work in my previous try because my boot.cfg
 command line was too long, which include a long UUID plus long strings to specify
 com port settings for Xen. When I went to a shorter command line, the NAME=wedgename
 form for specifying the bootdev/root worked fine. The problem I noticed was the
 last characters of the wedgename (it was a UUID) were truncated unless I reduced
 the length of the command line in boot.cfg.

 Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
 NetBSD/xen is installed on a wedge with a double digit index, so this PR should
 remain open.

 >  
 >  Also, according to a message I received on netbsd-users from Manuel who AFAIK is
 >  a port-xen maintainer, there is no difference in the arch/xen code between
 >  bootdev= and rootdev=. What you say here puts that in doubt, though.

 My testing verified that Manuel is correct: bootdev= and root= behave the same way,
 but actually root= is better because it gives me three more characters for the
 boot command line before it overflows and starts to truncate the end of the command
 line.

 Kind regards,

 Chuck Zmudzinski

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)

 gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:

 > Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
 > NetBSD/xen is installed on a wedge with a double digit index, so this PR should
 > remain open.

 Yes. I'm currently testing the following patch. The parsing is
 still not perfect, e.g. it accepts a 'partition' for devices that
 don't support partitions (like dk or dm).

 Index: xen_machdep.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/xen/xen/xen_machdep.c,v
 retrieving revision 1.29
 diff -p -u -r1.29 xen_machdep.c
 --- xen_machdep.c       17 Oct 2023 10:24:11 -0000      1.29
 +++ xen_machdep.c       2 Jun 2025 11:56:22 -0000
 @@ -62,6 +62,7 @@ __KERNEL_RCSID(0, "$NetBSD: xen_machdep.
  #include <sys/boot_flag.h>
  #include <sys/conf.h>
  #include <sys/disk.h>
 +#include <sys/disklabel.h>
  #include <sys/device.h>
  #include <sys/mount.h>
  #include <sys/reboot.h>
 @@ -546,14 +547,28 @@ xen_bootconf(void)
                         break;
                 }

 -               if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
 -                       continue;
 +               if (is_disk) {
 +                       size_t len;
 +                       int part;
 +
 +                       len = strlen(devname);
 +                       if  (strncmp(xcp.xcp_bootdev, devname, len) != 0)
 +                               continue;
 +
 +                       if (xcp.xcp_bootdev[len] != '\0') {
 +                               part = xcp.xcp_bootdev[len] - 'a';
 +                               if (part < 0 || part >= MAXPARTITIONS)
 +                                       continue;
 +
 +                               if (xcp.xcp_bootdev[len+1] != '\0')
 +                                       continue;

 -               if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
 -                       /* XXX check device_cfdata as in x86_autoconf.c? */
 -                       booted_partition = toupper(
 -                               xcp.xcp_bootdev[strlen(devname)]) - 'A';
 -                       DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 +                               booted_partition = part;
 +                               DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 +                       }
 +               } else {
 +                       if  (strcmp(xcp.xcp_bootdev, devname) != 0)
 +                               continue;
                 }

                 booted_device = dv;


 > >  Also, according to a message I received on netbsd-users from Manuel who AFAIK is
 > >  a port-xen maintainer, there is no difference in the arch/xen code between
 > >  bootdev= and rootdev=. What you say here puts that in doubt, though.
 > 
 > My testing verified that Manuel is correct: bootdev= and root= behave the same way,
 > but actually root= is better because it gives me three more characters for the
 > boot command line before it overflows and starts to truncate the end of the command
 > line.


 bootdev= and root= currently work the same way.

 The supplied string is checked against all disk and network interface
 device names.

 If there is a match, then the result is a pointer "booted_device"
 to the driver and an integer "booted_partition". The kernel will
 later try to access the particular driver and read data from the
 numbered partition. Obviously the partition is ignored for a
 network interface, and it should be 0 for something like a wedge
 (that doesn't have partitions).

 If there is no match, then the string is passed as is, similar to
 a hardcoded embedded "config root" in the kernel config file. The
 result is the string pointer "bootspec" that is evaluated later
 by the kernel.

 In either case, the supplied string can have 144 bytes (including
 the terminating NUL character), but the parser truncates the
 full command line to 255 characters first, not sure why.

From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 08:19:22 -0400

 On 6/1/2025 1:55 AM, Michael van Elst via gnats wrote:
 > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 > 
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 > Date: Sun, 1 Jun 2025 05:52:25 -0000 (UTC)
 > 
 >  frchuckz@gmail.com writes:
 >  
 >  >--- sys/arch/xen/xen/xen_machdep.c.orig 2023-10-18 12:53:03.000000000 -0400
 >  >+++ sys/arch/xen/xen/xen_machdep.c      2025-05-30 20:42:39.936253878 -0400
 >  >@@ -553,7 +553,10 @@
 >  >                        /* XXX check device_cfdata as in x86_autoconf.c? */
 >  >                        booted_partition = toupper(
 >  >                                xcp.xcp_bootdev[strlen(devname)]) - 'A';
 >  >+                       /* Check that the value of booted_partition is sane */
 >  >+                       if (booted_partition & 0xfffffff0)
 >  >+                               continue;
 >  >                        DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 >  >                }
 >  >                booted_device = dv;
 >  
 >  
 >  The code should check that the partition is between 0 and MAXPARTITIONS-1
 >  instead of using some bitmask and it shouldn't touch booted_partition
 >  before the value is validated.
 >  
 >  It should also parse xcp_bootdev better (this was bad before and the
 >  x86 version of that code isn't really better).
 >  
 >  
 >  However, nothing of that really helps to use wedges ("dkXX") as
 >  wedge unit numbers are a bit volatile.
 >  
 >  
 >  The better alternative is to not set "bootdev" but to pass the
 >  "root" command line option. The value is a string and interpreted
 >  by the MI part of the kernel as a device name (with partition
 >  for a disk) or as NAME=wedgename (or for compatiblity wedge:wedgename).
 >  
 >  E.g.:
 >  
 >  menu=Boot Xen Dom0:load /netbsd_xen console=pc root=NAME=my-root;multiboot /xen.gz
 >  
 >  

 I can verify that using the bootdev=NAME=wedgename form (or the root=NAME=wedgename
 form) in boot.cfg instead of the bootdev=dkXX form (or the root=dkXX form) is a way
 to work around this problem by completely bypassing this problematic xen MD code
 and use the MI code that Michael refers to above instead.

 Still, we should fix this problematic xen MD code so the bootdev=dkXX form (or the
 root=dkXX form) in boot.cfg will not find the wrong root device.

 So I will propose a patch for -current (after testing it on my box) that
 uses MAXPARTITIONS to validate booted_partition and avoids touching
 booted_partition before its value is validated. I don't know if such a patch
 fully addresses the problems with this code, but at least it would prevent us
 from finding the wrong root device when using the bootdev=dkXX form
 (or the root=dkXX form) in boot.cfg and allow us to close this PR.

 I also think the patch should be pulled to netbsd-10 and netbsd-9 (if netbsd-9
 is still supported). In netbsd-9, I think this problematic code is in a different
 file under sys/arch/xen, I think it is sys/arch/xen/x86/autoconf.c IIRC.

From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 10:00:56 -0400

 On 6/2/2025 8:20 AM, Michael van Elst via gnats wrote:
 > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 > 
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 > Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
 > 
 >  gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
 >  
 >  > Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
 >  > NetBSD/xen is installed on a wedge with a double digit index, so this PR should
 >  > remain open.
 >  
 >  Yes. I'm currently testing the following patch. The parsing is
 >  still not perfect, e.g. it accepts a 'partition' for devices that
 >  don't support partitions (like dk or dm).

 Yeah, the patches I have been testing also have this problem. Couldn't
 we add a supports_partitions(device_t dev) function?

 >  
 >  Index: xen_machdep.c
 >  ===================================================================
 >  RCS file: /cvsroot/src/sys/arch/xen/xen/xen_machdep.c,v
 >  retrieving revision 1.29
 >  diff -p -u -r1.29 xen_machdep.c
 >  --- xen_machdep.c       17 Oct 2023 10:24:11 -0000      1.29
 >  +++ xen_machdep.c       2 Jun 2025 11:56:22 -0000
 >  @@ -62,6 +62,7 @@ __KERNEL_RCSID(0, "$NetBSD: xen_machdep.
 >   #include <sys/boot_flag.h>
 >   #include <sys/conf.h>
 >   #include <sys/disk.h>
 >  +#include <sys/disklabel.h>

 Yes, I have this to pull in MAXPARTITIONS

 >   #include <sys/device.h>
 >   #include <sys/mount.h>
 >   #include <sys/reboot.h>
 >  @@ -546,14 +547,28 @@ xen_bootconf(void)
 >                          break;
 >                  }
 >   
 >  -               if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
 >  -                       continue;
 >  +               if (is_disk) {
 >  +                       size_t len;
 >  +                       int part;
 >  +
 >  +                       len = strlen(devname);
 >  +                       if  (strncmp(xcp.xcp_bootdev, devname, len) != 0)
 >  +                               continue;
 >  +
 >  +                       if (xcp.xcp_bootdev[len] != '\0') {
 >  +                               part = xcp.xcp_bootdev[len] - 'a';
 >  +                               if (part < 0 || part >= MAXPARTITIONS)
 >  +                                       continue;

 This is exactly what I am also doing to validate the value for booted_partition,
 except I keep toupper(...) and subtract 'A' instead of 'a', presumably so the code
 works if bootdev=wd0A instead of bootdev=wd0a in boot.cfg. I see the code is
 not consistent about this. I don't know if this is the only place we are
 currently using toupper and 'A'.

 >  +
 >  +                               if (xcp.xcp_bootdev[len+1] != '\0')
 >  +                                       continue;

 I do not have this in the patch I am testing. This looks like another
 sanity check to prevent accepting a match between bootdev=wd0a1 and
 devname=wd0. I think what we have now would accept such a match, so
 I suppose this is an improvement over what we have.

 >   
 >  -               if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
 >  -                       /* XXX check device_cfdata as in x86_autoconf.c? */
 >  -                       booted_partition = toupper(
 >  -                               xcp.xcp_bootdev[strlen(devname)]) - 'A';
 >  -                       DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 >  +                               booted_partition = part;
 >  +                               DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 >  +                       }
 >  +               } else {
 >  +                       if  (strcmp(xcp.xcp_bootdev, devname) != 0)
 >  +                               continue;

 The patch I am pondering avoids this if .. else logic but it looks like this
 is needed the way this patch is organized to make sure we find the bootdev
 when we have is_ifnet instead of is_disk.

 >                  }
 >   
 >                  booted_device = dv;

From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 11:48:18 -0400

 On 6/2/2025 8:20 AM, Michael van Elst via gnats wrote:
 > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 > 
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 > Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
 > 
 >  gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
 >  
 >  > Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
 >  > NetBSD/xen is installed on a wedge with a double digit index, so this PR should
 >  > remain open.
 >  
 >  Yes. I'm currently testing the following patch. The parsing is
 >  still not perfect, e.g. it accepts a 'partition' for devices that
 >  don't support partitions (like dk or dm).

 This raises a question in my mind that I don't know the answer to.

 If I create a wedge, say it's dk2, on a host Xen dom0 system that
 is set to be the virtual disk of a Xen domU and in the guest
 domU I write a disklabel on that virtual disk, will those partitions
 on the virtual disk in the guest domU show up on the host Xen dom0
 system as devices with names like /dev/dk2a, /dev/dk2b, etc.?

 In such a case, could we say dk devices can support partitions?

From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 12:48:30 -0400

 On 6/2/2025 8:20 AM, Michael van Elst via gnats wrote:
 > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 > 
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 > Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
 > 
 >  gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
 >  ...
 >  > My testing verified that Manuel is correct: bootdev= and root= behave the same way,
 >  > but actually root= is better because it gives me three more characters for the
 >  > boot command line before it overflows and starts to truncate the end of the command
 >  > line.
 >  
 >  
 >  bootdev= and root= currently work the same way.
 >  
 >  The supplied string is checked against all disk and network interface
 >  device names.
 >  
 >  If there is a match, then the result is a pointer "booted_device"
 >  to the driver and an integer "booted_partition". The kernel will
 >  later try to access the particular driver and read data from the
 >  numbered partition. Obviously the partition is ignored for a
 >  network interface, and it should be 0 for something like a wedge
 >  (that doesn't have partitions).
 >  
 >  If there is no match, then the string is passed as is, similar to
 >  a hardcoded embedded "config root" in the kernel config file. The
 >  result is the string pointer "bootspec" that is evaluated later
 >  by the kernel.
 >  
 >  In either case, the supplied string can have 144 bytes (including
 >  the terminating NUL character), but the parser truncates the
 >  full command line to 255 characters first, not sure why.
 >  

 After further investigation...

 I don't think the string passed by the bootdev= setting is overflowing
 in the kernel. Rather, I think the length of the arguments of the
 "load" command that the bootloader uses to load the DOM0 kernel image
 is overflowing. I think so because I was actually able to use a
 long UUID as the wedgename by reducing the number of characters in
 the filename of the kernel I am booting, and reducing that helps
 reduce the length of the arguments to the bootlader's "load" command.

 In any case, I decided to set a label on the GPT partition like "netsbd-root"
 instead of using the long UUID as the wedgename to help ensure that nothing
 in my boot.cfg gets truncated.

From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 15:33:21 -0400

 On 6/2/2025 8:20 AM, Michael van Elst via gnats wrote:
 > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 > 
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 > Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
 > 
 >  gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
 >  
 >  > Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
 >  > NetBSD/xen is installed on a wedge with a double digit index, so this PR should
 >  > remain open.
 >  
 >  Yes. I'm currently testing the following patch. The parsing is
 >  still not perfect, e.g. it accepts a 'partition' for devices that
 >  don't support partitions (like dk or dm).
 >  
 >  Index: xen_machdep.c
 >  ===================================================================
 >  RCS file: /cvsroot/src/sys/arch/xen/xen/xen_machdep.c,v
 >  retrieving revision 1.29
 >  diff -p -u -r1.29 xen_machdep.c
 >  --- xen_machdep.c       17 Oct 2023 10:24:11 -0000      1.29
 >  +++ xen_machdep.c       2 Jun 2025 11:56:22 -0000
 >  @@ -62,6 +62,7 @@ __KERNEL_RCSID(0, "$NetBSD: xen_machdep.
 >   #include <sys/boot_flag.h>
 >   #include <sys/conf.h>
 >   #include <sys/disk.h>
 >  +#include <sys/disklabel.h>
 >   #include <sys/device.h>
 >   #include <sys/mount.h>
 >   #include <sys/reboot.h>
 >  @@ -546,14 +547,28 @@ xen_bootconf(void)
 >                          break;
 >                  }
 >   
 >  -               if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
 >  -                       continue;
 >  +               if (is_disk) {
 >  +                       size_t len;
 >  +                       int part;
 >  +
 >  +                       len = strlen(devname);
 >  +                       if  (strncmp(xcp.xcp_bootdev, devname, len) != 0)
 >  +                               continue;
 >  +
 >  +                       if (xcp.xcp_bootdev[len] != '\0') {
 >  +                               part = xcp.xcp_bootdev[len] - 'a';
 >  +                               if (part < 0 || part >= MAXPARTITIONS)
 >  +                                       continue;
 >  +
 >  +                               if (xcp.xcp_bootdev[len+1] != '\0')
 >  +                                       continue;
 >   
 >  -               if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
 >  -                       /* XXX check device_cfdata as in x86_autoconf.c? */
 >  -                       booted_partition = toupper(
 >  -                               xcp.xcp_bootdev[strlen(devname)]) - 'A';
 >  -                       DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 >  +                               booted_partition = part;
 >  +                               DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 >  +                       }
 >  +               } else {
 >  +                       if  (strcmp(xcp.xcp_bootdev, devname) != 0)
 >  +                               continue;
 >                  }
 >   
 >                  booted_device = dv;
 >  

 I tested this patch on my box applied to netbsd-10 (it applies cleanly there),
 and I can confirm it resolves the problem on my box when bootdev=dk12. Now it
 finds the correct root device, dk12.

 I cannot say if it introduces any unintended consequences since I cannot easily
 test behavior with bootdev=wd1a or bootdev=rge0, etc. But it does look fine to me
 and probably does a better job of parsing the bootdev string than what we
 have now.

 Thanks.

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 20:21:10 -0000 (UTC)

 gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:

 > If I create a wedge, say it's dk2, on a host Xen dom0 system that
 > is set to be the virtual disk of a Xen domU and in the guest
 > domU I write a disklabel on that virtual disk, will those partitions
 > on the virtual disk in the guest domU show up on the host Xen dom0
 > system as devices with names like /dev/dk2a, /dev/dk2b, etc.?

 The partitions will not show up. The dk driver doesn't know
 anything about partitions. You can't open a partition, there
 are no ioctls that handle partition information. The bits of
 a disklabel on the storage are just bits that you can read.

 The disklabel command can be told to read the bits from a file (-F),
 and that also works for a wedge device. But that would only
 print the disklabel bits, otherwise it has no meaning for the
 Dom0.

From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 16:49:37 -0400

 On 6/2/2025 4:25 PM, Michael van Elst via gnats wrote:
 > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 > 
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 > Date: Mon, 2 Jun 2025 20:21:10 -0000 (UTC)
 > 
 >  gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
 >  
 >  > If I create a wedge, say it's dk2, on a host Xen dom0 system that
 >  > is set to be the virtual disk of a Xen domU and in the guest
 >  > domU I write a disklabel on that virtual disk, will those partitions
 >  > on the virtual disk in the guest domU show up on the host Xen dom0
 >  > system as devices with names like /dev/dk2a, /dev/dk2b, etc.?
 >  
 >  The partitions will not show up. The dk driver doesn't know
 >  anything about partitions. You can't open a partition, there
 >  are no ioctls that handle partition information. The bits of
 >  a disklabel on the storage are just bits that you can read.
 >  
 >  The disklabel command can be told to read the bits from a file (-F),
 >  and that also works for a wedge device. But that would only
 >  print the disklabel bits, otherwise it has no meaning for the
 >  Dom0.
 >  

 So trying to catch the case of a user setting bootdev=dk2a in boot.cfg is
 just trying to catch a case when the user made a mistake. It seems unlikely
 to happen because the user will see never see devices named dk2a on a dom0
 system. Most likely, it a user did that, your proposed patch would strip
 off the a and set booted_partition to 0 and it would most likely just work
 if dk2 was the correct root device.

 If the user set something like dk2e, then booted_partition will be 4 I think,
 but in my testing the code more or less ignores booted_partition, even if
 it is a garbage value, in the case of bootdev being set as a dkXX device in
 boot.cfg. In my case, it was -15: '2' - 'A'. That bad value was actually
 present in a function in init_main.c where there is a KASSERT to check if
 booted_partition is in the expected range (>= 0 or < MAXPARTITIONS), but the
 KASSERT never got triggered in my testing.

From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 16:51:56 -0400

 On 6/2/2025 4:49 PM, Chuck Zmudzinski wrote:
 > On 6/2/2025 4:25 PM, Michael van Elst via gnats wrote:
 >> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 >> 
 >> From: mlelstv@serpens.de (Michael van Elst)
 >> To: gnats-bugs@netbsd.org
 >> Cc: 
 >> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 >> Date: Mon, 2 Jun 2025 20:21:10 -0000 (UTC)
 >> 
 >>  gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
 >>  
 >>  > If I create a wedge, say it's dk2, on a host Xen dom0 system that
 >>  > is set to be the virtual disk of a Xen domU and in the guest
 >>  > domU I write a disklabel on that virtual disk, will those partitions
 >>  > on the virtual disk in the guest domU show up on the host Xen dom0
 >>  > system as devices with names like /dev/dk2a, /dev/dk2b, etc.?
 >>  
 >>  The partitions will not show up. The dk driver doesn't know
 >>  anything about partitions. You can't open a partition, there
 >>  are no ioctls that handle partition information. The bits of
 >>  a disklabel on the storage are just bits that you can read.
 >>  
 >>  The disklabel command can be told to read the bits from a file (-F),
 >>  and that also works for a wedge device. But that would only
 >>  print the disklabel bits, otherwise it has no meaning for the
 >>  Dom0.
 >>  
 > 
 > So trying to catch the case of a user setting bootdev=dk2a in boot.cfg is
 > just trying to catch a case when the user made a mistake. It seems unlikely
 > to happen because the user will see never see devices named dk2a on a dom0
 > system. Most likely, it a user did that, your proposed patch would strip
 > off the a and set booted_partition to 0 and it would most likely just work
 > if dk2 was the correct root device.
 > 
 > If the user set something like dk2e, then booted_partition will be 4 I think,
 > but in my testing the code more or less ignores booted_partition, even if
 > it is a garbage value, in the case of bootdev being set as a dkXX device in
 > boot.cfg. In my case, it was -15: '2' - 'A'. That bad value was actually
 > present in a function in init_main.c where there is a KASSERT to check if
 > booted_partition is in the expected range (>= 0 or < MAXPARTITIONS), but the
 > KASSERT never got triggered in my testing.

 Oops, the KASSERT checks it is not in the unexpected range (>= 0 or < MAXPARTITIONS).

From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 16:53:21 -0400

 On 6/2/2025 4:51 PM, Chuck Zmudzinski wrote:
 > On 6/2/2025 4:49 PM, Chuck Zmudzinski wrote:
 >> On 6/2/2025 4:25 PM, Michael van Elst via gnats wrote:
 >>> The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 >>> 
 >>> From: mlelstv@serpens.de (Michael van Elst)
 >>> To: gnats-bugs@netbsd.org
 >>> Cc: 
 >>> Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 >>> Date: Mon, 2 Jun 2025 20:21:10 -0000 (UTC)
 >>> 
 >>>  gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
 >>>  
 >>>  > If I create a wedge, say it's dk2, on a host Xen dom0 system that
 >>>  > is set to be the virtual disk of a Xen domU and in the guest
 >>>  > domU I write a disklabel on that virtual disk, will those partitions
 >>>  > on the virtual disk in the guest domU show up on the host Xen dom0
 >>>  > system as devices with names like /dev/dk2a, /dev/dk2b, etc.?
 >>>  
 >>>  The partitions will not show up. The dk driver doesn't know
 >>>  anything about partitions. You can't open a partition, there
 >>>  are no ioctls that handle partition information. The bits of
 >>>  a disklabel on the storage are just bits that you can read.
 >>>  
 >>>  The disklabel command can be told to read the bits from a file (-F),
 >>>  and that also works for a wedge device. But that would only
 >>>  print the disklabel bits, otherwise it has no meaning for the
 >>>  Dom0.
 >>>  
 >> 
 >> So trying to catch the case of a user setting bootdev=dk2a in boot.cfg is
 >> just trying to catch a case when the user made a mistake. It seems unlikely
 >> to happen because the user will see never see devices named dk2a on a dom0
 >> system. Most likely, it a user did that, your proposed patch would strip
 >> off the a and set booted_partition to 0 and it would most likely just work
 >> if dk2 was the correct root device.
 >> 
 >> If the user set something like dk2e, then booted_partition will be 4 I think,
 >> but in my testing the code more or less ignores booted_partition, even if
 >> it is a garbage value, in the case of bootdev being set as a dkXX device in
 >> boot.cfg. In my case, it was -15: '2' - 'A'. That bad value was actually
 >> present in a function in init_main.c where there is a KASSERT to check if
 >> booted_partition is in the expected range (>= 0 or < MAXPARTITIONS), but the
 >> KASSERT never got triggered in my testing.
 > 
 > Oops, the KASSERT checks it is not in the unexpected range (>= 0 or < MAXPARTITIONS).

 Oops again, I had it right the first time. Sorry for the noise. I'm getting old...

From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Mon, 2 Jun 2025 19:41:32 -0400

 On 6/2/2025 8:20 AM, Michael van Elst via gnats wrote:
 > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 > 
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 > Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
 > 
 >  gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
 >  
 >  > Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
 >  > NetBSD/xen is installed on a wedge with a double digit index, so this PR should
 >  > remain open.
 >  
 >  Yes. I'm currently testing the following patch. The parsing is
 >  still not perfect, e.g. it accepts a 'partition' for devices that
 >  don't support partitions (like dk or dm).

 I see your code accepts a 'partition' for dk, but I don't see why for dm.
 We have this:

 is_disk = is_valid_disk(dv);

 and this:

 static int
 is_valid_disk(device_t dv)
 {
 	if (device_class(dv) != DV_DISK)
 		return (0);

 	return (device_is_a(dv, "dk") ||
 		device_is_a(dv, "sd") ||
 		device_is_a(dv, "wd") ||
 		device_is_a(dv, "ld") ||
 		device_is_a(dv, "ed") ||
 		device_is_a(dv, "xbd"));
 }

 "dk" is there as true for is_disk, but not "dm" and that makes
 sense because my understanding is that NetBSD/xen does not currently support
 booting with an LVM device as root. If we have an LVM device like dm0 and
 set bootdev=dm0 when trying to boot NetBSD/xen PV dom0, what do you think
 would happen with our current code and with your proposed patch? It looks to
 me like you would never get a match, so would the kernel just try dm0 and
 see what happens?

 >  
 >  Index: xen_machdep.c
 >  ===================================================================
 >  RCS file: /cvsroot/src/sys/arch/xen/xen/xen_machdep.c,v
 >  retrieving revision 1.29
 >  diff -p -u -r1.29 xen_machdep.c
 >  --- xen_machdep.c       17 Oct 2023 10:24:11 -0000      1.29
 >  +++ xen_machdep.c       2 Jun 2025 11:56:22 -0000
 >  @@ -62,6 +62,7 @@ __KERNEL_RCSID(0, "$NetBSD: xen_machdep.
 >   #include <sys/boot_flag.h>
 >   #include <sys/conf.h>
 >   #include <sys/disk.h>
 >  +#include <sys/disklabel.h>
 >   #include <sys/device.h>
 >   #include <sys/mount.h>
 >   #include <sys/reboot.h>
 >  @@ -546,14 +547,28 @@ xen_bootconf(void)
 >                          break;
 >                  }
 >   
 >  -               if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
 >  -                       continue;
 >  +               if (is_disk) {
 >  +                       size_t len;
 >  +                       int part;
 >  +
 >  +                       len = strlen(devname);
 >  +                       if  (strncmp(xcp.xcp_bootdev, devname, len) != 0)
 >  +                               continue;
 >  +
 >  +                       if (xcp.xcp_bootdev[len] != '\0') {
 >  +                               part = xcp.xcp_bootdev[len] - 'a';
 >  +                               if (part < 0 || part >= MAXPARTITIONS)
 >  +                                       continue;
 >  +
 >  +                               if (xcp.xcp_bootdev[len+1] != '\0')
 >  +                                       continue;
 >   
 >  -               if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
 >  -                       /* XXX check device_cfdata as in x86_autoconf.c? */
 >  -                       booted_partition = toupper(
 >  -                               xcp.xcp_bootdev[strlen(devname)]) - 'A';
 >  -                       DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 >  +                               booted_partition = part;
 >  +                               DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 >  +                       }
 >  +               } else {
 >  +                       if  (strcmp(xcp.xcp_bootdev, devname) != 0)
 >  +                               continue;
 >                  }
 >   
 >                  booted_device = dv;
 >  
 >  
 >  > >  Also, according to a message I received on netbsd-users from Manuel who AFAIK is
 >  > >  a port-xen maintainer, there is no difference in the arch/xen code between
 >  > >  bootdev= and rootdev=. What you say here puts that in doubt, though.
 >  > 
 >  > My testing verified that Manuel is correct: bootdev= and root= behave the same way,
 >  > but actually root= is better because it gives me three more characters for the
 >  > boot command line before it overflows and starts to truncate the end of the command
 >  > line.
 >  
 >  
 >  bootdev= and root= currently work the same way.
 >  
 >  The supplied string is checked against all disk and network interface
 >  device names.
 >  
 >  If there is a match, then the result is a pointer "booted_device"
 >  to the driver and an integer "booted_partition". The kernel will
 >  later try to access the particular driver and read data from the
 >  numbered partition. Obviously the partition is ignored for a
 >  network interface, and it should be 0 for something like a wedge
 >  (that doesn't have partitions).
 >  
 >  If there is no match, then the string is passed as is, similar to
 >  a hardcoded embedded "config root" in the kernel config file. The
 >  result is the string pointer "bootspec" that is evaluated later
 >  by the kernel.
 >  
 >  In either case, the supplied string can have 144 bytes (including
 >  the terminating NUL character), but the parser truncates the
 >  full command line to 255 characters first, not sure why.
 >  

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Tue, 3 Jun 2025 05:33:10 -0000 (UTC)

 gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:

 > >  Yes. I'm currently testing the following patch. The parsing is
 > >  still not perfect, e.g. it accepts a 'partition' for devices that
 > >  don't support partitions (like dk or dm).
 > 
 > I see your code accepts a 'partition' for dk, but I don't see why for dm.


 dk and dm are just examples for drivers that don't support partitions.

 Currently the dm driver couldn't be used for booting as the instances are
 created only by a userland program. But if someone adds code for LVM
 autoconfiguration, it could.


 To identify a boot device, the code needs to handle 3 different
 cases:

 - a disk (supports partitions via disklabel).
 - some storage (basically a single partition).
 - a network interface.

 The first two drivers have a class DV_DISK.
 The last driver has a class DV_IFNET.


 The distinction between the first two cases is difficult
 as the driver class is the same. A regular disk supports
 the DIOCGDINFO ioctl (to return a label), but a storage
 driver like 'dk' does not. The problem to distinguish
 both cases exists since wedges were invented and in the
 meantime some other added drivers behave similarly.


 There is some ad-hoc code in the kernel that tries to
 identify such drivers by name.

 E.g. from kern/kern_subr.c as part of the "MI" root filesystem
 handling that I did talk about:

 /*
  * Use partition letters if it's a disk class but not a wedge or flash.
  * XXX Check for wedge/flash is kinda gross.
  */
 #define DEV_USES_PARTITIONS(dv)                                         \
         (device_class((dv)) == DV_DISK &&                               \
          !device_is_a((dv), "dk") &&                                    \
          !device_is_a((dv), "flash"))


From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Tue, 3 Jun 2025 08:22:27 -0400

 On 6/2/2025 8:20 AM, Michael van Elst via gnats wrote:
 > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 > 
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 > Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
 > 
 >  gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
 >  
 >  > Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
 >  > NetBSD/xen is installed on a wedge with a double digit index, so this PR should
 >  > remain open.
 >  
 >  Yes. I'm currently testing the following patch. The parsing is
 >  still not perfect, e.g. it accepts a 'partition' for devices that
 >  don't support partitions (like dk or dm).
 >  
 >  Index: xen_machdep.c
 >  ===================================================================
 >  RCS file: /cvsroot/src/sys/arch/xen/xen/xen_machdep.c,v
 >  retrieving revision 1.29
 >  diff -p -u -r1.29 xen_machdep.c
 >  --- xen_machdep.c       17 Oct 2023 10:24:11 -0000      1.29
 >  +++ xen_machdep.c       2 Jun 2025 11:56:22 -0000
 >  @@ -62,6 +62,7 @@ __KERNEL_RCSID(0, "$NetBSD: xen_machdep.
 >   #include <sys/boot_flag.h>
 >   #include <sys/conf.h>
 >   #include <sys/disk.h>
 >  +#include <sys/disklabel.h>
 >   #include <sys/device.h>
 >   #include <sys/mount.h>
 >   #include <sys/reboot.h>
 >  @@ -546,14 +547,28 @@ xen_bootconf(void)
 >                          break;
 >                  }
 >   
 >  -               if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
 >  -                       continue;
 >  +               if (is_disk) {
 >  +                       size_t len;
 >  +                       int part;
 >  +
 >  +                       len = strlen(devname);
 >  +                       if  (strncmp(xcp.xcp_bootdev, devname, len) != 0)
 >  +                               continue;
 >  +
 >  +                       if (xcp.xcp_bootdev[len] != '\0') {
 >  +                               part = xcp.xcp_bootdev[len] - 'a';
 >  +                               if (part < 0 || part >= MAXPARTITIONS)
 >  +                                       continue;
 >  +
 >  +                               if (xcp.xcp_bootdev[len+1] != '\0')
 >  +                                       continue;
 >   
 >  -               if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
 >  -                       /* XXX check device_cfdata as in x86_autoconf.c? */
 >  -                       booted_partition = toupper(
 >  -                               xcp.xcp_bootdev[strlen(devname)]) - 'A';
 >  -                       DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 >  +                               booted_partition = part;
 >  +                               DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 >  +                       }
 >  +               } else {
 >  +                       if  (strcmp(xcp.xcp_bootdev, devname) != 0)
 >  +                               continue;
 >                  }
 >   
 >                  booted_device = dv;
 >  

 So taking into account everything I have learned, the only thing I would
 add to this patch is a check that we do not have a "dk" wedge device
 before we try to find a partition and set booted_partition. Probably replacing
 the first insertion of the second hunk:

  +               if (is_disk) {

 with

  +               if (is_disk && !device_is_a(dv, "dk")) {

 But I have not tested this yet.

 This patch that attempts to fix the parsing problems here is
 probably only suitable for -current. I still plan to propose a
 conservative patch more suitable for the stable branches that
 will be less likely to introduce unintended consequences for
 stable users.

From: Chuck Zmudzinski <frchuckz@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
Date: Tue, 3 Jun 2025 11:57:14 -0400

 On 6/3/2025 8:25 AM, Chuck Zmudzinski via gnats wrote:
 > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 > 
 > From: Chuck Zmudzinski <frchuckz@gmail.com>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 > Date: Tue, 3 Jun 2025 08:22:27 -0400
 > 
 >  On 6/2/2025 8:20 AM, Michael van Elst via gnats wrote:
 >  > The following reply was made to PR port-xen/59451; it has been noted by GNATS.
 >  > 
 >  > From: mlelstv@serpens.de (Michael van Elst)
 >  > To: gnats-bugs@netbsd.org
 >  > Cc: 
 >  > Subject: Re: port-xen/59451: XEN3_DOM0 kernel finds the wrong root device
 >  > Date: Mon, 2 Jun 2025 12:18:41 -0000 (UTC)
 >  > 
 >  >  gnats-admin@NetBSD.org ("Chuck Zmudzinski via gnats") writes:
 >  >  
 >  >  > Still, the bootdev=dk12 form needs some fix/kernel patch for this case when
 >  >  > NetBSD/xen is installed on a wedge with a double digit index, so this PR should
 >  >  > remain open.
 >  >  
 >  >  Yes. I'm currently testing the following patch. The parsing is
 >  >  still not perfect, e.g. it accepts a 'partition' for devices that
 >  >  don't support partitions (like dk or dm).
 >  >  
 >  >  Index: xen_machdep.c
 >  >  ===================================================================
 >  >  RCS file: /cvsroot/src/sys/arch/xen/xen/xen_machdep.c,v
 >  >  retrieving revision 1.29
 >  >  diff -p -u -r1.29 xen_machdep.c
 >  >  --- xen_machdep.c       17 Oct 2023 10:24:11 -0000      1.29
 >  >  +++ xen_machdep.c       2 Jun 2025 11:56:22 -0000
 >  >  @@ -62,6 +62,7 @@ __KERNEL_RCSID(0, "$NetBSD: xen_machdep.
 >  >   #include <sys/boot_flag.h>
 >  >   #include <sys/conf.h>
 >  >   #include <sys/disk.h>
 >  >  +#include <sys/disklabel.h>
 >  >   #include <sys/device.h>
 >  >   #include <sys/mount.h>
 >  >   #include <sys/reboot.h>
 >  >  @@ -546,14 +547,28 @@ xen_bootconf(void)
 >  >                          break;
 >  >                  }
 >  >   
 >  >  -               if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
 >  >  -                       continue;
 >  >  +               if (is_disk) {
 >  >  +                       size_t len;
 >  >  +                       int part;
 >  >  +
 >  >  +                       len = strlen(devname);
 >  >  +                       if  (strncmp(xcp.xcp_bootdev, devname, len) != 0)
 >  >  +                               continue;
 >  >  +
 >  >  +                       if (xcp.xcp_bootdev[len] != '\0') {
 >  >  +                               part = xcp.xcp_bootdev[len] - 'a';
 >  >  +                               if (part < 0 || part >= MAXPARTITIONS)
 >  >  +                                       continue;
 >  >  +
 >  >  +                               if (xcp.xcp_bootdev[len+1] != '\0')
 >  >  +                                       continue;
 >  >   
 >  >  -               if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
 >  >  -                       /* XXX check device_cfdata as in x86_autoconf.c? */
 >  >  -                       booted_partition = toupper(
 >  >  -                               xcp.xcp_bootdev[strlen(devname)]) - 'A';
 >  >  -                       DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 >  >  +                               booted_partition = part;
 >  >  +                               DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
 >  >  +                       }
 >  >  +               } else {
 >  >  +                       if  (strcmp(xcp.xcp_bootdev, devname) != 0)
 >  >  +                               continue;
 >  >                  }
 >  >   
 >  >                  booted_device = dv;
 >  >  
 >  
 >  So taking into account everything I have learned, the only thing I would
 >  add to this patch is a check that we do not have a "dk" wedge device
 >  before we try to find a partition and set booted_partition. Probably replacing
 >  the first insertion of the second hunk:
 >  
 >   +               if (is_disk) {
 >  
 >  with
 >  
 >   +               if (is_disk && !device_is_a(dv, "dk")) {
 >  
 >  But I have not tested this yet.
 >  
 >  This patch that attempts to fix the parsing problems here is
 >  probably only suitable for -current. I still plan to propose a
 >  conservative patch more suitable for the stable branches that
 >  will be less likely to introduce unintended consequences for
 >  stable users.
 >  

 Here is my proposed conservative patch for the stable branches:

 --- xen_machdep.c	2023-10-18 12:53:03.000000000 -0400
 +++ xen_machdep.c	2025-06-03 10:22:46.222889485 -0400
 @@ -62,6 +62,7 @@
  #include <sys/boot_flag.h>
  #include <sys/conf.h>
  #include <sys/disk.h>
 +#include <sys/disklabel.h>
  #include <sys/device.h>
  #include <sys/mount.h>
  #include <sys/reboot.h>
 @@ -90,6 +91,8 @@
  #define DPRINTF(a)
  #endif

 +#define PARTITION_IS_VALID(part) \
 +	((part >= 0) && (part < MAXPARTITIONS))

  bool xen_suspend_allow;

 @@ -549,10 +552,16 @@
  		if (strncmp(xcp.xcp_bootdev, devname, strlen(devname)))
  			continue;

 +		if (device_is_a(dv, "dk"))
 +			continue;
 +
  		if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) {
  			/* XXX check device_cfdata as in x86_autoconf.c? */
 -			booted_partition = toupper(
 +			int part = toupper(
  				xcp.xcp_bootdev[strlen(devname)]) - 'A';
 +			if (!PARTITION_IS_VALID(part))
 +				continue;
 +			booted_partition = part;
  			DPRINTF(("%s: booted_partition: %d\n", __func__, booted_partition));
  		}


 I tested this on netbsd-10 and it fixes the problem. This file is the same in
 -current, so it should also work for -current.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.