NetBSD Problem Report #55958
From www@netbsd.org Tue Jan 26 09:24:23 2021
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 8F20F1A923B
for <gnats-bugs@gnats.NetBSD.org>; Tue, 26 Jan 2021 09:24:23 +0000 (UTC)
Message-Id: <20210126092421.E22E71A923C@mollari.NetBSD.org>
Date: Tue, 26 Jan 2021 09:24:21 +0000 (UTC)
From: aza.sea.agenda@gmail.com
Reply-To: aza.sea.agenda@gmail.com
To: gnats-bugs@NetBSD.org
Subject: pciback.hide parsing error
X-Send-Pr-Version: www-1.0
>Number: 55958
>Category: kern
>Synopsis: pciback.hide parsing error
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jan 26 09:25:00 +0000 2021
>Last-Modified: Fri Jan 29 11:05:01 +0000 2021
>Originator: Aleksey Arens
>Release: (current) 9.99.78
>Organization:
>Environment:
NetBSD HOSTNAME_REDACTED 9.99.78 NetBSD 9.99.78 (SYS0.STAGING.02) #2: Tue Jan 26 09:03:55 UTC 2021 root@HOSTNAME_REDACTED:/usr/obj/sys/arch/amd64/compile/SYS0 amd64
>Description:
Booting kernel with the following option produces an error message on the console, and does not attach the respective pci device (at 8:0.0) to pciback:
pciback.hide=(8:0.0)(1:0.1)(1:0.0)
The issue does not present itself whence the option list is shorter:
pciback.hide=(8:0.0)(1:0.1)
The relevant portion of dmesg output:
---8<---
[ 1.000003] pciback_pci_init: hide claim device 8:0:0
[ 1.000003] pciback_pci_init: hide claim device 1:0:1
[ 1.000003] autoconfiguration error: pciback_pci_init: syntax error at 1:
---8<---
The issue manifests regardless of the order of parenthesized items.
>How-To-Repeat:
Boot kernel with option pciback.hide=(8:0.0)(1:0.1)(1:0.0) on a machine where thus-enumerated devices are present. Only booting as Dom0 under Xen was tested.
>Fix:
I performed a very basic bit of debugging by changing the kernel source as follows:
---8<--
diff --git a/sys/arch/xen/xen/pciback.c b/sys/arch/xen/xen/pciback.c
index 57f70788d194..55dc646f927f 100644
--- a/sys/arch/xen/xen/pciback.c
+++ b/sys/arch/xen/xen/pciback.c
@@ -377,6 +377,7 @@ pciback_pci_init(void)
if (strlen(xi.xcp_pcidevs) == 0)
return;
pcidevs = xi.xcp_pcidevs;
+ aprint_verbose("xi.xcp_pcidevs=%s\n", xi.xcp_pcidevs);
for (pcidevs = xi.xcp_pcidevs; *pcidevs != '\0';) {
if (*pcidevs != '(')
goto error;
---8<---
The resulting output in dmesg has been:
---8<---
[ 1.000003] xi.xcp_pcidevs=(8:0.0)(1:0.1)(1:
[ 1.000003] pciback_pci_init: hide claim device 8:0:0
[ 1.000003] pciback_pci_init: hide claim device 1:0:1
[ 1.000003] autoconfiguration error: pciback_pci_init: syntax error at 1:
---8<---
The source of the issue is thus very much clear. I'll attempt to track it down further and post a fix thereby.
>Audit-Trail:
From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55958: pciback.hide parsing error
Date: Tue, 26 Jan 2021 04:06:33 -0800
I've performed further debugging. The Xen image was adjusted to print
out the cmdline that it is getting from the module loader. The line
turned out to be truncated.
Evidently, module loading code in sys/arch/i386/stand needs to be
examined, since truncation should be occurring there.
From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55958: pciback.hide parsing error
Date: Tue, 26 Jan 2021 05:19:27 -0800
I tracked the problem to strncpy at
https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#763
---8<---
bi = (struct bi_modulelist_entry *)(buf + off);
off += sizeof(struct bi_modulelist_entry);
strncpy(bi->path, bm->bm_path, sizeof(bi->path) - 1);
bi->base = image_end;
bi->len = len;
---8<---
The issue occurs because the static buffer size at
https://nxr.netbsd.org/xref/src/sys/arch/x86/include/bootinfo.h#180
---8<---
struct bi_modulelist_entry
{
char path[80];
int type;
int len;
uint32_t base;
};
---8<---
is too small to accommodate module arguments line longer than 80 characters.
From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55958: pciback.hide parsing error
Date: Tue, 26 Jan 2021 05:47:20 -0800
Adjusting the size of the static buffer path to 256 bytes had yielded
a positive outcome. Adjusted /boot program had yielded a successful
boot on a system with BIOS firmware, and arguments to pciback were
interpreted correctly by the kernel subsequently.
---8<---
[ 1.000003] pciback_pci_init: hide claim device 8:0:0
[ 1.000003] pciback_pci_init: hide claim device 1:0:1
[ 1.000003] pciback_pci_init: hide claim device 1:0:0
---8<---
The corresponding patch is:
---8<---
diff --git a/sys/arch/x86/include/bootinfo.h b/sys/arch/x86/include/bootinfo.h
index 9d0ee542d892..6b3d967e1a7e 100644
--- a/sys/arch/x86/include/bootinfo.h
+++ b/sys/arch/x86/include/bootinfo.h
@@ -178,7 +178,7 @@ struct btinfo_biosgeom {
};
struct bi_modulelist_entry {
- char path[80];
+ char path[256];
int type;
int len;
uint32_t base;
---8<---
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Aleksey Arens <aza.sea.agenda@gmail.com>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Tue, 26 Jan 2021 21:55:08 +0100
On Tue, Jan 26, 2021 at 05:47:20AM -0800, Aleksey Arens wrote:
> Adjusting the size of the static buffer path to 256 bytes had yielded
> a positive outcome. Adjusted /boot program had yielded a successful
> boot on a system with BIOS firmware, and arguments to pciback were
> interpreted correctly by the kernel subsequently.
>
> ---8<---
> [ 1.000003] pciback_pci_init: hide claim device 8:0:0
> [ 1.000003] pciback_pci_init: hide claim device 1:0:1
> [ 1.000003] pciback_pci_init: hide claim device 1:0:0
> ---8<---
>
> The corresponding patch is:
>
> ---8<---
>
> diff --git a/sys/arch/x86/include/bootinfo.h b/sys/arch/x86/include/bootinfo.h
> index 9d0ee542d892..6b3d967e1a7e 100644
> --- a/sys/arch/x86/include/bootinfo.h
> +++ b/sys/arch/x86/include/bootinfo.h
> @@ -178,7 +178,7 @@ struct btinfo_biosgeom {
> };
>
> struct bi_modulelist_entry {
> - char path[80];
> + char path[256];
> int type;
> int len;
> uint32_t base;
>
> ---8<---
i'm not sure we can do this without a compatibility issue ...
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Tue, 26 Jan 2021 18:36:26 -0800
I wonder if the following solution may be acceptable. Since the
change implies a difference in ABI, would it be possible to schedule
it for a next major release (for instance, release 10)? I also
wonder, if the change could go into -current at this point; as I
understand, -current is not expected to maintain strict binary
compatibility.
I suppose that the proper approach to adding this might involve adding
#ifdef-s that checks for COMPAT_XX options. Though, I think that a
more proper implementation would be to check at runtime for the module
version, and decide upon what buffer length for parameter passing
could be used accordingly. I could work this code out, in principle;
I would appreciate a suggestion on what would be the best, most
compatible with the tradition, way to handle the hard-coded constant
in the header file.
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Aleksey Arens <aza.sea.agenda@gmail.com>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Wed, 27 Jan 2021 08:45:55 +0100
On Tue, Jan 26, 2021 at 06:36:26PM -0800, Aleksey Arens wrote:
> I wonder if the following solution may be acceptable. Since the
> change implies a difference in ABI, would it be possible to schedule
> it for a next major release (for instance, release 10)? I also
> wonder, if the change could go into -current at this point; as I
> understand, -current is not expected to maintain strict binary
> compatibility.
>
> I suppose that the proper approach to adding this might involve adding
> #ifdef-s that checks for COMPAT_XX options. Though, I think that a
> more proper implementation would be to check at runtime for the module
> version, and decide upon what buffer length for parameter passing
> could be used accordingly. I could work this code out, in principle;
> I would appreciate a suggestion on what would be the best, most
> compatible with the tradition, way to handle the hard-coded constant
> in the header file.
We must be able to boot a new kernel with an older /boot (this is a requirement
for upgrade anyway), so some compat mechanism at the boot structure
level is neeed.
And actually it's easy: leave bi_modulelist alone, and define a
btinfo_modulelist2, associated with a BTINFO_MODULELIST2
Have the boot loader fill both, and the kernel check first BTINFO_MODULELIST2
and then BTINFO_MODULELIST. This way you can boot an old kernel with
newer /boot, or the opposite.
While there, put path[] at the end of bi_modulelist_entry2, so it can
be arbitrary long.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Martin Husemann <martin@duskware.de>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: Aleksey Arens <aza.sea.agenda@gmail.com>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Wed, 27 Jan 2021 08:58:45 +0100
On Tue, Jan 26, 2021 at 09:55:08PM +0100, Manuel Bouyer wrote:
> > diff --git a/sys/arch/x86/include/bootinfo.h b/sys/arch/x86/include/bootinfo.h
> > index 9d0ee542d892..6b3d967e1a7e 100644
> > --- a/sys/arch/x86/include/bootinfo.h
> > +++ b/sys/arch/x86/include/bootinfo.h
> > @@ -178,7 +178,7 @@ struct btinfo_biosgeom {
> > };
> >
> > struct bi_modulelist_entry {
> > - char path[80];
> > + char path[256];
> > int type;
> > int len;
> > uint32_t base;
> >
> > ---8<---
>
> i'm not sure we can do this without a compatibility issue ...
I think we could define and provide both structures from a new
bootloader by e.g. renaming BTINFO_MODULELIST to
BTINFO_COMPAT_MODULELIST and giving BTINFO_MODULELIST a new value.
Martin
From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Wed, 27 Jan 2021 00:10:55 -0800
This sounds like a good approach. Please allow me to extend it a
bit. Adding another field to the module in addition to an existing
one is a natural instance of a scheme involving either versioning or
capability attributes associated with the module. Now, versions and
capabilities could very well be encoded into long (just a few
kilobytes at most) strings. So, we just need an ability to introduce
variable-size string buffers that would encode all the attributes
associated with a module. The struct containing this var-size string
might also include a field for the encoding id (some GUID?). After
all, what we're doing here is introducing a new structure to the
module interface that is binary-incompatible with the extant one, so
why not indeed make it extensible in a seamless fashion? Furthermore,
all sorts of fancy module argument schemes would fall in nicely into
such an approach -- the only question would be the kernel's
willingness to parse and interpret the language within the string, but
there shall be no issue with binary compatibility.
My only concern is a question of how would a Xen kernel recognize the
newer format. I could look at the boot code more closely, and
determine if it's something that could be done by altering the
bootloader itself or not. My guess is that it doesn't, but I would
certainly appreciate additional insight. If I understand it
correctly, the bi_modulelist_entry struct does not represent any of
Xen's ABI, and Xen's boot parameters are filled out from this struct
by the boot's code.
From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: Martin Husemann <martin@duskware.de>
Cc: Manuel Bouyer <bouyer@antioche.eu.org>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Wed, 27 Jan 2021 00:21:30 -0800
On Tue, Jan 26, 2021 at 11:58 PM Martin Husemann <martin@duskware.de> wrote:
>
> [...]
> I think we could define and provide both structures from a new
> bootloader by e.g. renaming BTINFO_MODULELIST to
> BTINFO_COMPAT_MODULELIST and giving BTINFO_MODULELIST a new value.
This looks like a good strategy, with e.g. BTINFO_MODULELIST
corresponding to e.g. variable-length string attribute description,
and BTINFO_COMPAT_MODULELIST being just a reference to struct
bi_modulelist.
Now, I see that the module list is being interpreted by /netbsd's code
at sys/arch/x86/x86/x86_machdep.c:module_init_md(). I wonder, if
those structures (well, currently only struct bi_modulelist) are also
[do need to be] interpreted by a Xen's kernel?
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Aleksey Arens <aza.sea.agenda@gmail.com>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Wed, 27 Jan 2021 09:25:23 +0100
On Wed, Jan 27, 2021 at 12:07:33AM -0800, Aleksey Arens wrote:
> This sounds like a good approach. Please allow me to extend it a
> bit. Adding another field to the module in addition to an existing
> one is a natural instance of a scheme involving either versioning or
> capability attributes associated with the module. Now, versions and
> capabilities could very well be encoded into long (just a few
> kilobytes at most) strings. So, we just need an ability to introduce
> variable-size string buffers that would encode all the attributes
> associated with a module. The struct containing this var-size string
> might also include a field for the encoding id (some GUID?). After
> all, what we're doing here is introducing a new structure to the
> module interface that is binary-incompatible with the extant one, so
> why not indeed make it extensible in a seamless fashion? Furthermore,
> all sorts of fancy module argument schemes would fall in nicely into
> such an approach -- the only question would be the kernel's
> willingness to parse and interpret the language within the string, but
> there shall be no issue with binary compatibility.
This is a more radical change, this would need to be discussed on port-amd64
and port-i386 I guess.
>
> My only concern is a question of how would a Xen kernel recognize the
> newer format. I could look at the boot code more closely, and
> determine if it's something that could be done by altering the
> bootloader itself or not. My guess is that it doesn't, but I would
> certainly appreciate additional insight. If I understand it
> correctly, the bi_modulelist_entry struct does not represent any of
> Xen's ABI, and Xen's boot parameters are filled out from this struct
> by the boot's code.
Yes. But the Xen kernel probably needs to copy it at some point (even if
it doens't know what it's copying). I'm not sure if it can copy something
arbitrary long (it may be limited e.g. to a page size).
But there's something strange here: Xen is supposed to use multiboot, not
the NetBSD native boot protocol. In multiboot, the structures from
bootinfo.h are not supposed to be used.
So I guess that the problem is more that /boot abuses bi_modulelist_entry
for internal structures in the multiboot case. The problem can probably
be fixed in /boot without changing bootinfo.h at all.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Wed, 27 Jan 2021 00:54:14 -0800
> Yes. But the Xen kernel probably needs to copy it at some point (even if
> it doens't know what it's copying). I'm not sure if it can copy somethin=
g
> arbitrary long (it may be limited e.g. to a page size).
>
As I understand it, Xen reads the cmdline string from an offset in the
module image passed-in to it. Cf.
https://fossies.org/linux/xen/xen/arch/x86/setup.c#l_760
> But there's something strange here: Xen is supposed to use multiboot, no=
t
> the NetBSD native boot protocol. In multiboot, the structures from
> bootinfo.h are not supposed to be used.
>
Correct. However, those arguments are not the ones being passed to
Xen, but to the module image that is being loaded by Xen.
> So I guess that the problem is more that /boot abuses bi_modulelist_entr=
y
> for internal structures in the multiboot case. The problem can probably
> be fixed in /boot without changing bootinfo.h at all.
In principle, yes. However, in that case, a different Xen-specific
format will need to be defined. Maybe a copy of bi_modulelist_entry
should be made, mentioning xen in its name? This is a piece of
bootinfo that is Xen-specific, and currently, it appears that the same
structure has merely been reused to cover both cases.
At any rate, an ability to use longer =E2=80=94 and ideally variable-lengt=
h =E2=80=94
strings for /netbsd's boot arguments won't hurt in the long run,
though *that* specific topic (i.e. re: /netbsd's arguments) could
become a matter for another ticket. Right now, setting up a special
case for Xen (with fixed buffer length of e.g. 512, being a half of
Xen's limit of 1024 for the whole command line) might as well be more
than sufficient.
From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55958: pciback.hide parsing error
Date: Wed, 27 Jan 2021 22:09:04 -0800
A subsequent discussion at #xendevel ensued. It was learned that as a
matter of fact, Xen does expect multiboot protocol for module loading.
Also, a reference to a respective implementation in FreeBSD was
suggested:
https://cgit.freebsd.org/src/tree/stand/i386/libi386/multiboot.c#n253
Thus, it would be fair to say that sharing data-structures between the
native module loader and code that is involved in loading Xen modules
-- which must be done not with a native bootloader, but with a
multiboot loader -- is not desirable. (I must clarify thereby. I'm
talking about the data structures that implement layout of loaded
module images, not the module chain descriptions; the latter could be
left intact.)
Since conceptually multiboot and native module loading are different
module loading schemes, I would like to suggest that a multiboot
protocol be implemented separately, with a separate set of
data-structures. If this approach is taken, then there shall be no
binary compatibility issue whatsoever with native loading, and
whatever binary compatibility might be occurring with the /netbsd's
multiboot loading case (which is AFAIK is not enabled by default in
the kernel config anyway) should be treated as kernel bugs, since a
standard-conforming multiboot kernel should always be loadable by a
standard-conforming multiboot bootloader (apart perhaps from the
initial ramdisk related caveat noted below).
I am interested in implementing the multiboot functionality -- at
least version 1, but there is also an interest on my part in
implementing version 2 -- for /boot, and accommodating Xen-specific
workarounds from FreeBSD. The latter workarounds are related to
initial ramdisk implementation differences and multiboot protocol
requirements.
I suppose we could look into the ramdisk issues, and decide whether
the multiboot implementation must be strict, or if it can be relaxed,
and if Xen's code could be adjusted. I favor the former option -- of
implementing the standard strictly in the bootloader. This option may
lead to a need to adjust /netbsd a bit w.r.t. the initrd issues with
multiboot. However, the result would be a kernel that fully conforms
to multiboot standard. Also, since MULTIBOOT is non-default for
/netbsd, the impact would be low.
From: John Nemeth <jnemeth@cue.bc.ca>
To: Aleksey Arens <aza.sea.agenda@gmail.com>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/55958: pciback.hide parsing error
Date: Thu, 28 Jan 2021 19:39:58 -0800
On Jan 27, 22:09, Aleksey Arens wrote:
}
} A subsequent discussion at #xendevel ensued. It was learned that as a
} matter of fact, Xen does expect multiboot protocol for module loading.
} Also, a reference to a respective implementation in FreeBSD was
} suggested:
} https://cgit.freebsd.org/src/tree/stand/i386/libi386/multiboot.c#n253
[snip]
I'm pretty sure that somebody previously mentioned that Xen
requires multiboot. I've haven't read your entire message, but
unfortunately it appears that you have spent a lot of time on this
for naught. Our native bootloader can do multiboot and has been
able to boot Xen for many years. You should have a look here:
http://wiki.netbsd.org/ports/xen/ and here:
http://wiki.netbsd.org/ports/xen/howto/ . This is from my development
box:
menu=Boot Xen with 6GB for dom0:load /netbsd.xen0 console=pc;multiboot /xen45-kernel/xen.gz dom0_mem=6GB dom0_max_vcpus=1 dom0_vcpus_pin
The "load" part is what does module loading (in this case, the
NetBSD kernel which is built as a Xen module) and the "multiboot"
part is what boots the Xen kernel. Note that since it is my
development box, I have many different variants of this line.
}-- End of excerpt from Aleksey Arens
From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: jnemeth@cue.bc.ca
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Thu, 28 Jan 2021 22:33:46 -0800
> I'm pretty sure that somebody previously mentioned that Xen
> requires multiboot. I've haven't read your entire message, but
> unfortunately it appears that you have spent a lot of time on this
> for naught. Our native bootloader can do multiboot and has been
> able to boot Xen for many years. You should have a look here:
> http://wiki.netbsd.org/ports/xen/ and here:
> http://wiki.netbsd.org/ports/xen/howto/ . This is from my development
> box:
>
> menu=Boot Xen with 6GB for dom0:load /netbsd.xen0 console=pc;multiboot /xen45-kernel/xen.gz dom0_mem=6GB dom0_max_vcpus=1 dom0_vcpus_pin
>
Thank you for joining the discussion. One of the essential points in
my previous message is that I suspect that there is at least a bug in
the multiboot implementation in /boot. I'm also concerned about the
fact that the same data-structures are reused in implementing the
binary module layout for the cases when a native module is loaded and
when a multiboot module is loaded. I hope that these points would
become clear to you if you invest your time and give your full
attention to the entire message.
Finally, the command passed to the module -- in the example that you
had displayed -- is a bit too short. To see the issue at hand, you'd
need to make it over 80 characters long, and then observe that it does
get truncated once it makes it to Xen. This was demonstrated in the
first three messages to this PR.
> The "load" part is what does module loading (in this case, the
> NetBSD kernel which is built as a Xen module) and the "multiboot"
> part is what boots the Xen kernel. Note that since it is my
> development box, I have many different variants of this line.
>
Could we please go into more details thereby? The Xen expects the
modules to be loaded according to multiboot protocol. What I'm seeing
is that "load" uses the same BM_TYPE_KMOD for either the case of
native or multiboot module loading scenario. Furthermore:
https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#exec_multiboot
https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#exec_netbsd
are both calling to the same
https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#module_init
function, in which the short buffer is copied:
https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#763
So, clearly, both native and multiboot modules are initialized in an
identical fashion. This must have led to this bug. I request that
this bug be fixed, and I propose that it be done by separating the
code responsible for module loading in case of multiboot and Xen.
From: John Nemeth <jnemeth@cue.bc.ca>
To: Aleksey Arens <aza.sea.agenda@gmail.com>, jnemeth@cue.bc.ca
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Fri, 29 Jan 2021 00:15:55 -0800
On Jan 28, 22:33, Aleksey Arens wrote:
}
} > I'm pretty sure that somebody previously mentioned that Xen
} > requires multiboot. I've haven't read your entire message, but
} > unfortunately it appears that you have spent a lot of time on this
} > for naught. Our native bootloader can do multiboot and has been
} > able to boot Xen for many years. You should have a look here:
} > http://wiki.netbsd.org/ports/xen/ and here:
} > http://wiki.netbsd.org/ports/xen/howto/ . This is from my development
} > box:
} >
} > menu=Boot Xen with 6GB for dom0:load /netbsd.xen0 console=pc;multiboot /xen45-kernel/xen.gz dom0_mem=6GB dom0_max_vcpus=1 dom0_vcpus_pin
}
} Thank you for joining the discussion. One of the essential points in
} my previous message is that I suspect that there is at least a bug in
} the multiboot implementation in /boot. I'm also concerned about the
} fact that the same data-structures are reused in implementing the
} binary module layout for the cases when a native module is loaded and
} when a multiboot module is loaded. I hope that these points would
} become clear to you if you invest your time and give your full
} attention to the entire message.
}
} Finally, the command passed to the module -- in the example that you
} had displayed -- is a bit too short. To see the issue at hand, you'd
} need to make it over 80 characters long, and then observe that it does
} get truncated once it makes it to Xen. This was demonstrated in the
} first three messages to this PR.
}
} > The "load" part is what does module loading (in this case, the
} > NetBSD kernel which is built as a Xen module) and the "multiboot"
} > part is what boots the Xen kernel. Note that since it is my
} > development box, I have many different variants of this line.
}
} Could we please go into more details thereby? The Xen expects the
} modules to be loaded according to multiboot protocol. What I'm seeing
} is that "load" uses the same BM_TYPE_KMOD for either the case of
} native or multiboot module loading scenario. Furthermore:
} https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#exec_multiboot
} https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#exec_netbsd
} are both calling to the same
} https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#module_init
} function, in which the short buffer is copied:
} https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#763
}
} So, clearly, both native and multiboot modules are initialized in an
} identical fashion. This must have led to this bug. I request that
} this bug be fixed, and I propose that it be done by separating the
} code responsible for module loading in case of multiboot and Xen.
Keep in mind that the NetBSD kernel itself is the module.
You're not loading a module into the NetBSD kernel, you're loading
a module into the Xen kernel (Xen is the host OS). The arguments
are for the kernel itself, not a module, so you need to find where
the kernel parses its own command line. This has nothing to do
with NetBSD modules.
}-- End of excerpt from Aleksey Arens
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Aleksey Arens <aza.sea.agenda@gmail.com>
Cc: jnemeth@cue.bc.ca, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Fri, 29 Jan 2021 11:19:52 +0100
On Thu, Jan 28, 2021 at 10:33:46PM -0800, Aleksey Arens wrote:
>
> Thank you for joining the discussion. One of the essential points in
> my previous message is that I suspect that there is at least a bug in
> the multiboot implementation in /boot. I'm also concerned about the
> fact that the same data-structures are reused in implementing the
> binary module layout for the cases when a native module is loaded and
> when a multiboot module is loaded. I hope that these points would
> become clear to you if you invest your time and give your full
> attention to the entire message.
there's something that doesn't match here. A NetBSD dom0 can be loaded
either from grub, pxelinux or NetBSD's boot loader.
When started as dom0 (or as domU, the code path is the same), the kernel
command line (which is the module command line in multiboot, when started
as dom0) is in start_info.cmd_line as is a plain text string, not a
btinfo_modulelist or bi_modulelist_entry.
See xen_parse_cmdline() in xen_machdep.c
>
> Finally, the command passed to the module -- in the example that you
> had displayed -- is a bit too short. To see the issue at hand, you'd
> need to make it over 80 characters long, and then observe that it does
> get truncated once it makes it to Xen. This was demonstrated in the
> first three messages to this PR.
>
> > The "load" part is what does module loading (in this case, the
> > NetBSD kernel which is built as a Xen module) and the "multiboot"
> > part is what boots the Xen kernel. Note that since it is my
> > development box, I have many different variants of this line.
> >
>
> Could we please go into more details thereby? The Xen expects the
> modules to be loaded according to multiboot protocol. What I'm seeing
> is that "load" uses the same BM_TYPE_KMOD for either the case of
> native or multiboot module loading scenario. Furthermore:
> https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#exec_multiboot
> https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#exec_netbsd
> are both calling to the same
> https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#module_init
> function, in which the short buffer is copied:
> https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#763
>
> So, clearly, both native and multiboot modules are initialized in an
> identical fashion. This must have led to this bug. I request that
> this bug be fixed, and I propose that it be done by separating the
> code responsible for module loading in case of multiboot and Xen.
This is not exactly how it works. Yes, the load command is parsed the
same way for both native and multiboot case, using the same structure. But
then the exec_multiboot1() function will convert the native structure
to multiboot format. mmo_string points to bim->path, not the bim
structure itself.
So to fix the problem no change to bootinfo.h is needed; only the
boot loader needs to be changed to pass the full string to exec_multiboot1().
For example, instead of reusing bi_modulelist to represent the module,
module_init() could use a private structure, which is then converted
to the appropriate interface depending on the boot mode.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: jnemeth@cue.bc.ca, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Fri, 29 Jan 2021 03:03:45 -0800
by Manuel Bouyer:
> When started as dom0 (or as domU, the code path is the same), the kernel
> command line (which is the module command line in multiboot, when started
> as dom0) is in start_info.cmd_line as is a plain text string, not a
> btinfo_modulelist or bi_modulelist_entry.
> See xen_parse_cmdline() in xen_machdep.c
>
Ok.
> This is not exactly how it works. Yes, the load command is parsed the
> same way for both native and multiboot case, using the same structure. But
> then the exec_multiboot1() function will convert the native structure
> to multiboot format. mmo_string points to bim->path, not the bim
> structure itself.
>
> So to fix the problem no change to bootinfo.h is needed; only the
> boot loader needs to be changed to pass the full string to exec_multiboot1().
>
> For example, instead of reusing bi_modulelist to represent the module,
> module_init() could use a private structure, which is then converted
> to the appropriate interface depending on the boot mode.
>
Thank you for clarifying these details to me. I have not tracked the
codepath to that point, and was wondering what those functions were
doing.
Your suggestion about handling this problem does sound good to me.
I was concerned initially, if bi_modulelist_entry struct had played a
special role in further processing. I noticed the field name has been
changed from "path" to 'kmod". Does it reflect its purpose? Is there
a detailed documentation on NetBSD's boot protocol, aside from the
source code itself?
One question remains. Since now the exec.c:module_init() would
potentially fill out an intermediate struct with a long buffer, how
should the code constructing the module image for native
(non-multiboot) bootloading handle the argument buffer length? Would
doing sizeof() on the corresponding bi_modulelist_entry struct's entry
be sufficient? Perhaps, there should be a const defined in the
bootinfo.h for this purpose?
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.