NetBSD Problem Report #55958

From www@netbsd.org  Tue Jan 26 09:24:23 2021
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 8F20F1A923B
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 26 Jan 2021 09:24:23 +0000 (UTC)
Message-Id: <20210126092421.E22E71A923C@mollari.NetBSD.org>
Date: Tue, 26 Jan 2021 09:24:21 +0000 (UTC)
From: aza.sea.agenda@gmail.com
Reply-To: aza.sea.agenda@gmail.com
To: gnats-bugs@NetBSD.org
Subject: pciback.hide parsing error
X-Send-Pr-Version: www-1.0

>Number:         55958
>Category:       kern
>Synopsis:       pciback.hide parsing error
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jan 26 09:25:00 +0000 2021
>Last-Modified:  Fri Jan 29 11:05:01 +0000 2021
>Originator:     Aleksey Arens
>Release:        (current) 9.99.78
>Organization:
>Environment:
NetBSD HOSTNAME_REDACTED 9.99.78 NetBSD 9.99.78 (SYS0.STAGING.02) #2: Tue Jan 26 09:03:55 UTC 2021  root@HOSTNAME_REDACTED:/usr/obj/sys/arch/amd64/compile/SYS0 amd64
>Description:
Booting kernel with the following option produces an error message on the console, and does not attach the respective pci device (at 8:0.0) to pciback:

  pciback.hide=(8:0.0)(1:0.1)(1:0.0)

The issue does not present itself whence the option list is shorter:

  pciback.hide=(8:0.0)(1:0.1)


The relevant portion of dmesg output:

---8<---


[     1.000003] pciback_pci_init: hide claim device 8:0:0
[     1.000003] pciback_pci_init: hide claim device 1:0:1
[     1.000003] autoconfiguration error: pciback_pci_init: syntax error at 1:


---8<---

The issue manifests regardless of the order of parenthesized items.

>How-To-Repeat:
Boot kernel with option pciback.hide=(8:0.0)(1:0.1)(1:0.0) on a machine where thus-enumerated devices are present.  Only booting as Dom0 under Xen was tested.
>Fix:
I performed a very basic bit of debugging by changing the kernel source as follows:

---8<--

diff --git a/sys/arch/xen/xen/pciback.c b/sys/arch/xen/xen/pciback.c
index 57f70788d194..55dc646f927f 100644
--- a/sys/arch/xen/xen/pciback.c
+++ b/sys/arch/xen/xen/pciback.c
@@ -377,6 +377,7 @@ pciback_pci_init(void)
        if (strlen(xi.xcp_pcidevs) == 0)
                return;
        pcidevs = xi.xcp_pcidevs;
+        aprint_verbose("xi.xcp_pcidevs=%s\n", xi.xcp_pcidevs);
        for (pcidevs = xi.xcp_pcidevs; *pcidevs != '\0';) {
                if (*pcidevs != '(')
                        goto error;

---8<---

The resulting output in dmesg has been:

---8<---

[     1.000003] xi.xcp_pcidevs=(8:0.0)(1:0.1)(1:
[     1.000003] pciback_pci_init: hide claim device 8:0:0
[     1.000003] pciback_pci_init: hide claim device 1:0:1
[     1.000003] autoconfiguration error: pciback_pci_init: syntax error at 1:

---8<---

The source of the issue is thus very much clear.  I'll attempt to track it down further and post a fix thereby.

>Audit-Trail:
From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55958: pciback.hide parsing error
Date: Tue, 26 Jan 2021 04:06:33 -0800

 I've performed further debugging.  The Xen image was adjusted to print
 out the cmdline that it is getting from the module loader.  The line
 turned out to be truncated.

 Evidently, module loading code in sys/arch/i386/stand needs to be
 examined, since truncation should be occurring there.

From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55958: pciback.hide parsing error
Date: Tue, 26 Jan 2021 05:19:27 -0800

 I tracked the problem to strncpy at
   https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#763
 ---8<---
 bi = (struct bi_modulelist_entry *)(buf + off);
 off += sizeof(struct bi_modulelist_entry);
 strncpy(bi->path, bm->bm_path, sizeof(bi->path) - 1);
 bi->base = image_end;
 bi->len = len;
 ---8<---
 The issue occurs because the static buffer size at
   https://nxr.netbsd.org/xref/src/sys/arch/x86/include/bootinfo.h#180
 ---8<---
 struct bi_modulelist_entry
 {
     char path[80];
     int type;
     int len;
     uint32_t base;
 };
 ---8<---
 is too small to accommodate module arguments line longer than 80 characters.

From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55958: pciback.hide parsing error
Date: Tue, 26 Jan 2021 05:47:20 -0800

 Adjusting the size of the static buffer path to 256 bytes had yielded
 a positive outcome.  Adjusted /boot program had yielded a successful
 boot on a system with BIOS firmware, and arguments to pciback were
 interpreted correctly by the kernel subsequently.

 ---8<---
 [     1.000003] pciback_pci_init: hide claim device 8:0:0
 [     1.000003] pciback_pci_init: hide claim device 1:0:1
 [     1.000003] pciback_pci_init: hide claim device 1:0:0
 ---8<---

 The corresponding patch is:

 ---8<---

 diff --git a/sys/arch/x86/include/bootinfo.h b/sys/arch/x86/include/bootinfo.h
 index 9d0ee542d892..6b3d967e1a7e 100644
 --- a/sys/arch/x86/include/bootinfo.h
 +++ b/sys/arch/x86/include/bootinfo.h
 @@ -178,7 +178,7 @@ struct btinfo_biosgeom {
  };

  struct bi_modulelist_entry {
 -       char path[80];
 +       char path[256];
         int type;
         int len;
         uint32_t base;

 ---8<---

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Aleksey Arens <aza.sea.agenda@gmail.com>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Tue, 26 Jan 2021 21:55:08 +0100

 On Tue, Jan 26, 2021 at 05:47:20AM -0800, Aleksey Arens wrote:
 > Adjusting the size of the static buffer path to 256 bytes had yielded
 > a positive outcome.  Adjusted /boot program had yielded a successful
 > boot on a system with BIOS firmware, and arguments to pciback were
 > interpreted correctly by the kernel subsequently.
 > 
 > ---8<---
 > [     1.000003] pciback_pci_init: hide claim device 8:0:0
 > [     1.000003] pciback_pci_init: hide claim device 1:0:1
 > [     1.000003] pciback_pci_init: hide claim device 1:0:0
 > ---8<---
 > 
 > The corresponding patch is:
 > 
 > ---8<---
 > 
 > diff --git a/sys/arch/x86/include/bootinfo.h b/sys/arch/x86/include/bootinfo.h
 > index 9d0ee542d892..6b3d967e1a7e 100644
 > --- a/sys/arch/x86/include/bootinfo.h
 > +++ b/sys/arch/x86/include/bootinfo.h
 > @@ -178,7 +178,7 @@ struct btinfo_biosgeom {
 >  };
 > 
 >  struct bi_modulelist_entry {
 > -       char path[80];
 > +       char path[256];
 >         int type;
 >         int len;
 >         uint32_t base;
 > 
 > ---8<---

 i'm not sure we can do this without a compatibility issue ...

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Tue, 26 Jan 2021 18:36:26 -0800

 I wonder if the following solution may be acceptable.  Since the
 change implies a difference in ABI, would it be possible to schedule
 it for a next major release (for instance, release 10)?  I also
 wonder, if the change could go into -current at this point; as I
 understand, -current is not expected to maintain strict binary
 compatibility.

 I suppose that the proper approach to adding this might involve adding
 #ifdef-s that checks for COMPAT_XX options.  Though, I think that a
 more proper implementation would be to check at runtime for the module
 version, and decide upon what buffer length for parameter passing
 could be used accordingly.  I could work this code out, in principle;
 I would appreciate a suggestion on what would be the best, most
 compatible with the tradition, way to handle the hard-coded constant
 in the header file.

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Aleksey Arens <aza.sea.agenda@gmail.com>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Wed, 27 Jan 2021 08:45:55 +0100

 On Tue, Jan 26, 2021 at 06:36:26PM -0800, Aleksey Arens wrote:
 > I wonder if the following solution may be acceptable.  Since the
 > change implies a difference in ABI, would it be possible to schedule
 > it for a next major release (for instance, release 10)?  I also
 > wonder, if the change could go into -current at this point; as I
 > understand, -current is not expected to maintain strict binary
 > compatibility.
 > 
 > I suppose that the proper approach to adding this might involve adding
 > #ifdef-s that checks for COMPAT_XX options.  Though, I think that a
 > more proper implementation would be to check at runtime for the module
 > version, and decide upon what buffer length for parameter passing
 > could be used accordingly.  I could work this code out, in principle;
 > I would appreciate a suggestion on what would be the best, most
 > compatible with the tradition, way to handle the hard-coded constant
 > in the header file.

 We must be able to boot a new kernel with an older /boot (this is a requirement
 for upgrade anyway), so some compat mechanism at the boot structure 
 level is neeed.

 And actually it's easy: leave bi_modulelist alone, and define a
 btinfo_modulelist2, associated with a BTINFO_MODULELIST2
 Have the boot loader fill both, and the kernel check first BTINFO_MODULELIST2
 and then BTINFO_MODULELIST. This way you can boot an old kernel with
 newer /boot, or the opposite.
 While there, put path[] at the end of bi_modulelist_entry2, so it can
 be arbitrary long.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Martin Husemann <martin@duskware.de>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: Aleksey Arens <aza.sea.agenda@gmail.com>, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Wed, 27 Jan 2021 08:58:45 +0100

 On Tue, Jan 26, 2021 at 09:55:08PM +0100, Manuel Bouyer wrote:
 > > diff --git a/sys/arch/x86/include/bootinfo.h b/sys/arch/x86/include/bootinfo.h
 > > index 9d0ee542d892..6b3d967e1a7e 100644
 > > --- a/sys/arch/x86/include/bootinfo.h
 > > +++ b/sys/arch/x86/include/bootinfo.h
 > > @@ -178,7 +178,7 @@ struct btinfo_biosgeom {
 > >  };
 > > 
 > >  struct bi_modulelist_entry {
 > > -       char path[80];
 > > +       char path[256];
 > >         int type;
 > >         int len;
 > >         uint32_t base;
 > > 
 > > ---8<---
 > 
 > i'm not sure we can do this without a compatibility issue ...

 I think we could define and provide both structures from a new
 bootloader by e.g. renaming BTINFO_MODULELIST to
 BTINFO_COMPAT_MODULELIST and giving BTINFO_MODULELIST a new value.

 Martin

From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Wed, 27 Jan 2021 00:10:55 -0800

  This sounds like a good approach.  Please allow me to extend it a
 bit.  Adding another field to the module in addition to an existing
 one is a natural instance of a scheme involving either versioning or
 capability attributes associated with the module.  Now, versions and
 capabilities could very well be encoded into long (just a few
 kilobytes at most) strings.  So, we just need an ability to introduce
 variable-size string buffers that would encode all the attributes
 associated with a module.  The struct containing this var-size string
 might also include a field for the encoding id (some GUID?).  After
 all, what we're doing here is introducing a new structure to the
 module interface that is binary-incompatible with the extant one, so
 why not indeed make it extensible in a seamless fashion?  Furthermore,
 all sorts of fancy module argument schemes would fall in nicely into
 such an approach -- the only question would be the kernel's
 willingness to parse and interpret the language within the string, but
 there shall be no issue with binary compatibility.

 My only concern is a question of how would a Xen kernel recognize the
 newer format.  I could look at the boot code more closely, and
 determine if it's something that could be done by altering the
 bootloader itself or not.  My guess is that it doesn't, but I would
 certainly appreciate additional insight.  If I understand it
 correctly, the bi_modulelist_entry struct does not represent any of
 Xen's ABI, and Xen's boot parameters are filled out from this struct
 by the boot's code.

From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: Martin Husemann <martin@duskware.de>
Cc: Manuel Bouyer <bouyer@antioche.eu.org>, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Wed, 27 Jan 2021 00:21:30 -0800

 On Tue, Jan 26, 2021 at 11:58 PM Martin Husemann <martin@duskware.de> wrote:
 >
 > [...]
 > I think we could define and provide both structures from a new
 > bootloader by e.g. renaming BTINFO_MODULELIST to
 > BTINFO_COMPAT_MODULELIST and giving BTINFO_MODULELIST a new value.

 This looks like a good strategy, with e.g. BTINFO_MODULELIST
 corresponding to e.g. variable-length string attribute description,
 and BTINFO_COMPAT_MODULELIST being just a reference to struct
 bi_modulelist.

 Now, I see that the module list is being interpreted by /netbsd's code
 at sys/arch/x86/x86/x86_machdep.c:module_init_md().  I wonder, if
 those structures (well, currently only struct bi_modulelist) are also
 [do need to be] interpreted by a Xen's kernel?

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Aleksey Arens <aza.sea.agenda@gmail.com>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Wed, 27 Jan 2021 09:25:23 +0100

 On Wed, Jan 27, 2021 at 12:07:33AM -0800, Aleksey Arens wrote:
 >  This sounds like a good approach.  Please allow me to extend it a
 > bit.  Adding another field to the module in addition to an existing
 > one is a natural instance of a scheme involving either versioning or
 > capability attributes associated with the module.  Now, versions and
 > capabilities could very well be encoded into long (just a few
 > kilobytes at most) strings.  So, we just need an ability to introduce
 > variable-size string buffers that would encode all the attributes
 > associated with a module.  The struct containing this var-size string
 > might also include a field for the encoding id (some GUID?).  After
 > all, what we're doing here is introducing a new structure to the
 > module interface that is binary-incompatible with the extant one, so
 > why not indeed make it extensible in a seamless fashion?  Furthermore,
 > all sorts of fancy module argument schemes would fall in nicely into
 > such an approach -- the only question would be the kernel's
 > willingness to parse and interpret the language within the string, but
 > there shall be no issue with binary compatibility.

 This is a more radical change, this would need to be discussed on port-amd64
 and port-i386 I guess.

 > 
 > My only concern is a question of how would a Xen kernel recognize the
 > newer format.  I could look at the boot code more closely, and
 > determine if it's something that could be done by altering the
 > bootloader itself or not.  My guess is that it doesn't, but I would
 > certainly appreciate additional insight.  If I understand it
 > correctly, the bi_modulelist_entry struct does not represent any of
 > Xen's ABI, and Xen's boot parameters are filled out from this struct
 > by the boot's code.

 Yes. But the Xen kernel probably needs to copy it at some point (even if
 it doens't know what it's copying). I'm not sure if it can copy something
 arbitrary long (it may be limited e.g. to a page size).

 But there's something strange here: Xen is supposed to use multiboot, not
 the NetBSD native boot protocol. In multiboot, the structures from
 bootinfo.h are not supposed to be used.

 So I guess that the problem is more that /boot abuses bi_modulelist_entry
 for internal structures in the multiboot case. The problem can probably
 be fixed in /boot without changing bootinfo.h at all.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Wed, 27 Jan 2021 00:54:14 -0800

 >  Yes. But the Xen kernel probably needs to copy it at some point (even if
 >  it doens't know what it's copying). I'm not sure if it can copy somethin=
 g
 >  arbitrary long (it may be limited e.g. to a page size).
 >

 As I understand it, Xen reads the cmdline string from an offset in the
 module image passed-in to it.  Cf.
 https://fossies.org/linux/xen/xen/arch/x86/setup.c#l_760

 >  But there's something strange here: Xen is supposed to use multiboot, no=
 t
 >  the NetBSD native boot protocol. In multiboot, the structures from
 >  bootinfo.h are not supposed to be used.
 >

 Correct.  However, those arguments are not the ones being passed to
 Xen, but to the module image that is being loaded by Xen.

 >  So I guess that the problem is more that /boot abuses bi_modulelist_entr=
 y
 >  for internal structures in the multiboot case. The problem can probably
 >  be fixed in /boot without changing bootinfo.h at all.

 In principle, yes.  However, in that case, a different Xen-specific
 format will need to be defined.  Maybe a copy of bi_modulelist_entry
 should be made, mentioning xen in its name?  This is a piece of
 bootinfo that is Xen-specific, and currently, it appears that the same
 structure has merely been reused to cover both cases.

 At any rate, an ability to use longer =E2=80=94  and ideally variable-lengt=
 h =E2=80=94
 strings for /netbsd's boot arguments won't hurt in the long run,
 though *that* specific topic (i.e. re: /netbsd's arguments) could
 become a matter for another ticket.  Right now, setting up a special
 case for Xen (with fixed buffer length of e.g. 512, being a half of
 Xen's limit of 1024 for the whole command line) might as well be more
 than sufficient.

From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55958: pciback.hide parsing error
Date: Wed, 27 Jan 2021 22:09:04 -0800

 A subsequent discussion at #xendevel ensued.  It was learned that as a
 matter of fact, Xen does expect multiboot protocol for module loading.
 Also, a reference to a respective implementation in FreeBSD was
 suggested:
   https://cgit.freebsd.org/src/tree/stand/i386/libi386/multiboot.c#n253

 Thus, it would be fair to say that sharing data-structures between the
 native module loader and code that is involved in loading Xen modules
 -- which must be done not with a native bootloader, but with a
 multiboot loader -- is not desirable.  (I must clarify thereby.  I'm
 talking about the data structures that implement layout of loaded
 module images, not the module chain descriptions; the latter could be
 left intact.)

 Since conceptually multiboot and native module loading are different
 module loading schemes, I would like to suggest that a multiboot
 protocol be implemented separately, with a separate set of
 data-structures.  If this approach is taken, then there shall be no
 binary compatibility issue whatsoever with native loading, and
 whatever binary compatibility might be occurring with the /netbsd's
 multiboot loading case (which is AFAIK is not enabled by default in
 the kernel config anyway) should be treated as kernel bugs, since a
 standard-conforming multiboot kernel should always be loadable by a
 standard-conforming multiboot bootloader (apart perhaps from the
 initial ramdisk related caveat noted below).

 I am interested in implementing the multiboot functionality -- at
 least version 1, but there is also an interest on my part in
 implementing version 2 -- for /boot, and accommodating Xen-specific
 workarounds from FreeBSD.  The latter workarounds are related to
 initial ramdisk implementation differences and multiboot protocol
 requirements.

 I suppose we could look into the ramdisk issues, and decide whether
 the multiboot implementation must be strict, or if it can be relaxed,
 and if Xen's code could be adjusted.  I favor the former option -- of
 implementing the standard strictly in the bootloader.  This option may
 lead to a need to adjust /netbsd a bit w.r.t. the initrd issues with
 multiboot.  However, the result would be a kernel that fully conforms
 to multiboot standard.  Also, since MULTIBOOT is non-default for
 /netbsd, the impact would be low.

From: John Nemeth <jnemeth@cue.bc.ca>
To: Aleksey Arens <aza.sea.agenda@gmail.com>, kern-bug-people@netbsd.org,
        gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55958: pciback.hide parsing error
Date: Thu, 28 Jan 2021 19:39:58 -0800

 On Jan 27, 22:09, Aleksey Arens wrote:
 }
 } A subsequent discussion at #xendevel ensued.  It was learned that as a
 } matter of fact, Xen does expect multiboot protocol for module loading.
 } Also, a reference to a respective implementation in FreeBSD was
 } suggested:
 }   https://cgit.freebsd.org/src/tree/stand/i386/libi386/multiboot.c#n253

 [snip]

      I'm pretty sure that somebody previously mentioned that Xen
 requires multiboot.  I've haven't read your entire message, but
 unfortunately it appears that you have spent a lot of time on this
 for naught.  Our native bootloader can do multiboot and has been
 able to boot Xen for many years.  You should have a look here:
 http://wiki.netbsd.org/ports/xen/ and here:
 http://wiki.netbsd.org/ports/xen/howto/ .  This is from my development
 box:

 menu=Boot Xen with 6GB for dom0:load /netbsd.xen0 console=pc;multiboot /xen45-kernel/xen.gz dom0_mem=6GB dom0_max_vcpus=1 dom0_vcpus_pin

 The "load" part is what does module loading (in this case, the
 NetBSD kernel which is built as a Xen module) and the "multiboot"
 part is what boots the Xen kernel.  Note that since it is my
 development box, I have many different variants of this line.

 }-- End of excerpt from Aleksey Arens

From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: jnemeth@cue.bc.ca
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Thu, 28 Jan 2021 22:33:46 -0800

 >      I'm pretty sure that somebody previously mentioned that Xen
 > requires multiboot.  I've haven't read your entire message, but
 > unfortunately it appears that you have spent a lot of time on this
 > for naught.  Our native bootloader can do multiboot and has been
 > able to boot Xen for many years.  You should have a look here:
 > http://wiki.netbsd.org/ports/xen/ and here:
 > http://wiki.netbsd.org/ports/xen/howto/ .  This is from my development
 > box:
 >
 > menu=Boot Xen with 6GB for dom0:load /netbsd.xen0 console=pc;multiboot /xen45-kernel/xen.gz dom0_mem=6GB dom0_max_vcpus=1 dom0_vcpus_pin
 >

 Thank you for joining the discussion.  One of the essential points in
 my previous message is that I suspect that there is at least a bug in
 the multiboot implementation in /boot.  I'm also concerned about the
 fact that the same data-structures are reused in implementing the
 binary module layout for the cases when a native module is loaded and
 when a multiboot module is loaded.  I hope that these points would
 become clear to you if you invest your time and give your full
 attention to the entire message.

 Finally, the command passed to the module -- in the example that you
 had displayed -- is a bit too short.  To see the issue at hand, you'd
 need to make it over 80 characters long, and then observe that it does
 get truncated once it makes it to Xen.  This was demonstrated in the
 first three messages to this PR.

 > The "load" part is what does module loading (in this case, the
 > NetBSD kernel which is built as a Xen module) and the "multiboot"
 > part is what boots the Xen kernel.  Note that since it is my
 > development box, I have many different variants of this line.
 >

 Could we please go into more details thereby?  The Xen expects the
 modules to be loaded according to multiboot protocol.  What I'm seeing
 is that "load" uses the same BM_TYPE_KMOD for either the case of
 native or multiboot module loading scenario.  Furthermore:
   https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#exec_multiboot
   https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#exec_netbsd
 are both calling to the same
   https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#module_init
 function, in which the short buffer is copied:
   https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#763

 So, clearly, both native and multiboot modules are initialized in an
 identical fashion.  This must have led to this bug.  I request that
 this bug be fixed, and I propose that it be done by separating the
 code responsible for module loading in case of multiboot and Xen.

From: John Nemeth <jnemeth@cue.bc.ca>
To: Aleksey Arens <aza.sea.agenda@gmail.com>, jnemeth@cue.bc.ca
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Fri, 29 Jan 2021 00:15:55 -0800

 On Jan 28, 22:33, Aleksey Arens wrote:
 }
 } >      I'm pretty sure that somebody previously mentioned that Xen
 } > requires multiboot.  I've haven't read your entire message, but
 } > unfortunately it appears that you have spent a lot of time on this
 } > for naught.  Our native bootloader can do multiboot and has been
 } > able to boot Xen for many years.  You should have a look here:
 } > http://wiki.netbsd.org/ports/xen/ and here:
 } > http://wiki.netbsd.org/ports/xen/howto/ .  This is from my development
 } > box:
 } >
 } > menu=Boot Xen with 6GB for dom0:load /netbsd.xen0 console=pc;multiboot /xen45-kernel/xen.gz dom0_mem=6GB dom0_max_vcpus=1 dom0_vcpus_pin
 } 
 } Thank you for joining the discussion.  One of the essential points in
 } my previous message is that I suspect that there is at least a bug in
 } the multiboot implementation in /boot.  I'm also concerned about the
 } fact that the same data-structures are reused in implementing the
 } binary module layout for the cases when a native module is loaded and
 } when a multiboot module is loaded.  I hope that these points would
 } become clear to you if you invest your time and give your full
 } attention to the entire message.
 } 
 } Finally, the command passed to the module -- in the example that you
 } had displayed -- is a bit too short.  To see the issue at hand, you'd
 } need to make it over 80 characters long, and then observe that it does
 } get truncated once it makes it to Xen.  This was demonstrated in the
 } first three messages to this PR.
 } 
 } > The "load" part is what does module loading (in this case, the
 } > NetBSD kernel which is built as a Xen module) and the "multiboot"
 } > part is what boots the Xen kernel.  Note that since it is my
 } > development box, I have many different variants of this line.
 } 
 } Could we please go into more details thereby?  The Xen expects the
 } modules to be loaded according to multiboot protocol.  What I'm seeing
 } is that "load" uses the same BM_TYPE_KMOD for either the case of
 } native or multiboot module loading scenario.  Furthermore:
 }   https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#exec_multiboot
 }   https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#exec_netbsd
 } are both calling to the same
 }   https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#module_init
 } function, in which the short buffer is copied:
 }   https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#763
 } 
 } So, clearly, both native and multiboot modules are initialized in an
 } identical fashion.  This must have led to this bug.  I request that
 } this bug be fixed, and I propose that it be done by separating the
 } code responsible for module loading in case of multiboot and Xen.

      Keep in mind that the NetBSD kernel itself is the module.
 You're not loading a module into the NetBSD kernel, you're loading
 a module into the Xen kernel (Xen is the host OS).  The arguments
 are for the kernel itself, not a module, so you need to find where
 the kernel parses its own command line.  This has nothing to do
 with NetBSD modules.

 }-- End of excerpt from Aleksey Arens

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Aleksey Arens <aza.sea.agenda@gmail.com>
Cc: jnemeth@cue.bc.ca, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Fri, 29 Jan 2021 11:19:52 +0100

 On Thu, Jan 28, 2021 at 10:33:46PM -0800, Aleksey Arens wrote:
 > 
 > Thank you for joining the discussion.  One of the essential points in
 > my previous message is that I suspect that there is at least a bug in
 > the multiboot implementation in /boot.  I'm also concerned about the
 > fact that the same data-structures are reused in implementing the
 > binary module layout for the cases when a native module is loaded and
 > when a multiboot module is loaded.  I hope that these points would
 > become clear to you if you invest your time and give your full
 > attention to the entire message.

 there's something that doesn't match here. A NetBSD dom0 can be loaded
 either from grub, pxelinux or NetBSD's boot loader.
 When started as dom0 (or as domU, the code path is the same), the kernel
 command line (which is the module command line in multiboot, when started
 as dom0) is in start_info.cmd_line as is a plain text string, not a
 btinfo_modulelist or bi_modulelist_entry.
 See xen_parse_cmdline() in xen_machdep.c



 > 
 > Finally, the command passed to the module -- in the example that you
 > had displayed -- is a bit too short.  To see the issue at hand, you'd
 > need to make it over 80 characters long, and then observe that it does
 > get truncated once it makes it to Xen.  This was demonstrated in the
 > first three messages to this PR.
 > 
 > > The "load" part is what does module loading (in this case, the
 > > NetBSD kernel which is built as a Xen module) and the "multiboot"
 > > part is what boots the Xen kernel.  Note that since it is my
 > > development box, I have many different variants of this line.
 > >
 > 
 > Could we please go into more details thereby?  The Xen expects the
 > modules to be loaded according to multiboot protocol.  What I'm seeing
 > is that "load" uses the same BM_TYPE_KMOD for either the case of
 > native or multiboot module loading scenario.  Furthermore:
 >   https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#exec_multiboot
 >   https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#exec_netbsd
 > are both calling to the same
 >   https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#module_init
 > function, in which the short buffer is copied:
 >   https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#763
 > 
 > So, clearly, both native and multiboot modules are initialized in an
 > identical fashion.  This must have led to this bug.  I request that
 > this bug be fixed, and I propose that it be done by separating the
 > code responsible for module loading in case of multiboot and Xen.

 This is not exactly how it works. Yes, the load command is parsed the
 same way for both native and multiboot case, using the same structure. But
 then the exec_multiboot1() function will convert the native structure
 to multiboot format. mmo_string points to bim->path, not the bim
 structure itself.

 So to fix the problem no change to bootinfo.h is needed; only the
 boot loader needs to be changed to pass the full string to exec_multiboot1().

 For example, instead of reusing bi_modulelist to represent the module,
 module_init() could use a private structure, which is then converted
 to the appropriate interface depending on the boot mode.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Aleksey Arens <aza.sea.agenda@gmail.com>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: jnemeth@cue.bc.ca, kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
	netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/55958: pciback.hide parsing error
Date: Fri, 29 Jan 2021 03:03:45 -0800

 by Manuel Bouyer:
 > When started as dom0 (or as domU, the code path is the same), the kernel
 > command line (which is the module command line in multiboot, when started
 > as dom0) is in start_info.cmd_line as is a plain text string, not a
 > btinfo_modulelist or bi_modulelist_entry.
 > See xen_parse_cmdline() in xen_machdep.c
 >

 Ok.

 > This is not exactly how it works. Yes, the load command is parsed the
 > same way for both native and multiboot case, using the same structure. But
 > then the exec_multiboot1() function will convert the native structure
 > to multiboot format. mmo_string points to bim->path, not the bim
 > structure itself.
 >
 > So to fix the problem no change to bootinfo.h is needed; only the
 > boot loader needs to be changed to pass the full string to exec_multiboot1().
 >
 > For example, instead of reusing bi_modulelist to represent the module,
 > module_init() could use a private structure, which is then converted
 > to the appropriate interface depending on the boot mode.
 >

 Thank you for clarifying these details to me.  I have not tracked the
 codepath to that point, and was wondering what those functions were
 doing.

 Your suggestion about handling this problem does sound good to me.

 I was concerned initially, if bi_modulelist_entry struct had played a
 special role in further processing.  I noticed the field name has been
 changed from "path" to 'kmod".  Does it reflect its purpose?  Is there
 a detailed documentation on NetBSD's boot protocol, aside from the
 source code itself?

 One question remains.  Since now the exec.c:module_init() would
 potentially fill out an intermediate struct with a long buffer, how
 should the code constructing the module image for native
 (non-multiboot) bootloading handle the argument buffer length?  Would
 doing sizeof() on the corresponding bi_modulelist_entry struct's entry
 be sufficient?  Perhaps, there should be a const defined in the
 bootinfo.h for this purpose?

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.