NetBSD Problem Report #53220

From liman@cafax.se  Sat Apr 28 06:02:43 2018
Return-Path: <liman@cafax.se>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 321027A1D0
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 28 Apr 2018 06:02:43 +0000 (UTC)
Message-Id: <20180428060238.E52309313A@netbsd8.cafax.se>
Date: Sat, 28 Apr 2018 08:02:38 +0200 (MEST)
From: Lars-Johan Liman <liman@cafax.se>
Reply-To: liman@cafax.se
To: gnats-bugs@NetBSD.org
Subject: sysinst dumps core with extended partitioning.
X-Send-Pr-Version: 3.95

>Number:         53220
>Category:       install
>Synopsis:       sysinst on Xen domU dumps core on a pre-partitioned gpt image.
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    martin
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Apr 28 06:05:00 +0000 2018
>Last-Modified:  Mon May 07 04:10:00 +0000 2018
>Originator:     Lars-Johan Liman
>Release:        NetBSD 8.0_RC1
>Organization:
Cafax AB
>Environment:
System: NetBSD netbsd8.cafax.se 8.0_RC1 NetBSD 8.0_RC1 (XEN3_DOMU.201804191727Z$
Architecture: x86_64
Machine: amd64
>Description:

If you want to install an amd64 Xen domU on a disk image that is
pre-partitioned with gpt, and fire up a virtual machine with the
INSTALL kernel, and select "extended partitioning", sysinst
immediately dumps core. This is probably not the intended
behaviour ... ;-)

>How-To-Repeat:

1. Create blank disk image.

   dd if=/dev/null of=myimage.img count=...

2. Create gpt partitions on the image.

   vnconfig vndX myimage.img
   gpt destroy /dev/rvndX
   gpt add -s NNNN -t ffs /dev/rvndX
   gpt add -s NNNN -t swap /dev/rvndX
   gpt add -s NNNN -t ffs /dev/rvndX
   gpt add -s NNNN -t ffs /dev/rvndX
   gpt label -i 1 -l ROOT /dev/r$drive
   gpt label -i 2 -l SWAP /dev/r$drive
   gpt label -i 3 -l USR /dev/r$drive
   gpt label -i 4 -l VAR /dev/r$drive
   vnconfig -u vndX

3. Boot a Xen virtual machine with the 8.0_RC1
   netbsd-INSTALL_XEN3_DOMU kernel.

4. Navigate through the sysinst menus

   a: Install messages in English

   a: Install NetBSD to hard disk

   b: Yes (we shall continue!)

   b: Extended partitioning

   [1]   Segmentation fault (core dumped) sysinst

>Fix:

No known fix, but a workaround is to _not_ use sysinst, but to
shutdown the domU, and manually install NetBSD from the dom0 by
mounting the diskimage and filling it by hand.

1. vnconfig vndX myimage.img

2. newfs /dev/rvndXa (etc.)

3. mount /dev/vndXa (etc.)

4. pax -zrvpe -f amd64/binary/sets/base.tgz (etc.)

5. Fix fstab, rc.conf, localtime, passwd (etc.)

6. vnconfig -u vndX

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: install-manager->martin
Responsible-Changed-By: martin@NetBSD.org
Responsible-Changed-When: Sat, 28 Apr 2018 06:16:30 +0000
Responsible-Changed-Why:
Take


From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Sat, 28 Apr 2018 14:17:41 +0700

 I have seen that as well, though on bare metal rather
 that a DomU (though to sysinst there should be little,
 if any, difference.)

 I used a similar workaround - just install manually.
 You could have done it from inside the DomU
 though, just tell sysinst not to install, but just give
 you a shell - after that, the same procedure
 except there's no need to vnconfig anything
 (which is fortunate as on bare metal there's
 no way to do it outside the running system.)

 I have been meaning to look into it - sometime - but it is
 not a high priority for me (system is installed now...)

 kre

From: Lars-Johan Liman <liman@cafax.se>
To: gnats-bugs@NetBSD.org
Cc: martin@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Sat, 28 Apr 2018 21:02:51 +0200

 Hi Robert!

 (I don't know if you remember me, but we've run into each other at IETFs
 in the past, in DNSOP and DNSEXT. I'm one of the 2 m tall DNS Swedes ... ;-)

 kre@munnari.OZ.AU:
 > The following reply was made to PR install/53220; it has been noted by GNATS.

 > From: Robert Elz <kre@munnari.OZ.AU>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: install/53220: sysinst dumps core with extended partitioning.
 > Date: Sat, 28 Apr 2018 14:17:41 +0700

 >  I have seen that as well, though on bare metal rather
 >  that a DomU (though to sysinst there should be little,
 >  if any, difference.)

 Aye. I suspected as much, but I had no bare metal to test it on.

 >  I used a similar workaround - just install manually.
 >  You could have done it from inside the DomU
 >  though, just tell sysinst not to install, but just give
 >  you a shell - after that, the same procedure
 >  except there's no need to vnconfig anything
 >  (which is fortunate as on bare metal there's
 >  no way to do it outside the running system.)

 (Sure, it's just that outside the VM i have access to all my
 "convenience tools" whereas inside it, the land is "barren and
 cold" ... ;-)

 >  I have been meaning to look into it - sometime - but it is
 >  not a high priority for me (system is installed now...)

 Understood. But maybe the famous $someone should take a look at it
 before 8.0-REL goes out the door.

 			Cheers,
 			  /Liman
                            (a.k.a. hostmaster ,at, i-root.servers.net,
                             or liman ,at, netnod.se (my daywork))
 #-------------------------------------------------------------------------
 # Lars-Johan Liman, M.Sc.		 ! E-mail: liman ,at, cafax.se
 # Cafax AB				 ! HTTP  : //www.cafax.se/
 # Computer Consultants, Sweden		 ! Voice : +46 8 - 564 702 30
 #-------------------------------------------------------------------------

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Sun, 29 Apr 2018 11:19:05 +0200

 Sorry, but I can not reproduce it on bare metal with the 8.0 RC sysinst.
 For me it just works fine:


  This is the extended partition manager.  All disks, partitions, etc. are
  listed below.  MBR partitions, if desired, must be created before editing
  the disklabel.  To use RAID, LVM, or CGD: 1) Create BSD partitions with the
  appropriate type; 2) Create RAID/LVM VG/CGD using these partitions; 3) Save
  changes; 4) Create partitions for RAID/CGD or Logical Volumes for LVM.

 >a: vnd0                              GPT-labeled disk          UNCHANGED
  b:    dk0:                           4.2BSD                        1024M
  c:    dk1:                           swap                          1024M
  d:    dk2:                           4.2BSD                        1024M
  e:    dk3:                           4.2BSD                        1024M
  f: Create cryptographic volume (CGD)
  g: Create virtual disk image (VND)
  h: Create volume group (LVM VG)
  i: Create software RAID
  j: Update device list
  k: Save changes
  x: Finish partitioning


 Robert is correct, this is supposed to be done on the installed
 domU, as sysinst will insist on creating the partitions you install into
 yourself in the extended partitioning menu (this is a different bug which
 is going to be fixed, along with other gpt related issues, but the
 fix unlikely will be ready for 8.0, so it will likely be in 8.1 only).


 Martin

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Sun, 29 Apr 2018 23:19:21 +0700

     Date:        Sun, 29 Apr 2018 09:20:01 +0000 (UTC)
     From:        Martin Husemann <martin@duskware.de>
     Message-ID:  <20180429092001.275F57A225@mollari.NetBSD.org>

   |  Sorry, but I can not reproduce it on bare metal with the 8.0 RC sysinst.

 I will see if I can set up the conditions that cause the problem.

 kre

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Mon, 30 Apr 2018 02:25:27 +0700

     Date:        Sun, 29 Apr 2018 16:25:01 +0000 (UTC)
     From:        Robert Elz <kre@munnari.OZ.AU>
     Message-ID:  <20180429162501.76E3D7A261@mollari.NetBSD.org>

   |  I will see if I can set up the conditions that cause the problem.

 OK, done.   core dump from sysinst easy to make...

 total 3136                                                                      
 -r--r--r--   1 root  wheel     2646 Mar 30 14:24 .profile                       
 drwxr-xr-x   2 root  wheel      512 Mar 30 14:24 bin
 drwxr-xr-x   9 root  wheel   115968 Apr 30 02:02 dev
 drwxr-xr-x   2 root  wheel        0 Apr 30 02:02 etc
 drwxr-xr-x   2 root  wheel      512 Mar 30 14:24 kern
 drwxr-xr-x   3 root  wheel      512 Mar 30 14:24 libexec
 drwxr-xr-x   2 root  wheel      512 Mar 30 14:24 mnt
 drwxr-xr-x   2 root  wheel      512 Mar 30 14:24 mnt2
 drwxr-xr-x   2 root  wheel     1024 Mar 30 14:24 sbin                           
 -r-xr-xr-x  75 root  wheel  2485376 Mar 30 14:24 sysinst                        
 -rw-------   1 root  wheel   435904 Apr 30 02:03 sysinst.core                   
 -r--r--r--   1 root  wheel    36898 Mar 30 14:24 sysinstmsgs.de                 
 [...]


 I used an 8.0_RC1 build I made on April 21, but I think anything recentish
 would do (the previous time it happened for me was a NetBSD-current (8.99.?)
 from (very) late January.

 This time I booted a XEN DomU (as that's the easied to play in), the
 previous time was bare metal booting from an install CD (an install
 CD in a format I make it, which might not be quite the same as the
 normal one, but which has the same sysinst, ... there was no
 problem booting it - at least when the UEFI firmware was switched
 to legacy mode - it was made before the code to allow UEFI booting
 from cd was added - I have not yet tried that)

 The procedure I used just now was:

 First set up the disk (in my bare metal test this was done, despite my
 request that it not be) by the people who sold me the system...

 Boot netbsd-INSTALL kernel from
 	${RELEASE}/amd64/binary/kernel/netbsd-INSTALL_XEN3_DOMU.gz

 exit sysinst, and in the root shell that appears:

 gpt create -A xbd0
 gpt add -b 2048 -s 204800 -l EFI -t efi xbd0
 gpt add -s 1576960 -t windows -l w^HWindows xbd0 
 gpt add -b 1783808 -l windows_data -t windows xbd0 
 shutdown

 That leaves the disk looking like:

     start     size  index  contents
         0        1         PMBR (active)
         1        1         Pri GPT header
         2       32         Pri GPT table
        34     2014         Unused
      2048   204800      1  GPT part - EFI System
    206848  1576960      2  GPT part - Windows basic data
   1783808  3144467      3  GPT part - Windows basic data
   4928275       32         Sec GPT table
   4928307        1         Sec GPT header

 Any method that leaves the drive (real or virtual) looking like
 that should be OK.   Note there's no free space (well, 2014 blocks...)
 There's also (in my case) no NetBSD partitions.

 My bare metal install looked just the same - the drive (well drives)
 were much bigger, but full of windows crap filesystems (mostly
 just empty).

 Then attempt a NetBSD install:

 Boot netbsd-INSTALL kernel
 In Sysinst
         a - messages in English
         a - Install NetBSD to hard disk
         b - yes
         b - extended partitioning

 (BOOM)

 I then repeated that, using the same (untouched) drive, using a
 NetBSD-current (8.99.14) from late March (the latest one I had
 lying around without doing a rebuild.)   Same steps in sysinst,
 same result.

 If you're still unable to reproduce this, let me know, and I'll build
 a version of this that includes a sysinst that's built wwith debug
 symbols,  and see if I can see what is up in the core file.

 And from Martin's message:
   | Robert is correct, this is supposed to be done on the installed
   |  domU, as sysinst will insist 

 That's not quite what I said, I said it can be done in the installed
 DomU (as I did just above) not that it should be.   What sysinst
 insists on is irrelevant, as once it is unable to proceed, it simply
 isn't used any more - I almost never use it, when I am doing
 DomU builds (my must common type) I simply have DESTDIR be
 the filesystem that is to be the DomU filesystem, and build
 directly into it.  Then I populate /dev fix up what needs to be
 fixed in /etc, and boot it.

 If I was going to go ahead and install the system just above, I'd
 do it the same way - either in the DomU after sysinst has crashed,
 I'd gpt remove gpt remove ... then gpt add gpt add ... to set up
 the filesystems, mount them, and then fetch and untar the install
 sets (which is exactly what I did on the bare metal system,
 except for the "fetch" part as they were on the CD already waiting
 to be unpacked).

 Alternatively, when using XEN, the whole install can be done my
 mounting the (to be) DomU filesystem(s) in the Dom0 and then
 doing the set unpacks there (where the tooks are better as a
 full running NetBSD is available).

 Either way works - which is better just depends what you prefer
 (the Xen Dom0 way means all the doc is also easily available.)

 kre

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Sun, 29 Apr 2018 21:36:36 +0200

 On Sun, Apr 29, 2018 at 07:30:01PM +0000, Robert Elz wrote:
 >  If you're still unable to reproduce this, let me know, and I'll build
 >  a version of this that includes a sysinst that's built wwith debug
 >  symbols,  and see if I can see what is up in the core file.

 Indeed I am still unable to reproduce it. Must be something very strange
 going on.

 Could you just make the sysinst.core file available?

 Btw: the official builds all include debug.tgz which has all the debug
 symbols needed.

 Martin

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Mon, 30 Apr 2018 03:51:32 +0700

     Date:        Sun, 29 Apr 2018 19:40:01 +0000 (UTC)
     From:        Martin Husemann <martin@duskware.de>
     Message-ID:  <20180429194001.3C0807A261@mollari.NetBSD.org>

   |  Indeed I am still unable to reproduce it. Must be something very strange
   |  going on.

 Yes, very weird.   It is so simple to make happen...

   |  Could you just make the sysinst.core file available?

 Sure, both of them:

 	ftp://munnari.oz.au/martin/8.0_rc1-sysinst.core
 	ftp://munnari.oz.au/martin/8.99.14-sysinst.core

   |  Btw: the official builds all include debug.tgz which has all the debug
   |  symbols needed.

 Hmm - mine don't.   What's different?   I simply do

 	build.sh [-x] [-u] [-P] release

 with MK_CONF=/dev/null

 (and settings for the obj, dest, and release dirs that suit
 my peculiar needs).   It is supposed to be as similar to
 the official builds as possible (at least when I include -x,
 I often omit that as I don't generally need it when I am
 just testing (though my source sets have extsrc.tgz
 instead of gnusrc.tgz - II build the source sets with my
 own script - variant of the normal one - as well.)

 My cd (dvd really) building differs as I add all kinds of
 other stuff (source sets, incl pkgsrc, soetimes some
 distfiles, ...) in the iso image - if I'm going to burn a dvd I
 mught as well fill it up with stuff that might be useful!

 I only mentioned that, as my build script for that is
 very old, and still uses the emulated floppy method
 of booting (or however it used to be done) so that
 the kernel boots info a MFS miniroot, rather than using
 the cd image as the root filesys, which I think is the
 more modern way.   I should update it someday...

 kre


From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Mon, 30 Apr 2018 10:56:57 +0200

 On Sun, Apr 29, 2018 at 08:55:01PM +0000, Robert Elz wrote:
 >  Sure, both of them:
 >  
 >  	ftp://munnari.oz.au/martin/8.0_rc1-sysinst.core
 >  	ftp://munnari.oz.au/martin/8.99.14-sysinst.core

 Hmm, both cores don't work for me - they have the $pc in some unmapped
 code:

 Core was generated by `sysinst'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x00000000004b11cc in ?? ()
 (gdb) bt
 #0  0x00000000004b11cc in ?? ()
 #1  0x0000000000000000 in ?? ()
 (gdb) x/16i $pc
 => 0x4b11cc:    Cannot access memory at address 0x4b11cc

 >    |  Btw: the official builds all include debug.tgz which has all the debug
 >    |  symbols needed.
 >  
 >  Hmm - mine don't.   What's different?   I simply do
 >  
 >  	build.sh [-x] [-u] [-P] release

 Add: -V MKDEBUG=yes

 Martin

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Mon, 30 Apr 2018 23:24:34 +0700

     Date:        Mon, 30 Apr 2018 09:00:01 +0000 (UTC)
     From:        Martin Husemann <martin@duskware.de>
     Message-ID:  <20180430090001.5CE367A25E@mollari.NetBSD.org>


   |  Hmm, both cores don't work for me - they have the $pc in some unmapped
   |  code:
   |  
   |  #1  0x0000000000000000 in ?? ()
   |  (gdb) x/16i $pc
   |  => 0x4b11cc:    Cannot access memory at address 0x4b11cc

 That would explain the SEGV which generates the core file.

 sysinst does some indirect function calls:
 	 error = (*item->func)(found, numfound);
 for example, which could result in a jump into oblivion.

 the pm_edit function in particular takes 3 function pointers
 as args, and as that's about where we're going when it all
 falls apart, that's where I'd be suspicious.

   |  Add: -V MKDEBUG=yes

 OK, I will do that.   I'd have thought that the "official builds" should
 avoid using non-default options - would it perhaps be the right thing
 to do to make that be on by default, if that is what the releases are
 to use?

 I'll do an 

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Tue, 01 May 2018 11:26:05 +0700

 Some progress on this ...

 The sysinst code contains this (disks.c)...

 static bool
 is_gpt(const char *dev)
 {
         check_available_binaries();

         if (!have_gpt)
                 return false;           

         return !run_program(RUN_SILENT | RUN_ERROR_OK,
                 "sh -c 'gpt show %s |grep -e Pri\\ GPT\\ table'", dev);
 }

 which looks kind of OK, except there is no "grep" in the MFS miniroot
 so that is always going to fail (return false).   "sed" is installed, and can
 generally used to do what grep can do, with the notable exception of
 exiting with a status indicating whether a string (or RE) exists in the file.

 I (in a local copy only, as I don't really think this is the right way) changed
 this to

         return !run_program(RUN_SILENT | RUN_ERROR_OK,
                 "sh -c 'test -n \"$(gpt show %s | while read line; do "
                 "case \"${line}\" in (*\"Pri GPT table\"*) echo FOUND;exit 0;;" 
                 "esac; done)\"'", dev);      

 which is somewhat convoluted, but does work to determine if there
 is a GPT label on the drive (variations that attempt to be simpler
 will probably fail.)   Provided gpt (which is tested for) and sh
 (kind of mandatory) exist (and given "test" is built into sh) this
 should be OK.

 A much better solution would be to have a new sub-command for
 gpt like

 	gpt [-q] query dev

 which would (if the already existing -q option is not given)
 print something like "GPT Primary Label @%d on %s\n"
 or "No GPT on %s" as appropriate, and then (regardless of -q)
 exit(0) if there is a GPT label, and exit(1) otherwise, then
 sysinst could just do:

 	return !run_program(RUN_SILENT | RUN_ERROR_OK,
 		"sh -c 'gpt -q query %s'", dev);

 (the name "query" for the command could be anything of course).
 (It could also just forget the "sh -c" and use (for the 2nd line):
 		"gpt -q qery %s", dev);


 But this one isn't the real problem, though it probably doesn't help.

 This is also in sysinst - in partman.c ...

 #define remove_vnd_options() (void)0
                 if (!have_vnd)
                         remove_vnd_options();
                 else if (!(vnds = calloc(MAX_VND, sizeof(*vnds))))
                         have_vnd = 0;

 Combine that with:

         have_vnd = binary_available("vnconfig");

 when "vnconfig" was renamed as "vndconfig" ages ago, and you
 have a recipe for potential disaster, when someone eventually
 decides that enough time has passed, and the old vnconfig
 link to vndconfig should simply be dropped (changing the 4
 occurrences of vnconfig into vndconfig should be easy!)

 Of course, the same recipe for disaster (not potential) exists when
 there is no insalled vnconfig (or vndconfig) program at all - which is
 the case on the MFS mini-roon that's embedded in the INSTALL
 kernel.

 Fixing this would mean actually writing the function, which
 given the simplicity of the other similar functions does't look
 like it should be too hard, eg, this example...

 void
 remove_cgd_options() 
 {
         /* 
          * No CGD available, remove the following menu entries:
          */
         remove_menu_option(MENU_pmdiskentry, MSG_encrypt);
         remove_menu_option(MENU_pmpartentry, MSG_encrypt);
 }

 (and similar for gpt, raid, lvm, cgd, color -- several of which also have
 no support on the MFS embedded mini-root)

 kre

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Tue, 1 May 2018 11:20:06 +0200

 On Tue, May 01, 2018 at 04:30:01AM +0000, Robert Elz wrote:
 >  which looks kind of OK, except there is no "grep" in the MFS miniroot
 >  so that is always going to fail (return false).

 OK, you are using the ram disk based installers, while I tested with the
 CD image.

 >  A much better solution would be to have a new sub-command for
 >  gpt like
 >  
 >  	gpt [-q] query dev

 I did a very lazy poor man's version of that.

 >  when "vnconfig" was renamed as "vndconfig" ages ago, and you
 >  have a recipe for potential disaster, when someone eventually
 >  decides that enough time has passed, and the old vnconfig
 >  link to vndconfig should simply be dropped (changing the 4
 >  occurrences of vnconfig into vndconfig should be easy!)

 I replaced the name.

 >  Of course, the same recipe for disaster (not potential) exists when
 >  there is no insalled vnconfig (or vndconfig) program at all - which is
 >  the case on the MFS mini-roon that's embedded in the INSTALL
 >  kernel.

 We have the "have_vnd" variable to note that.

 >  Fixing this would mean actually writing the function, which
 >  given the simplicity of the other similar functions does't look
 >  like it should be too hard, eg, this example...
 >  
 >  void
 >  remove_cgd_options() 
 >  {
 >          /* 
 >           * No CGD available, remove the following menu entries:
 >           */
 >          remove_menu_option(MENU_pmdiskentry, MSG_encrypt);
 >          remove_menu_option(MENU_pmpartentry, MSG_encrypt);
 >  }

 Indeed, but I'm not quite sure what menus exactly you want to remove
 in this case.

 Martin

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Tue, 01 May 2018 19:57:58 +0700

     Date:        Tue,  1 May 2018 09:25:01 +0000 (UTC)
     From:        Martin Husemann <martin@duskware.de>
     Message-ID:  <20180501092501.9703B7A219@mollari.NetBSD.org>

   |  OK, you are using the ram disk based installers, while I tested with the
   |  CD image.

 Yes, the original PR text from Lars-Johan Liman says:

 3. Boot a Xen virtual machine with the 8.0_RC1
    netbsd-INSTALL_XEN3_DOMU kernel.

 And then,  when I repeated it:

  Boot netbsd-INSTALL kernel from
         ${RELEASE}/amd64/binary/kernel/netbsd-INSTALL_XEN3_DOMU.gz

 though any netbsd-INSTALL kernel does the same (my earlier bare
 metal test did not use the XEN3_DOMU kernel, obviously.)

   |  >  	gpt [-q] query dev
   |  I did a very lazy poor man's version of that.

 That should be fine.   I had in the back of my mind the potential to
 add options to "gpt query" to test other things which a script might
 want to know, but if that need ever eventuates, it can be done then,
 and the "header" command can continue exit(1) for no GPT.

   |  >  Of course, the same recipe for disaster (not potential) exists when
   |  >  there is no insalled vnconfig (or vndconfig) program at all - which is
   |  >  the case on the MFS mini-roon that's embedded in the INSTALL
   |  >  kernel.
   |  
   |  We have the "have_vnd" variable to note that.

 We do, so when have_vnd is false, the code does...

                  if (!have_vnd)
                          (void)0();

 If someone was looking for a way to manufacture a core file
 in a way that gdb would fail to be of much help, I can't think of
 many easier strategies...

   |  Indeed, but I'm not quite sure what menus exactly you want to remove
   |  in this case.

 No idea, I did not look that deeply, but if there are none, then just not
 removing the not existing menus would be better than what it is doing now.

 That is, change:

                 if (!have_vnd)
                         remove_vnd_options();
                 else if (!(vnds = calloc(MAX_VND, sizeof(*vnds))))
                         have_vnd = 0;

 into
 	if (have_vnd && !(vnds = calloc(MAX_VND, sizeof(*vnds))))
 		have_vnd = 0;


 It is worth noting that that small segment is the only use of have_vnd
 anywhere, so why it sets have_vnd = 0 when it fails I have no idea.

 The
 	 if (firstrun) { 
 		/* ... */
 		firstrun = 0;
 	}

 block makes sure it is never done again, so that "have_vnd = 0"
 is assigning to a variable that is never examined again.    The
 other have_xxx variables are (well, some of them at least, I did
 not do an exhaustive search) also used elsewhere.

 That is, it sure looks as if the remove_xxx_options() code is really
 intended to remove stuff from the menus, so that other code which
 will not work wiuthout vndconfig  can never later attempt to execute it.

 And I do see ...

         if (have_lvm) { 
                 pm_upddevlist_adv(m, arg, &i,
                     &(pm_upddevlist_adv_t) {MSG_create_cnd, PM_VND_T, &vnds_t_in
 fo, 0, NULL});
                 pm_upddevlist_adv(m, arg, &i,
                     &(pm_upddevlist_adv_t) {MSG_create_vg, PM_LVM_T, &lvms_t_inf
 o, 0,                   
                     &(pm_upddevlist_adv_t) {MSG_create_lv, PM_LVMLV_T, &lv_t_inf
 o, 0, NULL}});  
         }               

 and also:

                 case PM_VND_T:
                         return pm_edit(PMV_MENU_END, pm_vnd_edit_menufmt,
                                 pm_vnd_set_value, pm_vnd_check, pm_vnd_init,
                                 NULL, ((part_entry_t *)arg)[m->cursel].dev_ptr, 


 and in pm_menufmt()

                 case PM_VND_T:          
                         pm_vnd_menufmt(m, opt, arg);
                         break; 

 all of which might be related.

 At a minimum, if vnds are needed for lvm config (why?  I thought lvm
 used vg not vnd ?) then perhaps

         have_lvm = binary_available("lvm");
 should become
         have_lvm = have vnd && binary_available("lvm");

 But I suspect that things are just more messed up than that.

 kre

 ps: aside: I also don't understand why from msg.mi.en the
 message "Create virtual disk image (VND)" (or from msg.mi.de
 "Erstellen Sie virtuelle Disk-Image (VND)" etc) is "message create_cnd"
 instead of "message create_vnd" - looks weird - but that's just a name.

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Wed, 02 May 2018 04:03:30 +0700

 Ignore the vnd red herring, that turned out to be relevant to nothing (though
 the change from vnconfig to vndconfig was certainly the right thing to do).

 The actual cause for the core dump is the pm_lvm_find() function,
 it assumes that lvms has been allocated to point to a suitable block
 of storage, but when we !have_lvm:

                 if (!have_lvm)
                         remove_lvm_options();
                 else if (!(lvms = calloc(MAX_LVM_VG, sizeof(*lvms))))
                         have_lvm = 0;

 the menu entries related to lvm get dleeted, so there's no way to refer to
 them, but lvms is just left uninitialized.

 Later partman() does

                 menu_num_entries = pm_upddevlist(&(menudesc){.opts = menu_entrie

 and pm_upddevlist() does

         changed = 0;
         /* Mark all devices as not found */
         SLIST_FOREACH(pm_i, &pm_head, l)
                 if (pm_i->found > 0)
                         pm_i->found = 0;
         /* Detect all present devices */
         (void)find_disks("partman");
         pm_lvm_find();
         pm_clean();

 pm_lvm_find() does

       for (i = 0; i < MAX_LVM_VG; i++) { 
                 if (! lvms[i].blocked)  
                         continue; 

 (and much more) using that uninit'd lvms.

 This doesn't cause a problem with a fully populated root filesys
 sitting under sysinst, as in that case have_lvm == 1, lvms gets calloc()'d
 and all is fine.

 To test this, I actually made pm_lvm_find() simply return 0 if (!have_lvm )
 and with that change, the partman menu appeared (and at that point I
 stopped testing...   I did not have things set up in a way that the sets were
 available, and playing with the partition editor would have meant changing
 the setup I have used over and over again while hunting this down (just
 in case there turns out to be more to hunt it is nice to keep as is)

 However I think a better fix is:

 Index: partman.c
 ===================================================================
 RCS file: /cvsroot/src/usr.sbin/sysinst/partman.c,v
 retrieving revision 1.19
 diff -u -r1.19 partman.c
 --- partman.c	1 May 2018 08:27:39 -0000	1.19
 +++ partman.c	1 May 2018 20:54:13 -0000
 @@ -2609,7 +2609,8 @@
  			pm_i->found = 0;
  	/* Detect all present devices */
  	(void)find_disks("partman");
 -	pm_lvm_find();
 +	if (have_lvm)
 +		pm_lvm_find();
  	pm_clean();

  	if (m == NULL || arg == NULL)


 In addition it might also be a good idea to turn pm_lvm_find() into a
 function returning void, rather than its current int return, which is
 always 0, which is always (the fragment above is the only use)
 ignored - though that's just futzing around.

 kre

From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53220 CVS commit: src/usr.sbin/sysinst
Date: Tue, 1 May 2018 21:26:41 +0000

 Module Name:	src
 Committed By:	kre
 Date:		Tue May  1 21:26:41 UTC 2018

 Modified Files:
 	src/usr.sbin/sysinst: partman.c

 Log Message:
 PR install/53220

 Don't call pm_lvm_find() unless have_lvm

 pm_lvm_find() assumes that data structs (lvms) has been allocated
 for it to use, which doesn't happen if !have_lvm

 This avoids a sysinst core dump when the lvm command is not installed
 (such as when installing from the embedded RAM root filesys in an
 INSTALL kernel.)


 To generate a diff of this commit:
 cvs rdiff -u -r1.19 -r1.20 src/usr.sbin/sysinst/partman.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Lars-Johan Liman <liman@cafax.se>
To: gnats-bugs@NetBSD.org
Cc: martin@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Wed, 02 May 2018 09:01:58 -0400

 Martin, Robert,

 >    |  >  	gpt [-q] query dev
 >    |  I did a very lazy poor man's version of that.

 kre@munnari.OZ.AU:
 >  That should be fine.

 What does "gpt show" return when there is no gpt table? Isn't the exit
 status from that sufficient for testing?

 (I'm currently behind a firewall from h-ll, so I can't test.)

 				Cheers,
 				  /Liman

From: Robert Elz <kre@munnari.OZ.AU>
To: Lars-Johan Liman <liman@cafax.se>
Cc: gnats-bugs@NetBSD.org, martin@NetBSD.org, netbsd-bugs@netbsd.org
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Thu, 03 May 2018 06:55:18 +0700

     Date:        Wed, 02 May 2018 09:01:58 -0400
     From:        Lars-Johan Liman <liman@cafax.se>
     Message-ID:  <22po2e1a9l.fsf@netnod.se>

   | What does "gpt show" return when there is no gpt table?

 Nothing too interesting.

   | Isn't the exit status from that sufficient for testing?

 No, it exits 0 for that,

 I guess it could be changed, but the show command works,
 it isn't failing - I think Martin's decision to make the header
 command exit(1) is more appropriate - dumping the header
 if there is none can be seen as an error.

 kre

From: Martin Husemann <martin@duskware.de>
To: Robert Elz <kre@munnari.OZ.AU>
Cc: Lars-Johan Liman <liman@cafax.se>, gnats-bugs@NetBSD.org
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Thu, 3 May 2018 08:13:10 +0200

 On Thu, May 03, 2018 at 06:55:18AM +0700, Robert Elz wrote:
 >   | Isn't the exit status from that sufficient for testing?
 > 
 > No, it exits 0 for that,

 My initial idea was to change that, but:
 "gpt show" also works on MBR labeld disks and prints reasonable output
 there, so it seems wrong to return EXIT_FAILURE in that case.

 Martin

From: Lars-Johan Liman <liman@cafax.se>
To: Martin Husemann <martin@duskware.de>
Cc: Robert Elz <kre@munnari.OZ.AU>, gnats-bugs@NetBSD.org
Subject: Re: install/53220: sysinst dumps core with extended partitioning.
Date: Thu, 03 May 2018 10:48:17 -0400

 martin@duskware.de:
 > On Thu, May 03, 2018 at 06:55:18AM +0700, Robert Elz wrote:
 >> | Isn't the exit status from that sufficient for testing?
 >> 
 >> No, it exits 0 for that,

 > My initial idea was to change that, but:
 > "gpt show" also works on MBR labeld disks and prints reasonable output
 > there, so it seems wrong to return EXIT_FAILURE in that case.

 Ah. Right. Agreed.

From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53220 CVS commit: [netbsd-8] src/usr.sbin/sysinst
Date: Mon, 7 May 2018 04:06:31 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Mon May  7 04:06:31 UTC 2018

 Modified Files:
 	src/usr.sbin/sysinst [netbsd-8]: partman.c

 Log Message:
 Pull up following revision(s) (requested by kre in ticket #801):
 	usr.sbin/sysinst/partman.c: 1.20-1.21
 PR install/53220
 Don't call pm_lvm_find() unless have_lvm
 pm_lvm_find() assumes that data structs (lvms) has been allocated
 for it to use, which doesn't happen if !have_lvm
 This avoids a sysinst core dump when the lvm command is not installed
 (such as when installing from the embedded RAM root filesys in an
 INSTALL kernel.)
 --
 Change return type of pm_lvm_find() from int to void.
 It always returns (returned) 0 which was ignored by the one call.


 To generate a diff of this commit:
 cvs rdiff -u -r1.15.6.2 -r1.15.6.3 src/usr.sbin/sysinst/partman.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.