NetBSD Problem Report #56310

From riz@netbsd.org  Wed Jul 14 18:25:47 2021
Return-Path: <riz@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id CC1211A921F
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 14 Jul 2021 18:25:47 +0000 (UTC)
Message-Id: <20210714182546.E4F0B1985D1@morden.netbsd.org>
Date: Wed, 14 Jul 2021 18:25:46 +0000 (UTC)
From: riz@netbsd.org
Reply-To: riz@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: segfault when partitioning
X-Send-Pr-Version: 3.95

>Number:         56310
>Category:       install
>Synopsis:       creating RAID and ZFS partitions on GPT -> segfault
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    martin
>State:          feedback
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 14 18:30:00 +0000 2021
>Closed-Date:    
>Last-Modified:  Sat Jul 17 18:20:01 +0000 2021
>Originator:     Jeff Rizzo
>Release:        NetBSD 9.99.86 (INSTALL) #0: Sun Jul 11 10:51:46 UTC 2021
>Organization:

>Environment:
System: [   1.0000000] NetBSD 9.99.86 (INSTALL) #0: Sun Jul 11 10:51:46 UTC 2021
[   1.0000000]  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/INSTALL
[   1.0000000] total memory = 73718 MB
[   1.0000000] avail memory = 71418 MB
Architecture: x86_64
Machine: amd64
>Description:
	System has 4 new 4TB disks:
[   5.9892527] wd0: <ST4000NM0115-1YZ107>
[   6.0392521] wd0: drive supports 16-sector PIO transfers, LBA48 addressing
[   6.0392521] wd0: 3726 GB, 7752021 cyl, 16 head, 63 sec, 512 bytes/sect x 7814037168 sectors (4096 bytes/physsect; first aligned sector: 8)
[   6.4292526] wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133), WRITE DMA FUA, NCQ (32 tags)
[   6.4292526] wd0(ahcisata0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA), NCQ (31 tags)

... I want to run a mostly-ZFS system, but root-on-zfs isn't there yet.  So,
the intent is to set up all 4 disks with a raidframe (20GB) and a ZFS (rest)
partition, and run RAID1 root with a zpool for most data, possibly pivoting
to root.  First step, partition the disks - there's where trouble starts.

Step-by-step is included in "How-To-Repeat", but basically, "choose
wd0, choose GPT, add RAID partition, add ZFS partition, say I'm done
with wd0, *boom*"

Update:  looks like the ZFS part isn't needed - just adding the RAID
partition (GPT) makes it go boom.

In case it matters, machine is PXE booted, loading the amd64 netbsd-INSTALL.gz
kernel from 20210711 over tftp.

>How-To-Repeat:

	From main sysinst menu:

Choose "a: Installation messages in English"
Choose "e: Utility menu"
Choose "d: Partition a disk"
(Obviously, this part below may vary)
Choose ">a: wd0 (3.6T)                                                  UNCHANGED"
Choose "a: Edit partitions"
Choose ">a: Guid Partition Table (GPT)"

... since this is a new drive, no partitions exist

Choose "c: Change input units (sectors/cylinders/MB/GB)"
Choose "a: Gigabytes"
Choose "b: Add a partition"
Choose "a:             type :" -> set to RAID (t)
Choose "c:             size :" -> set to 20GB
Choose "x: OK"
Choose "x: Partition sizes ok"
[1]   Segmentation fault (core dumped) sysinst

This is the final screen where it crashes:

 We now have your GPT partitions for wd0 below.  This is your last chance to
 change them.

 Flags: (I)nstall, (N)ewfs, (B)ootable.  Total size: 3726G, free: 0B

      Start (GB)     End (GB)    Size (GB)  FS type Flag Filesystem
    ------------ ------------ ------------ -------- ---- ----------------
 a:            0           19           19     RAID
    ------------ ------------ ------------ -------- ---- ----------------
 c: Add a partition
 d: Change input units (sectors/cylinders/MB/GB)
 e: Clone external partition(s)
 f: Cancel
 x: Partition sizes ok
>Fix:
none given

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: install-manager->martin
Responsible-Changed-By: martin@NetBSD.org
Responsible-Changed-When: Wed, 14 Jul 2021 18:33:34 +0000
Responsible-Changed-Why:
take


From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: install/56310: segfault when partitioning
Date: Fri, 16 Jul 2021 20:34:07 +0200

 I can only reproduce this when the disk already has wedges defined for
 it.

 In your case:

 	dkctl wd0 listwedges

 would need to show some "dk" wedges on the device.

 If that is not the case, or after removing and wedges with

 	dkctl wd0 delwedge dkN

 before running sysinst, the partitioning works fine for me.

 The pre-existing wedges make creation of new wedges (by sysinst) fail
 and it does not properly deal with that in all cases.

 If that is not your failure, I am not sure what I am doing differently.

 Martin

From: riz@NetBSD.org
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: install/56310: segfault when partitioning
Date: Fri, 16 Jul 2021 11:41:21 -0700

 This is ... odd.


 The drives are now, unfortunately, in use with data - but while trying 
 various things, I got crashes even while trying to edit an existing GPT 
 on the disks (after trying adding partitions by hand);  I also saw the 
 same crash with sysinst from 9.2.

 That machine is now running -current, but I can try building sysinst and 
 duplicating that crash in multiuser mode; it'll probably be a few days 
 before I can afford to be without one of the disks long enough to try it.


From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: install/56310: segfault when partitioning
Date: Sat, 17 Jul 2021 07:59:43 +0200

 On Fri, Jul 16, 2021 at 06:45:01PM +0000, riz@NetBSD.org wrote:
 >  The drives are now, unfortunately, in use with data - but while trying 
 >  various things, I got crashes even while trying to edit an existing GPT 
 >  on the disks (after trying adding partitions by hand);  I also saw the 
 >  same crash with sysinst from 9.2.

 OK, existing GPT with wedges already auto-configured but then GPT destroyed
 some other way [not "gpt destroy wd0" or via sysinst] would do it.

 I'll fix that part first, and then we can see if you can still reproduce it
 (I can help with testing in non-intrusive ways).

 Martin

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56310 CVS commit: src/usr.sbin/sysinst
Date: Sat, 17 Jul 2021 11:32:51 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sat Jul 17 11:32:50 UTC 2021

 Modified Files:
 	src/usr.sbin/sysinst: gpt.c partman.c

 Log Message:
 PR 56310: avoid assert() failures (or crashes) when the runtime addition
 of a wedge fails (for whatever reasons).


 To generate a diff of this commit:
 cvs rdiff -u -r1.23 -r1.24 src/usr.sbin/sysinst/gpt.c
 cvs rdiff -u -r1.51 -r1.52 src/usr.sbin/sysinst/partman.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56310 CVS commit: src/usr.sbin/sysinst
Date: Sat, 17 Jul 2021 18:07:23 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sat Jul 17 18:07:23 UTC 2021

 Modified Files:
 	src/usr.sbin/sysinst: gpt.c

 Log Message:
 PR 56310: if we fail to create a wedge this either means there is
 a bug here (and we requested something nonsensial), or there are pre-
 existing "foreign" wedges which disturb our work.
 So remove all wedges on this disk that we do not know about and retry
 to add our new wedge.


 To generate a diff of this commit:
 cvs rdiff -u -r1.24 -r1.25 src/usr.sbin/sysinst/gpt.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: martin@NetBSD.org
State-Changed-When: Sat, 17 Jul 2021 18:15:10 +0000
State-Changed-Why:
I think it is fixed


From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: install/56310: segfault when partitioning
Date: Sat, 17 Jul 2021 20:14:08 +0200

 I have fixed a few things and now can not reproduce the crashes (or
 assertion failures in non-crunched builds) anymore.

 If you don't have a spare drive to test, you can use a vnd, like:

  - grep in dmesg for sector count of original disk
  - cd /var/tmp && dd if=/dev/zero of=disk.img count=1 oseek=$(sectors-1) &&
 	vnconfig -c vnd0 disk.img
  - use sysinst to partition vnd0
  - vnconfig -u vnd0 && rm disk.img

 Martin

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.