NetBSD Problem Report #56917

From kre@munnari.OZ.AU  Fri Jul  8 11:20:40 2022
Return-Path: <kre@munnari.OZ.AU>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 718361A921F
	for <gnats-bugs@gnats.NetBSD.org>; Fri,  8 Jul 2022 11:20:40 +0000 (UTC)
Message-Id: <202207081102.268B2mAk001056@jacaranda.noi.kre.to>
Date: Fri, 8 Jul 2022 18:02:48 +0700 (+07)
From: kre@munnari.OZ.AU
Reply-To:
To: gnats-bugs@NetBSD.org
Subject: raidctl -c configuration fails if NAME=wedge component is missing
X-Send-Pr-Version: 3.95

>Number:         56917
>Category:       bin
>Synopsis:       raidctl -c configuration fails if NAME=wedge component is missing
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jul 08 11:25:00 +0000 2022
>Closed-Date:    Thu Jul 21 09:24:56 +0000 2022
>Last-Modified:  Thu Jul 21 09:24:56 +0000 2022
>Originator:     kre@munnari.OZ.AU
>Release:        NetBSD 9.99.97
>Organization:
>Environment:
System: NetBSD jacaranda.noi.kre.to 9.99.97 NetBSD 9.99.97 (GENERIC) #2: Wed Jun 8 01:46:15 +07 2022 kre@jacaranda.noi.kre.to:/usr/obj/current/amd64/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
	When not using raidframe autoconfiguration, but configuring via a
	raidctl config file (raidctl -c ...) using raidframs on wedges
	(and hence using NAME=wedge-name in the config file, rather then
	/dev/dkN as the latter is more or less meaningless) if the wedge
	is not found, the config fails, and the raid set is not configured
	at all.

	On the other hand, if using autoconfiguration, the missing component
	is simply "failed" and the system works, with the raid in degraded
	mode, just fine.

	Since the whole idea of raid is to keep systems working when some of
	the storage has failed, it seems like it would be a good idea to
	continue with that when using -c and wedges.

	When not using NAME= type config (or perhaps ROOT.x which is less
	likely, though not impossibly, in the same position) the device name
	is simply passed through from the config file to the kernel raidframe,
	which fails to access it (if missing), and so degrades the raid.
	When NAME= is used, getfsspecname() fails, and there is no device
	name to send - the string that is sent to the kernel is meaningless
	to it, and the raidframe config fails entirely.

	Or that's what looks to be happening to me - I have reasons for not
	using raid autoconfig at the minute (I was, and things worked when
	a drive vanished - that has a tendency to happen sometimes on my
	system due to BIOS/NetBSD "issues" - I stopped so I could get earlier
	notification that the drive vanished, before raidframe started
	using a rc.d script that checks for the drive(s) being missing and
	aborts the boot ... but with autconfigured raid, after the raidframe
	was configured with the missing component, requiring a reconstruction
	when I beat the BIOS into submission and the drive returned - so I
	disabled raid autoconf, so raidframs would not start at all until after
	the rc.d script verifies that all the drives are at least present).

>How-To-Repeat:
	Build a raidframe out of wedges (a raid1 from 2 of them - on different
	drives - will do).   Make the config file use NAME=wedge-name to select
	the drives (partitions of the drives) to use.   Configure the raid,
	initialize it, partation it if you want, makefs ... (ie: use the thing).
	Do not turn on raid autoconfig for this raidset (so no root on this
	raid).   Remove one of the drives being used by the raidset (or if that
	is too extreme an action to test this, just relabel one of the relevant
	wedges, so the NAME=wedge-name for one of the wedges doesn't match a
	wedge that is present in the system).   Reboot (with raidframe=YES in
	rc.conf).   Observe that the raid set is not configured at all.
	(If you had raid autoconfig turned on, and did the same thing, the
	raid set would be configured, in degraded mode - the missing component
	marked as failed - but just altering wedge names is no use to test
	that case, the wedge really must be absebt).

>Fix:
	I am not sure this will work (I'm yet to test it) but I think
	this patch might allow the raidframe to configure:

Index: rf_configure.c
===================================================================
RCS file: /cvsroot/src/sbin/raidctl/rf_configure.c,v
retrieving revision 1.36
diff -u -r1.36 rf_configure.c
--- rf_configure.c	14 Jun 2022 08:06:13 -0000	1.36
+++ rf_configure.c	8 Jul 2022 11:00:05 -0000
@@ -278,7 +278,7 @@
 			warnx("Config file error: warning: unable to "
 			    "get device file for disk at col %d: %s",
 			    c, b1);
-			b = buf;
+			b = "absent";
 		}

 		strlcpy(cfgPtr->devnames[0][c], b,

>Release-Note:

>Audit-Trail:
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56917 CVS commit: src/sbin/raidctl
Date: Thu, 21 Jul 2022 09:19:54 +0000

 Module Name:	src
 Committed By:	kre
 Date:		Thu Jul 21 09:19:54 UTC 2022

 Modified Files:
 	src/sbin/raidctl: rf_configure.c

 Log Message:
 PR bin/56917

 If getfsspecname() fails that will usually mean that a NAME=wedge or
 ROOT.x partition is unabailable.   raidframe specified unavailable
 partitions as "absent" so in this case, pass "absent" rather than the
 unaltered NAME= or ROOT.x string, which the kernel has no clue what
 do do with, and doesn't configure the raid at all.


 To generate a diff of this commit:
 cvs rdiff -u -r1.36 -r1.37 src/sbin/raidctl/rf_configure.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: kre@NetBSD.org
State-Changed-When: Thu, 21 Jul 2022 09:24:56 +0000
State-Changed-Why:
Patch supplied applied


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.