NetBSD Problem Report #56917
From kre@munnari.OZ.AU Fri Jul 8 11:20:40 2022
Return-Path: <kre@munnari.OZ.AU>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 718361A921F
for <gnats-bugs@gnats.NetBSD.org>; Fri, 8 Jul 2022 11:20:40 +0000 (UTC)
Message-Id: <202207081102.268B2mAk001056@jacaranda.noi.kre.to>
Date: Fri, 8 Jul 2022 18:02:48 +0700 (+07)
From: kre@munnari.OZ.AU
Reply-To:
To: gnats-bugs@NetBSD.org
Subject: raidctl -c configuration fails if NAME=wedge component is missing
X-Send-Pr-Version: 3.95
>Number: 56917
>Category: bin
>Synopsis: raidctl -c configuration fails if NAME=wedge component is missing
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: bin-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Jul 08 11:25:00 +0000 2022
>Closed-Date: Thu Jul 21 09:24:56 +0000 2022
>Last-Modified: Thu Jul 21 09:24:56 +0000 2022
>Originator: kre@munnari.OZ.AU
>Release: NetBSD 9.99.97
>Organization:
>Environment:
System: NetBSD jacaranda.noi.kre.to 9.99.97 NetBSD 9.99.97 (GENERIC) #2: Wed Jun 8 01:46:15 +07 2022 kre@jacaranda.noi.kre.to:/usr/obj/current/amd64/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
When not using raidframe autoconfiguration, but configuring via a
raidctl config file (raidctl -c ...) using raidframs on wedges
(and hence using NAME=wedge-name in the config file, rather then
/dev/dkN as the latter is more or less meaningless) if the wedge
is not found, the config fails, and the raid set is not configured
at all.
On the other hand, if using autoconfiguration, the missing component
is simply "failed" and the system works, with the raid in degraded
mode, just fine.
Since the whole idea of raid is to keep systems working when some of
the storage has failed, it seems like it would be a good idea to
continue with that when using -c and wedges.
When not using NAME= type config (or perhaps ROOT.x which is less
likely, though not impossibly, in the same position) the device name
is simply passed through from the config file to the kernel raidframe,
which fails to access it (if missing), and so degrades the raid.
When NAME= is used, getfsspecname() fails, and there is no device
name to send - the string that is sent to the kernel is meaningless
to it, and the raidframe config fails entirely.
Or that's what looks to be happening to me - I have reasons for not
using raid autoconfig at the minute (I was, and things worked when
a drive vanished - that has a tendency to happen sometimes on my
system due to BIOS/NetBSD "issues" - I stopped so I could get earlier
notification that the drive vanished, before raidframe started
using a rc.d script that checks for the drive(s) being missing and
aborts the boot ... but with autconfigured raid, after the raidframe
was configured with the missing component, requiring a reconstruction
when I beat the BIOS into submission and the drive returned - so I
disabled raid autoconf, so raidframs would not start at all until after
the rc.d script verifies that all the drives are at least present).
>How-To-Repeat:
Build a raidframe out of wedges (a raid1 from 2 of them - on different
drives - will do). Make the config file use NAME=wedge-name to select
the drives (partitions of the drives) to use. Configure the raid,
initialize it, partation it if you want, makefs ... (ie: use the thing).
Do not turn on raid autoconfig for this raidset (so no root on this
raid). Remove one of the drives being used by the raidset (or if that
is too extreme an action to test this, just relabel one of the relevant
wedges, so the NAME=wedge-name for one of the wedges doesn't match a
wedge that is present in the system). Reboot (with raidframe=YES in
rc.conf). Observe that the raid set is not configured at all.
(If you had raid autoconfig turned on, and did the same thing, the
raid set would be configured, in degraded mode - the missing component
marked as failed - but just altering wedge names is no use to test
that case, the wedge really must be absebt).
>Fix:
I am not sure this will work (I'm yet to test it) but I think
this patch might allow the raidframe to configure:
Index: rf_configure.c
===================================================================
RCS file: /cvsroot/src/sbin/raidctl/rf_configure.c,v
retrieving revision 1.36
diff -u -r1.36 rf_configure.c
--- rf_configure.c 14 Jun 2022 08:06:13 -0000 1.36
+++ rf_configure.c 8 Jul 2022 11:00:05 -0000
@@ -278,7 +278,7 @@
warnx("Config file error: warning: unable to "
"get device file for disk at col %d: %s",
c, b1);
- b = buf;
+ b = "absent";
}
strlcpy(cfgPtr->devnames[0][c], b,
>Release-Note:
>Audit-Trail:
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56917 CVS commit: src/sbin/raidctl
Date: Thu, 21 Jul 2022 09:19:54 +0000
Module Name: src
Committed By: kre
Date: Thu Jul 21 09:19:54 UTC 2022
Modified Files:
src/sbin/raidctl: rf_configure.c
Log Message:
PR bin/56917
If getfsspecname() fails that will usually mean that a NAME=wedge or
ROOT.x partition is unabailable. raidframe specified unavailable
partitions as "absent" so in this case, pass "absent" rather than the
unaltered NAME= or ROOT.x string, which the kernel has no clue what
do do with, and doesn't configure the raid at all.
To generate a diff of this commit:
cvs rdiff -u -r1.36 -r1.37 src/sbin/raidctl/rf_configure.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: kre@NetBSD.org
State-Changed-When: Thu, 21 Jul 2022 09:24:56 +0000
State-Changed-Why:
Patch supplied applied
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.