NetBSD Problem Report #54858

From bernd@bor.bersie.loc  Tue Jan 14 10:56:20 2020
Return-Path: <bernd@bor.bersie.loc>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id DC76E7A187
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 14 Jan 2020 10:56:19 +0000 (UTC)
Message-Id: <20200114093946.B902B315FD@bor.bersie.loc>
Date: Tue, 14 Jan 2020 10:39:46 +0100 (CET)
From: bernd.sieker@posteo.net
Reply-To: bernd.sieker@posteo.net
To: gnats-bugs@NetBSD.org
Subject: Rraidframe parity map shows all regions dirty despite being clean.
X-Send-Pr-Version: 3.95

>Number:         54858
>Category:       bin
>Synopsis:       Raidframe parity map diplay in raidctl shows all regions dirty despite being clean
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jan 14 11:00:00 +0000 2020
>Last-Modified:  Wed Jan 15 00:10:01 +0000 2020
>Originator:     bernd.sieker@posteo.net
>Release:        NetBSD 8.1
>Organization:
>Environment:
System: NetBSD bor 8.1 NetBSD 8.1 (BOR) #2: Mon Jan 13 10:00:26 CET 2020 bernd@bor:/usr/src/sys/arch/amd64/compile/BOR amd64

ident raidctl:
/sbin/raidctl:
     $NetBSD: crt0.S,v 1.3 2011/07/01 02:59:05 joerg Exp $
     $NetBSD: crt0-common.c,v 1.14 2016/06/07 12:07:35 joerg Exp $
     $NetBSD: crti.S,v 1.1 2010/08/07 18:01:35 joerg Exp $
     $NetBSD: crtbegin.S,v 1.2 2010/11/30 18:37:59 joerg Exp $
     $NetBSD: rf_configure.c,v 1.26.8.1 2018/09/10 17:56:00 martin Exp $
     $NetBSD: raidctl.c,v 1.65 2016/01/06 22:57:44 wiz Exp $
     $NetBSD: raidctl_hostops.c,v 1.3 2017/01/10 20:47:05 christos Exp $
     $NetBSD: crtend.S,v 1.1 2010/08/07 18:01:34 joerg Exp $
     $NetBSD: crtn.S,v 1.1 2010/08/07 18:01:35 joerg Exp $

ldd raidctl:
/sbin/raidctl:
        -lutil.7 => /lib/libutil.so.7
        -lc.12 => /lib/libc.so.12
Architecture: x86_64
Machine: amd64
>Description:
I set up a new RAID-1 with one component "absent", made a filesystem, copied
contents over from an old filesystem.
Then I added a second disk as a spare, and initiated reconstruction onto the
spare. Afterwards I rebooted to have a "cleaner" setup with proper components
indicated by raidctl -s.
It all works as expected, and raidctl -s shows that all is fine:

$ raidctl -s raid0
Components:
            /dev/dk0: optimal
            /dev/dk1: optimal
No spares.
Component label for /dev/dk0:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2020011100, Mod Counter: 390
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 7814036992
   RAID Level: 1
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid0
Component label for /dev/dk1:
   Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2020011100, Mod Counter: 390
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 7814036992
   RAID Level: 1
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid0
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

raidctl -p agrees that parity is clean, too:

$ raidctl -p raid0
/dev/rraid0: Parity status: clean

However, raidctl -mv shows an all-dirty parity map:

$ raidctl -mv raid0
raid0: parity map enabled with 4096 regions of 932MB
raid0: regions marked clean after 8 intervals of 40.000s
raid0: write/sync/clean counters 15/0/0
raid0: 4096 dirty regions
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
raid0: parity map will remain enabled on next configure

Even after a prolonged wait with no activity, nothing changes.
Even a freshly configured, clean raid shows the same.

The disks are two 4 TB SATA disks with identical GPT:

$ gpt show wd1
       start        size  index  contents
           0           1         PMBR
           1           1         Pri GPT header
           2          32         Pri GPT table
          34          30         Unused
          64  7814037056      1  GPT part - NetBSD RAIDFrame component
  7814037120          15         Unused
  7814037135          32         Sec GPT table
  7814037167           1         Sec GPT header

A proper diskwedge is created automatically:

$ dkctl wd1 listwedges
/dev/rwd1: 1 wedge:
dk0: raidbackup01, 7814037056 blocks at 64, type: raidframe

The raid itself also has a gpt and a wedge that work fine.
There is no problem using the raid, and parity reconstruction after
a hard reset also works as if the parity map worked fine;
it is only displayed wrong. 

I am normally running a custom-built kernel, but the same happens
with a GENERIC kernel.

>How-To-Repeat:
I don't know. I have other raidframe setups on other computers
(RAID-1 and RAID-5) with a working parity map display.
>Fix:
Unknown.

>Audit-Trail:
From: Bernd Sieker <bernd.sieker@posteo.net>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/54858: Rraidframe parity map shows all regions dirty despite
 being clean.
Date: Tue, 14 Jan 2020 23:50:38 +0100

 On 14.01.2020 12:00, gnats-admin@netbsd.org wrote:
 > Thank you very much for your problem report.
 > It has the internal identification `bin/54858'.
 > The individual assigned to look at your
 > report is: bin-bug-people. 
 > 
 >> Category:       bin
 >> Responsible:    bin-bug-people
 >> Synopsis:       Raidframe parity map diplay in raidctl shows all regions dirty despite being clean
 >> Arrival-Date:   Tue Jan 14 11:00:00 +0000 2020
 > 

 I found a workaround. When I disable the parity map, unconfigure the
 raid, configure it again, enable the parity map, unconfigure and
 configure it again, the parity map works and is displayed as expected.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.