NetBSD Problem Report #50601
From martin@duskware.de Wed Dec 30 15:11:28 2015
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.NetBSD.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 751E27ACC8
for <gnats-bugs@gnats.NetBSD.org>; Wed, 30 Dec 2015 15:11:28 +0000 (UTC)
Date: Wed, 30 Dec 2015 16:11:25 CET
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: sparc64 with root-on-raid0 does not reboot
X-Send-Pr-Version: 3.95
>Number: 50601
>Category: kern
>Synopsis: sparc64 with root-on-raid0 does not reboot
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Dec 30 15:15:00 +0000 2015
>Closed-Date: Thu Jan 17 08:38:40 +0000 2019
>Last-Modified: Thu Jan 17 08:38:40 +0000 2019
>Originator: Martin Husemann
>Release: NetBSD 7.99.25
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD night-owl.duskware.de 7.99.25 NetBSD 7.99.25 (NIGHT-OWL) #367: Mon Dec 21 16:32:08 CET 2015 martin@night-owl.duskware.de:/usr/src/sys/arch/amd64/compile/NIGHT-OWL amd64
Architecture: sparc64
Machine: sparc64
>Description:
With a monolithic kernel (no options MODULAR) and root on raid0 a sparc64
system running -current does not reboot anymore. The kernel starts to detach
devices:
syncing disks... done
cd0: detached
pcmcia1: detached
audio0: detached
and then hangs, idle waiting. I guess raid0 would have been the next to
detach.
>How-To-Repeat:
s/a
>Fix:
n/a
>Release-Note:
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/50601: sparc64 with root-on-raid0 does not reboot
Date: Wed, 30 Dec 2015 21:15:33 +0100
The patch below fixes the issue for me. The first call to raid_detach
fails with EBUSY, but does not unlock the raidset. The next call never
is able to aquire the lock.
With this change it reboots just fine, but interestingly I get a duplicate
detach message from raid0:
Done running shutdown hooks.
Desyncing disks... done
cd0: detached
pcmcia1: detached
audio0: detached
raid0: Waiting for parity re-write to exit...
raid0: Error re-writing parity (1)!
raid0: detached
raid0: detached
sd2: detached
sd1: detached
scsibus0: detached
cpu1: shutting down
cpu0: rebooting
Patch just as a proof of concept - we should probably deal with the
unlocking in the relevant error paths in raid_detach_unlocked().
Does anyone see what causes the duplicate detach messages?
Martin
Index: rf_netbsdkintf.c
===================================================================
RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v
retrieving revision 1.330
diff -u -r1.330 rf_netbsdkintf.c
--- rf_netbsdkintf.c 26 Dec 2015 21:50:43 -0000 1.330
+++ rf_netbsdkintf.c 30 Dec 2015 20:00:55 -0000
@@ -3938,6 +3938,8 @@
return (error);
error = raid_detach_unlocked(rs);
+ if (error)
+ raidunlock(rs);
return error;
}
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/50601: sparc64 with root-on-raid0 does not reboot
Date: Wed, 30 Dec 2015 23:58:03 +0000 (UTC)
martin@duskware.de (Martin Husemann) writes:
> Does anyone see what causes the duplicate detach messages?
The duplicate detach message comes from printing it in the
detach code and automatically printing it in the shutdown
code where config_detach() isn't called with DETACH_QUIET.
> Martin
>
> Index: rf_netbsdkintf.c
> ===================================================================
> RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v
> retrieving revision 1.330
> diff -u -r1.330 rf_netbsdkintf.c
> --- rf_netbsdkintf.c 26 Dec 2015 21:50:43 -0000 1.330
> +++ rf_netbsdkintf.c 30 Dec 2015 20:00:55 -0000
> @@ -3938,6 +3938,8 @@
> return (error);
>
> error = raid_detach_unlocked(rs);
> + if (error)
> + raidunlock(rs);
Moving raidunlock (and raidput) out of raid_detach_unlocked
is easier to understand. I.e.:
if ((error = raidlock(rs)) != 0)
return (error);
error = raid_detach_unlocked(rs);
raidunlock(rs);
if (error)
return error;
/* Free the softc */
raidput(rs);
return 0;
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
State-Changed-From-To: open->closed
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Thu, 17 Jan 2019 08:38:40 +0000
State-Changed-Why:
better patch was commited 3 years ago.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.