NetBSD Problem Report #50601

From martin@duskware.de  Wed Dec 30 15:11:28 2015
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.NetBSD.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 751E27ACC8
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 30 Dec 2015 15:11:28 +0000 (UTC)
Date: Wed, 30 Dec 2015 16:11:25 CET
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: sparc64 with root-on-raid0 does not reboot
X-Send-Pr-Version: 3.95

>Number:         50601
>Category:       kern
>Synopsis:       sparc64 with root-on-raid0 does not reboot
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Dec 30 15:15:00 +0000 2015
>Closed-Date:    Thu Jan 17 08:38:40 +0000 2019
>Last-Modified:  Thu Jan 17 08:38:40 +0000 2019
>Originator:     Martin Husemann
>Release:        NetBSD 7.99.25
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD night-owl.duskware.de 7.99.25 NetBSD 7.99.25 (NIGHT-OWL) #367: Mon Dec 21 16:32:08 CET 2015 martin@night-owl.duskware.de:/usr/src/sys/arch/amd64/compile/NIGHT-OWL amd64
Architecture: sparc64
Machine: sparc64
>Description:

With a monolithic kernel (no options MODULAR) and root on raid0 a sparc64
system running -current does not reboot anymore. The kernel starts to detach
devices:

syncing disks... done
cd0: detached
pcmcia1: detached
audio0: detached

and then hangs, idle waiting. I guess raid0 would have been the next to
detach.

>How-To-Repeat:
s/a

>Fix:
n/a

>Release-Note:

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/50601: sparc64 with root-on-raid0 does not reboot
Date: Wed, 30 Dec 2015 21:15:33 +0100

 The patch below fixes the issue for me. The first call to raid_detach
 fails with EBUSY, but does not unlock the raidset. The next call never
 is able to aquire the lock.

 With this change it reboots just fine, but interestingly I get a duplicate
 detach message from raid0:

 Done running shutdown hooks.
 Desyncing disks... done
 cd0: detached
 pcmcia1: detached
 audio0: detached
 raid0: Waiting for parity re-write to exit...
 raid0: Error re-writing parity (1)!
 raid0: detached
 raid0: detached
 sd2: detached
 sd1: detached
 scsibus0: detached
 cpu1: shutting down
 cpu0: rebooting


 Patch just as a proof of concept - we should probably deal with the
 unlocking in the relevant error paths in raid_detach_unlocked().

 Does anyone see what causes the duplicate detach messages?

 Martin

 Index: rf_netbsdkintf.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v
 retrieving revision 1.330
 diff -u -r1.330 rf_netbsdkintf.c
 --- rf_netbsdkintf.c	26 Dec 2015 21:50:43 -0000	1.330
 +++ rf_netbsdkintf.c	30 Dec 2015 20:00:55 -0000
 @@ -3938,6 +3938,8 @@
  		return (error);

  	error = raid_detach_unlocked(rs);
 +	if (error)
 +		raidunlock(rs);

  	return error;
  }

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/50601: sparc64 with root-on-raid0 does not reboot
Date: Wed, 30 Dec 2015 23:58:03 +0000 (UTC)

 martin@duskware.de (Martin Husemann) writes:

 > Does anyone see what causes the duplicate detach messages?

 The duplicate detach message comes from printing it in the
 detach code and automatically printing it in the shutdown
 code where config_detach() isn't called with DETACH_QUIET.

 > Martin
 > 
 > Index: rf_netbsdkintf.c
 > ===================================================================
 > RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v
 > retrieving revision 1.330
 > diff -u -r1.330 rf_netbsdkintf.c
 > --- rf_netbsdkintf.c	26 Dec 2015 21:50:43 -0000	1.330
 > +++ rf_netbsdkintf.c	30 Dec 2015 20:00:55 -0000
 > @@ -3938,6 +3938,8 @@
 >  		return (error);
 >  
 >  	error = raid_detach_unlocked(rs);
 > +	if (error)
 > +		raidunlock(rs);

 Moving raidunlock (and raidput) out of raid_detach_unlocked
 is easier to understand. I.e.:

         if ((error = raidlock(rs)) != 0)
                 return (error);

         error = raid_detach_unlocked(rs);

         raidunlock(rs);

         if (error)
                 return error;

         /* Free the softc */
         raidput(rs);

         return 0;


 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

State-Changed-From-To: open->closed
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Thu, 17 Jan 2019 08:38:40 +0000
State-Changed-Why:
better patch was commited 3 years ago.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.