NetBSD Problem Report #43169

From leo@marco.de  Fri Apr 16 12:25:00 2010
Return-Path: <leo@marco.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 1EB8263B8BC
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 16 Apr 2010 12:25:00 +0000 (UTC)
Message-Id: <20100416110540.1CCAF73DB@zork.local>
Date: Fri, 16 Apr 2010 13:05:40 +0200 (CEST)
From: leo@marco.de
Reply-To: leo@marco.de
To: gnats-bugs@gnats.NetBSD.org
Subject: sys/dev/ata/wd.c doesn't call disk_unbusy in case of errors/retries
X-Send-Pr-Version: 3.95

>Number:         43169
>Category:       kern
>Synopsis:       sys/dev/ata/wd.c doesn't call disk_unbusy in case of errors/retries
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    jdolecek
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Apr 16 12:25:00 +0000 2010
>Closed-Date:    Sat Oct 07 20:04:55 +0000 2017
>Last-Modified:  Sat Oct 07 20:04:55 +0000 2017
>Originator:     leo@marco.de
>Release:        NetBSD 5.99.26
>Organization:
Matthias Pfaller                            Software Entwicklung
marco Systemanalyse und Entwicklung GmbH    Tel   +49 8131 5161 41
Hans-Böckler-Str. 2, D 85221 Dachau         Fax   +49 8131 5161 66
http://www.marco.de/                        Email leo@marco.de
>Environment:


System: NetBSD zork 5.99.26 NetBSD 5.99.26 (ZORK) #0: Sat Apr 10 14:30:43 CEST 2010 leo@zork:/usr/src-current/sys/arch/amd64/compile/ZORK amd64
Architecture: x86_64
Machine: amd64
>Description:
	When wd.c retries a request (e.g. after a timeout because the drive
	is powered down), it will issue a callout to wdrestart and return
	without doing an disk_unbusy. wdrestart will call __wdstart which
	will call disk_busy a second time while wddone will call disk_unbusy
	only one time.
	The same problem applies to ATACMD_TRY_AGAIN.
>How-To-Repeat:
	Provoke a disk errror and watch the disk beeing 100% busy all the time
	afterwards.
>Fix:
	Maybe it would be best to move disk_busy from __wdstart to wdstart?
	In that case it would be called only one time per request, but the
	disk still would be marked busy until the callout is serviced
	(I'm not sure if this should be considered a bug or a feature...).

>Release-Note:

>Audit-Trail:
From: Matthias Pfaller <leo@marco.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/43169: sys/dev/ata/wd.c doesn't call disk_unbusy in case
 of errors/retries
Date: Thu, 22 Apr 2010 19:04:45 +0200

 This is a multi-part message in MIME format.
 --------------000506070409080104070201
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 7bit

 I'm running for a couple of days with the attached patch. After 
 readerrors the drive is no longer reported 100% busy.

 Matthias

 --------------000506070409080104070201
 Content-Type: text/plain;
  name="P.txt"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline;
  filename="P.txt"

 Index: wd.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/ata/wd.c,v
 retrieving revision 1.384
 diff -u -u -r1.384 wd.c
 --- wd.c	24 Feb 2010 22:37:57 -0000	1.384
 +++ wd.c	22 Apr 2010 16:55:20 -0000
 @@ -585,6 +585,8 @@
  		wd->openings--;

  		wd->retries = 0;
 +		/* Instrumentation. */
 +		disk_busy(&wd->sc_dk);
  		wdstart1(wd, bp);
  	}
  }
 @@ -664,6 +666,7 @@
  			/* No memory -- fail the iop. */
  			bp->b_error = ENOMEM;
  			bp->b_resid = bp->b_bcount;
 +			disk_unbusy(&wd->sc_dk, 0, (bp->b_flags & B_READ));
  			biodone(bp);
  			wd->openings++;
  			return;
 @@ -716,8 +719,6 @@
  		wd->sc_wdc_bio.flags |= ATA_LBA;
  	if (bp->b_flags & B_READ)
  		wd->sc_wdc_bio.flags |= ATA_READ;
 -	/* Instrumentation. */
 -	disk_busy(&wd->sc_dk);
  	switch (wd->atabus->ata_bio(wd->drvp, &wd->sc_wdc_bio)) {
  	case ATACMD_TRY_AGAIN:
  		callout_reset(&wd->sc_restart_ch, hz, wdrestart, wd);
 @@ -943,7 +944,7 @@
  	return error;
  }

 -/* 
 +/*
   * Caller must hold wd->sc_dk.dk_openlock.
   */
  static int
 @@ -1461,7 +1462,7 @@

  		return 0;
  	    }
 -	
 +
  	case DIOCSSTRATEGY:
  	    {
  		struct disk_strategy *dks = (void *)addr;
 @@ -1630,7 +1631,7 @@
  		err = 0;
  		break;
  	default:
 -		panic("wddump: unknown error type %d", err); 
 +		panic("wddump: unknown error type %d", err);
  	}
  	if (err != 0) {
  		printf("\n");

 --------------000506070409080104070201--

Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Sun, 23 Jul 2017 13:53:02 +0000
Responsible-Changed-Why:
I've noticed this bug also while working on jdolecek-ncq branch, fixed it there.


State-Changed-From-To: open->analyzed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sun, 23 Jul 2017 13:53:02 +0000
State-Changed-Why:
Fix committed on dev branch, will close once the branch is merged.


From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/43169 CVS commit: [jdolecek-ncq] src/sys/dev/ata
Date: Sun, 23 Jul 2017 13:50:43 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Sun Jul 23 13:50:43 UTC 2017

 Modified Files:
 	src/sys/dev/ata [jdolecek-ncq]: wd.c

 Log Message:
 for wd, only call disk_busy() on the first try, do not call it on retries,
 as unbusy is called just once when the xfer is finished

 also noticed in PR kern/43169 by Matthias Pfaller, but contrary to suggested
 fix done in way to keep the disk marked busy during the timeouts, as I think
 it's more correct


 To generate a diff of this commit:
 cvs rdiff -u -r1.428.2.28 -r1.428.2.29 src/sys/dev/ata/wd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sat, 07 Oct 2017 17:44:37 +0000
State-Changed-Why:
Change committed on HEAD (-current). Can you confirm the fix?


State-Changed-From-To: feedback->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sat, 07 Oct 2017 20:04:55 +0000
State-Changed-Why:
Fix confirmed in NCQ branch, problem fixed there. Submitter no longer can
confirm.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.