NetBSD Problem Report #30816

From blymn@blymn.cust.internode.on.net  Sat Jul 23 14:20:25 2005
Return-Path: <blymn@blymn.cust.internode.on.net>
Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.203])
	by narn.netbsd.org (Postfix) with ESMTP id 1F33F63B117
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 23 Jul 2005 14:20:25 +0000 (UTC)
Message-Id: <200507231420.j6NEKH9x025625@blymn.cust.internode.on.net>
Date: Sat, 23 Jul 2005 23:50:17 +0930 (CST)
From: blymn@baea.com.au
Reply-To: blymn@baea.com.au
To: gnats-bugs@netbsd.org
Subject: dump(8) broken for larger values of blocking 
X-Send-Pr-Version: 3.95

>Number:         30816
>Category:       bin
>Synopsis:       large blocking factors cannot be used with dump
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jul 23 14:21:01 +0000 2005
>Originator:     Brett Lymn (Master of the Siren)
>Release:        NetBSD 3.99.6
>Organization:
Brett Lymn
>Environment:
System: NetBSD siren 3.99.6 NetBSD 3.99.6 (SIREN.ACPI.MP) #10: Sun Jul 17 19:29:12 CST 2005 toor@siren:/usr/src/sys/arch/amd64/compile/SIREN.ACPI.MP amd64
Architecture: x86_64
Machine: amd64
>Description:
	The b option of dump(8) may have a value of between 1 and 1000
according to the usage message from dump.  If a blocksize above about
200 is used then dump misbehaves in various ways, either looping
indefinitely or quitting with a "master/slave protocol botched" whilst
pass III is being done.  It seems the larger b is the more likely you
get the master/slave protocol botched message, values near 256 result
in a hang due to an infinite loop in tape.c:doslave(), for some reason
p->count is zero which causes the first for loop in doslave() to
never terminate.

>How-To-Repeat:
	I was dumping a 40Gb partition to a DLT40 tape drive using a 
blocksize of 512, this resulted in dump hanging during pass III of the
dump.  The machine was up multi-user but the filesystem in question does
fsck clean (i.e. this problem is not due to attempting to back up a
corrupt fs)

>Fix:
	The problem can be worked around by using a lower blocking size at
the expense of the tape drive not streaming, a blocksize of 128 appears to
work reliably.  I had a look at the code and there is only one place that
the request count could be zero and that is in tape.c:flushtape() where it
is deliberately zeroed and a comment of "Sentinel" is next to this statement.
This "sentinel" state does not seem to be checked anywhere in the code.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.