NetBSD Problem Report #27179

Received: (qmail 12586 invoked by uid 605); 7 Oct 2004 08:25:45 -0000
Message-Id: <20041007082542.5423976C8@asparagus.emsi.priv.at>
Date: Thu,  7 Oct 2004 10:25:42 +0200 (CEST)
From: mjl@netbsd.org
Sender: gnats-bugs-owner@NetBSD.org
Reply-To: mjl@emsi.priv.at
To: gnats-bugs@gnats.NetBSD.org
Subject: dump(8) goes into loop, never finishing dump
X-Send-Pr-Version: 3.95

>Number:         27179
>Category:       bin
>Synopsis:       dump(8) goes into loop, never finishing dump
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Oct 07 08:26:00 +0000 2004
>Closed-Date:    
>Last-Modified:  Mon May 03 02:50:02 +0000 2010
>Originator:     Martin J. Laubach
>Release:        NetBSD 2.0_RC2
>Organization:
>Environment:
System: NetBSD asparagus.emsi.priv.at 2.0_RC2 NetBSD 2.0_RC2 (ASPARAGUS) #0: Sat Oct 2 00:43:42 CEST 2004 mjl@asparagus.emsi.priv.at:/storage/netbsd/cvs/src/sys20/arch/i386/compile/ASPARAGUS i386
Architecture: i386
Machine: i386
>Description:

  It looks as if dump got in some kind of loop and never
finishes (usually that disk is dumped in an about hour,
I interrupted the dump after 10 hours).

|   DUMP: Found /dev/rld0h on /home in /etc/fstab
|   DUMP: Date of this level 0 dump: Wed Oct  6 03:49:10 2004
|   DUMP: Date of last level 0 dump: the epoch
|   DUMP: Dumping /dev/rld0h (/home) to standard output
|   DUMP: Label: none
|   DUMP: mapping (Pass I) [regular files]
|   DUMP: mapping (Pass II) [directories]
|   DUMP: estimated 14143153 tape blocks.
|   DUMP: Volume 1 started at: Wed Oct  6 03:49:31 2004
|   DUMP: dumping (Pass III) [directories]
|   DUMP: dumping (Pass IV) [regular files]
|   DUMP: 3.68% done, finished in 2:11
|   DUMP: 8.65% done, finished in 1:45
|   DUMP: 12.81% done, finished in 1:42
|   DUMP: 17.47% done, finished in 1:34
|   DUMP: 22.47% done, finished in 1:26
|   DUMP: 27.63% done, finished in 1:18
|   DUMP: 32.90% done, finished in 1:11
|   DUMP: 37.99% done, finished in 1:05
|   DUMP: 43.33% done, finished in 0:58
|   DUMP: 48.59% done, finished in 0:52
|   DUMP: 53.98% done, finished in 0:46
|   DUMP: 59.30% done, finished in 0:41
|   DUMP: 64.35% done, finished in 0:36
|   DUMP: 69.60% done, finished in 0:30
|   DUMP: 75.26% done, finished in 0:24
|   DUMP: 80.26% done, finished in 0:19
|   DUMP: 81.72% done, finished in 0:19
|   DUMP: 81.96% done, finished in 0:19
|   DUMP: 81.96% done, finished in 0:19
|   DUMP: 82.01% done, finished in 0:20
|   DUMP: 82.06% done, finished in 0:21
| ...
|   DUMP: 90.78% done, finished in 1:28
|   DUMP: 90.83% done, finished in 1:28
|   DUMP: 90.87% done, finished in 1:28
|   DUMP: 90.92% done, finished in 1:28
|   DUMP: 90.97% done, finished in 1:28
|   DUMP: 91.01% done, finished in 1:28
|   DUMP: 91.06% done, finished in 1:28
|   DUMP: 91.10% done, finished in 1:28
|   DUMP: 91.15% done, finished in 1:28
|   DUMP: 91.19% done, finished in 1:28
|   DUMP: 91.27% done, finished in 1:28
|   DUMP: 91.31% done, finished in 1:28
|   DUMP: 91.36% done, finished in 1:28
|   DUMP: 91.40% done, finished in 1:28
|   DUMP: 91.45% done, finished in 1:28
|   DUMP: 91.49% done, finished in 1:28
|   DUMP: 91.54% done, finished in 1:28
|   DUMP: 91.58% done, finished in 1:28

  This seems to happen in amanda initiated dumps only,
manual dumps work fine. Also, it seems directly related
to the size of the file system (or perhaps the time it
takes to dump), it never happens on small fs, but is
pretty consistent on large ones like the one above.

  This problem was there in 1.6 too, but something in
my setup made it disappear at some point. Now it's back
it seems.

  Several people have commented they experienced the
same problem:

---
From: Luke Mewburn <lukem@NetBSD.org>
Subject: Re: dump(8) behaviour

[..]

I see it in 2.0G, from amanda dumps.
Manual dumps work fine.

amanda did work for a while in 2.0, but something changed and I haven't
been able to track down what it is.  It's not a PIPE_SOCKETPAIR issue.

---
From: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>

[..]

Yep. I used to see this pretty often in amanda backup logs on 1.6.x and
1.6A..Z. It has become less frequent with 2.0, but still occurs
occasionally.

        hauke
---
From: Tom Ivar Helbekkmo <tih@eunetnorge.no>

[..]

That's my experience, too.  Furthermore, it never happens on small
file systems; the chance of dump hanging increases with fs size.

-tih

>How-To-Repeat:

  Install amanda and try to dump large file system?

>Fix:
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback
State-Changed-By: joerg@narn.netbsd.org
State-Changed-When: Sat, 19 Jan 2008 14:08:09 +0000
State-Changed-Why:
Does this problem still exist?


State-Changed-From-To: feedback->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 03 May 2010 01:26:39 +0000
State-Changed-Why:
I see nothing in the changelogs for dump that would have been remotely
likely to correct this. All the same, it would be extremely useful to
know if this problem has been seen recently.


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/27179 (dump(8) goes into loop, never finishing dump)
Date: Mon, 3 May 2010 02:46:31 +0000

 On Mon, May 03, 2010 at 01:26:40AM +0000, dholland@NetBSD.org wrote:
  > I see nothing in the changelogs for dump that would have been remotely
  > likely to correct this. All the same, it would be extremely useful to
  > know if this problem has been seen recently.

 Looking into it some more I have the following observations:

 (1) it is probably a race condition somewhere, so there's some more or
 less fixed probability of triggering it at any given moment, which is
 why it manifests mostly on large dumps;

 (2) in the case cited, it is making progress, or thinks it is, just at
 about 1/100th the normal rate;

 (3) the scheme dump uses to coordinate its subprocesses is fragile and
 could be messed up by all manner of kernel bugs, particularly in
 signals or AF_UNIX sockets; however, I also can't so far rule out a
 corner case in the state transitions;

 (4) however, all of the likely issues I can think of (of either kind)
 would lead to it hanging completely, not proceeding at a crawl.

 (5) I also have no idea why it would be correlated with using amanda.


 So I dunno. If anyone manages to reproduce this it would be useful to
 know where the various dump processes are spending their time while
 mostly not making progress...

 -- 
 David A. Holland
 dholland@netbsd.org

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.