NetBSD Problem Report #47424
From kre@munnari.OZ.AU Wed Jan 9 06:04:03 2013
Return-Path: <kre@munnari.OZ.AU>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
by www.NetBSD.org (Postfix) with ESMTP id 26F9163EBD5
for <gnats-bugs@gnats.NetBSD.org>; Wed, 9 Jan 2013 06:04:03 +0000 (UTC)
Message-Id: <201301090602.r0962gxa013730@jade.coe.psu.ac.th>
Date: Wed, 9 Jan 2013 13:02:42 +0700 (ICT)
From: kre@munnari.OZ.AU
To: gnats-bugs@gnats.NetBSD.org
Subject: pkgsrc "make fetch" fails to fetch fotoxx-13.01.tar.gz for graphics/fotoxx
X-Send-Pr-Version: 3.95
>Number: 47424
>Category: bin
>Synopsis: pkgsrc "make fetch" fails to fetch fotoxx-13.01.tar.gz for graphics/fotoxx
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jan 09 06:05:00 +0000 2013
>Closed-Date:
>Last-Modified: Sat Jan 12 03:10:00 +0000 2013
>Originator: Robert Elz
>Release: NetBSD 5.1_STABLE (pkgsrc current 2013-01-09)
>Organization:
Prince of Songkla University
>Environment:
System: NetBSD jade.coe.psu.ac.th 5.1_STABLE NetBSD 5.1_STABLE (JADE-1.12-20120130) #27: Tue Jan 31 05:20:31 ICT 2012 kre@jade.coe.psu.ac.th:/usr/obj/5/kernels/i386/JADE i386
Architecture: i386
Machine: i386
>Description:
For some undetermined (so far) reason, a "make fetch" (or "make
checksum") in graphics/fotoxx stalls after fetching 2129920 of the
expected 2131822 bytes that are in the file.
wget fetches the file correctly (after fetching that way the size
and checksum are as expected). So does ftp, if it is left long
enough (the pkgsrc fetch times out after stalling for 121 seconds,
which is apparently not long enough for those last 1902 bytes to
arrive).
f.n.b currently has a fotoxx-13.01.tar.gz that is (exactly) 32KB
in its distfiles directory (probably caused by a transfer that
failed in a similar way). That needs to be removed and the transfer
redone.
I have no love (to put it mildly) for using the http protocol to
fetch files (a http:// url), but there must be something broken
in the ftp client (in NetBSD 5, and current) that causes the fetch
to stall (very repeatably) at that point.
I tried the same thing using NetBSD current (amd64) (well, 6.99.15
from early December, so not quite current) - it also stalled at
the same point, and also eventually recovered.
>How-To-Repeat:
Using NetBSD 5, attempt ...
ftp http://www.kornelix.com/uploads/1/3/0/3/13035936/fotoxx-13.01.tar.gz
Watch it get to 2080KiB (2129920 bytes) and stall - then just wait,
a fairly long time, and it will complete. Try again
using wget, and observe it complete correctly, and quickly.
Try using "make fetch" and observe pkgsrc detect the stalled ftp
and kill it before it has a chance to finish.
>Fix:
No idea at the minute (obvious workaround would be to add a
"FETCH_USING" or whatever it is so wget is always used, or do
something to alter the timeout) - but whatever is making ftp
behave differently than wget here really needs to be fixed.
>Release-Note:
>Audit-Trail:
From: Thomas Klausner <wiz@NetBSD.org>
To: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Cc:
Subject: Re: pkg/47424: pkgsrc "make fetch" fails to fetch
fotoxx-13.01.tar.gz for graphics/fotoxx
Date: Wed, 9 Jan 2013 09:17:55 +0100
On Wed, Jan 09, 2013 at 06:05:01AM +0000, kre@munnari.OZ.AU wrote:
> For some undetermined (so far) reason, a "make fetch" (or "make
> checksum") in graphics/fotoxx stalls after fetching 2129920 of the
> expected 2131822 bytes that are in the file.
>
> wget fetches the file correctly (after fetching that way the size
> and checksum are as expected). So does ftp, if it is left long
> enough (the pkgsrc fetch times out after stalling for 121 seconds,
> which is apparently not long enough for those last 1902 bytes to
> arrive).
I see the same behaviour on 6.99.16/amd64. Perhaps a bug report for
ftp(1) is in order?
> f.n.b currently has a fotoxx-13.01.tar.gz that is (exactly) 32KB
> in its distfiles directory (probably caused by a transfer that
> failed in a similar way). That needs to be removed and the transfer
> redone.
I've replaced the file on nbftp.
Thomas
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: pkg/47424: pkgsrc "make fetch" fails to fetch fotoxx-13.01.tar.gz for graphics/fotoxx
Date: Wed, 09 Jan 2013 18:58:18 +0700
Date: Wed, 9 Jan 2013 08:20:06 +0000 (UTC)
From: Thomas Klausner <wiz@NetBSD.org>
Message-ID: <20130109082006.A425663EBD5@www.NetBSD.org>
| I see the same behaviour on 6.99.16/amd64. Perhaps a bug report for
| ftp(1) is in order?
Perhaps, though ftp does (eventually) work ... I;ll try to analyse
what is actually happening before calling it an ftp bug.
pkgsrc did need some assistance though (which is why the PR there).
| I've replaced the file on nbftp.
Great, thyanks, in that case you can close the PR, as pkgsrc will
fail over to that one (if the ftp stalls and times out) and the
fetch from f.n.o should work fine as a backup.
kre
State-Changed-From-To: open->closed
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Wed, 09 Jan 2013 18:17:26 +0000
State-Changed-Why:
Distfile on nbftp is enough for submitter; he'll investigate ftp(1) further.
Thanks for the PR!
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: pkg/47424 (pkgsrc "make fetch" fails to fetch fotoxx-13.01.tar.gz for graphics/fotoxx)
Date: Fri, 11 Jan 2013 19:15:47 +0700
Date: Wed, 9 Jan 2013 18:17:28 +0000 (UTC)
From: wiz@NetBSD.org
Message-ID: <20130109181728.B809F63ED1A@www.NetBSD.org>
| Distfile on nbftp is enough for submitter; he'll investigate ftp(1) further.
I have done, and I don't think it is a bug in ftp(1), I think it is a bug
in the HTTP server that's serving the file (though I am certainly no
expert on the intricacies of HTTP file transfers, nor do I particularly
want to become one...)
The difference between using ftp and wget (or so it appears to me) is that
the http client in ftp(1) does
Connection: close
in the header of the GET request, and wget does
Connection: Keep-Alive
instead. The server (at www.kornelix.com) (which appears to be Apache, though
I have no idea which version) responds to the wget request with a header that
includes
Content-Length: 213182
Connection: Keep-Alive
whereas with the ftp request, the first of those is present in the reply
header, but there is no "Connection:" field at all.
Either way, in both cases, the server seems to send the data in the file,
and then stop (implementing keep-alive type connections). NetBSD's ftp
client is assuming it will get a FIN to conclude the transfer, so after
all the data has arrived, it just sits and waits for that FIN. The server
on the other hand appears to be waiting for the next request.
Deadlock...
Eventually after almost 3 minutes) I assume their server gets tired
of waiting for a new request, and closes the connection. That (finally)
provides the FIN that ftp(1) has been waiting for, it is happy, and the
connection completed properly.
But that 3 minutes is too much for pkgsrc, which tells ftp to give up
after 2 minutes. When that happens, ftp never bothers to flush the last
(partially filled) buffer, and just aborts, leaving those final 1902
bytes missing from the file (if there's any bug in ftp(1) it would be
that, it did receive the data, it could have written it to the file before
quitting, and had it done do, pkgsrc would have verified the file checksum,
I guess, if not on the same attempt as when ftp failed, then next time
it went to look and found the file already existing) and all would have been
OK. But demanding that processes clean up fully when they are failing,
is probably too much to exoect.
With wget, the client TCP (ie: my system running wget) sends the first FIN,
immediately after receiving the end of the file data, which is what you'd
expect with a client doing Keep-Alive (why it bothers when it has only one
file to fetch I have no idea, but it does, I guess just to be consistent with
usages when it is fetching entire trees of files.)
The real problem here appears to be the HTTP server that seems to be
implementing keep-alive connection mode, when just the opposite was
requested by the client.
I have complete tcpdump (binary form) dumps of the two transactions,
if anyone else (someone who speaks more http than I do) would like to
take a look and confirm (or refute) my analysis. They're each about 2.3MB
big (and probably do not compress much, as most of that will be the file
data, which was a .gz file, though I haven't tried to see).
I can e-mail one of both of those, or make them available for ftp (but
not for http...) if someone would like to take a look - or this seems to
be consistently repeatable enough, that you could make your own trace
(other than the 3 minute timeout, actually about 150 secs idle) all of
this happens fairly quickly, it is not a very big file to transfer.
kre
From: David Holland <dholland-pbugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: pkg/47424 (pkgsrc "make fetch" fails to fetch
fotoxx-13.01.tar.gz for graphics/fotoxx)
Date: Fri, 11 Jan 2013 17:42:05 +0000
On Fri, Jan 11, 2013 at 12:35:07PM +0000, Robert Elz wrote:
> But that 3 minutes is too much for pkgsrc, which tells ftp to give
> up after 2 minutes. When that happens, ftp never bothers to flush
> the last (partially filled) buffer, and just aborts, leaving those
> final 1902 bytes missing from the file (if there's any bug in
> ftp(1) it would be that, it did receive the data, it could have
> written it to the file before quitting, and had it done do, pkgsrc
> would have verified the file checksum, I guess, if not on the same
> attempt as when ftp failed, then next time it went to look and
> found the file already existing) and all would have been OK. But
> demanding that processes clean up fully when they are failing, is
> probably too much to exoect.
That is at least one and maybe two bugs; ftp should write out the data
it has and not throw it away... and also, it should be capable of
noticing that it's received the entire Content-Length and proceeding
accordingly rather than timing out.
--
David A. Holland
dholland@netbsd.org
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: pkg/47424 (pkgsrc "make fetch" fails to fetch fotoxx-13.01.tar.gz for graphics/fotoxx)
Date: Sat, 12 Jan 2013 07:21:14 +0700
Date: Fri, 11 Jan 2013 17:45:02 +0000 (UTC)
From: David Holland <dholland-pbugs@NetBSD.org>
Message-ID: <20130111174502.DD0A063C07C@www.NetBSD.org>
| That is at least one and maybe two bugs; ftp should write out the data
| it has and not throw it away...
Yes, possibly.
| and also, it should be capable of
| noticing that it's received the entire Content-Length and proceeding
| accordingly rather than timing out.
I'll leave it up to someone more familiar with the GTTP spec (like someone
who has actually read it, rather than just reading about it) to determine
what is correct behaviour when the client requests that end of transfer be
signalled by closing the connection (original HTTP 1.0 behaviour) but the
server wants to implement connection keep-alive (so the client can either
reuse the connection, or otherwise initiale the close, so it gets TIME WAIT
state rather than the server).
kre
Responsible-Changed-From-To: pkg-manager->bin-bug-people
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Sat, 12 Jan 2013 03:10:00 +0000
Responsible-Changed-Why:
A problem exists in ftp(1).
State-Changed-From-To: closed->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 12 Jan 2013 03:10:00 +0000
State-Changed-Why:
.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.