NetBSD Problem Report #45187
From Manuel.Bouyer@lip6.fr Thu Jul 28 16:11:25 2011
Return-Path: <Manuel.Bouyer@lip6.fr>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 68A8263BEB2
for <gnats-bugs@gnats.NetBSD.org>; Thu, 28 Jul 2011 16:11:25 +0000 (UTC)
Message-Id: <20110728161121.22FD334C28@armandeche.soc.lip6.fr>
Date: Thu, 28 Jul 2011 18:11:21 +0200 (MEST)
From: Manuel.Bouyer@lip6.fr
Reply-To: Manuel.Bouyer@lip6.fr
To: gnats-bugs@gnats.NetBSD.org
Subject: select(2) sometimes doesn't wakeup
X-Send-Pr-Version: 3.95
>Number: 45187
>Category: kern
>Synopsis: select(2) sometimes doesn't wakeup
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Jul 28 16:15:01 +0000 2011
>Closed-Date: Mon Jul 14 01:05:49 +0000 2014
>Last-Modified: Mon Jul 14 01:05:49 +0000 2014
>Originator: Manuel Bouyer
>Release: NetBSD 5.99.55
>Organization:
>Environment:
System: NetBSD borneo 5.99.55 NetBSD 5.99.55 (GENERIC) #2: Thu Jul 28 17:49:53 CEST 2011 bouyer@hop:/dsk/l1/misc/bouyer/tmp/amd64/obj/dsk/l1/misc/bouyer/quota2/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: amd64
Machine: amd64
>Description:
While testing a fix for another issue, I ran the following:
while (1)
rsync -avH --delete --delete-excluded --delete-after --delay-updates --force --stats --partial rsync://rsync.fr.netbsd.org/NetBSD/NetBSD-release-5-0/src .
rsync -avH --delete --delete-excluded --delete-after --delay-updates --force --stats --partial rsync://rsync.fr.netbsd.org/NetBSD/NetBSD-release-4-0/src .
end
Then I noticed that rsync would be very slow, updating only a few
files per minutes (the test box has a 100Mb connection to
rsync.fr.netbsd.org). It spends most of its time in select(2),
while the receive socket buffer is full.
ktrace shows the following:
4102 1 rsync 1311780519.324063908 CALL select(4,0x7f7fffff83b0,0x7f7fffff8390,0,0x7f7fffff83d0)
4102 1 rsync 1311780579.483436279 RET select 0
4102 1 rsync 1311780579.483440327 CALL select(4,0x7f7fffff83b0,0x7f7fffff8390,0,0x7f7fffff83d0)
4102 1 rsync 1311780579.483442445 RET select 1
4102 1 rsync 1311780579.483443326 CALL read(3,0x7f7ff7a36de2,0x21a)
4102 1 rsync 1311780579.483451341 GIO fd 3 read 538 bytes
So select blocks (maybe because there's effectively nothing to read at this
time), but instead of waking up when there's data ready it wakes up
when the timeout expires. The next select call returns immediatly.
>How-To-Repeat:
see above
>Fix:
disabling DIRECT_SELECT in sys/kern/sys_select.c (with a
#define NO_DIRECT_SELECT at top of file) makes rsync behaves
as expected.
>Release-Note:
>Audit-Trail:
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/45187: select(2) sometimes doesn't wakeup
Date: Fri, 29 Jul 2011 06:07:16 +0000 (UTC)
>>Number: 45187
>>Category: kern
>>Synopsis: select(2) sometimes doesn't wakeup
PR/44763 is related.
YAMAMOTO Takashi
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/45187 CVS commit: src/sys/kern
Date: Sat, 6 Aug 2011 11:04:25 +0000
Module Name: src
Committed By: hannken
Date: Sat Aug 6 11:04:25 UTC 2011
Modified Files:
src/sys/kern: sys_select.c
Log Message:
Fix the races of direct select()/poll():
- When sel_do_scan() restarts do a full initialization with selclear() so
we start from an empty set without registered events. Defer the
evaluation of l_selret after selclear() and add the count of direct events
to the count of events.
- For selscan()/pollscan() zero the output descriptors before we poll and
for selscan() take the sc_lock before we change them.
- Change sel_setevents() to not count events already set.
Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
Should fix PR #44763 (select/poll direct-set optimization seems racy)
and PR #45187 (select(2) sometimes doesn't wakeup)
To generate a diff of this commit:
cvs rdiff -u -r1.33 -r1.34 src/sys/kern/sys_select.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: kern-bug-people->hannken
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Thu, 11 Aug 2011 06:06:41 +0000
Responsible-Changed-Why:
Take.
State-Changed-From-To: open->feedback
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Thu, 11 Aug 2011 06:06:41 +0000
State-Changed-Why:
Should be fixed now - please confirm.
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: hannken@NetBSD.org, kern-bug-people@NetBSD.org, netbsd-bugs@NetBSD.org,
gnats-admin@NetBSD.org
Subject: Re: kern/45187 (select(2) sometimes doesn't wakeup)
Date: Thu, 11 Aug 2011 22:23:26 +0200
On Thu, Aug 11, 2011 at 06:06:42AM +0000, hannken@NetBSD.org wrote:
> Synopsis: select(2) sometimes doesn't wakeup
>
> Responsible-Changed-From-To: kern-bug-people->hannken
> Responsible-Changed-By: hannken@NetBSD.org
> Responsible-Changed-When: Thu, 11 Aug 2011 06:06:41 +0000
> Responsible-Changed-Why:
> Take.
>
>
> State-Changed-From-To: open->feedback
> State-Changed-By: hannken@NetBSD.org
> State-Changed-When: Thu, 11 Aug 2011 06:06:41 +0000
> State-Changed-Why:
> Should be fixed now - please confirm.
Unfortunably the problem is still there, with the same symptoms:
socket receive queue is full, rsync is sleeping on select.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
Responsible-Changed-From-To: hannken->kern-bug-people
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Thu, 11 Aug 2011 20:27:56 +0000
Responsible-Changed-Why:
Give back.
State-Changed-From-To: feedback->open
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Thu, 11 Aug 2011 20:27:56 +0000
State-Changed-Why:
Not fixed.
State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 12 Jul 2014 20:01:06 +0000
State-Changed-Why:
Is this problem still live? If so I'd like to mark it important for -7.
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, netbsd-bugs@NetBSD.org, gnats-admin@NetBSD.org,
dholland@NetBSD.org
Subject: Re: kern/45187 (select(2) sometimes doesn't wakeup)
Date: Sun, 13 Jul 2014 08:48:58 +0200
On Sat, Jul 12, 2014 at 08:01:06PM +0000, dholland@NetBSD.org wrote:
> Synopsis: select(2) sometimes doesn't wakeup
>
> State-Changed-From-To: open->feedback
> State-Changed-By: dholland@NetBSD.org
> State-Changed-When: Sat, 12 Jul 2014 20:01:06 +0000
> State-Changed-Why:
> Is this problem still live? If so I'd like to mark it important for -7.
I can't reproduce it any more with a recent netbsd-6 installation.
But I can't be sure it's fixed, too many things have changed:
hardware (on both ends), network path and the rsync binary ...
I'd say close this PR, and if the problem shows up again I'll open a new one.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 14 Jul 2014 01:05:49 +0000
State-Changed-Why:
we will assume/believe it's fixed until we get contrary evidence.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.