NetBSD Problem Report #45187

From Manuel.Bouyer@lip6.fr  Thu Jul 28 16:11:25 2011
Return-Path: <Manuel.Bouyer@lip6.fr>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 68A8263BEB2
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 28 Jul 2011 16:11:25 +0000 (UTC)
Message-Id: <20110728161121.22FD334C28@armandeche.soc.lip6.fr>
Date: Thu, 28 Jul 2011 18:11:21 +0200 (MEST)
From: Manuel.Bouyer@lip6.fr
Reply-To: Manuel.Bouyer@lip6.fr
To: gnats-bugs@gnats.NetBSD.org
Subject: select(2) sometimes doesn't wakeup
X-Send-Pr-Version: 3.95

>Number:         45187
>Category:       kern
>Synopsis:       select(2) sometimes doesn't wakeup
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jul 28 16:15:01 +0000 2011
>Closed-Date:    Mon Jul 14 01:05:49 +0000 2014
>Last-Modified:  Mon Jul 14 01:05:49 +0000 2014
>Originator:     Manuel Bouyer
>Release:        NetBSD 5.99.55
>Organization:
>Environment:
System: NetBSD borneo 5.99.55 NetBSD 5.99.55 (GENERIC) #2: Thu Jul 28 17:49:53 CEST 2011  bouyer@hop:/dsk/l1/misc/bouyer/tmp/amd64/obj/dsk/l1/misc/bouyer/quota2/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: amd64
Machine: amd64
>Description:
	While testing a fix for another issue, I ran the following:
while (1)
rsync -avH --delete --delete-excluded --delete-after --delay-updates --force --stats --partial rsync://rsync.fr.netbsd.org/NetBSD/NetBSD-release-5-0/src .
rsync -avH --delete --delete-excluded --delete-after --delay-updates --force --stats --partial rsync://rsync.fr.netbsd.org/NetBSD/NetBSD-release-4-0/src .
end

	Then I noticed that rsync would be very slow, updating only a few
	files per minutes (the test box has a 100Mb connection to
	rsync.fr.netbsd.org). It spends most of its time in select(2),
	while the receive socket buffer is full.
	ktrace shows the following:
   4102      1 rsync    1311780519.324063908 CALL  select(4,0x7f7fffff83b0,0x7f7fffff8390,0,0x7f7fffff83d0)
   4102      1 rsync    1311780579.483436279 RET   select 0
   4102      1 rsync    1311780579.483440327 CALL  select(4,0x7f7fffff83b0,0x7f7fffff8390,0,0x7f7fffff83d0)
   4102      1 rsync    1311780579.483442445 RET   select 1
   4102      1 rsync    1311780579.483443326 CALL  read(3,0x7f7ff7a36de2,0x21a) 
   4102      1 rsync    1311780579.483451341 GIO   fd 3 read 538 bytes

So select blocks (maybe because there's effectively nothing to read at this
time), but instead of waking up when there's data ready it wakes up
when the timeout expires. The next select call returns immediatly.

>How-To-Repeat:
	see above
>Fix:
	disabling DIRECT_SELECT in sys/kern/sys_select.c (with a
	#define NO_DIRECT_SELECT at top of file) makes rsync behaves
	as expected.

>Release-Note:

>Audit-Trail:
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/45187: select(2) sometimes doesn't wakeup
Date: Fri, 29 Jul 2011 06:07:16 +0000 (UTC)

 >>Number:         45187
 >>Category:       kern
 >>Synopsis:       select(2) sometimes doesn't wakeup

 PR/44763 is related.

 YAMAMOTO Takashi

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/45187 CVS commit: src/sys/kern
Date: Sat, 6 Aug 2011 11:04:25 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Sat Aug  6 11:04:25 UTC 2011

 Modified Files:
 	src/sys/kern: sys_select.c

 Log Message:
 Fix the races of direct select()/poll():

 - When sel_do_scan() restarts do a full initialization with selclear() so
   we start from an empty set without registered events.  Defer the
   evaluation of l_selret after selclear() and add the count of direct events
   to the count of events.

 - For selscan()/pollscan() zero the output descriptors before we poll and
   for selscan() take the sc_lock before we change them.

 - Change sel_setevents() to not count events already set.

 Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>

 Should fix PR #44763 (select/poll direct-set optimization seems racy)
        and PR #45187 (select(2) sometimes doesn't wakeup)


 To generate a diff of this commit:
 cvs rdiff -u -r1.33 -r1.34 src/sys/kern/sys_select.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: kern-bug-people->hannken
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Thu, 11 Aug 2011 06:06:41 +0000
Responsible-Changed-Why:
Take.


State-Changed-From-To: open->feedback
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Thu, 11 Aug 2011 06:06:41 +0000
State-Changed-Why:
Should be fixed now - please confirm.


From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: hannken@NetBSD.org, kern-bug-people@NetBSD.org, netbsd-bugs@NetBSD.org,
        gnats-admin@NetBSD.org
Subject: Re: kern/45187 (select(2) sometimes doesn't wakeup)
Date: Thu, 11 Aug 2011 22:23:26 +0200

 On Thu, Aug 11, 2011 at 06:06:42AM +0000, hannken@NetBSD.org wrote:
 > Synopsis: select(2) sometimes doesn't wakeup
 > 
 > Responsible-Changed-From-To: kern-bug-people->hannken
 > Responsible-Changed-By: hannken@NetBSD.org
 > Responsible-Changed-When: Thu, 11 Aug 2011 06:06:41 +0000
 > Responsible-Changed-Why:
 > Take.
 > 
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: hannken@NetBSD.org
 > State-Changed-When: Thu, 11 Aug 2011 06:06:41 +0000
 > State-Changed-Why:
 > Should be fixed now - please confirm.

 Unfortunably the problem is still there, with the same symptoms:
 socket receive queue is full, rsync is sleeping on select.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

Responsible-Changed-From-To: hannken->kern-bug-people
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Thu, 11 Aug 2011 20:27:56 +0000
Responsible-Changed-Why:
Give back.


State-Changed-From-To: feedback->open
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Thu, 11 Aug 2011 20:27:56 +0000
State-Changed-Why:
Not fixed.


State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 12 Jul 2014 20:01:06 +0000
State-Changed-Why:
Is this problem still live? If so I'd like to mark it important for -7.


From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, netbsd-bugs@NetBSD.org, gnats-admin@NetBSD.org,
        dholland@NetBSD.org
Subject: Re: kern/45187 (select(2) sometimes doesn't wakeup)
Date: Sun, 13 Jul 2014 08:48:58 +0200

 On Sat, Jul 12, 2014 at 08:01:06PM +0000, dholland@NetBSD.org wrote:
 > Synopsis: select(2) sometimes doesn't wakeup
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: dholland@NetBSD.org
 > State-Changed-When: Sat, 12 Jul 2014 20:01:06 +0000
 > State-Changed-Why:
 > Is this problem still live? If so I'd like to mark it important for -7.

 I can't reproduce it any more with a recent netbsd-6 installation.
 But I can't be sure it's fixed, too many things have changed:
 hardware (on both ends), network path and the rsync binary ...
 I'd say close this PR, and if the problem shows up again I'll open a new one.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 14 Jul 2014 01:05:49 +0000
State-Changed-Why:
we will assume/believe it's fixed until we get contrary evidence.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.