NetBSD Problem Report #47431

From dholland@macaran.localdomain  Thu Jan 10 23:22:54 2013
Return-Path: <dholland@macaran.localdomain>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id D3A0263EA72
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 10 Jan 2013 23:22:53 +0000 (UTC)
Message-Id: <20130110232344.8192A6E239@macaran.localdomain>
Date: Thu, 10 Jan 2013 18:23:44 -0500 (EST)
From: dholland@eecs.harvard.edu
Reply-To: dholland@eecs.harvard.edu
To: gnats-bugs@gnats.NetBSD.org
Subject: nanosleep is more like millisleep
X-Send-Pr-Version: 3.95

>Number:         47431
>Category:       kern
>Synopsis:       nanosleep is more like millisleep
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jan 10 23:25:00 +0000 2013
>Last-Modified:  Wed Mar 26 18:10:00 +0000 2014
>Originator:     David A. Holland
>Release:        NetBSD 6.99.11 (20120906)
>Organization:
>Environment:
System: NetBSD macaran 6.99.11 NetBSD 6.99.11 (MACARAN) #15: Mon Oct 15 21:24:31 EDT 2012 dholland@macaran:/usr/src/sys/arch/amd64/compile/MACARAN amd64
Architecture: x86_64
Machine: amd64
>Description:

nanosleep(2) (and its related forms) do an exceptionally poor job.

While NetBSD is not a realtime system and nothing is particularly
guaranteed, the current nanosleep behavior is near-useless and we can
and should better.

I wrote a simple test program to measure how long nanosleep actually
sleeps, and ascertained the following (on an otherwise idle system
with plenty of spare cores):

   - nanosleep(0) apparently doesn't sleep, but nearly always takes
     several milliseconds to return; this seems excessive, even if
     we assume part of that time is actually being used calling
     clock_gettime().

   - For all time values > 0 and <= 10 ms, the resulting delay is
     almost always 19.99 ms. I assume this is two scheduler quanta,
     since I notice that HZ is still 100.

   - (I thought x86 had changed to HZ=1000 some time back, but
     apparently not.)

   - For all time values >= 20 ms, the resulting delay is nearly
     always the requested time plus 9.99 ms, or sometimes a bit more
     than that. That is, even if the requested delay is an integer
     number of scheduler quanta, we always sleep for one more.

Ostensibly if one wants to sleep for small amounts of time, one is
supposed to busy-loop; this is fine. However, nanosleep is supposed to
do this for me, and do it in the kernel where ready access to
fine-grained timing is available. This is arguably the whole point of
nanosleep(*). Furthermore, in general only the kernel can know the
length of time at which sleeping should give way to spinning.

Even if for some reason nanosleep cannot be fixed to spin when needed,
the behavior where it always tacks on one extra scheduler quantum
(thus always taking two for very short sleeps) is particularly silly
and can and should be fixed.

However, being able to sleep for short periods of time is useful in a
number of contexts, and I would think we ought to make a credible
best-effort attempt to support it.

(Also, is there any reason we haven't gone to HZ=1000 for at least
x86? Other OSes did it years ago.)


(*) Or at least, it was when nanosleep was introduced, to the best of
my recollection. If this behavior has been explicitly prohibited by
standards in the meantime, please point me at C&V.

>How-To-Repeat:

Here is the test program:

   ---- nanoslap.c ----
#include <sys/types.h>
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

static void printheader(void) {
   printf("Requested       Experienced     Excess          Error\n");
}

static void printtime(char *buf, size_t max, const struct timespec *tv) {
   snprintf(buf, max, "%lld.%09lu", (long long) tv->tv_sec, tv->tv_nsec);
}

static unsigned long long getnsecs(const struct timespec *tv) {
   return (tv->tv_sec * (unsigned long long) 1000000000) + tv->tv_nsec;
}

static void testone(unsigned long long nsecs) {
   struct timespec requested, start, end, experienced, excess;
   unsigned long long a, b, c;
   char buf[32];

   if (nsecs >= 1000000000) {
      requested.tv_sec = nsecs / 1000000000;
      requested.tv_nsec = nsecs % 1000000000;
   }
   else {
      requested.tv_sec = 0;
      requested.tv_nsec = nsecs;
   }

   clock_gettime(CLOCK_MONOTONIC, &start);
   nanosleep(&requested, NULL);
   clock_gettime(CLOCK_MONOTONIC, &end);

   timespecsub(&end, &start, &experienced);
   timespecsub(&experienced, &requested, &excess);

   printtime(buf, sizeof(buf), &requested);
   printf("%-16s", buf);

   printtime(buf, sizeof(buf), &experienced);
   printf("%-16s", buf);

   printtime(buf, sizeof(buf), &excess);
   printf("%-16s", buf);

   a = getnsecs(&requested);
   b = getnsecs(&experienced);
   c = getnsecs(&excess);

   if (a == 0) {
      printf("---");
   }
   else if (b > 2*a) {
      printf("%g x", (double)b / (double)a);
   }
   else {
      printf("%g %%", (100.0*c) / (double)a);
   }
   printf("\n");
}

static void testall(void) {
   unsigned long x;
   unsigned k;

   testone(0);
   for (x = 1; x < 1000000000; x *= 10) {
      for (k = 1; k < 10; k++) {
	 testone(x * k);
      }
   }
}

int main(int argc, char *argv[]) {
   int i;

   printheader();
   if (argc == 1) {
      testall();
   }
   else {
      for (i=2; i<argc; i++) {
	 testone((unsigned long long) 1000000000 * atof(argv[i]));
      }
   }

   return 0;
}
   --------

Note that this system does not have clock_nanosleep() as Christos only
added it in October, but using it shouldn't make any difference.

>Fix:

dunno.

>Release-Note:

>Audit-Trail:
From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/47431: nanosleep is more like millisleep
Date: Fri, 11 Jan 2013 08:36:14 +0000

 On Thu, Jan 10, 2013 at 11:25:00PM +0000, dholland@eecs.harvard.edu wrote:
 > >Number:         47431
 > >Category:       bin
 > >Synopsis:       nanosleep is more like millisleep
 ...
 > nanosleep(2) (and its related forms) do an exceptionally poor job.
 > 
 > While NetBSD is not a realtime system and nothing is particularly
 > guaranteed, the current nanosleep behavior is near-useless and we can
 > and should better.
 ...
 >    - (I thought x86 had changed to HZ=1000 some time back, but
 >      apparently not.)

 No - no one has ever changed the default.

 >    - For all time values >= 20 ms, the resulting delay is nearly
 >      always the requested time plus 9.99 ms, or sometimes a bit more
 >      than that. That is, even if the requested delay is an integer
 >      number of scheduler quanta, we always sleep for one more.

 The standard will just say that the time sleeping must be 'at least as
 long as that requested'.
 So if a program does sleeps in a loop for an interval that is a multiple
 of the timer tick they will always be one tick longer than requested.

 > Ostensibly if one wants to sleep for small amounts of time, one is
 > supposed to busy-loop; this is fine. However, nanosleep is supposed to
 > do this for me, and do it in the kernel where ready access to
 > fine-grained timing is available. This is arguably the whole point of
 > nanosleep(*).

 Don't think nanosleep can be expected to busy-wait.
 In any case, very short busy-waits don't have the desired effect
 (they are either much longer than wanted, or you are trying to separate
 bus cycles and delays in bridges (etc) mean the cycles can get moved
 together).

 	David

 -- 
 David Laight: david@l8s.co.uk

Responsible-Changed-From-To: bin-bug-people->kern-bug-people
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Fri, 11 Jan 2013 08:42:23 +0000
Responsible-Changed-Why:
Operator error.


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/47431: nanosleep is more like millisleep
Date: Fri, 11 Jan 2013 08:55:18 +0000

 On Fri, Jan 11, 2013 at 08:25:05AM +0000, David Laight wrote:
  >  >    - For all time values >= 20 ms, the resulting delay is nearly
  >  >      always the requested time plus 9.99 ms, or sometimes a bit more
  >  >      than that. That is, even if the requested delay is an integer
  >  >      number of scheduler quanta, we always sleep for one more.
  >  
  >  The standard will just say that the time sleeping must be 'at least as
  >  long as that requested'.
  >  So if a program does sleeps in a loop for an interval that is a multiple
  >  of the timer tick they will always be one tick longer than requested.

 That is very much not useful.

  >  > Ostensibly if one wants to sleep for small amounts of time, one is
  >  > supposed to busy-loop; this is fine. However, nanosleep is supposed to
  >  > do this for me, and do it in the kernel where ready access to
  >  > fine-grained timing is available. This is arguably the whole point of
  >  > nanosleep(*).
  >  
  >  Don't think nanosleep can be expected to busy-wait.

 Again, that's the whole point of nanosleep.

  >  In any case, very short busy-waits don't have the desired effect
  >  (they are either much longer than wanted,

 Nonsense. Delay loops delay exactly as long as you want. Because
 NetBSD's nanosleep is defective I had to roll my own today, and
 without trying very hard or really bothering to calibrate anything in
 much detail, got myself something good to a few microseconds on
 average. This was not very difficult, and it's much easier if you can
 get at a useful time reference without paying the overhead of a system
 call.

 Fixing the system so it can do timing at higher resolution will
 actually take work, but adding some timing loop logic should not be
 very involved.

  >  or you are trying to separate
  >  bus cycles and delays in bridges (etc) mean the cycles can get moved
  >  together).

 ENOPARSE?

 -- 
 David A. Holland
 dholland@netbsd.org

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/47431: nanosleep is more like millisleep
Date: Fri, 11 Jan 2013 10:04:37 +0000 (UTC)

 dholland-bugs@NetBSD.org (David Holland) writes:

 >  >  The standard will just say that the time sleeping must be 'at least as
 >  >  long as that requested'.
 >  >  So if a program does sleeps in a loop for an interval that is a multiple
 >  >  of the timer tick they will always be one tick longer than requested.
 > 
 > That is very much not useful.

 But correct.

 >  >  Don't think nanosleep can be expected to busy-wait.
 > 
 > Again, that's the whole point of nanosleep.

 The point of nanosleep is to provide a sleep function to which you
 can specify the sleep interval with a high resolution. But you
 will always get the resolution that is provided by the system
 modulo whatever the scheduler will allow.

 nanosleep may or may not use a delay loop (it shouldn't except
 for tiny intervals). Even then you will hardly get nanosecond
 resolution but whatever some hardware timer offers.


 > Nonsense. Delay loops delay exactly as long as you want. Because
 > NetBSD's nanosleep is defective

 The single problem is that the resolution of the NetBSD timer isn't
 fine enough for your (and many other) applications.

 To fix this you need to implement a high resolution timer (might be
 even done tickless), then nanosleep can use this instead of hardclock().
 The two remaining problems with this is that maybe not all platforms
 have a usuable high resolution timer and that even then the timer might
 be too coarse. But on most systems you should get a resolution of
 less than 100 microseconds, that's more than 100 times better than now.

 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/47431: nanosleep is more like millisleep
Date: Wed, 26 Mar 2014 17:48:30 +0000

 On Fri, Jan 11, 2013 at 10:05:06AM +0000, Michael van Elst wrote:
  >>>  The standard will just say that the time sleeping must be 'at least as
  >>>  long as that requested'.
  >>>  So if a program does sleeps in a loop for an interval that is a multiple
  >>>  of the timer tick they will always be one tick longer than requested.
  >> 
  >> That is very much not useful.
  >  
  >  But correct.

 Pedantically, yes.

  >>>  Don't think nanosleep can be expected to busy-wait.
  >> 
  >> Again, that's the whole point of nanosleep.
  >  
  >  The point of nanosleep is to provide a sleep function to which you
  >  can specify the sleep interval with a high resolution. But you
  >  will always get the resolution that is provided by the system
  >  modulo whatever the scheduler will allow.
  >  
  >  nanosleep may or may not use a delay loop (it shouldn't except
  >  for tiny intervals). Even then you will hardly get nanosecond
  >  resolution but whatever some hardware timer offers.

 Of course. I'm aware of the difference between accuracy and precision,
 thank you.

 That said, the whole intent of nanosleep when it was introduced
 (vs. various usleep calls or using select/poll) was to provide an
 interface where it made sense to use a timing loop to handle short
 sleeps with reasonable accuracy.

  >  > Nonsense. Delay loops delay exactly as long as you want. Because
  >  > NetBSD's nanosleep is defective
  >  
  >  The single problem is that the resolution of the NetBSD timer isn't
  >  fine enough for your (and many other) applications.
  >  
  >  To fix this you need to implement a high resolution timer (might be
  >  even done tickless), then nanosleep can use this instead of hardclock().
  >  The two remaining problems with this is that maybe not all platforms
  >  have a usuable high resolution timer and that even then the timer might
  >  be too coarse. But on most systems you should get a resolution of
  >  less than 100 microseconds, that's more than 100 times better than now.

 NetBSD already has high-resolution timecounters. I don't know what's
 entailed in making this work properly, but the point of this PR is
 that we ought to eventually get it done.

 -- 
 David A. Holland
 dholland@netbsd.org

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/47431: nanosleep is more like millisleep
Date: Wed, 26 Mar 2014 18:07:05 +0000

 On Wed, Mar 26, 2014 at 05:50:01PM +0000, David Holland wrote:
  > [stuff]

 http://www.dragonflybsd.org/presentations/nanosleep/

 Can someone who's familiar with the timecounter code look at that and
 see if we can steal their fixes?

 -- 
 David A. Holland
 dholland@netbsd.org

>Unformatted:
Home
PR Database Search
(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.