NetBSD Problem Report #47431
From dholland@macaran.localdomain Thu Jan 10 23:22:54 2013
Return-Path: <dholland@macaran.localdomain>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
by www.NetBSD.org (Postfix) with ESMTP id D3A0263EA72
for <gnats-bugs@gnats.NetBSD.org>; Thu, 10 Jan 2013 23:22:53 +0000 (UTC)
Message-Id: <20130110232344.8192A6E239@macaran.localdomain>
Date: Thu, 10 Jan 2013 18:23:44 -0500 (EST)
From: dholland@eecs.harvard.edu
Reply-To: dholland@eecs.harvard.edu
To: gnats-bugs@gnats.NetBSD.org
Subject: nanosleep is more like millisleep
X-Send-Pr-Version: 3.95
>Number: 47431
>Category: kern
>Synopsis: nanosleep is more like millisleep
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Jan 10 23:25:00 +0000 2013
>Last-Modified: Wed Mar 26 18:10:00 +0000 2014
>Originator: David A. Holland
>Release: NetBSD 6.99.11 (20120906)
>Organization:
>Environment:
System: NetBSD macaran 6.99.11 NetBSD 6.99.11 (MACARAN) #15: Mon Oct 15 21:24:31 EDT 2012 dholland@macaran:/usr/src/sys/arch/amd64/compile/MACARAN amd64
Architecture: x86_64
Machine: amd64
>Description:
nanosleep(2) (and its related forms) do an exceptionally poor job.
While NetBSD is not a realtime system and nothing is particularly
guaranteed, the current nanosleep behavior is near-useless and we can
and should better.
I wrote a simple test program to measure how long nanosleep actually
sleeps, and ascertained the following (on an otherwise idle system
with plenty of spare cores):
- nanosleep(0) apparently doesn't sleep, but nearly always takes
several milliseconds to return; this seems excessive, even if
we assume part of that time is actually being used calling
clock_gettime().
- For all time values > 0 and <= 10 ms, the resulting delay is
almost always 19.99 ms. I assume this is two scheduler quanta,
since I notice that HZ is still 100.
- (I thought x86 had changed to HZ=1000 some time back, but
apparently not.)
- For all time values >= 20 ms, the resulting delay is nearly
always the requested time plus 9.99 ms, or sometimes a bit more
than that. That is, even if the requested delay is an integer
number of scheduler quanta, we always sleep for one more.
Ostensibly if one wants to sleep for small amounts of time, one is
supposed to busy-loop; this is fine. However, nanosleep is supposed to
do this for me, and do it in the kernel where ready access to
fine-grained timing is available. This is arguably the whole point of
nanosleep(*). Furthermore, in general only the kernel can know the
length of time at which sleeping should give way to spinning.
Even if for some reason nanosleep cannot be fixed to spin when needed,
the behavior where it always tacks on one extra scheduler quantum
(thus always taking two for very short sleeps) is particularly silly
and can and should be fixed.
However, being able to sleep for short periods of time is useful in a
number of contexts, and I would think we ought to make a credible
best-effort attempt to support it.
(Also, is there any reason we haven't gone to HZ=1000 for at least
x86? Other OSes did it years ago.)
(*) Or at least, it was when nanosleep was introduced, to the best of
my recollection. If this behavior has been explicitly prohibited by
standards in the meantime, please point me at C&V.
>How-To-Repeat:
Here is the test program:
---- nanoslap.c ----
#include <sys/types.h>
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
static void printheader(void) {
printf("Requested Experienced Excess Error\n");
}
static void printtime(char *buf, size_t max, const struct timespec *tv) {
snprintf(buf, max, "%lld.%09lu", (long long) tv->tv_sec, tv->tv_nsec);
}
static unsigned long long getnsecs(const struct timespec *tv) {
return (tv->tv_sec * (unsigned long long) 1000000000) + tv->tv_nsec;
}
static void testone(unsigned long long nsecs) {
struct timespec requested, start, end, experienced, excess;
unsigned long long a, b, c;
char buf[32];
if (nsecs >= 1000000000) {
requested.tv_sec = nsecs / 1000000000;
requested.tv_nsec = nsecs % 1000000000;
}
else {
requested.tv_sec = 0;
requested.tv_nsec = nsecs;
}
clock_gettime(CLOCK_MONOTONIC, &start);
nanosleep(&requested, NULL);
clock_gettime(CLOCK_MONOTONIC, &end);
timespecsub(&end, &start, &experienced);
timespecsub(&experienced, &requested, &excess);
printtime(buf, sizeof(buf), &requested);
printf("%-16s", buf);
printtime(buf, sizeof(buf), &experienced);
printf("%-16s", buf);
printtime(buf, sizeof(buf), &excess);
printf("%-16s", buf);
a = getnsecs(&requested);
b = getnsecs(&experienced);
c = getnsecs(&excess);
if (a == 0) {
printf("---");
}
else if (b > 2*a) {
printf("%g x", (double)b / (double)a);
}
else {
printf("%g %%", (100.0*c) / (double)a);
}
printf("\n");
}
static void testall(void) {
unsigned long x;
unsigned k;
testone(0);
for (x = 1; x < 1000000000; x *= 10) {
for (k = 1; k < 10; k++) {
testone(x * k);
}
}
}
int main(int argc, char *argv[]) {
int i;
printheader();
if (argc == 1) {
testall();
}
else {
for (i=2; i<argc; i++) {
testone((unsigned long long) 1000000000 * atof(argv[i]));
}
}
return 0;
}
--------
Note that this system does not have clock_nanosleep() as Christos only
added it in October, but using it shouldn't make any difference.
>Fix:
dunno.
>Release-Note:
>Audit-Trail:
From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/47431: nanosleep is more like millisleep
Date: Fri, 11 Jan 2013 08:36:14 +0000
On Thu, Jan 10, 2013 at 11:25:00PM +0000, dholland@eecs.harvard.edu wrote:
> >Number: 47431
> >Category: bin
> >Synopsis: nanosleep is more like millisleep
...
> nanosleep(2) (and its related forms) do an exceptionally poor job.
>
> While NetBSD is not a realtime system and nothing is particularly
> guaranteed, the current nanosleep behavior is near-useless and we can
> and should better.
...
> - (I thought x86 had changed to HZ=1000 some time back, but
> apparently not.)
No - no one has ever changed the default.
> - For all time values >= 20 ms, the resulting delay is nearly
> always the requested time plus 9.99 ms, or sometimes a bit more
> than that. That is, even if the requested delay is an integer
> number of scheduler quanta, we always sleep for one more.
The standard will just say that the time sleeping must be 'at least as
long as that requested'.
So if a program does sleeps in a loop for an interval that is a multiple
of the timer tick they will always be one tick longer than requested.
> Ostensibly if one wants to sleep for small amounts of time, one is
> supposed to busy-loop; this is fine. However, nanosleep is supposed to
> do this for me, and do it in the kernel where ready access to
> fine-grained timing is available. This is arguably the whole point of
> nanosleep(*).
Don't think nanosleep can be expected to busy-wait.
In any case, very short busy-waits don't have the desired effect
(they are either much longer than wanted, or you are trying to separate
bus cycles and delays in bridges (etc) mean the cycles can get moved
together).
David
--
David Laight: david@l8s.co.uk
Responsible-Changed-From-To: bin-bug-people->kern-bug-people
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Fri, 11 Jan 2013 08:42:23 +0000
Responsible-Changed-Why:
Operator error.
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/47431: nanosleep is more like millisleep
Date: Fri, 11 Jan 2013 08:55:18 +0000
On Fri, Jan 11, 2013 at 08:25:05AM +0000, David Laight wrote:
> > - For all time values >= 20 ms, the resulting delay is nearly
> > always the requested time plus 9.99 ms, or sometimes a bit more
> > than that. That is, even if the requested delay is an integer
> > number of scheduler quanta, we always sleep for one more.
>
> The standard will just say that the time sleeping must be 'at least as
> long as that requested'.
> So if a program does sleeps in a loop for an interval that is a multiple
> of the timer tick they will always be one tick longer than requested.
That is very much not useful.
> > Ostensibly if one wants to sleep for small amounts of time, one is
> > supposed to busy-loop; this is fine. However, nanosleep is supposed to
> > do this for me, and do it in the kernel where ready access to
> > fine-grained timing is available. This is arguably the whole point of
> > nanosleep(*).
>
> Don't think nanosleep can be expected to busy-wait.
Again, that's the whole point of nanosleep.
> In any case, very short busy-waits don't have the desired effect
> (they are either much longer than wanted,
Nonsense. Delay loops delay exactly as long as you want. Because
NetBSD's nanosleep is defective I had to roll my own today, and
without trying very hard or really bothering to calibrate anything in
much detail, got myself something good to a few microseconds on
average. This was not very difficult, and it's much easier if you can
get at a useful time reference without paying the overhead of a system
call.
Fixing the system so it can do timing at higher resolution will
actually take work, but adding some timing loop logic should not be
very involved.
> or you are trying to separate
> bus cycles and delays in bridges (etc) mean the cycles can get moved
> together).
ENOPARSE?
--
David A. Holland
dholland@netbsd.org
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/47431: nanosleep is more like millisleep
Date: Fri, 11 Jan 2013 10:04:37 +0000 (UTC)
dholland-bugs@NetBSD.org (David Holland) writes:
> > The standard will just say that the time sleeping must be 'at least as
> > long as that requested'.
> > So if a program does sleeps in a loop for an interval that is a multiple
> > of the timer tick they will always be one tick longer than requested.
>
> That is very much not useful.
But correct.
> > Don't think nanosleep can be expected to busy-wait.
>
> Again, that's the whole point of nanosleep.
The point of nanosleep is to provide a sleep function to which you
can specify the sleep interval with a high resolution. But you
will always get the resolution that is provided by the system
modulo whatever the scheduler will allow.
nanosleep may or may not use a delay loop (it shouldn't except
for tiny intervals). Even then you will hardly get nanosecond
resolution but whatever some hardware timer offers.
> Nonsense. Delay loops delay exactly as long as you want. Because
> NetBSD's nanosleep is defective
The single problem is that the resolution of the NetBSD timer isn't
fine enough for your (and many other) applications.
To fix this you need to implement a high resolution timer (might be
even done tickless), then nanosleep can use this instead of hardclock().
The two remaining problems with this is that maybe not all platforms
have a usuable high resolution timer and that even then the timer might
be too coarse. But on most systems you should get a resolution of
less than 100 microseconds, that's more than 100 times better than now.
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/47431: nanosleep is more like millisleep
Date: Wed, 26 Mar 2014 17:48:30 +0000
On Fri, Jan 11, 2013 at 10:05:06AM +0000, Michael van Elst wrote:
>>> The standard will just say that the time sleeping must be 'at least as
>>> long as that requested'.
>>> So if a program does sleeps in a loop for an interval that is a multiple
>>> of the timer tick they will always be one tick longer than requested.
>>
>> That is very much not useful.
>
> But correct.
Pedantically, yes.
>>> Don't think nanosleep can be expected to busy-wait.
>>
>> Again, that's the whole point of nanosleep.
>
> The point of nanosleep is to provide a sleep function to which you
> can specify the sleep interval with a high resolution. But you
> will always get the resolution that is provided by the system
> modulo whatever the scheduler will allow.
>
> nanosleep may or may not use a delay loop (it shouldn't except
> for tiny intervals). Even then you will hardly get nanosecond
> resolution but whatever some hardware timer offers.
Of course. I'm aware of the difference between accuracy and precision,
thank you.
That said, the whole intent of nanosleep when it was introduced
(vs. various usleep calls or using select/poll) was to provide an
interface where it made sense to use a timing loop to handle short
sleeps with reasonable accuracy.
> > Nonsense. Delay loops delay exactly as long as you want. Because
> > NetBSD's nanosleep is defective
>
> The single problem is that the resolution of the NetBSD timer isn't
> fine enough for your (and many other) applications.
>
> To fix this you need to implement a high resolution timer (might be
> even done tickless), then nanosleep can use this instead of hardclock().
> The two remaining problems with this is that maybe not all platforms
> have a usuable high resolution timer and that even then the timer might
> be too coarse. But on most systems you should get a resolution of
> less than 100 microseconds, that's more than 100 times better than now.
NetBSD already has high-resolution timecounters. I don't know what's
entailed in making this work properly, but the point of this PR is
that we ought to eventually get it done.
--
David A. Holland
dholland@netbsd.org
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/47431: nanosleep is more like millisleep
Date: Wed, 26 Mar 2014 18:07:05 +0000
On Wed, Mar 26, 2014 at 05:50:01PM +0000, David Holland wrote:
> [stuff]
http://www.dragonflybsd.org/presentations/nanosleep/
Can someone who's familiar with the timecounter code look at that and
see if we can steal their fixes?
--
David A. Holland
dholland@netbsd.org
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.