NetBSD Problem Report #36528
From martin@duskware.de Sat Jun 23 12:46:30 2007
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id CFC0463B882
for <gnats-bugs@gnats.netbsd.org>; Sat, 23 Jun 2007 12:46:30 +0000 (UTC)
Message-Id: <20070623112846.10DD263B882@narn.NetBSD.org>
Date: Sat, 23 Jun 2007 11:28:46 +0000 (UTC)
From: ekamperi@auth.gr
Reply-To: ekamperi@auth.gr
To: netbsd-bugs-owner@NetBSD.org
Subject: strptime(3) doesn't fill in the 'tm' structure fields correctly
X-Send-Pr-Version: www-1.0
>Number: 36528
>Category: lib
>Synopsis: strptime(3) doesn't fill in the 'tm' structure fields correctly
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: lib-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jun 23 12:50:00 +0000 2007
>Closed-Date: Sat May 13 18:19:54 +0000 2023
>Last-Modified: Mon May 15 04:45:01 +0000 2023
>Originator: Stathis Kamperis
>Release: NetBSD 4.99.20
>Organization:
Student of Medicine, Aristotle University of Thessaloniki
>Environment:
NetBSD netbsd 4.99.20 NetBSD 4.99.20 (MYGENERIC) #1: Wed Jun 13 00:11:49 EEST 2007 root@netbsd:/usr/obj/sys/arch/i386/compile/MYGENERIC i386
>Description:
strptime(3) doesn't seem to fill in the 'tm' structure fields correctly. Specifically the tm_mday, tm_mon and tm_wday.
As far as I know, the exact behavior is manifested by OpenBSD, FreeBSD and Mac OS X as well. On the other hand, Linux and Solaris yield the expected results.
>How-To-Repeat:
Compile and run the following program.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
int main(void)
{
struct tm tm;
char *retp;
/* Clear the tm structure */
memset(&tm, 0, sizeof(tm));
/* Call strptime()
%j is the day number of the year [1,366]
%Y is the year, including the century (i.e., 1996)
*/
retp = strptime("100-2007", "%j-%Y", &tm);
/* strptime() failed */
if (retp == NULL) {
fprintf(stderr, "strptime() failed\n");
exit(EXIT_FAILURE);
}
/* parsing failed */
if (*retp != '\0') {
fprintf(stderr, "parsing failed\n");
exit(EXIT_FAILURE);
}
/* print tm's fields */
printf("tm_mday = %d\n", tm.tm_mday);
printf("tm_mon = %d\n", tm.tm_mon);
printf("tm_year = %d\n", tm.tm_year);
printf("tm_yday = %d\n", tm.tm_yday);
printf("tm_wday = %d\n", tm.tm_wday);
return EXIT_SUCCESS;
}
-------------------------------------------------------------
[stathis@netbsd ~]$ ./strptime
tm_mday = 0
tm_mon = 0
tm_year = 107
tm_yday = 99
tm_wday = 0
[stathis@archlinux ~]$ ./strptime
tm_mday = 1
tm_mon = 3
tm_year = 107
tm_yday = 99
tm_wday = 2
tm_wday = 0
>Fix:
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 13 May 2023 18:19:54 +0000
State-Changed-Why:
This works now. No idea when it got fixed...
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields correctly)
Date: Sun, 14 May 2023 14:55:06 +0700
Date: Sat, 13 May 2023 18:19:54 +0000 (UTC)
From: dholland@NetBSD.org
Message-ID: <20230513181954.B299D1A923D@mollari.NetBSD.org>
| Synopsis: strptime(3) doesn't fill in the 'tm' structure fields correctly
Note that this never was a bug. POSIX says:
It is unspecified whether multiple calls to strptime( ) using the
same tm structure will update the current contents of the structure
or overwrite all contents of the structure. Conforming applications
should make a single call to strptime( ) with a format and all data
needed to completely specify the date and time being converted.
That is, all strptime() is required to fill in in the struct tm are the
fields actually specified by format characters. The implementation is
allowed to alter the other fields, if it wants to. No neither the linux
nor NetBSD implementations reported in the PR were incorrect, just the
assumption made by the application used to test it.
kre
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: Robert Elz <kre@munnari.OZ.AU>
Subject: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields
correctly)
Date: Sun, 14 May 2023 17:17:40 +0000
On Sun, May 14, 2023 at 08:00:03AM +0000, Robert Elz wrote:
> Note that this never was a bug. POSIX says:
>
> It is unspecified whether multiple calls to strptime( ) using the
> same tm structure will update the current contents of the structure
> or overwrite all contents of the structure. Conforming applications
> should make a single call to strptime( ) with a format and all data
> needed to completely specify the date and time being converted.
>
> That is, all strptime() is required to fill in in the struct tm are the
> fields actually specified by format characters. The implementation is
> allowed to alter the other fields, if it wants to. No neither the linux
> nor NetBSD implementations reported in the PR were incorrect, just the
> assumption made by the application used to test it.
I don't see how that's relevant to the example in this PR. The values
passed govern all the outputs that are printed.
--
David A. Holland
dholland@netbsd.org
From: Robert Elz <kre@munnari.OZ.AU>
To: David Holland <dholland-bugs@netbsd.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields correctly)
Date: Mon, 15 May 2023 01:59:50 +0700
Date: Sun, 14 May 2023 17:17:40 +0000
From: David Holland <dholland-bugs@netbsd.org>
Message-ID: <ZGEXtDniLI8JsFSD@netbsd.org>
| I don't see how that's relevant to the example in this PR. The values
| passed govern all the outputs that are printed.
You mean they could, but that's not how strptime() is defined to work.
The strptime( ) function shall convert the character string pointed
to by buf to values which are stored in the tm structure pointed to
by tm, using the format specified by format.
[...]
a The day of the week, using the locale's weekday names;
(that's tm_wday)
b The month, using the locale's month names;
(tm_mon)
d The day of the month [01,31];
(tm_mday)
etc. What's telling is this (and a few others like it)
g The last 2 digits of the week-based year (see below) as a decimal
number [...]. The effect of this year, if any, on the tm structure
pointed to by tm is unspecified.
That is, given that, nothing needs to be done to the tm at all, as it
has no field for that info (in strftime() that value is computed from
others - in strptime() the implementation is not required to invert that
calculation, even if it has the necessary information available).
%G %U %V %W %z and %Z all have the same qualification (though %z and %Z
probably need to be fixed now that tm_gmtoff and tm_zone have been added
to struct tm).
Note that strptime()'s format parameter isn't required to have any
conversions in it at all - it could be used to simply match strings
in a kind of white space weird way.
The format is composed of zero or more directives. Each directive is
composed of one of the following: one or more white-space bytes;
an ordinary character (neither '%' nor a white-space byte); or a
conversion specification.
[...]
A conversion specification composed of white-space bytes is executed
by scanning input up to the first non-white-space byte (which remains
unscanned), or until no more characters can be scanned.
A conversion specification that is an ordinary character is executed
by scanning the next character from the buffer. If the character
scanned from the buffer differs from the one comprising the directive,
the directive fails, and the differing and subsequent characters
remain unscanned.
[%n %t processing spec omitted here, not relevant]
Any other conversion specification is executed by scanning characters
until a character matching the next directive is scanned, or until no
more characters can be scanned. These characters, except the one
matching the next directive, are then compared to the locale values
associated with the conversion specifier. If a match is found,
values for the appropriate tm structure members are set to values
corresponding to the locale information.
The plural on "tm structure members" is because some directives (eg: %T,
which is defined as %H:%M:%S) cause multiple fields to be set.
That's all it says about what happens to struct tm - nothing at all about
calculating values for other fields out of what was received for the ones
that were generated (so while %j, in combination with %C%Y) might convey
enough information to allow all of the date related fields to be set,
that isn't required to happen. And then the text that I quoted previously
It is unspecified whether multiple calls to strptime( ) using the
same tm structure will update the current contents of the structure
or overwrite all contents of the structure.
That is, an implementation can, if it wants, allow you to write
p = strptime(buf, " %j", &tm);
p = strptime(p, " %Y", &tm);
p = strptime(p, " %b", &tm);
p = strptime(p, " %a", &tm);
p = strptime(p, " %d", &tm);
and given buf containing "209 2023 Feb Wed 30" (assuming the POSIX/C locale)
and might end up setting tm such that tm_yday == 209, tm_year == 123,
tm_mon == 1, tm_wday == 4, and tm_mday == 30 ... despite there not being
a 30th of Feb (in any year) and as Feb 28 2023 was a Tue, the 30th if it
did exist could not be a Weds, and further nothing anytime in Feb or Mar
is the 209'th day of anyone's year.
Applications cannot rely upon that working, that way, but an implemantation
is permitted to make that happen.
Also note that there is no requirement to init the tm to anything at all
before calling strptime(), it can be full of trap invoking integers in all
of its fields (and any random valid, or invalid, pointer in tm_zone).
All strptime() does (that is, is required to do) is stick values corresponding
to any conversions it encounters in the format in the matching field of
the tm struct. It cannot really do more.
What would you expect to happen if the above were instead written as
p = strptime(buf, " %j %Y %b %a %d", &tm);
with the same input? This time we have a single call, and the same
input, so the struct tm the results really must contain the values
that the implementation which allowed the multiple calls would have
stored.
The other fields of the struct tm (the ones that aren't mentioned
here, can be set to whatever the implementation likes, or simply
left as they were on input).
There's nothing in the spec that says that the result must make sense.
There's definitely no mention of it calling mktime() on the result
(that would be absurd, as mktime() requires some fields of the struct
tm to be filled in, if they're not what happens is unspecified, or
even perhaps undefined) and as above, the struct tm passed to strptime()
doesn't need to be init'd first, and the format doesn't need to contain
any conversions at all, meaning no fields in the struct must be set to
anything.
All strptime() was ever really intended to be was an inverse to strftime().
Given (approximately) the same format string that strftime() used, strptime()
is intended to fill in the fields of the struct tm that strftime() used
to format the data. That's why strptime() has the %g %G ... conversions
(which aren't defined to do anything specific at all to the struct tm -
explicitly) and POSIX strptime() (but not the C version) has %s which
also does nothing (though it doesn't say what should happen to the number)
as POSIX strftime() has a %s conversion which C does not.
Just like the discussion about mktime() and strftime() earlier (last year?)
this might not be what you'd like the strptime() function to do, but it
is how it is defined, which is based upon historical implementations.
kre
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: Robert Elz <kre@munnari.OZ.AU>
Subject: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields
correctly)
Date: Sun, 14 May 2023 22:11:41 +0000
On Mon, May 15, 2023 at 01:59:50AM +0700, Robert Elz wrote:
> Date: Sun, 14 May 2023 17:17:40 +0000
> From: David Holland <dholland-bugs@netbsd.org>
> Message-ID: <ZGEXtDniLI8JsFSD@netbsd.org>
>
> | I don't see how that's relevant to the example in this PR. The values
> | passed govern all the outputs that are printed.
>
> You mean they could, but that's not how strptime() is defined to work.
>
> [...]
>
> If a match is found,
> values for the appropriate tm structure members are set to values
> corresponding to the locale information.
This is _all_ it says; it doesn't define what is appropriate, or
attach specific fields to specific conversions. As we know from the
previous related long wrangle, they have done that in other cases and
therefore (by the rules of standards interpretation) we can conclude
that they intended to leave it vague.
Consequently, there's a reasonably strong argument that setting
tm_wday is appropriate if enough information has been provided to
compute it. Furthermore, even if it is not _required_ it is clearly
_permitted_ and also desirable.
In any event it's a moot point.
--
David A. Holland
dholland@netbsd.org
From: Robert Elz <kre@munnari.OZ.AU>
To: David Holland <dholland-bugs@netbsd.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields correctly)
Date: Mon, 15 May 2023 11:40:28 +0700
Date: Sun, 14 May 2023 22:11:41 +0000
From: David Holland <dholland-bugs@netbsd.org>
Message-ID: <ZGFcnRodwPnvOnvG@netbsd.org>
| This is _all_ it says; it doesn't define what is appropriate,
Absolutely, you clearly didn't really pay attention to my original
message on this subject. The implementation is allowed to do almost
anything. That's why this was never a bug, not in our (then)
implementation, not in the current one, and not in the linux one (as
reported in the PR) either, but an application error.
Of course, if the implementation doesn't fill in at least the fields
that match the format specification, that would be useless (and POSIX
probably needs to be fixed to be mor explicit about which ones need
to be modified, rather than just saying "appropriate").
| Furthermore, even if it is not _required_ it is clearly _permitted_
True, no-one ever said otherwise.
| and also desirable.
That's debatable - it is kind of pointless, as the application isn't
allowed (if it wants to be portable) to depend upon any of that. That's
why the original report was incorrect - the original implementation (not
that I have gone back to look at a NetBSD 4 vintage strptime() to check
what it was doing) was probably fine. What was broken was the assumption
as to what was supposed to happen (the PR even said that all the BSDs and
MacOS did it the "wrong" way, and only linux and solaris the "right" way).
| In any event it's a moot point.
Yes. Not sure why you bothered to comment on my initial message.
kre
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.