NetBSD Problem Report #36528

From martin@duskware.de  Sat Jun 23 12:46:30 2007
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id CFC0463B882
	for <gnats-bugs@gnats.netbsd.org>; Sat, 23 Jun 2007 12:46:30 +0000 (UTC)
Message-Id: <20070623112846.10DD263B882@narn.NetBSD.org>
Date: Sat, 23 Jun 2007 11:28:46 +0000 (UTC)
From: ekamperi@auth.gr
Reply-To: ekamperi@auth.gr
To: netbsd-bugs-owner@NetBSD.org
Subject: strptime(3) doesn't fill in the 'tm' structure fields correctly
X-Send-Pr-Version: www-1.0

>Number:         36528
>Category:       lib
>Synopsis:       strptime(3) doesn't fill in the 'tm' structure fields correctly
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    lib-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jun 23 12:50:00 +0000 2007
>Closed-Date:    Sat May 13 18:19:54 +0000 2023
>Last-Modified:  Mon May 15 04:45:01 +0000 2023
>Originator:     Stathis Kamperis
>Release:        NetBSD 4.99.20
>Organization:
Student of Medicine, Aristotle University of Thessaloniki
>Environment:
NetBSD netbsd 4.99.20 NetBSD 4.99.20 (MYGENERIC) #1: Wed Jun 13 00:11:49 EEST 2007  root@netbsd:/usr/obj/sys/arch/i386/compile/MYGENERIC i386

>Description:
strptime(3) doesn't seem to fill in the 'tm' structure fields correctly. Specifically the tm_mday, tm_mon and tm_wday.

As far as I know, the exact behavior is manifested by OpenBSD, FreeBSD and Mac OS X as well. On the other hand, Linux and Solaris yield the expected results.

>How-To-Repeat:
Compile and run the following program. 

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(void)
{
    struct tm tm;
    char *retp;

    /* Clear the tm structure */
    memset(&tm, 0, sizeof(tm));

    /* Call strptime()                                                          
       %j is the day number of the year [1,366]                                 
       %Y is the year, including the century (i.e., 1996)                       
     */
    retp = strptime("100-2007", "%j-%Y", &tm);

    /* strptime() failed */
    if (retp == NULL) {
        fprintf(stderr, "strptime() failed\n");
        exit(EXIT_FAILURE);
    }

    /* parsing failed */
    if (*retp != '\0') {
        fprintf(stderr, "parsing failed\n");
        exit(EXIT_FAILURE);
    }

    /* print tm's fields */
    printf("tm_mday = %d\n", tm.tm_mday);
    printf("tm_mon = %d\n", tm.tm_mon);
    printf("tm_year = %d\n", tm.tm_year);
    printf("tm_yday = %d\n", tm.tm_yday);
    printf("tm_wday = %d\n", tm.tm_wday);

   return EXIT_SUCCESS;
}

-------------------------------------------------------------

[stathis@netbsd ~]$ ./strptime 
tm_mday = 0
tm_mon = 0
tm_year = 107
tm_yday = 99
tm_wday = 0

[stathis@archlinux ~]$ ./strptime
tm_mday = 1
tm_mon = 3
tm_year = 107
tm_yday = 99
tm_wday = 2


tm_wday = 0




>Fix:

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 13 May 2023 18:19:54 +0000
State-Changed-Why:
This works now. No idea when it got fixed...


From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields correctly)
Date: Sun, 14 May 2023 14:55:06 +0700

     Date:        Sat, 13 May 2023 18:19:54 +0000 (UTC)
     From:        dholland@NetBSD.org
     Message-ID:  <20230513181954.B299D1A923D@mollari.NetBSD.org>

   | Synopsis: strptime(3) doesn't fill in the 'tm' structure fields correctly

 Note that this never was a bug.  POSIX says:

 	It is unspecified whether multiple calls to strptime( ) using the
 	same tm structure will update the current contents of the structure
 	or overwrite all contents of the structure. Conforming applications
 	should make a single call to strptime( ) with a format and all data
 	needed to completely specify the date and time being converted.

 That is, all strptime() is required to fill in in the struct tm are the
 fields actually specified by format characters.   The implementation is
 allowed to alter the other fields, if it wants to.   No neither the linux
 nor NetBSD implementations reported in the PR were incorrect, just the
 assumption made by the application used to test it.

 kre

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: Robert Elz <kre@munnari.OZ.AU>
Subject: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields
 correctly)
Date: Sun, 14 May 2023 17:17:40 +0000

 On Sun, May 14, 2023 at 08:00:03AM +0000, Robert Elz wrote:
  >  Note that this never was a bug.  POSIX says:
  >  
  >  	It is unspecified whether multiple calls to strptime( ) using the
  >  	same tm structure will update the current contents of the structure
  >  	or overwrite all contents of the structure. Conforming applications
  >  	should make a single call to strptime( ) with a format and all data
  >  	needed to completely specify the date and time being converted.
  >  
  >  That is, all strptime() is required to fill in in the struct tm are the
  >  fields actually specified by format characters.   The implementation is
  >  allowed to alter the other fields, if it wants to.   No neither the linux
  >  nor NetBSD implementations reported in the PR were incorrect, just the
  >  assumption made by the application used to test it.

 I don't see how that's relevant to the example in this PR. The values
 passed govern all the outputs that are printed.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Robert Elz <kre@munnari.OZ.AU>
To: David Holland <dholland-bugs@netbsd.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields correctly)
Date: Mon, 15 May 2023 01:59:50 +0700

     Date:        Sun, 14 May 2023 17:17:40 +0000
     From:        David Holland <dholland-bugs@netbsd.org>
     Message-ID:  <ZGEXtDniLI8JsFSD@netbsd.org>

   | I don't see how that's relevant to the example in this PR. The values
   | passed govern all the outputs that are printed.

 You mean they could, but that's not how strptime() is defined to work.

 	The strptime( ) function shall convert the character string pointed
 	to by buf to values which are stored in the tm structure pointed to
 	by tm, using the format specified by format.

 [...]

 	a The day of the week, using the locale's weekday names;
 (that's tm_wday)
 	b The month, using the locale's month names;
 (tm_mon)
 	d The day of the month [01,31];
 (tm_mday)

 etc.   What's telling is this (and a few others like it)

 	g The last 2 digits of the week-based year (see below) as a decimal
 	  number [...]. The effect of this year, if any, on the tm structure
           pointed to by tm is unspecified.

 That is, given that, nothing needs to be done to the tm at all, as it
 has no field for that info (in strftime() that value is computed from
 others - in strptime() the implementation is not required to invert that
 calculation, even if it has the necessary information available).
 %G %U %V %W %z and %Z all have the same qualification (though %z and %Z
 probably need to be fixed now that tm_gmtoff and tm_zone have been added
 to struct tm).

 Note that strptime()'s format parameter isn't required to have any
 conversions in it at all - it could be used to simply match strings
 in a kind of white space weird way.

 	The format is composed of zero or more directives. Each directive is
 	composed of one of the following: one or more white-space bytes;
 	an ordinary character (neither '%' nor a white-space byte); or a
 	conversion specification.

 [...]

 	A conversion specification composed of white-space bytes is executed
 	by scanning input up to the first non-white-space byte (which remains
 	unscanned), or until no more characters can be scanned.

 	A conversion specification that is an ordinary character is executed
 	by scanning the next character from the buffer. If the character
 	scanned from the buffer differs from the one comprising the directive,
 	the directive fails, and the differing and subsequent characters
 	remain unscanned.

 [%n %t processing spec omitted here, not relevant]

 	Any other conversion specification is executed by scanning characters
 	until a character matching the next directive is scanned, or until no
 	more characters can be scanned. These characters, except the one
 	matching the next directive, are then compared to the locale values
 	associated with the conversion specifier. If a match is found,
 	values for the appropriate tm structure members are set to values
 	corresponding to the locale information.

 The plural on "tm structure members" is because some directives (eg: %T,
 which is defined as %H:%M:%S) cause multiple fields to be set.

 That's all it says about what happens to struct tm - nothing at all about
 calculating values for other fields out of what was received for the ones
 that were generated (so while %j, in combination with %C%Y) might convey
 enough information to allow all of the date related fields to be set,
 that isn't required to happen.   And then the text that I quoted previously

 	It is unspecified whether multiple calls to strptime( ) using the
 	same tm structure will update the current contents of the structure
 	or overwrite all contents of the structure.

 That is, an implementation can, if it wants, allow you to write

 	p = strptime(buf, " %j", &tm);
 	p = strptime(p, " %Y", &tm);
 	p = strptime(p, " %b", &tm);
 	p = strptime(p, " %a", &tm);
 	p = strptime(p, " %d", &tm);

 and given buf containing "209 2023 Feb Wed 30" (assuming the POSIX/C locale)

 and might end up setting tm such that tm_yday == 209, tm_year == 123,
 tm_mon == 1, tm_wday == 4, and tm_mday == 30 ... despite there not being
 a 30th of Feb (in any year) and as Feb 28 2023 was a Tue, the 30th if it
 did exist could not be a Weds, and further nothing anytime in Feb or Mar
 is the 209'th day of anyone's year.

 Applications cannot rely upon that working, that way, but an implemantation
 is permitted to make that happen.

 Also note that there is no requirement to init the tm to anything at all
 before calling strptime(), it can be full of trap invoking integers in all
 of its fields (and any random valid, or invalid, pointer in tm_zone).
 All strptime() does (that is, is required to do) is stick values corresponding
 to any conversions it encounters in the format in the matching field of
 the tm struct.   It cannot really do more.

 What would you expect to happen if the above were instead written as

 	p = strptime(buf, " %j %Y %b %a %d", &tm);

 with the same input?   This time we have a single call, and the same
 input, so the struct tm the results really must contain the values
 that the implementation which allowed the multiple calls would have
 stored.

 The other fields of the struct tm (the ones that aren't mentioned
 here, can be set to whatever the implementation likes, or simply
 left as they were on input).

 There's nothing in the spec that says that the result must make sense.
 There's definitely no mention of it calling mktime() on the result
 (that would be absurd, as mktime() requires some fields of the struct
 tm to be filled in, if they're not what happens is unspecified, or
 even perhaps undefined) and as above, the struct tm passed to strptime()
 doesn't need to be init'd first, and the format doesn't need to contain
 any conversions at all, meaning no fields in the struct must be set to
 anything.

 All strptime() was ever really intended to be was an inverse to strftime().
 Given (approximately) the same format string that strftime() used, strptime()
 is intended to fill in the fields of the struct tm that strftime() used
 to format the data.  That's why strptime() has the %g %G ... conversions
 (which aren't defined to do anything specific at all to the struct tm -
 explicitly) and POSIX strptime() (but not the C version) has %s which
 also does nothing (though it doesn't say what should happen to the number)
 as POSIX strftime() has a %s conversion which C does not.

 Just like the discussion about mktime() and strftime() earlier (last year?)
 this might not be what you'd like the strptime() function to do, but it
 is how it is defined, which is based upon historical implementations.

 kre


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: Robert Elz <kre@munnari.OZ.AU>
Subject: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields
 correctly)
Date: Sun, 14 May 2023 22:11:41 +0000

 On Mon, May 15, 2023 at 01:59:50AM +0700, Robert Elz wrote:
  >     Date:        Sun, 14 May 2023 17:17:40 +0000
  >     From:        David Holland <dholland-bugs@netbsd.org>
  >     Message-ID:  <ZGEXtDniLI8JsFSD@netbsd.org>
  > 
  >   | I don't see how that's relevant to the example in this PR. The values
  >   | passed govern all the outputs that are printed.
  > 
  > You mean they could, but that's not how strptime() is defined to work.
  >
  > [...]
  >
  > 	If a match is found,
  > 	values for the appropriate tm structure members are set to values
  > 	corresponding to the locale information.

 This is _all_ it says; it doesn't define what is appropriate, or
 attach specific fields to specific conversions. As we know from the
 previous related long wrangle, they have done that in other cases and
 therefore (by the rules of standards interpretation) we can conclude
 that they intended to leave it vague.

 Consequently, there's a reasonably strong argument that setting
 tm_wday is appropriate if enough information has been provided to
 compute it. Furthermore, even if it is not _required_ it is clearly
 _permitted_ and also desirable.

 In any event it's a moot point.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Robert Elz <kre@munnari.OZ.AU>
To: David Holland <dholland-bugs@netbsd.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields correctly)
Date: Mon, 15 May 2023 11:40:28 +0700

     Date:        Sun, 14 May 2023 22:11:41 +0000
     From:        David Holland <dholland-bugs@netbsd.org>
     Message-ID:  <ZGFcnRodwPnvOnvG@netbsd.org>

   | This is _all_ it says; it doesn't define what is appropriate,

 Absolutely, you clearly didn't really pay attention to my original
 message on this subject.   The implementation is allowed to do almost
 anything.   That's why this was never a bug, not in our (then)
 implementation, not in the current one, and not in the linux one (as
 reported in the PR) either, but an application error.

 Of course, if the implementation doesn't fill in at least the fields
 that match the format specification, that would be useless (and POSIX
 probably needs to be fixed to be mor explicit about which ones need
 to be modified, rather than just saying "appropriate").

   | Furthermore, even if it is not _required_ it is clearly _permitted_

 True, no-one ever said otherwise.

   | and also desirable.

 That's debatable - it is kind of pointless, as the application isn't
 allowed (if it wants to be portable) to depend upon any of that.   That's
 why the original report was incorrect - the original implementation (not
 that I have gone back to look at a NetBSD 4 vintage strptime() to check
 what it was doing) was probably fine.   What was broken was the assumption
 as to what was supposed to happen (the PR even said that all the BSDs and
 MacOS did it the "wrong" way, and only linux and solaris the "right" way).

   | In any event it's a moot point.

 Yes.    Not sure why you bothered to comment on my initial message.

 kre

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.