NetBSD Problem Report #52819

From www@NetBSD.org  Thu Dec 14 15:07:52 2017
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id BD41B7A180
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 14 Dec 2017 15:07:52 +0000 (UTC)
Message-Id: <20171214150751.83C897A1CF@mollari.NetBSD.org>
Date: Thu, 14 Dec 2017 15:07:51 +0000 (UTC)
From: daniel.schemmel@comsys.rwth-aachen.de
Reply-To: daniel.schemmel@comsys.rwth-aachen.de
To: gnats-bugs@NetBSD.org
Subject: a64l should sign-extend its result when long has more than 32 bit
X-Send-Pr-Version: www-1.0

>Number:         52819
>Category:       standards
>Synopsis:       a64l should sign-extend its result when long has more than 32 bit
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    standards-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Dec 14 15:10:00 +0000 2017
>Last-Modified:  Fri Dec 15 23:20:01 +0000 2017
>Originator:     Daniel Schemmel
>Release:        7.1
>Organization:
RWTH Aachen University
>Environment:
NetBSD   7.1 NetBSD 7.1 (GENERIC.201703111743Z) amd64
>Description:
As per POSIX, a64l is required to sign-extend its result if the type long - which the type of its result - has more than 32 bit. On amd64, the type long is a signed 64 bit integer.

Note that this relies on a slightly weird aspect of the POSIX specification: The lower 32 bit of any long are used for l64a and the behavior is unspecified if the value is negative. If long has more than 32 bit, it is never negative, as long as only the lower 32 bit are used! (Note that the glibc implementation guarantees that the behavior is never unspecified by always treating the argument as unsigned.)

This allows generating the string "zzzzz1" by giving the (positive) number 2**32-1 to l64a. This in turn ensures that the behavior of a64l is not unspecified, as the string has been generated by a call to l64a. When undoing the conversion, the lower 32 bit of the result are 0xFFFFFFFF which, when sign-extended should yield -1.

The exact wording of the POSIX 2008 standard is available at: http://pubs.opengroup.org/onlinepubs/9699919799/functions/a64l.html

This behavior was detected using Symbolic Execution techniques developed in the course of the SYMBIOSYS research project at COMSYS, RWTH Aachen University. This research is supported by the European Research Council (ERC) under the EU's Horizon 2020 Research and Innovation Programme grant agreement n. 647295 (SYMBIOSYS).
>How-To-Repeat:
Consider the following C program, which contains a test case for a64l that fails on NetBSD 7.1 when compiled for e.g. amd64:

#include <assert.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

int main(void) {
	char const* s = l64a(0xFFFFFFFFu);
	assert(0 == strcmp(s, "zzzzz1"); // OK
	long result = a64l(s);
	long sign_extended = (long)(int32_t)result;
	assert(sign_extended == result); // FAILS
}
>Fix:
The FreeBSD implementation of a64l (https://svnweb.freebsd.org/base/head/lib/libc/stdlib/a64l.c?view=markup) is only slightly different from the NetBSD implementation (http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/stdlib/a64l.c?annotate=1.10) and does not exhibit this problem.

Alternatively, a slightly more clean approach would be to cast the result of the operation to int32_t when returning (which will explicitly show that the result is the sign-extended version of the lower 32 bit of the conversion, as required per POSIX).

>Audit-Trail:
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: standards/52819: a64l should sign-extend its result when long has more than 32 bit
Date: Fri, 15 Dec 2017 01:33:35 +0700

     Date:        Thu, 14 Dec 2017 15:10:00 +0000 (UTC)
     From:        daniel.schemmel@comsys.rwth-aachen.de
     Message-ID:  <20171214151000.B57907A180@mollari.NetBSD.org>


   |  When undoing the conversion, the lower 32 bit of the result are
   | 0xFFFFFFFF which, when sign-extended should yield -1.

 I am not sure I agree with that interpretation.   There is no way to represent
 a sign bit in the encoded form - it is not needed, as only positive (incl 0)
 values have a defined representation.   Hence when re-encoded, there is no
 sign bit (assuming that the upper bit of a 32 bit value is a sign bit is
 incorrect, nothing in the spec says that).  Since only positive values are
 ever ended, the (missing, or elided) sign bit is always a 0.  That 0 is
 extended to the upper bits when long has more than 32 bits - that would be
 the "sign extension" expected.

 This then makes it possible to achieve what the application usage says
 can be done ...

    If the type long contains more than 32 bits, the result of
    a64l(l64a(x)) is x in the low-order 32 bits.

 That's not normative text, but it is a strong hint as to the expected
 behaviour, if x is 0xFFFFFFFF then the result needs to be 0XFFFFFFF in
 the low 32 bits - the same value as was passed in.

 I will ask the austin group for a ruling on this though.

 kre


From: Daniel Schemmel <daniel.schemmel@comsys.rwth-aachen.de>
To: <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: standards/52819: a64l should sign-extend its result when long has
 more than 32 bit
Date: Fri, 15 Dec 2017 01:39:05 +0100

 On 14.12.2017 19:35, Robert Elz wrote:
 > The following reply was made to PR standards/52819; it has been noted b=
 y GNATS.
 >
 > From: Robert Elz <kre@munnari.OZ.AU>
 > To: gnats-bugs@NetBSD.org
 > Cc:=20
 > Subject: Re: standards/52819: a64l should sign-extend its result when l=
 ong has more than 32 bit
 > Date: Fri, 15 Dec 2017 01:33:35 +0700
 >
 >      Date:        Thu, 14 Dec 2017 15:10:00 +0000 (UTC)
 >      From:        daniel.schemmel@comsys.rwth-aachen.de
 >      Message-ID:  <20171214151000.B57907A180@mollari.NetBSD.org>
 > =20
 > =20
 >    |  When undoing the conversion, the lower 32 bit of the result are
 >    | 0xFFFFFFFF which, when sign-extended should yield -1.
 > =20
 >  I am not sure I agree with that interpretation.   There is no way to r=
 epresent
 >  a sign bit in the encoded form - it is not needed, as only positive (i=
 ncl 0)
 >  values have a defined representation.
 I agree that the encoded form ("zzzzz1" for this example) need not deal
 with a sign bit. The sign-extension is applied to the "resulting value"
 of a64l only.
 > Hence when re-encoded, there is no
 >  sign bit (assuming that the upper bit of a 32 bit value is a sign bit =
 is
 >  incorrect, nothing in the spec says that).
 While generally the C standard is indeed very open about the
 bit-representation of a negative number, the ABI for the target
 architecture(s) clearly defines what a signed 32 bit number means.
 > Since only positive values are
 >  ever ended, the (missing, or elided) sign bit is always a 0.  That 0 i=
 s
 >  extended to the upper bits when long has more than 32 bits - that woul=
 d be
 >  the "sign extension" expected.
 That is a rather elegant way to give the sign-extension meaning in the
 absence of an actual sign. However, in my understanding, sign-extension
 is a concept that only applies to signed numbers (i.e. the C-standard
 defines a sign bit for signed integers, but not for unsigned integers),
 especially as the POSIX standard oftentimes uses the wording "set to
 zero", but only very rarely talks about sign-extension. Another wording
 that is used at least once is "high-order bits beyond the specified
 character size are cleared" (IEEE 1003.1-2008 A11.2.2).

 In fact, all mentions of sign-extension I could find in POSIX 2008 read
 along these lines: "sign-extension of a variable of type char on
 widening to integer is implementation-defined" (see e.g. getc or
 getchar). Keeping your interpretation in mind, this can either be
 interpreted that it is implementation-defined whether sign-extension is
 done, or whether a sign-extension of a one or a zero takes place.
 >  This then makes it possible to achieve what the application usage says=

 >  can be done ...
 > =20
 >     If the type long contains more than 32 bits, the result of
 >     a64l(l64a(x)) is x in the low-order 32 bits.
 > =20
 >  That's not normative text, but it is a strong hint as to the expected
 >  behaviour, if x is 0xFFFFFFFF then the result needs to be 0XFFFFFFF in=

 >  the low 32 bits - the same value as was passed in.=20
 The value 0xFFFFFFFFFFFFFFFF =3D -1L also contains the result 0xFFFFFFFF
 in the low 32 bits, so neither interpretation violates this comment.
 >  I will ask the austin group for a ruling on this though.
 > =20
 >  kre
 An official ruling on this would be interesting to hear, as I found
 additional implementations going in either direction (glibc returns
 0xFFFFFFFF, while musl returns 0xFFFFFFFFFFFFFFFF).

 Thank you very much for your time,
 Daniel

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: standards/52819: a64l should sign-extend its result when long has more than 32 bit
Date: Fri, 15 Dec 2017 11:08:55 +0700

     Date:        Thu, 14 Dec 2017 18:35:01 +0000 (UTC)
     From:        Robert Elz <kre@munnari.OZ.AU>
     Message-ID:  <20171214183501.7A25F7A1EA@mollari.NetBSD.org>

   |  Since only positive values are
   |  ever ended,

 I wish my email spell checker was also a mind reader... s/ended/needed/
 (I cannot imagine how I managed to type that!)

   |  I will ask the austin group for a ruling on this though.

 I have done that - but only informally so far (an e-mail request.)
 No meaningful response yet.

 If it looks to be needed, I will submit a formal "clarification requested"
 defect report, but if that is required it might take until sometime
 in 2019 before there's any real action (that's the current delay on
 processing these issues...)

 In the meantime, it would be useful if someone (Daniel ... hint hint...)
 were to create an ATF test for this case with the interpretation from
 the PR as the expected output, and then mark it as expected to fail,
 citing this PR (and simply to be skipped on 32 bit long systems.)

 Then after the PR is dealt with, we either change libc in which case the
 test will stop failing, or we change the test to expect the alternative
 (current) libc result, in which case it will also stop failing.

 kre

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: standards/52819: a64l should sign-extend its result when long has more than 32 bit
Date: Sat, 16 Dec 2017 06:17:14 +0700

     Date:        Fri, 15 Dec 2017 02:20:01 +0000 (UTC)
     From:        Daniel Schemmel <daniel.schemmel@comsys.rwth-aachen.de>
     Message-ID:  <20171215022001.995857A1F7@mollari.NetBSD.org>

   |  The value 0xFFFFFFFFFFFFFFFF =3D -1L also contains the result 0xFFFFFFFF
   |  in the low 32 bits, so neither interpretation violates this comment.

 That would be more relevant if that's what the (non-normative) text said, but
 it doesn't say the result contains x in the low 32 bits, it says the result
 *is* x in the low 32 bits.   But this is no more than a hint at best.

   |  An official ruling on this would be interesting to hear,

 Still nothing from a source I would consider authoritative - there may
 never be this way (I suspect that the original authors no longer participate,
 so everyone now is just guessing...)   In that case there would need to be
 an actual determination via a defect report which can result in changed text.

 However, we did discover that Solaris apparently implements it the way
 you believe is correct.

 And there was a suggestion that the functions be marked as obsolete and
 simply deleted, as being mostly useless and not well defined - if that were
 to happen they'd be marked as obsolete in the next major revision of the
 standard, and then deleted from the one after that (which will be years away).

 I'd think this would be quite likely if your interpretation is the one
 that is considered the correct one - as a part of functions that encode
 an (in range) value, and then decode it again producing a different value,
 are not exactly useful for much.

 kre

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.