NetBSD Problem Report #47840

From dholland@macaran.localdomain  Mon May 20 05:25:49 2013
Return-Path: <dholland@macaran.localdomain>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 816FD63F59B
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 20 May 2013 05:25:49 +0000 (UTC)
Message-Id: <20130520040650.6C4E16E221@macaran.localdomain>
Date: Mon, 20 May 2013 00:06:50 -0400 (EDT)
From: dholland@eecs.harvard.edu
Reply-To: dholland@eecs.harvard.edu
To: gnats-bugs@NetBSD.org
Subject: awk string comparison of integer constant
X-Send-Pr-Version: 3.95

>Number:         47840
>Category:       bin
>Synopsis:       awk string comparison of integer constant
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon May 20 05:30:01 +0000 2013
>Last-Modified:  Sat Feb 14 11:00:01 +0000 2015
>Originator:     David A. Holland
>Release:        NetBSD 6.99.17 (20130321)
>Organization:
>Environment:
System: NetBSD macaran 6.99.17 NetBSD 6.99.17 (MACARAN) #18: Thu Mar 21 13:12:01 EDT 2013 root@macaran:/usr/src/sys/arch/amd64/compile/MACARAN amd64
Architecture: x86_64
Machine: amd64
>Description:

Observe the following curious behavior:

macaran% jot 15 1 | awk '{ a[$1] = ($1 < 10); } END { for (k in a) { print k, a[k], (k < 10); }}'
2 1 0
3 1 0
4 1 0
5 1 0
6 1 0
7 1 0
8 1 0
9 1 0
10 0 0
11 0 0
12 0 0
13 0 0
14 0 0
15 0 0
1 1 1

Note that k < 10 is evaluated as a string comparison.

Is this required by some standard? gawk does the same thing, but it
definitely violates the POLA.

>How-To-Repeat:

as above

>Fix:

dunno

>Audit-Trail:
From: Aleksey Cheusov <cheusov@tut.by>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/47840: awk string comparison of integer constant
Date: Mon, 20 May 2013 15:10:54 +0300

 --001a11c33a5282864004dd253969
 Content-Type: text/plain; charset=ISO-8859-1

 On Mon, May 20, 2013 at 8:30 AM, <dholland@eecs.harvard.edu> wrote:

 > >Description:
 >
 > Observe the following curious behavior:
 >
 > macaran% jot 15 1 | awk '{ a[$1] = ($1 < 10); } END { for (k in a) { print
 > k, a[k], (k < 10); }}'
 > 2 1 0
 > 3 1 0
 > 4 1 0
 > 5 1 0
 > 6 1 0
 > 7 1 0
 > 8 1 0
 > 9 1 0
 > 10 0 0
 > 11 0 0
 > 12 0 0
 > 13 0 0
 > 14 0 0
 > 15 0 0
 > 1 1 1
 >
 > Note that k < 10 is evaluated as a string comparison.
 >
 > Is this required by some standard? gawk does the same thing, but it
 > definitely violates the POLA.
 >

 POSIX says the following
 http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html

 "Comparisons (with the '<', "<=", "!=", "==", '>', and ">=" operators)
 shall be made numerically if both operands are  numeric, if one is numeric
 and the other has a string value that is a numeric string, or if one is
 numeric and the other has the uninitialized value. Otherwise, operands
 shall be converted to strings as required and a string comparison shall be
 made using the locale-specific collation sequence."

 Unless I read this sentence incorrectly the second and third columns in
 your output
 should contain the same values because in both contexts 10 has definitely a
 type "numeric"
 and therefore both k and $1 should be converted to the number before
 comparison.

 So, I think nawk violates POSIX. On the other hand mawk, gawk and Solaris'
 xpg4/awk work the same way.

 --001a11c33a5282864004dd253969
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: quoted-printable

 <div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On M=
 on, May 20, 2013 at 8:30 AM,  <span dir=3D"ltr">&lt;<a href=3D"mailto:dholl=
 and@eecs.harvard.edu" target=3D"_blank">dholland@eecs.harvard.edu</a>&gt;</=
 span> wrote:<br>
 <blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
 left:1px solid rgb(204,204,204);padding-left:1ex">&gt;Description:<br>
 <br>
 Observe the following curious behavior:<br>
 <br>
 macaran% jot 15 1 | awk &#39;{ a[$1] =3D ($1 &lt; 10); } END { for (k in a)=
  { print k, a[k], (k &lt; 10); }}&#39;<br>
 2 1 0<br>
 3 1 0<br>
 4 1 0<br>
 5 1 0<br>
 6 1 0<br>
 7 1 0<br>
 8 1 0<br>
 9 1 0<br>
 10 0 0<br>
 11 0 0<br>
 12 0 0<br>
 13 0 0<br>
 14 0 0<br>
 15 0 0<br>
 1 1 1<br>
 <br>
 Note that k &lt; 10 is evaluated as a string comparison.<br>
 <br>
 Is this required by some standard? gawk does the same thing, but it<br>
 definitely violates the POLA.<br></blockquote></div><br></div><div class=3D=
 "gmail_extra">POSIX says the following<br><a href=3D"http://pubs.opengroup.=
 org/onlinepubs/9699919799/utilities/awk.html">http://pubs.opengroup.org/onl=
 inepubs/9699919799/utilities/awk.html</a><br>
 <br>&quot;Comparisons (with the &#39;&lt;&#39;, &quot;&lt;=3D&=
 quot;, &quot;!=3D&quot;, &quot;=3D=3D&quot;, &#3=
 9;&gt;&#39;, and
 &quot;&gt;=3D&quot; operators) shall be made numerically if both o=
 perands are=A0 numeric, if one is numeric and the other has a string
 value that is a numeric string, or if one is numeric and the other has the =
 uninitialized value. Otherwise, operands shall be
 converted to strings as required and a string comparison shall be made usin=
 g the locale-specific collation sequence.&quot;<br><br></div><div class=3D"=
 gmail_extra">Unless I read this sentence incorrectly the second and third c=
 olumns in your output<br>
 should contain the same values because in both contexts 10 has definitely a=
  type &quot;numeric&quot;<br>and therefore both k and $1 should be converte=
 d to the number before comparison.<br></div><div class=3D"gmail_extra"><br>
 So, I think nawk violates POSIX. On the other hand mawk, gawk and Solaris&#=
 39; xpg4/awk work the same way.<br><br></div></div>

 --001a11c33a5282864004dd253969--

From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/47840: awk string comparison of integer constant
Date: Tue, 21 May 2013 03:02:14 +0400

 On Mon, May 20, 2013 at 05:30:01 +0000, dholland@eecs.harvard.edu wrote:

 > Observe the following curious behavior:
 > 
 > macaran% jot 15 1 | awk '{ a[$1] = ($1 < 10); } END { for (k in a) { print k, a[k], (k < 10); }}'
 > 2 1 0
 [...]
 >
 > Note that k < 10 is evaluated as a string comparison.
 > 
 > Is this required by some standard? gawk does the same thing, but it
 > definitely violates the POLA.

 Hmm, it does, indeed, but read the already mentioned

 http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html

 closer and pay attention to the definition of "numeric string".

   Expressions in awk

   [...]

   A string value shall be considered a NUMERIC STRING if it comes from
   one of the following:

     1. Field variables
     2. Input from the getline() function
     3. FILENAME
     4. ARGV array elements
     5. ENVIRON array elements
     6. Array elements created by the split() function
     7. A command line variable assignment
     8. Variable assignment from another numeric string variable

     ...  Whether or not a string is a numeric string shall be relevant
     only in contexts where that term is used in this section.

   [...]

   Comparisons (with the '<', "<=", "!=", "==", '>', and ">="
   operators) shall be made numerically if both operands are numeric,
   if one is numeric and the other has A STRING VALUE THAT IS A NUMERIC
   STRING, or if one is numeric and the other has the uninitialized
   value.  Otherwise, operands shall be converted to strings as
   required and a string comparison shall be made using the
   locale-specific collation sequence.

 So for (k in a) gives you k that is a string, but not a numeric
 string(!), and so the compariosn is done on strings.

   RATIONALE

   [...]

     The description for comparisons in the ISO POSIX-2:1993 standard
     did not properly describe historical practice because of the way
     numeric strings are compared as numbers.  The current rules cause
     the following code:

     if (0 == "000")
         print "strange, but true"
     else
         print "not true"

     to do a numeric comparison, causing the if to succeed. It should
     be intuitively obvious that this is incorrect behavior, and
     indeed, no historical implementation of awk actually behaves this
     way.

     To fix this problem, the definition of numeric string was enhanced
     to include only those values obtained from specific circumstances
     (mostly external sources) where it is not possible to determine
     unambiguously whether the value is intended to be a string or a
     numeric.

     Variables that are assigned to a numeric string shall also be
     treated as a numeric string.  (For example, the notion of a
     numeric string can be propagated across assignments.)

 -uwe

From: Aleksey Cheusov <cheusov@tut.by>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/47840: awk string comparison of integer constant
Date: Tue, 21 May 2013 17:31:17 +0300

 --089e01493e2e59a94c04dd3b4d8f
 Content-Type: text/plain; charset=ISO-8859-1

 >
 >
 >    A string value shall be considered a NUMERIC STRING if it comes from
 >    one of the following:
 >

 Thanks! I overlooked "numeric string" definition.

 --089e01493e2e59a94c04dd3b4d8f
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: quoted-printable

 <div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote"><blo=
 ckquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #c=
 cc solid;padding-left:1ex">
 <br>
 =A0 =A0A string value shall be considered a NUMERIC STRING if it comes from=
 <br>
 =A0 =A0one of the following:<br></blockquote><br></div>Thanks! I overlooked=
  &quot;numeric string&quot; definition.<br><br></div></div>

 --089e01493e2e59a94c04dd3b4d8f--

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/47840: awk string comparison of integer constant
Date: Fri, 24 May 2013 07:51:55 +0000

 On Mon, May 20, 2013 at 11:05:03PM +0000, Valery Ushakov wrote:
  >  > Note that k < 10 is evaluated as a string comparison.
  >  > 
  >  > Is this required by some standard? gawk does the same thing, but it
  >  > definitely violates the POLA.
  >  
  >  Hmm, it does, indeed, but [...]

 Blah... all right then, can we at least warn if a numeric constant
 gets converted to a string value like this? It is reasonable to
 suppose that if I (or anyone else) intended a string value I would
 have written a string constant.

 The behavior I've just exhibited is just as nonsensical as that cited
 in that RATIONALE.

 -- 
 David A. Holland
 dholland@netbsd.org

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/47840: awk string comparison of integer constant
Date: Sat, 14 Feb 2015 10:59:34 +0000

 On Fri, May 24, 2013 at 07:55:00AM +0000, David Holland wrote:
  >  On Mon, May 20, 2013 at 11:05:03PM +0000, Valery Ushakov wrote:
  >   >  > Note that k < 10 is evaluated as a string comparison.
  >   >  > 
  >   >  > Is this required by some standard? gawk does the same thing, but it
  >   >  > definitely violates the POLA.
  >   >  
  >   >  Hmm, it does, indeed, but [...]
  >  
  >  Blah... all right then, can we at least warn if a numeric constant
  >  gets converted to a string value like this? It is reasonable to
  >  suppose that if I (or anyone else) intended a string value I would
  >  have written a string constant.
  >  
  >  The behavior I've just exhibited is just as nonsensical as that cited
  >  in that RATIONALE.

 Reading through this again, it seems that the problem is that in

    a[$1] = ($1 < 10)

 $1 is a numeric string because it's a field variable. But after the
 assignment the keys of a[] are not numeric strings, even though the
 value was copied from a numeric string. Maybe this is technically not
 an "assignment", but clearly the numeric string tag should be getting
 propagated here too.

 -- 
 David A. Holland
 dholland@netbsd.org

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.