NetBSD Problem Report #53885

From kre@munnari.OZ.AU  Thu Jan 17 01:14:46 2019
Return-Path: <kre@munnari.OZ.AU>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 19AB87A1B5
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 17 Jan 2019 01:14:46 +0000 (UTC)
Message-Id: <201901170114.x0H1ELZ4003162@jinx.noi.kre.to>
Date: Thu, 17 Jan 2019 08:14:21 +0700 (+07)
From: kre@munnari.OZ.AU, Martijn Dekker <martijn@inlv.org>
To: gnats-bugs@NetBSD.org
Subject: awk ERE bug
X-Send-Pr-Version: 3.95

>Number:         53885
>Category:       bin
>Synopsis:       awk ERE bug
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    christos
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jan 17 01:15:00 +0000 2019
>Closed-Date:    Sat Jan 19 09:15:28 +0000 2019
>Last-Modified:  Sat Jan 19 09:15:28 +0000 2019
>Originator:     Robert Elz (on behalf of Martijn Dekker)
>Release:        NetBSD 8.99.30, NetBSD 8.0
>Organization:
>Environment:
System: NetBSD jinx.noi.kre.to 8.99.30 NetBSD 8.99.30 (1.1-20190114) #9: Mon Jan 14 13:29:08 ICT 2019 kre@onyx.coe.psu.ac.th:/usr/obj/testing/kernels/amd64/JINX amd64
Architecture: x86_64
Machine: amd64
>Description:
Martijn e-mailed to me:

	Not sure where to report this
[I sent directions to send-pr and the web form.]

	-- if not you, perhaps you can forward it to the appropriate party
[this is that]

	(after checking if it's broken in -current).
[It is the same there]

	Found a bug in NetBSD 8.0 awk: bounds in EREs are broken.

	$ awk 'BEGIN { exit(!match("aaa","^a{3}$")); }'; echo $?
	1

	This should output 0.

gawk gives 0 for that.

>How-To-Repeat:
	As above.

>Fix:

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: gnats-admin->bin-bug-people
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Fri, 18 Jan 2019 07:51:10 +0000
Responsible-Changed-Why:
rescue from pending


State-Changed-From-To: open->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Fri, 18 Jan 2019 14:35:28 +0000
State-Changed-Why:
nawk doesn't support {} in regular expressions. e.g. this also fails:
echo "aaa" | nawk '/a{3}/ { print "matched" }'
As Kernigham notes in https://github.com/onetrueawk/awk/issues/26 changing
this would be very difficult.


From: Martijn Dekker <martijn@inlv.org>
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        jdolecek@NetBSD.org, kre@munnari.OZ.AU
Cc: 
Subject: Re: bin/53885 (awk ERE bug)
Date: Fri, 18 Jan 2019 17:18:52 +0100

 Op 18-01-19 om 15:35 schreef jdolecek@NetBSD.org:
 > nawk doesn't support {} in regular expressions. e.g. this also fails:
 > echo "aaa" | nawk '/a{3}/ { print "matched" }'
 > As Kernigham notes in https://github.com/onetrueawk/awk/issues/26 changing
 > this would be very difficult.

 POSIX requires that awk supports standard EREs[*1] which include bounds
 (interval expressions)[*2], and it has done for many years[*3].

 If it is true that nawk's ERE implementation is so "exceedingly
 complicated and fragile" that adding a basic and standard ERE feature
 would be too difficult, then this awk is broken by design and you should
 consider using some other awk implementation as the default awk.

 - M.

 [*1]
 http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html#tag_20_06_13_04

 [*2] point 5 at
 http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_06

 [*3] I was able to verify this back to the 2004 version, but the
 requirement may well be older. In any case 2004 is 15 years ago now.

From: Martijn Dekker <martijn@inlv.org>
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        jdolecek@NetBSD.org, kre@munnari.OZ.AU
Cc: 
Subject: Re: bin/53885 (awk ERE bug)
Date: Fri, 18 Jan 2019 21:31:37 +0100

 Op 18-01-19 om 17:18 schreef Martijn Dekker:
 > If it is true that nawk's ERE implementation is so "exceedingly
 > complicated and fragile" that adding a basic and standard ERE feature
 > would be too difficult, then this awk is broken by design and you should
 > consider using some other awk implementation as the default awk.

 It must not actually be true, though -- because Apple's /usr/bin/awk is
 of the same lineage, and it supports POSIX EREs just fine, including the
 bounds/interval expressions. See the code at:
 https://opensource.apple.com/source/awk/awk-24/src/b.c.auto.html

 ...and if you get the tarball at
 https://opensource.apple.com/tarballs/awk/awk-24.tar.gz
 (and ignore the obsolete Makefile within it) then you get another
 awk.tar.gz within that with the original sources (warning: files at top
 level, so 'mkdir' and 'cd' first), and a src/ directory with Apple's
 patched version. The code for the bounds feature is quite easy to find
 in a 'diff -ur' between them.

 So this should be quite possible to backport... I hope someone will. If
 not, maybe I'll take a shot at it when I find significant time, because
 I think it's important for POSIX awk scripts to be actually portable.

 - M.

Responsible-Changed-From-To: bin-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Fri, 18 Jan 2019 21:25:51 +0000
Responsible-Changed-Why:
I'll take a look on the code from the Apple repo.


State-Changed-From-To: closed->open
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Fri, 18 Jan 2019 21:25:51 +0000
State-Changed-Why:
Checking if we can fix it reasonably easily.


From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: bin/53885 (awk ERE bug)
Date: Fri, 18 Jan 2019 22:38:01 +0100

 Le ven. 18 janv. 2019 =C3=A0 21:37, Martijn Dekker <martijn@inlv.org> a =C3=
 =A9crit :
 >  must not actually be true, though -- because Apple's /usr/bin/awk is
 >  of the same lineage, and it supports POSIX EREs just fine, including the
 >  bounds/interval expressions. See the code at:
 >  https://opensource.apple.com/source/awk/awk-24/src/b.c.auto.html

 I've just checked on actual Mac OS X.

 > awk --version
 awk version 20070501

 > nawk --version
 awk version 20121220


 The 'awk' supports bounds/interval.

 The  'nawk' idoesn't.

 So I'm a bit sceptic. But I'll look on the archive nevertheless.

 Jaromir

From: Martijn Dekker <martijn@inlv.org>
To: gnats-bugs@NetBSD.org, jdolecek@NetBSD.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org, kre@munnari.OZ.AU
Cc: 
Subject: Re: bin/53885 (awk ERE bug)
Date: Fri, 18 Jan 2019 23:03:09 +0100

 Op 18-01-19 om 22:40 schreef Jaromír Doleček:
 >  I've just checked on actual Mac OS X.
 >  
 >  > awk --version
 >  awk version 20070501
 >  
 >  > nawk --version
 >  awk version 20121220
 >  
 >  
 >  The 'awk' supports bounds/interval.
 >  
 >  The  'nawk' idoesn't.
 >  
 >  So I'm a bit sceptic. But I'll look on the archive nevertheless.

 I'm an actual Mac OS X 10.11.6 user. On my Mac I have no 'nawk' at all.
 I don't think it comes with the system. Someone must have added it to
 yours at some point.

 I recompiled those Apple awk sources, which are definitely original
 lineage, and it results in the same program as my /usr/bin/awk.

 - M.

From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: bin/53885 (awk ERE bug)
Date: Sat, 19 Jan 2019 00:24:50 +0100

 Le ven. 18 janv. 2019 =C3=A0 23:03, Martijn Dekker <martijn@inlv.org> a =C3=
 =A9crit :
 > I'm an actual Mac OS X 10.11.6 user. On my Mac I have no 'nawk' at all.
 > I don't think it comes with the system. Someone must have added it to
 > yours at some point.

 Yeah, forgot I have pkg in path - nawk was from pkgsrc. The 'awk' is
 the one coming with system, and identifies itself as 20070501.

 Got the Apple sources and got repetition pattern locally to work with
 nawk 20121220. There is indeed specific added support for this,
 conditional upon "Unix2003_compat". I'll think how to integrate it to
 make this acceptable for upstream.

 Jaromir

From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53885 CVS commit: src/external/historical/nawk/dist
Date: Fri, 18 Jan 2019 19:37:42 -0500

 Module Name:	src
 Committed By:	christos
 Date:		Sat Jan 19 00:37:42 UTC 2019

 Modified Files:
 	src/external/historical/nawk/dist: b.c

 Log Message:
 PR/53885: Martijn Dekker: Add ERE support from
 https://opensource.apple.com/tarballs/awk/awk-24.tar.gz


 To generate a diff of this commit:
 cvs rdiff -u -r1.5 -r1.6 src/external/historical/nawk/dist/b.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, jdolecek@NetBSD.org, gnats-admin@netbsd.org, 
	netbsd-bugs@netbsd.org, kre@munnari.OZ.AU, 
	Martijn Dekker <martijn@inlv.org>
Cc: 
Subject: Re: bin/53885 (awk ERE bug)
Date: Fri, 18 Jan 2019 19:43:43 -0500

 On Jan 18, 11:30pm, jaromir.dolecek@gmail.com (=?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?=) wrote:
 -- Subject: Re: bin/53885 (awk ERE bug)

 |  Yeah, forgot I have pkg in path - nawk was from pkgsrc. The 'awk' is
 |  the one coming with system, and identifies itself as 20070501.
 |  
 |  Got the Apple sources and got repetition pattern locally to work with
 |  nawk 20121220. There is indeed specific added support for this,
 |  conditional upon "Unix2003_compat". I'll think how to integrate it to
 |  make this acceptable for upstream.

 I've already merged and committed the code.

 christos

From: Paul Goyette <paul@whooppee.com>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org, jdolecek@NetBSD.org, gnats-admin@netbsd.org, 
    netbsd-bugs@netbsd.org, kre@munnari.OZ.AU, 
    Martijn Dekker <martijn@inlv.org>
Subject: Re: bin/53885 (awk ERE bug)
Date: Sat, 19 Jan 2019 08:47:52 +0800 (PST)

 On Fri, 18 Jan 2019, Christos Zoulas wrote:

 > On Jan 18, 11:30pm, jaromir.dolecek@gmail.com (=?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?=) wrote:
 > -- Subject: Re: bin/53885 (awk ERE bug)
 >
 > |  Yeah, forgot I have pkg in path - nawk was from pkgsrc. The 'awk' is
 > |  the one coming with system, and identifies itself as 20070501.
 > |
 > |  Got the Apple sources and got repetition pattern locally to work with
 > |  nawk 20121220. There is indeed specific added support for this,
 > |  conditional upon "Unix2003_compat". I'll think how to integrate it to
 > |  make this acceptable for upstream.
 >
 > I've already merged and committed the code.

 I suspect someone(tm) ought to right a new atf test for this...  :)

 (I am _not_ volunteering!)


 +------------------+--------------------------+----------------------------+
 | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:          |
 | (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
 | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
 +------------------+--------------------------+----------------------------+

Responsible-Changed-From-To: jdolecek->christos
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Sat, 19 Jan 2019 09:15:28 +0000
Responsible-Changed-Why:
Christos committed fix.


State-Changed-From-To: open->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sat, 19 Jan 2019 09:15:28 +0000
State-Changed-Why:
Fixed on HEAD (-current).


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.