NetBSD Problem Report #54424

From www@netbsd.org  Wed Jul 31 23:00:32 2019
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 77A697A18B
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 31 Jul 2019 23:00:32 +0000 (UTC)
Message-Id: <20190731230031.E25417A1A8@mollari.NetBSD.org>
Date: Wed, 31 Jul 2019 23:00:31 +0000 (UTC)
From: martijn@inlv.org
Reply-To: martijn@inlv.org
To: gnats-bugs@NetBSD.org
Subject: awk: broken character classes in UTF-8 locale: only the first matches
X-Send-Pr-Version: www-1.0

>Number:         54424
>Category:       bin
>Synopsis:       awk: broken character classes in UTF-8 locale: only the first matches
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    bin-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 31 23:05:00 +0000 2019
>Closed-Date:    Tue Apr 21 20:06:55 +0000 2020
>Last-Modified:  Tue Apr 21 20:06:55 +0000 2020
>Originator:     Martijn Dekker
>Release:        9.0_BETA
>Organization:
modernish
>Environment:
NetBSD localhost 9.0_BETA NetBSD 9.0_BETA (GENERIC) #0: Tue Jul 30 16:52:10 UTC 2019  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
When a UTF-8 locale is active, /usr/bin/awk only matches the first character class in a bracket expression, even when matching simple ASCII characters.

I've confirmed this on NetBSD 8.1 as well. I've not tested earlier versions.

/usr/bin/awk on OpenBSD, FreeBSD and macOS (also nawk variants) do not have this problem, nor does the current upstream version (20190717).

>How-To-Repeat:
$ echo x | LANG=C awk '/[[:digit:][:alpha:]]/'  # ok
x
$ echo x | LANG=en_US.UTF-8 awk '/[[:digit:][:alpha:]]/'  # WRONG
$ echo x | LANG=en_US.UTF-8 awk '/[[:alpha:][:digit:]]/'  # ok
x

>Fix:

>Release-Note:

>Audit-Trail:
From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54424 CVS commit: src/external/historical/nawk/dist
Date: Thu, 1 Aug 2019 02:22:52 -0400

 Module Name:	src
 Committed By:	christos
 Date:		Thu Aug  1 06:22:52 UTC 2019

 Modified Files:
 	src/external/historical/nawk/dist: b.c

 Log Message:
 PR/54424: Martijn Dekker: awk: broken character classes in UTF-8 locale:
 only the first matches
 Pick up some of the fixes from upstream:
 	- posix paren matching
 	- print \v \a
 	- some more fatal handling
 	- init all the character range.


 To generate a diff of this commit:
 cvs rdiff -u -r1.7 -r1.8 src/external/historical/nawk/dist/b.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Martijn Dekker <martijn@inlv.org>
To: gnats-bugs@netbsd.org
Cc: christos@netbsd.org
Subject: Re: PR/54424 CVS commit: src/external/historical/nawk/dist
Date: Thu, 1 Aug 2019 10:42:14 +0100

 The \v \a support should be added to tran.c as well.

 See:
 https://github.com/onetrueawk/awk/pull/44
 https://github.com/onetrueawk/awk/commit/5b602ca8

 - M.

 -- 
 modernish -- harness the shell
 https://github.com/modernish/modernish

From: Christos Zoulas <christos@zoulas.com>
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 martijn@inlv.org
Subject: Re: PR/54424 CVS commit: src/external/historical/nawk/dist
Date: Thu, 1 Aug 2019 16:21:23 +0300

 You got it.

 christos

 > On Aug 1, 2019, at 2:15 PM, Martijn Dekker <martijn@inlv.org> wrote:
 > 
 > The following reply was made to PR bin/54424; it has been noted by GNATS.
 > 
 > From: Martijn Dekker <martijn@inlv.org>
 > To: gnats-bugs@netbsd.org
 > Cc: christos@netbsd.org
 > Subject: Re: PR/54424 CVS commit: src/external/historical/nawk/dist
 > Date: Thu, 1 Aug 2019 10:42:14 +0100
 > 
 > The \v \a support should be added to tran.c as well.
 > 
 > See:
 > https://github.com/onetrueawk/awk/pull/44
 > https://github.com/onetrueawk/awk/commit/5b602ca8
 > 
 > - M.
 > 
 > -- 
 > modernish -- harness the shell
 > https://github.com/modernish/modernish
 > 

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54424 CVS commit: [netbsd-9] src/external/historical/nawk/dist
Date: Sun, 4 Aug 2019 19:19:31 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sun Aug  4 19:19:31 UTC 2019

 Modified Files:
 	src/external/historical/nawk/dist [netbsd-9]: FIXES awk.1 awk.h b.c
 	    lex.c lib.c main.c run.c tran.c ytab.c

 Log Message:
 Pull up following revision(s) (requested by christos in ticket #11):

 	external/historical/nawk/dist/awk.h: revision 1.3
 	external/historical/nawk/dist/run.c: revision 1.10
 	external/historical/nawk/dist/FIXES: revision 1.2
 	external/historical/nawk/dist/b.c: revision 1.7
 	external/historical/nawk/dist/b.c: revision 1.8
 	external/historical/nawk/dist/lex.c: revision 1.5
 	external/historical/nawk/dist/main.c: revision 1.9
 	external/historical/nawk/dist/proto.h: revision 1.8
 	external/historical/nawk/dist/tran.c: revision 1.10
 	external/historical/nawk/dist/proto.h: revision 1.9
 	external/historical/nawk/dist/awk.1: revision 1.2
 	external/historical/nawk/dist/lib.c: revision 1.9
 	external/historical/nawk/dist/ytab.c: revision 1.2
 	external/historical/nawk/dist/tran.c: revision 1.9

 remove trailing whitespace.

 PR/54424: Martijn Dekker: awk: broken character classes in UTF-8 locale:
 only the first matches

 Pick up some of the fixes from upstream:
         - posix paren matching
         - print \v \a
         - some more fatal handling
         - init all the character range.

 remove ### error output accidentally committed.

 Add translators for \v and \a per posix.


 To generate a diff of this commit:
 cvs rdiff -u -r1.1.1.2 -r1.1.1.2.32.1 src/external/historical/nawk/dist/FIXES
 cvs rdiff -u -r1.1.1.1 -r1.1.1.1.48.1 src/external/historical/nawk/dist/awk.1
 cvs rdiff -u -r1.2 -r1.2.48.1 src/external/historical/nawk/dist/awk.h
 cvs rdiff -u -r1.6 -r1.6.2.1 src/external/historical/nawk/dist/b.c
 cvs rdiff -u -r1.4 -r1.4.4.1 src/external/historical/nawk/dist/lex.c
 cvs rdiff -u -r1.8 -r1.8.28.1 src/external/historical/nawk/dist/lib.c
 cvs rdiff -u -r1.8 -r1.8.32.1 src/external/historical/nawk/dist/main.c \
     src/external/historical/nawk/dist/tran.c
 cvs rdiff -u -r1.9 -r1.9.18.1 src/external/historical/nawk/dist/run.c
 cvs rdiff -u -r1.1.1.1 -r1.1.1.1.36.1 \
     src/external/historical/nawk/dist/ytab.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: maya@NetBSD.org
State-Changed-When: Tue, 21 Apr 2020 20:06:55 +0000
State-Changed-Why:
This seems to be fixed, and it looks like we've re-imported a newer upstream version. THanks for the report!


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.