NetBSD Problem Report #43896

From www@NetBSD.org  Wed Sep 22 15:53:18 2010
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 2945E63B995
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 22 Sep 2010 15:53:18 +0000 (UTC)
Message-Id: <20100922155317.E825763B97A@www.NetBSD.org>
Date: Wed, 22 Sep 2010 15:53:17 +0000 (UTC)
From: peter@kerwien.homeip.net
Reply-To: peter@kerwien.homeip.net
To: gnats-bugs@NetBSD.org
Subject: grep -o match problem
X-Send-Pr-Version: www-1.0

>Number:         43896
>Category:       bin
>Synopsis:       grep -o match problem
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    bin-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Sep 22 15:55:00 +0000 2010
>Closed-Date:    Tue Sep 28 01:28:09 +0000 2010
>Last-Modified:  Tue Sep 28 01:28:09 +0000 2010
>Originator:     Peter Kerwien
>Release:        NetBSD 5.99.39 (amd64)
>Organization:
N/A
>Environment:
NetBSD pc3 5.99.39 NetBSD 5.99.39 (GENERIC) #1: Wed Sep 22 05:57:37 UTC 2010  root@pc3:/usr/obj/sys/arch/amd64/compile/GENERIC amd64

>Description:
The following command fails to match properly:

echo VERSION=10 | grep -o '[0-9]*'

The result is empty. The correct result should be 10.

>How-To-Repeat:
See description.
>Fix:

>Release-Note:

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/43896: grep -o match problem
Date: Mon, 27 Sep 2010 02:17:47 +0000

 On Wed, Sep 22, 2010 at 03:55:00PM +0000, peter@kerwien.homeip.net wrote:
  > The following command fails to match properly:
  > 
  > echo VERSION=10 | grep -o '[0-9]*'
  > 
  > The result is empty. The correct result should be 10.

 This result is, though perhaps not useful, correct. You can see what's
 going on if you try sed:

    % echo VERSION=10 | sed 's/[0-9]*/wibble/'
    wibbleVERSION=10
    % 

 Because [0-9]* matches the empty string, grep is matching the empty
 string at the beginning of the line and printing that.

 To get the result you're looking for, try grep -o '[0-9][0-9]*' or
 egrep -o '[0-9]+'.

 -- 
 David A. Holland
 dholland@netbsd.org

State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 27 Sep 2010 02:22:44 +0000
State-Changed-Why:
Submitter hit one of the pitfalls in regexp matching...


State-Changed-From-To: closed->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 27 Sep 2010 08:29:53 +0000
State-Changed-Why:
A bug does exist here, however. grep -o apparently prints each distinct
match for a given input line separately:

   % echo ' 1 2 3 4 ' | grep -o '[0-9]'
   1
   2
   3
   4
   % echo ' the quick brown fox ' | grep -o '[a-z][a-z]*'
   the
   quick 
   brown
   fox 
   %

Therefore, the original example, which can match the empty string, should
print all the possible empty strings it can match and also the nonempty
match, and not just stop with the first empty match at the beginning of the
line.

Reportedly, updating grep will fix the problem.


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/43896 (grep -o match problem)
Date: Mon, 27 Sep 2010 08:37:00 +0000

 On Mon, Sep 27, 2010 at 08:29:54AM +0000, dholland@NetBSD.org wrote:
  > A bug does exist here, however. grep -o apparently prints each distinct
  > match for a given input line separately:
  >  
  > [...]
  > 
  > Reportedly, updating grep will fix the problem.

 As does this patch:

 Index: src/grep.c
 ===================================================================
 RCS file: /cvsroot/src/gnu/dist/grep/src/grep.c,v
 retrieving revision 1.12
 diff -u -p -r1.12 grep.c
 --- src/grep.c	28 Aug 2008 03:59:06 -0000	1.12
 +++ src/grep.c	27 Sep 2010 08:35:31 -0000
 @@ -542,7 +542,10 @@ prline (char const *beg, char const *lim
  	  if (b == lim)
  	    break;
  	  if (match_size == 0)
 -	    break;
 +	    {
 +	      beg++;
 +	      continue;
 +	    }
  	  if(color_option)
  	    printf("\33[%sm", grep_color);
  	  fwrite(b, sizeof (char), match_size, stdout);


 -- 
 David A. Holland
 dholland@netbsd.org

From: "David A. Holland" <dholland@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/43896 CVS commit: src/gnu/dist/grep/src
Date: Tue, 28 Sep 2010 00:54:05 +0000

 Module Name:	src
 Committed By:	dholland
 Date:		Tue Sep 28 00:54:04 UTC 2010

 Modified Files:
 	src/gnu/dist/grep/src: grep.c

 Log Message:
 Fix -o behavior with patterns that match the empty string, as per PR 43896.


 To generate a diff of this commit:
 cvs rdiff -u -r1.12 -r1.13 src/gnu/dist/grep/src/grep.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Tue, 28 Sep 2010 01:28:09 +0000
State-Changed-Why:
Fixed (properly now), thanks.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.