NetBSD Problem Report #28126

From andrew@untraceable.net  Tue Nov  9 02:36:13 2004
Return-Path: <andrew@untraceable.net>
Received: from noc.untraceable.net (noc.untraceable.net [166.84.189.65])
	by narn.netbsd.org (Postfix) with ESMTP id 59BAF2521DB
	for <gnats-bugs@gnats.netbsd.org>; Tue,  9 Nov 2004 02:36:13 +0000 (UTC)
Message-Id: <200411090236.iA92aAk0019567@noc.untraceable.net>
Date: Mon, 8 Nov 2004 21:36:10 -0500 (EST)
From: Andrew Brown <atatat@atatdot.net>
Reply-To: Andrew Brown <atatat@atatdot.net>
To: gnats-bugs@gnats.netbsd.org
Subject: sed fails to match empty back-reference
X-Send-Pr-Version: 3.95

>Number:         28126
>Category:       bin
>Synopsis:       sed fails to match empty back-reference
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    bin-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Nov 09 02:37:00 +0000 2004
>Closed-Date:    Tue Feb 23 21:07:11 +0000 2021
>Last-Modified:  Tue Feb 23 21:07:11 +0000 2021
>Originator:     TheMan
>Release:        2.99.10
>Organization:
none
>Environment:

System: NetBSD this 2.99.10 NetBSD 2.99.10 (THAT) #346: Mon Nov  8 07:58:51 EST 2004  andrew@this:/usr/src/sys/arch/i386/compile/THAT i386


>Description:

	sed fails to match an empty back-reference.  i discovered this
after i partially zorched my cvs tree and my Root and Repository files
mostly ended up containing garbage.  finding improper Repository files
should have been as simple as:

	% cd /usr/src
	% find ./bin -path \*/CVS/Repository | \
		xargs grep -H . | \
		sed -n '/^\.\(.*\)\/CVS\/Repository:src\1$/d;p'

yet the first line out is 

	./CVS/Repository:src

which is clearly fine.

>How-To-Repeat:

on netbsd current

	% echo foobar | sed -ne '/foo\(.*\)bar\1/p'
	% echo foo1bar1 | sed -ne '/foo\(.*\)bar\1/p'
	foo1bar1

compared to solaris:

	% echo foobar | sed -ne '/foo\(.*\)bar\1/p'
	foobar

linux:

	% echo foobar | sed -ne '/foo\(.*\)bar\1/p'
	foobar

netbsd 2.0_RC4:

	% echo foobar | sed -ne '/foo\(.*\)bar\1/p'
	foobar

netbsd 1.6ZL:

	% echo foobar | sed -ne '/foo\(.*\)bar\1/p'
	foobar

etc.

>Fix:

not sure.  repeatedly updating my sed tree to something older (and
older and older) didn't yield any good results, so i suppose it might
be a recent change to regex stuff in libc?
>Release-Note:
>Audit-Trail:
From: dieter roelants <dieter.NetBSD@pandora.be>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/28126 (sed fails to match empty back-reference)
Date: Sat, 17 May 2008 17:00:41 +0200

 I've been looking into this. The change that caused it is:

 http://cvsweb.be.netbsd.org/cgi-bin/cvsweb.cgi/src/lib/libc/regex/engine.c.diff?r2=1.18&r1=1.17&f=H
 "Avoid infinite recursion on:

     echo "foo foo bar bar bar baz" | sed 's/\([^ ]*\)\( *\1\)*/\1/g'

 From OpenBSD."

 Wondering whether OpenBSD people had perhaps detected and fixed the
 problem, I noticed that they did, in response to this very same PR.

 The patch, adapted to NetBSD, follows. I have verified with a small C
 program (included after the patch) that it works (and returns the same
 results as PCRE).

 I should also note that the foo bar baz sed thing above returns the
 wrong result with the current libc:

 echo "foo foo bar bar bar baz" | sed 's/\([^ ]*\)\( *\1\)*/\1/g'
 foo bar baz
 echo "foo foo bar bar bar baz" | LD_PRELOAD=/usr/src/lib/libc/libc.so.12 sed 's/\([^ ]*\)\( *\1\)*/\1/g'
 foobarbaz

 Index: regex/engine.c
 ===================================================================
 RCS file: /cvsroot/src/lib/libc/regex/engine.c,v
 retrieving revision 1.21
 diff -u -r1.21 engine.c
 --- regex/engine.c	8 Feb 2007 05:44:18 -0000	1.21
 +++ regex/engine.c	17 May 2008 14:47:45 -0000
 @@ -128,10 +128,11 @@
  /* === engine.c === */
  static int matcher(struct re_guts *g, const char *string, size_t nmatch, regmatch_t pmatch[], int eflags);
  static const char *dissect(struct match *m, const char *start, const char *stop, sopno startst, sopno stopst);
 -static const char *backref(struct match *m, const char *start, const char *stop, sopno startst, sopno stopst, sopno lev);
 +static const char *backref(struct match *m, const char *start, const char *stop, sopno startst, sopno stopst, sopno lev, int rec);
  static const char *fast(struct match *m, const char *start, const char *stop, sopno startst, sopno stopst);
  static const char *slow(struct match *m, const char *start, const char *stop, sopno startst, sopno stopst);
  static states step(struct re_guts *g, sopno start, sopno stop, states bef, int ch, states aft);
 +#define MAX_RECURSION	100
  #define	BOL	(OUT+1)
  #define	EOL	(BOL+1)
  #define	BOLEOL	(BOL+2)
 @@ -279,7 +280,7 @@
  				goto done;
  			}
  			NOTE("backref dissect");
 -			dp = backref(m, m->coldp, endp, gf, gl, (sopno)0);
 +			dp = backref(m, m->coldp, endp, gf, gl, (sopno)0, 0);
  		}
  		if (dp != NULL)
  			break;
 @@ -302,7 +303,7 @@
  			}
  #endif
  			NOTE("backoff dissect");
 -			dp = backref(m, m->coldp, endp, gf, gl, (sopno)0);
 +			dp = backref(m, m->coldp, endp, gf, gl, (sopno)0, 0);
  		}
  		assert(dp == NULL || dp == endp);
  		if (dp != NULL)		/* found a shorter one */
 @@ -565,7 +566,8 @@
      const char *stop,
      sopno startst,
      sopno stopst,
 -    sopno lev)			/* PLUS nesting level */
 +    sopno lev,			/* PLUS nesting level */
 +    int rec)
  {
  	int i;
  	sopno ss;	/* start sop of current subRE */
 @@ -675,7 +677,7 @@
  			return(NULL);
  		assert(m->pmatch[i].rm_so != (regoff_t)-1);
  		len = (size_t)(m->pmatch[i].rm_eo - m->pmatch[i].rm_so);
 -		if (len == 0)
 +		if (len == 0 && rec++ > MAX_RECURSION)
  			return(NULL);
  		assert(stop - m->beginp >= len);
  		if (sp > stop - len)
 @@ -685,28 +687,28 @@
  			return(NULL);
  		while (m->g->strip[ss] != SOP(O_BACK, i))
  			ss++;
 -		return(backref(m, sp+len, stop, ss+1, stopst, lev));
 +		return(backref(m, sp+len, stop, ss+1, stopst, lev, rec));

  	case OQUEST_:		/* to null or not */
 -		dp = backref(m, sp, stop, ss+1, stopst, lev);
 +		dp = backref(m, sp, stop, ss+1, stopst, lev, rec);
  		if (dp != NULL)
  			return(dp);	/* not */
 -		return(backref(m, sp, stop, ss+OPND(s)+1, stopst, lev));
 +		return(backref(m, sp, stop, ss+OPND(s)+1, stopst, lev, rec));

  	case OPLUS_:
  		assert(m->lastpos != NULL);
  		assert(lev+1 <= m->g->nplus);
  		m->lastpos[lev+1] = sp;
 -		return(backref(m, sp, stop, ss+1, stopst, lev+1));
 +		return(backref(m, sp, stop, ss+1, stopst, lev+1, rec));

  	case O_PLUS:
  		if (sp == m->lastpos[lev])	/* last pass matched null */
 -			return(backref(m, sp, stop, ss+1, stopst, lev-1));
 +			return(backref(m, sp, stop, ss+1, stopst, lev-1, rec));
  		/* try another pass */
  		m->lastpos[lev] = sp;
 -		dp = backref(m, sp, stop, ss-OPND(s)+1, stopst, lev);
 +		dp = backref(m, sp, stop, ss-OPND(s)+1, stopst, lev, rec);
  		if (dp == NULL)
 -			dp = backref(m, sp, stop, ss+1, stopst, lev-1);
 +			dp = backref(m, sp, stop, ss+1, stopst, lev-1, rec);
  		return(dp);

  	case OCH_:		/* find the right one, if any */
 @@ -714,7 +716,7 @@
  		esub = ss + OPND(s) - 1;
  		assert(OP(m->g->strip[esub]) == OOR1);
  		for (;;) {	/* find first matching branch */
 -			dp = backref(m, sp, stop, ssub, esub, lev);
 +			dp = backref(m, sp, stop, ssub, esub, lev, rec);
  			if (dp != NULL)
  				return(dp);
  			/* that one missed, try next one */
 @@ -735,7 +737,7 @@
  		assert(0 < i && i <= m->g->nsub);
  		offsave = m->pmatch[i].rm_so;
  		m->pmatch[i].rm_so = sp - m->offp;
 -		dp = backref(m, sp, stop, ss+1, stopst, lev);
 +		dp = backref(m, sp, stop, ss+1, stopst, lev, rec);
  		if (dp != NULL)
  			return(dp);
  		m->pmatch[i].rm_so = offsave;
 @@ -746,7 +748,7 @@
  		assert(0 < i && i <= m->g->nsub);
  		offsave = m->pmatch[i].rm_eo;
  		m->pmatch[i].rm_eo = sp - m->offp;
 -		dp = backref(m, sp, stop, ss+1, stopst, lev);
 +		dp = backref(m, sp, stop, ss+1, stopst, lev, rec);
  		if (dp != NULL)
  			return(dp);
  		m->pmatch[i].rm_eo = offsave;


 ======================================
 #include <inttypes.h>
 #ifdef _USE_PCRE
 #include <pcreposix.h>
 #else
 #include <regex.h>
 #endif
 #include <stdio.h>

 void
 testreg(regex_t *r, const char *s)
 {
 	regmatch_t m[2];
 	printf("%s: %d\t", s, regexec(r, s, 2, m, 0));
 	printf("%jd - %jd\t", (intmax_t)(m[0].rm_so), (intmax_t)(m[0].rm_eo));
 	printf("%jd - %jd\n", (intmax_t)(m[1].rm_so), (intmax_t)(m[1].rm_eo));
 }

 int
 main()
 {
 	regex_t r;
 	int ret;
 	char err[1024];
 #ifdef _USE_PCRE
 #define TESTRE "foo(.*)bar\\1"
 #else
 #define TESTRE "foo\\(.*\\)bar\\1"
 #endif

 	if ((ret = regcomp(&r, TESTRE, 0)) != 0) {
 		regerror(ret, &r, err, 1024);
 		fprintf(stderr, "RE error: %s\n", err);
 		return 1;
 	}
 	testreg(&r, "foobar");
 	testreg(&r, "foo_bar_");
 	testreg(&r, "foobar_");
 	testreg(&r, "foolbar_");
 	return 0;
 }

 /*
 output:

 current libc:
 ./t_libc
 foobar: 1	0 - 0	0 - 577757282805521148
 foo_bar_: 0	0 - 8	3 - 4
 foobar_: 1	0 - 8	3 - 4
 foolbar_: 1	0 - 8	3 - 4

 patched libc:
 LD_PRELOAD=/usr/src/lib/libc/libc.so.12 ./t_libc
 foobar: 0	0 - 6	3 - 3
 foo_bar_: 0	0 - 8	3 - 4
 foobar_: 0	0 - 6	3 - 3
 foolbar_: 1	0 - 6	3 - 3

 pcre:
 LD_PRELOAD=/usr/pkg/lib/libpcreposix.so ./t_pcre
 foobar: 0	0 - 6	3 - 3
 foo_bar_: 0	0 - 8	3 - 4
 foobar_: 0	0 - 6	3 - 3
 foolbar_: 17	0 - 6	3 - 3
 */

 dieter

From: "Jukka Ruohonen" <jruoho@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/28126 CVS commit: src
Date: Sun, 18 Mar 2012 10:12:31 +0000

 Module Name:	src
 Committed By:	jruoho
 Date:		Sun Mar 18 10:12:31 UTC 2012

 Modified Files:
 	src/distrib/sets/lists/tests: mi
 	src/etc/mtree: NetBSD.dist.tests
 	src/tests/usr.bin: Makefile
 Added Files:
 	src/tests/usr.bin/sed: Makefile t_sed.sh

 Log Message:
 Add a test case for PR bin/28126. Does not fail with GNU sed.


 To generate a diff of this commit:
 cvs rdiff -u -r1.447 -r1.448 src/distrib/sets/lists/tests/mi
 cvs rdiff -u -r1.64 -r1.65 src/etc/mtree/NetBSD.dist.tests
 cvs rdiff -u -r1.9 -r1.10 src/tests/usr.bin/Makefile
 cvs rdiff -u -r0 -r1.1 src/tests/usr.bin/sed/Makefile \
     src/tests/usr.bin/sed/t_sed.sh

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: christos@NetBSD.org
State-Changed-When: Tue, 23 Feb 2021 16:07:11 -0500
State-Changed-Why:
fixed with the latest sync of regex from FreeBSD


>Unformatted:
 >Quarter:
 >Keywords:
 >Date-Required:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.