NetBSD Problem Report #51171

From jarle@singsaker.uninett.no  Fri May 27 07:39:29 2016
Return-Path: <jarle@singsaker.uninett.no>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E99307A46A
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 27 May 2016 07:39:28 +0000 (UTC)
Message-Id: <20160527073927.3661E84CE8@singsaker.uninett.no>
Date: Fri, 27 May 2016 09:20:56 +0200 (CEST)
From: jarle@uninett.no
Reply-To: jarle@uninett.no
To: gnats-bugs@NetBSD.org
Subject: sed does not match newlines in regexps properly
X-Send-Pr-Version: 3.95

>Number:         51171
>Category:       bin
>Synopsis:       sed does not match newlines in regexps properly
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 27 07:40:00 +0000 2016
>Last-Modified:  Fri May 27 10:10:00 +0000 2016
>Originator:     Jarle Greipsland
>Release:        NetBSD 7.99.26
>Organization:

>Environment:


System: NetBSD singsaker.uninett.no 7.99.26 NetBSD 7.99.26 (SINGSAKER) #0: Sat Mar 26 12:52:06 CET 2016 jarle@singsaker.uninett.no:/usr/obj/sys/arch/i386/compile/SINGSAKER i386
Architecture: i386
Machine: i386
>Description:

The behavior of sed with regards to matching embedded newlines in the
pattern space seems to have changed from NetBSD 6 to NetBSD 7.

-------- script.sed ---------
1{h;d;}
2{H;d;}
3{H
  x
# Pattern space: line1 \n line2 \n \line3 (without spaces)
# Now, delete the first character of line1 and line2
  s/^[^\n]\([^\n]*\n\)[^\n]/\1/
}
-----------------------------

On NetBSD 6, the command
  (echo abc; echo def; echo ghi) | sed -f script.sed
will print:
bc
ef
ghi
which is what I would expect.

However, on NetBSD 7 and NetBSD-current, the command will print:
bc
def
hi
which is rather unexpected.

>How-To-Repeat:
See above.

>Fix:


>Audit-Trail:
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/51171: sed does not match newlines in regexps properly
Date: Fri, 27 May 2016 16:16:57 +0700

     Date:        Fri, 27 May 2016 07:40:00 +0000 (UTC)
     From:        jarle@uninett.no
     Message-ID:  <20160527074000.50F8B7AABE@mollari.NetBSD.org>


   | -------- script.sed ---------
   | 1{h;d;}
   | 2{H;d;}
   | 3{H
   |   x
   | # Pattern space: line1 \n line2 \n \line3 (without spaces)
   | # Now, delete the first character of line1 and line2
   |   s/^[^\n]\([^\n]*\n\)[^\n]/\1/
   | }
   | -----------------------------
   | 
   | On NetBSD 6, the command
   |   (echo abc; echo def; echo ghi) | sed -f script.sed
   | will print:
   | bc
   | ef
   | ghi
   | which is what I would expect.

 If it does it is a bug the expression [^\n] matches a character
 that is neither a '\' nor an 'n' and has nothing at all to do with newlines.
 No escape characters work inside [] (though there a whole set of
 magic combinations that mean specific things).

 As best I can tell (having looked for it for ages) there is no way in
 sed to match anything other than a newline.   I resorted to s/\n/X/
 where X was a character I knew could not appear in the text (because
 earlier commands had removed all instances), followed by [^X] in the
 expression to do the work, followed by s/X/${nl}/ (${nl} is a literal
 newline.   Truly ugly, but I believe the only way possible.

 The best solution I can think of is to add a new char class that contains
 just newline, say [:nl:] and then use [^[:nl:]] but no sed does anything
 like that that I am aware of.

 kre


From: Jarle Greipsland <jarle@uninett.no>
To: gnats-bugs@NetBSD.org, kre@munnari.OZ.AU
Cc: 
Subject: Re: bin/51171: sed does not match newlines in regexps properly
Date: Fri, 27 May 2016 12:05:18 +0200 (CEST)

 Robert Elz <kre@munnari.OZ.AU> writes:
 >      From:        jarle@uninett.no
 >      Message-ID:  <20160527074000.50F8B7AABE@mollari.NetBSD.org>

 >    | -------- script.sed ---------
 >    | 1{h;d;}
 >    | 2{H;d;}
 >    | 3{H
 >    |   x
 >    | # Pattern space: line1 \n line2 \n \line3 (without spaces)
 >    | # Now, delete the first character of line1 and line2
 >    |   s/^[^\n]\([^\n]*\n\)[^\n]/\1/
 >    | }
 >    | -----------------------------
 >    | 
 >    | On NetBSD 6, the command
 >    |   (echo abc; echo def; echo ghi) | sed -f script.sed
 >    | will print:
 >    | bc
 >    | ef
 >    | ghi
 >    | which is what I would expect.
 >  
 >  If it does it is a bug the expression [^\n] matches a character
 >  that is neither a '\' nor an 'n' and has nothing at all to do with newlines.
 >  No escape characters work inside [] (though there a whole set of
 >  magic combinations that mean specific things).
 You are right.  I shall have to adjust my expectations.  And
 someone might want to adjust sed's behavior in NetBSD 6.  And GNU
 sed also, it would seem.  Oh well.  Lesson learned: don't rely on
 the behavior of \n in brackets.

 This problem report should probably be closed.

 >  As best I can tell (having looked for it for ages) there is no way in
 >  sed to match anything other than a newline.   I resorted to s/\n/X/
 >  where X was a character I knew could not appear in the text (because
 >  earlier commands had removed all instances), followed by [^X] in the
 >  expression to do the work, followed by s/X/${nl}/ (${nl} is a literal
 >  newline.   Truly ugly, but I believe the only way possible.
 Or even uglier, one could try and do dummy \n->\n substituions
 for positions where one does not wish a \n to match, and use
 control flow to branch to the appropriate substitutions.

 >  The best solution I can think of is to add a new char class that contains
 >  just newline, say [:nl:] and then use [^[:nl:]] but no sed does anything
 >  like that that I am aware of.
 That would have been nice, yes.
 					-jarle
 --
 we all hack on a broken subroutine, a broken subroutine, a broken subroutine...
 					-- Kenneth Stailey

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.