NetBSD Problem Report #51171
From jarle@singsaker.uninett.no Fri May 27 07:39:29 2016
Return-Path: <jarle@singsaker.uninett.no>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id E99307A46A
for <gnats-bugs@gnats.NetBSD.org>; Fri, 27 May 2016 07:39:28 +0000 (UTC)
Message-Id: <20160527073927.3661E84CE8@singsaker.uninett.no>
Date: Fri, 27 May 2016 09:20:56 +0200 (CEST)
From: jarle@uninett.no
Reply-To: jarle@uninett.no
To: gnats-bugs@NetBSD.org
Subject: sed does not match newlines in regexps properly
X-Send-Pr-Version: 3.95
>Number: 51171
>Category: bin
>Synopsis: sed does not match newlines in regexps properly
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri May 27 07:40:00 +0000 2016
>Last-Modified: Fri May 27 10:10:00 +0000 2016
>Originator: Jarle Greipsland
>Release: NetBSD 7.99.26
>Organization:
>Environment:
System: NetBSD singsaker.uninett.no 7.99.26 NetBSD 7.99.26 (SINGSAKER) #0: Sat Mar 26 12:52:06 CET 2016 jarle@singsaker.uninett.no:/usr/obj/sys/arch/i386/compile/SINGSAKER i386
Architecture: i386
Machine: i386
>Description:
The behavior of sed with regards to matching embedded newlines in the
pattern space seems to have changed from NetBSD 6 to NetBSD 7.
-------- script.sed ---------
1{h;d;}
2{H;d;}
3{H
x
# Pattern space: line1 \n line2 \n \line3 (without spaces)
# Now, delete the first character of line1 and line2
s/^[^\n]\([^\n]*\n\)[^\n]/\1/
}
-----------------------------
On NetBSD 6, the command
(echo abc; echo def; echo ghi) | sed -f script.sed
will print:
bc
ef
ghi
which is what I would expect.
However, on NetBSD 7 and NetBSD-current, the command will print:
bc
def
hi
which is rather unexpected.
>How-To-Repeat:
See above.
>Fix:
>Audit-Trail:
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/51171: sed does not match newlines in regexps properly
Date: Fri, 27 May 2016 16:16:57 +0700
Date: Fri, 27 May 2016 07:40:00 +0000 (UTC)
From: jarle@uninett.no
Message-ID: <20160527074000.50F8B7AABE@mollari.NetBSD.org>
| -------- script.sed ---------
| 1{h;d;}
| 2{H;d;}
| 3{H
| x
| # Pattern space: line1 \n line2 \n \line3 (without spaces)
| # Now, delete the first character of line1 and line2
| s/^[^\n]\([^\n]*\n\)[^\n]/\1/
| }
| -----------------------------
|
| On NetBSD 6, the command
| (echo abc; echo def; echo ghi) | sed -f script.sed
| will print:
| bc
| ef
| ghi
| which is what I would expect.
If it does it is a bug the expression [^\n] matches a character
that is neither a '\' nor an 'n' and has nothing at all to do with newlines.
No escape characters work inside [] (though there a whole set of
magic combinations that mean specific things).
As best I can tell (having looked for it for ages) there is no way in
sed to match anything other than a newline. I resorted to s/\n/X/
where X was a character I knew could not appear in the text (because
earlier commands had removed all instances), followed by [^X] in the
expression to do the work, followed by s/X/${nl}/ (${nl} is a literal
newline. Truly ugly, but I believe the only way possible.
The best solution I can think of is to add a new char class that contains
just newline, say [:nl:] and then use [^[:nl:]] but no sed does anything
like that that I am aware of.
kre
From: Jarle Greipsland <jarle@uninett.no>
To: gnats-bugs@NetBSD.org, kre@munnari.OZ.AU
Cc:
Subject: Re: bin/51171: sed does not match newlines in regexps properly
Date: Fri, 27 May 2016 12:05:18 +0200 (CEST)
Robert Elz <kre@munnari.OZ.AU> writes:
> From: jarle@uninett.no
> Message-ID: <20160527074000.50F8B7AABE@mollari.NetBSD.org>
> | -------- script.sed ---------
> | 1{h;d;}
> | 2{H;d;}
> | 3{H
> | x
> | # Pattern space: line1 \n line2 \n \line3 (without spaces)
> | # Now, delete the first character of line1 and line2
> | s/^[^\n]\([^\n]*\n\)[^\n]/\1/
> | }
> | -----------------------------
> |
> | On NetBSD 6, the command
> | (echo abc; echo def; echo ghi) | sed -f script.sed
> | will print:
> | bc
> | ef
> | ghi
> | which is what I would expect.
>
> If it does it is a bug the expression [^\n] matches a character
> that is neither a '\' nor an 'n' and has nothing at all to do with newlines.
> No escape characters work inside [] (though there a whole set of
> magic combinations that mean specific things).
You are right. I shall have to adjust my expectations. And
someone might want to adjust sed's behavior in NetBSD 6. And GNU
sed also, it would seem. Oh well. Lesson learned: don't rely on
the behavior of \n in brackets.
This problem report should probably be closed.
> As best I can tell (having looked for it for ages) there is no way in
> sed to match anything other than a newline. I resorted to s/\n/X/
> where X was a character I knew could not appear in the text (because
> earlier commands had removed all instances), followed by [^X] in the
> expression to do the work, followed by s/X/${nl}/ (${nl} is a literal
> newline. Truly ugly, but I believe the only way possible.
Or even uglier, one could try and do dummy \n->\n substituions
for positions where one does not wish a \n to match, and use
control flow to branch to the appropriate substitutions.
> The best solution I can think of is to add a new char class that contains
> just newline, say [:nl:] and then use [^[:nl:]] but no sed does anything
> like that that I am aware of.
That would have been nice, yes.
-jarle
--
we all hack on a broken subroutine, a broken subroutine, a broken subroutine...
-- Kenneth Stailey
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.