NetBSD Problem Report #59803
From elo@marmite.localnet Sat Nov 29 05:02:47 2025
Return-Path: <elo@marmite.localnet>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id B0EDE1A9239
for <gnats-bugs@gnats.NetBSD.org>; Sat, 29 Nov 2025 05:02:47 +0000 (UTC)
Message-Id: <20251129045417.8F9919932@marmite.localnet>
Date: Sat, 29 Nov 2025 04:54:17 +0000 (GMT)
From: elo@sdf.org
Reply-To: elo@sdf.org
To: gnats-bugs@NetBSD.org
Subject: sed(1) conditional branch command confuses subsequent line addressing
X-Send-Pr-Version: 3.95
>Number: 59803
>Category: bin
>Synopsis: sed(1) conditional branch command confuses subsequent line addressing
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Nov 29 05:05:00 +0000 2025
>Last-Modified: Mon Dec 01 03:40:01 +0000 2025
>Originator: elo
>Release: NetBSD 11.0_BETA
>Organization:
>Environment:
System: NetBSD marmite.localnet 11.0_BETA NetBSD 11.0_BETA (BROADBEAN) #5: Mon Oct 27 07:39:36 GMT 2025 elo@marmite.localnet:/usr/obj/sys/arch/amd64/compile/BROADBEAN amd64
Architecture: x86_64
Machine: amd64
$ ident /usr/bin/sed
/usr/bin/sed:
$NetBSD: crt0.S,v 1.4 2018/11/26 17:37:46 joerg Exp $
$NetBSD: crt0-common.c,v 1.30 2025/05/02 23:04:06 riastradh Exp $
$NetBSD: crti.S,v 1.1 2010/08/07 18:01:35 joerg Exp $
$NetBSD: crtbegin.S,v 1.2 2010/11/30 18:37:59 joerg Exp $
$NetBSD: compile.c,v 1.55 2025/06/03 19:02:29 martin Exp $
$NetBSD: main.c,v 1.38 2021/03/11 15:45:55 christos Exp $
$NetBSD: misc.c,v 1.15 2014/06/26 02:14:32 christos Exp $
$NetBSD: process.c,v 1.54 2024/09/17 13:34:08 kre Exp $
$NetBSD: crtend.S,v 1.1 2010/08/07 18:01:34 joerg Exp $
$NetBSD: crtn.S,v 1.1 2010/08/07 18:01:35 joerg Exp $
>Description:
In a sed(1) script, using the 't' command to branch when a
substitution has been made causes a subsequent line of the script,
qualified with an explicit address range, to be skipped for all
remaining input if the line number of the source text in which the
substitution was made is the same as the starting address of the
explicit address range. Fortunately, it's much easier to demonstrate
than it is to describe.
>How-To-Repeat:
$ cat sedbug.sed
#!/usr/bin/sed -nf
=
s,a,substitution succeeded,p
t
3,$ p
# end of script
First, exercise the script with no substitutions:
$ echo 'b\nb\nb\nb\nb' | ./sedbug.sed
1
2
3
b
4
b
5
b
The '=' command prints the input-line number, and the input line
itself is printed for lines 3 and after.
Let's have a substitution on line 2:
$ echo 'b\na\nb\nb\nb' | ./sedbug.sed
1
2
substitution succeeded
3
b
4
b
5
b
The substitution succeeded, the script branched to the end and the
subsequent lines were read and printed, as before.
Now let's move the substitution to line 3:
$ echo 'b\nb\na\nb\nb' | ./sedbug.sed
1
2
3
substitution succeeded
4
5
By my reckoning, lines 4 and 5 ought still to have been printed,
but they were not.
Minised (http://exactcode.com/opensource/minised/) demonstrates
exactly the same perplexing behaviour as NetBSD's native sed. It
was only when I tried GNU sed and found it to act as I expected
that I realised I might have stumbled upon a genuine anomaly (in
two implementations of sed, no less--do NetBSD sed and minised
have shared ancestry?).
>Fix:
>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/59803: sed(1) conditional branch command confuses subsequent line addressing
Date: Sat, 29 Nov 2025 08:56:40 -0000 (UTC)
elo@sdf.org writes:
> In a sed(1) script, using the 't' command to branch when a
> substitution has been made causes a subsequent line of the script,
> qualified with an explicit address range, to be skipped for all
> remaining input if the line number of the source text in which the
> substitution was made is the same as the starting address of the
> explicit address range. Fortunately, it's much easier to demonstrate
> than it is to describe.
I can confirm that behaviour.
The issue is that the address range '3,$' doesn't match linenumbers
between 3 and end-of-file. Instead the match starts when seeing line 3
and ends when seeing end-of-file.
Since line 3 is skipped by the 't' command, the address range never
matches.
No idea if that is right or wrong, but it behaves exactly like an
address match that uses a regular expression instead of a line number.
If the matching line for the start condition is skipped, the range
will never apply.
> Minised (http://exactcode.com/opensource/minised/) demonstrates
> exactly the same perplexing behaviour as NetBSD's native sed.
Minised is a bit different in what it regards "reading of an input line"
(the didsub flag is the result of the last substitute command and
not cleared by input). But it interprets ranges the same. A line
is only considered "in range" when the start address (line number of
pattern) is matched.
From: elo@sdf.org
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/59803: sed(1) conditional branch command confuses subsequent line addressing
Date: Sun, 30 Nov 2025 06:42:38 GMT
Hello, Michael.
> I can confirm that behaviour.
>
> The issue is that the address range '3,$' doesn't match linenumbers
> between 3 and end-of-file. Instead the match starts when seeing line 3
> and ends when seeing end-of-file.
>
> Since line 3 is skipped by the 't' command, the address range never
> matches.
>
> No idea if that is right or wrong, but it behaves exactly like an
> address match that uses a regular expression instead of a line number.
> If the matching line for the start condition is skipped, the range
> will never apply.
Thank you for replying to my PR so promptly. I have the uncomfortable
feeling I'm being a bit thick here. I don't know when an address
range is pre-compiled/compiled/interpreted (delete as appropriate) by
sed--whether before any input is read, or the first time the script
line qualified by the address range is encountered whilst reading
input, or every time the script line is encountered (I really should
take the trouble to look at the sed source at some point)--but in none
of those scenarios is it yet clear to me in the abstract why or how
the address range should be rendered ineffectual when the text in the
pattern space in a given cycle has not been substituted, and the 't'
command consequently has not altered the program flow.
The sed man page is seemingly unequivocal, at least in this passage:
Normally, sed cyclically copies a line of input, not including
its terminating newline character, into a pattern space, [...],
applies all of the commands with addresses that select that
pattern space, copies the pattern space to the standard output,
appending a newline, and deletes the pattern space.
Perhaps 'Normally' is bearing a heavier load than I at first realised.
In any case, if you're content that sed's behaviour is satisfactory as
it stands, that's all I need for now. If you haven't done already, I'm
happy for you to close this PR. Thanks again for taking the trouble to
explain things to me.
Cheers,
elo
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/59803: sed(1) conditional branch command confuses subsequent
line addressing
Date: Sun, 30 Nov 2025 12:42:52 +0300
On Sun, Nov 30, 2025 at 06:45:02 +0000, elo@sdf.org via gnats wrote:
> of those scenarios is it yet clear to me in the abstract why or how
> the address range should be rendered ineffectual when the text in the
> pattern space in a given cycle has not been substituted, and the 't'
> command consequently has not altered the program flow.
A bit further down in the man page:
A command line with two addresses selects an inclusive range. This
range starts with the first pattern space that matches the first
address. [...]
The pattern space with the third line in it never matched the first
address of 3,$ range b/c "t" command caused that bit of the sed script
to be skipped over. That seems like a pretty straightforward
procedural semantic.
The interpretation you seem to expect (declarative in spirit, not
procedural) makes address ranges (two addresses) behave radically
different from single address, if I understand correctly. AFAICT you
expect all range toggles to always get checked regardless of the
program flow, but the action part to only get executed according to
the program flow. Just thinking about that makes my head hurt :)
-uwe
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/59803: sed(1) conditional branch command confuses subsequent
line addressing
Date: Sun, 30 Nov 2025 13:13:34 +0300
On Sun, Nov 30, 2025 at 12:42:52 +0300, Valery Ushakov wrote:
> A bit further down in the man page:
>
> A command line with two addresses selects an inclusive range. This
> range starts with the first pattern space that matches the first
> address. [...]
>
> The pattern space with the third line in it never matched the first
> address of 3,$ range b/c "t" command caused that bit of the sed script
> to be skipped over. That seems like a pretty straightforward
> procedural semantic.
>
> The interpretation you seem to expect (declarative in spirit, not
> procedural) makes address ranges (two addresses) behave radically
> different from single address, if I understand correctly. AFAICT you
> expect all range toggles to always get checked regardless of the
> program flow, but the action part to only get executed according to
> the program flow. Just thinking about that makes my head hurt :)
Consider the "same" program in awk
$ cat t.awk
{ print NR }
/a/ { sub(/a/, "substitution succeeded"); print; next }
(NR == 3), 0 { print }
$ printf '%s\n' b a b b | gawk -f t.awk
1
2
substitution succeeded
3
b
4
b
$ printf '%s\n' b b a b b | gawk -f t.awk
1
2
3
substitution succeeded
4
5
-uwe
From: RVP <rvp@SDF.ORG>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/59803: sed(1) conditional branch command confuses subsequent
line addressing
Date: Sun, 30 Nov 2025 13:26:41 +0000 (UTC)
On Sat, 29 Nov 2025, elo@sdf.org wrote:
> Now let's move the substitution to line 3:
> $ echo 'b\nb\na\nb\nb' | ./sedbug.sed
> 1
> 2
> 3
> substitution succeeded
> 4
> 5
>
> By my reckoning, lines 4 and 5 ought still to have been printed,
> but they were not.
>
Yes, it seems that way to me too. This affects `b' too:
```
$ printf '%s\n' one two three four five six | sed -n $'=\n3b\n3,5p'
1
2
3
4
5
6
$
```
> Fix:
>
```
diff -urN sed.orig/process.c sed/process.c
--- sed.orig/process.c 2024-09-18 04:04:51.056653673 +0000
+++ sed/process.c 2025-11-30 12:47:21.934353983 +0000
@@ -331,7 +331,9 @@
} else
r = 1;
}
- } else if (cp->a1 && MATCH(cp->a1)) {
+ } else if (cp->a1 && (MATCH(cp->a1) ||
+ (cp->a1->type == AT_LINE &&
+ linenum >= cp->a1->u.l && linenum <= cp->a2->u.l))) {
/*
* If the second address is a number less than or
* equal to the line number first selected, only
```
ATF tests except for the overlapping ranges test (which'll need adjustment
because sed(1) now behaves like the GNU sed).
-RVP
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/59803: sed(1) conditional branch command confuses subsequent line addressing
Date: Mon, 01 Dec 2025 03:50:15 +0700
Date: Sun, 30 Nov 2025 13:30:01 +0000 (UTC)
From: "RVP via gnats" <gnats-admin@NetBSD.org>
Message-ID: <20251130133001.A2FE91A923A@mollari.NetBSD.org>
| > By my reckoning, lines 4 and 5 ought still to have been printed,
| > but they were not.
| >
|
| Yes, it seems that way to me too. This affects `b' too:
No, Michael (mlelstv@) and Valery (uwe@) are right, if the first address
in a dual address command doesn't actually match the pattern space, then
the command following a dual address is not executed.
POSIX is perhaps slightly clearer than the man page uwe@ quoted.
An editing command with two addresses shall select the
inclusive range from the first pattern space that matches
the first address through the next pattern space that matches
the second.
That is fairly clear, the pattern space must actually match the
first address for the range to be started - it doesn't matter why
it doesn't match, if it doesn't actually happen, it doesn't.
It is easier to visualise this if you use actual matching patterns,
rather than line numbers, that if if the input were
1b
2b
3a
4b
5b
and the sed script were
=
s,a,substitution succeeded,p
t
/^3/,$p
which is (aside from the lines printed having an extra leading
character) the same thing, approximately. But with this, if
sed never actually matches the leading '3' in the pattern space
how can it tell which lines that 'p' (the final one) is intended
to apply to? The (successful) 't' means that dual address
command is never executed in the line that contains the leading
'3' (as that's the line that also contains the 'a'), so there's
no way sed can know that the range has started, and if it doesn't
know that, it hasn't.
The proposed change makes numeric addresses behave differently than
patterns, but there is no justification anywhere for that distinction.
It might seem odd, but it really isn't.
kre
From: RVP <rvp@SDF.ORG>
To: gnats-bugs@netbsd.org
Cc: Robert Elz <kre@netbsd.org>
Subject: Re: bin/59803: sed(1) conditional branch command confuses subsequent
line addressing
Date: Mon, 1 Dec 2025 00:10:12 +0000 (UTC)
On Sun, 30 Nov 2025, Robert Elz via gnats wrote:
> POSIX is perhaps slightly clearer than the man page uwe@ quoted.
>
> An editing command with two addresses shall select the
> inclusive range from the first pattern space that matches
> the first address through the next pattern space that matches
> the second.
>
> That is fairly clear, the pattern space must actually match the
> first address for the range to be started - it doesn't matter why
> it doesn't match, if it doesn't actually happen, it doesn't.
>
Right, ...
> It is easier to visualise this if you use actual matching patterns,
> rather than line numbers, that if if the input were
>
> 1b
> 2b
> 3a
> 4b
> 5b
>
> and the sed script were
>
> =
> s,a,substitution succeeded,p
> t
>
> /^3/,$p
>
> which is (aside from the lines printed having an extra leading
> character) the same thing, approximately. But with this, if
> sed never actually matches the leading '3' in the pattern space
> how can it tell which lines that 'p' (the final one) is intended
> to apply to? The (successful) 't' means that dual address
> command is never executed in the line that contains the leading
> '3' (as that's the line that also contains the 'a'), so there's
> no way sed can know that the range has started, and if it doesn't
> know that, it hasn't.
>
but, there's a major difference between these two: even in a stream, sed can
_always_ retrieve the current line number (even with `sed -i '' file.txt').
Whereas of course, when the pattern is gone, it's gone and you can't determine
the applicable range. So, with line numbers, you can act on the rest of the
range.
> The proposed change makes numeric addresses behave differently than
> patterns, but there is no justification anywhere for that distinction.
>
Yes, deliberately so, for the reasons above :) Anyway, either way is fine
with me (I have a preference for the GNU behaviour, though). I tried all this
out (along with a: printf '%s\n' a b c d e f | sed '1,3d; 3,5d') on a few
OSes just now:
FreeBSD-14.3, OpenBSD-7.7 and OpenIndiana (Hipster 2025.10) behave like NetBSD
(all these seem to have the same parentage). 9front 2025/10/11 (Plan 9) also
acts the same way, even though the code-base is very different.
Tribblix has GNU sed. Busybox sed follows the GNU sed behaviour.
-RVP
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/59803: sed(1) conditional branch command confuses subsequent
line addressing
Date: Mon, 1 Dec 2025 04:12:37 +0300
On Mon, Dec 01, 2025 at 00:15:01 +0000, RVP via gnats wrote:
> but, there's a major difference between these two: even in a
> stream, sed can _always_ retrieve the current line number (even
> with `sed -i '' file.txt'). Whereas of course, when the pattern is
> gone, it's gone and you can't determine the applicable range. So,
> with line numbers, you can act on the rest of the range.
But why would you want to make them behave differently?! It goes _so_
against the POLA... Also see the awk example. In awk the two tests
in the toggle pattern are just expressions, line numbers are not
special in any way (they are just tests on NR). Breaking the sed/awk
symmetry seems like a bad idea.
-uwe
From: Robert Elz <kre@munnari.OZ.AU>
To: RVP <rvp@SDF.ORG>
Cc: gnats-bugs@netbsd.org
Subject: Re: bin/59803: sed(1) conditional branch command confuses subsequent line addressing
Date: Mon, 01 Dec 2025 09:17:38 +0700
Date: Mon, 1 Dec 2025 00:10:12 +0000 (UTC)
From: RVP <rvp@SDF.ORG>
Message-ID: <71936a11-411a-9ac3-c4cf-e5aef68f3393@SDF.ORG>
| but, there's a major difference between these two: even in a stream,
| sed can _always_ retrieve the current line number
Sure, but that's not the point. The point is what a dual address
command means, and it isn't the same as what you're imagining.
It is easy to be seduced when the command that is to be executed is
something simple, like 'p' 'd' or 's', but it isn't always.
Consider the case where what is happening is that extracts from the
text are being accumulated in the hold space from a range of input
lines - when the first line of the range is encountered, things
are initialised (the hold space is cleared, or whatever is needed),
and when the final line is encountered, the hold space is used in
whatever fashion is intended.
If you never actually encounter the first line, the init is going to
be skipped, and if the commands are executed on following lines, what
will result will be a mess.
The same applies to the end line of the range - if that one isn't
encountered, the commands simply keep on being applied - nothing has
caused them to stop. I suspect that your two line patch didn't handle
that case, I also suspect that handling it would be a little more complex.
But if in the OP's example the "3,$" were instead "3,5" (with the input
containing more than 5 lines), and it happened to be that line 5 was the
one where the substitute occurred, and that 't' causes the dual address
command to be skipped - then what happens is that that range remains active
and will apply to lines 6 7 8 ... continuing until line 5 is actually processed
(which is unlikely in that scenario!)
It isn't generally difficult to write sed scripts that handle all this kind
of thing properly (which often means not using explicit line numbers, other
than perhaps 1 and $) provided that one understands how sed is defined to
work - and dual address commands are not defined to be "any line that happens
to be at or after the first address and at or before the second address",
they are "start (only) when the first address is found", and "stop when the
second address is found (and only then)".
Don't fall into the trap of "it just seems obvious that it should ..." and
change the behaviour of commands without a careful analysis of why they
are the way they are.
kre
From: elo@sdf.org
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/59803: sed(1) conditional branch command confuses subsequent
line addressing
Date: Mon, 01 Dec 2025 02:26:05 +0000
Hello, all.
Thank you very much, uwe@ and kre@, for taking the time and trouble,
as mlelstv@ did earlier, to try to help me understand why sed acts as
it does. Unfortunately, I'm no further along. I've undoubtedly made a
hash of articulating the problem, so I hope you'll forgive my having
one more, self-indulgent go.
My persistent confusion about sed's expected behaviour lies in the
implication that it keeps state (other than the contents of the
pattern and hold spaces) from one cycle to the next. Returning to my
example,
#!/usr/bin/sed -nf
=
s,a,substitution succeeded,p
t
3,$ p
when the third line of input is substituted, of course the address
range of the final 'p' command will have no effect; the entire line
is skipped by the preceding 't'. That behaviour is manifestly correct,
and is not what I meant to question. What I'm failing to grasp is
how it is that, when the fourth and subsequent lines of input are
read, and do not result in substitutions, and the branch is not taken,
the '3,$' address range is not then satisfied. How did the fact that a
substitution occurred in a previous cycle influence later cycles? The
nature of that mechanism is what continues to elude me.
Cheers,
elo
PS. My thanks also to RVP for apprehending why it is I'm finding this
so perplexing. It's much appreciated.
From: elo@sdf.org
To: gnats-bugs@netbsd.org, "Robert Elz via gnats" <gnats-admin@NetBSD.org>
Cc:
Subject: Re: bin/59803: sed(1) conditional branch command confuses subsequent
line addressing
Date: Mon, 01 Dec 2025 02:42:22 +0000
Hello, Robert.
> Date: Mon, 01 Dec 2025 02:20:02 +0000
> From: Robert Elz via gnats <gnats-admin@NetBSD.org>
>
> [...] dual address commands are not defined to be "any line that happens
> to be at or after the first address and at or before the second address",
> they are "start (only) when the first address is found", and "stop when the
> second address is found (and only then)".
Ah! So there _is_ state! That is the crucial detail I've been missing.
Thank you very much indeed for stating it so clearly, and my apologies
to everyone for making such a nuisance of myself.
Cheers,
elo
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/59803: sed(1) conditional branch command confuses subsequent line addressing
Date: Mon, 01 Dec 2025 10:34:26 +0700
Date: Mon, 1 Dec 2025 02:30:02 +0000 (UTC)
From: "elo@sdf.org via gnats" <gnats-admin@NetBSD.org>
Message-ID: <20251201023002.138681A923A@mollari.NetBSD.org>
| My persistent confusion about sed's expected behaviour lies in the
| implication that it keeps state
It does.
| when the third line of input is substituted, of course the address
| range of the final 'p' command will have no effect; the entire line
| is skipped by the preceding 't'. That behaviour is manifestly correct,
| and is not what I meant to question.
We're aware of that.
| What I'm failing to grasp is
| how it is that, when the fourth and subsequent lines of input are
| read, and do not result in substitutions, and the branch is not taken,
| the '3,$' address range is not then satisfied.
Because address ranges don't work that way. As you suspected above,
they have state, they are turned on when the first address is encountered,
and turned off again when the second address is encountered. There is
no actual concept of "from here to there" just start when you see this,
and stop when you see that. (Then go back to looking for the start to
happen again.)
That's necessary, to make things work in the general case when the
addresses are patterns, not line numbers - line numbers are just a
degenerate case, which can be thought of as matching a counter which
is counting lines, against the pattern which is specified, it either
matches, or does not.
| How did the fact that a
| substitution occurred in a previous cycle influence later cycles? The
| nature of that mechanism is what continues to elude me.
The "this dual address command has been enabled" switch was never turned on.
That only happens when the first (of the two) address matches (and it
only turns off when the second does.)
kre
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.