NetBSD Problem Report #57773
From jarle@festningen.kontor.norid.no Thu Dec 14 17:36:15 2023
Return-Path: <jarle@festningen.kontor.norid.no>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 6231E1A9238
for <gnats-bugs@gnats.NetBSD.org>; Thu, 14 Dec 2023 17:36:15 +0000 (UTC)
Message-Id: <20231214173603.4EC4F17026E9@festningen.kontor.norid.no>
Date: Thu, 14 Dec 2023 18:36:03 +0100 (CET)
From: jarle.greipsland@norid.no
Reply-To: jarle.greipsland@norid.no
To: gnats-bugs@NetBSD.org
Subject: /bin/sh: Substring processing in assignments fail for quoted control characters
X-Send-Pr-Version: 3.95
>Number: 57773
>Category: bin
>Synopsis: /bin/sh: Substring processing in assignments fail for quoted control characters
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: kre
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Dec 14 17:40:00 +0000 2023
>Closed-Date: Sun Jan 14 16:47:06 +0000 2024
>Last-Modified: Sun Jan 14 16:47:06 +0000 2024
>Originator: Jarle Fredrik Greipsland
>Release: NetBSD 10.0_BETA
>Organization:
Vennlig hilsen
Jarle Fredrik Greipsland
>Environment:
System: NetBSD singsaker.uninett.no 10.0_BETA NetBSD 10.0_BETA (GENERIC) #1: Sun Sep 17 11:44:36 CEST 2023 jarle@singsaker.uninett.no:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
If the variable c holds a single character that has a byte value
in the range 129-139, the following parameter expansion with substring
processing produces unexpected results:
v="X'${c}'" -> (X'\202')
v=${v%"'${c}'"} -> (X'\202') where the expected result is (X)
On the other hand, the same parameter expansion work as expected if it
is not part of an assignment, i.e.
[ ${v%"'${c}'"} = X ] && echo Expected result.
will print "Expected result."
>How-To-Repeat:
Run the shell snippet below.
--------------------------------------------------
#!/bin/sh
export LC_ALL=C
for n in `jot 255 1`; do
c=$(printf "$(printf '\\%o' $n)")
v="X'$c'"
v=${v%"'$c'"}
if [ "$v" != X ]; then
printf "Unexpected result for %d: got (%s), expected (%s)\n" $n "$v" "X"
fi
done | vis -o
--------------------------------------------------
When run with /bin/sh on NetBSD, all characters outside the range
129-139 produce the expected results. When run with /bin/ksh and
/usr/pkg/bin/bash, all characters produce the expected results.
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: bin-bug-people->kre
Responsible-Changed-By: kre@NetBSD.org
Responsible-Changed-When: Fri, 15 Dec 2023 00:16:01 +0000
Responsible-Changed-Why:
I am looking into this PR
State-Changed-From-To: open->analyzed
State-Changed-By: kre@NetBSD.org
State-Changed-When: Fri, 15 Dec 2023 03:34:16 +0000
State-Changed-Why:
I know what is happening, and why (that it fails with \202, and possibly
a few nearby octet values, is no coincidence (those are used internally
in strings). I even know why it makes a difference whether the expansion
is in an assignment or elsewhere. The fix seems to be very simple.
I am testing it now (this kind of seemingly trivial change to sh can have
ramifications that are hard to imagine, so lots of testing is required).
The (current) sh ATF tests all pass though.
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/57773 [/bin/sh: Substring processing in assignments fail for quoted control characters]
Date: Mon, 25 Dec 2023 11:43:35 +0700
OK, I am about to commit the (very trivial) fix for this issue.
But, unlike some of my previous commits, the plan this time is
not to include a 50 line explantion of what went wrong and why the
fix works in the commit message - but instead do that here, and
then simply refer to this PR there.
First, in all internal processing, sh needs to know what it is
dealing with, so all strings being temporaily used contain
markers that tell the shell what special processing to do
with the string in question - those are (currently) the
bytes 0201..0213 (129..139) incl - the ones identified in
the PR as having the problem.
In earlier versions of the shell values from 1 were used (there
are ancient PRs about problems with ^A ec - all long since fixed).
Any byte values should work for this purpose, as one of these
CTL chars (sh terminology) means "treat the next byte literally"
- just like \ in many places (incl sh input).
The current values are used as they are less likely to occur in
normal char strings for anyone using UTF-8 encoding. That is,
less escaping is typically needed.
Normally whenever sh processes a string (like reading a command
line, or expanding a variable) it inserts that CTL char (CTLESC)
in the result before any byte value which would be interpreted
as one of those CTL chars, and then removes it again when all
processing is done.
But sometimes when we "know" that there is no more processing
going to happen, and if we inserted CTLESC all over the place,
the only thing those would ever be good for is to be removed
again, we just don't bother - and leave the string un-escaped.
(Slighty simplified for this explanation).
sh variable assignments, which are all always a single sh
"word" of the form:
name=value
are one such place, as once "value" is expanded nothing is done
with the expansion, other than what posix calls "quote removal"
- removing any quote chars that were in the original command.
In sh, those are long gone, replaced by CTL chars that achieve
the same effect.
Var assigns do no field splitting, no pathname expanion,
nothing after the var expansions happen - except quote removal,
which in sh is the removal of any CTL chars (which should only
be CTLESC by that point) that remain. Hence adding extra
CTLESC chars seemed unnecessary, and generally is.
[That's why quotes aren't needed around simple var=${other} type
assignments or var=$(whatever) -- only when we have something like
var="x y|x" with the value containing (as written) spaces or sh
operator (or quote) chars are the quotes ever needed.]
Thus in the v="X'${c}'" statement, even when c had the same
value as a CTL code, there was no need for a CTLESC to be inserted
before the value of ${c} even when that looked like a CTL char,
and it all just worked.
But in the assignment containing the substring assignment,
there was a problem ... there the word after the % in the
substring expansion is expanded the same as it always is,
meaning a CTLESC is inserted before any bytes that could be
interpreted as CTL codes. Threin lies the rub... The pattern
match was looking for a value which contained a CTLESC char (to
protect the 'magic' byte value that followd) in a string which
didn't contain that value. No hope of success.
In the expansion as a regular command arg (as in the [ (test)
example given) this isn't an issue, as the initial var expand
also inserts a CTLESC so the matching simply has strings which
seem to it to be 1 byte longer (in the examples given) than was
given -- none of the CTL codes is anything but a simple byte to
pattern matching, so that all just works.
One possible fix to this would have been to make the pattern
matching code understand the special meaning of CTLESC, and
ignore it whenever seen - but that code is ugly enough as it
is, brittle enough, and would have required many updates, that
I didn't want to go there.
An alternate fix is just to treat var assigns more like anything
else when doing expansions - causing CTLESC bytes to be inserted
wherever the look needed, just as is done with other expansions.
That is what I have done ... the fix is just to add one more
flag bit to the call of argstr() when expanding the value of
a variable asignment. Hence trivial (but can add a minor extra
processing cost in these unusual cases).
I have been testing this since the day after the PR was submitted
-- all the NetBSD ATF tests pass (or rather those that failed had
nothing to do with anything sh did) , and I have used this sh to
build about 1000 packages from pkgsrc (including several of the big
ones) without issue.
I could do a system build using it - and eventually will - but
build.sh is actually very gentle as far as sh usage goes, and
is very unlikely to be affected (or prove anything at all).
Thus I think this PR is fixed. Thanks for finding the problem
and telling us about it.
I still need to look at whether a similar problem might occur when
expanding the arg (usually filename) following a redirect operator,
which is a somewhat similar case, I will get to that soon.
kre
ps: Merry Christmas to all who celebrate that.
[This copy resent without the "cc", as apparently I canot
spell netbsd ...]
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57773 CVS commit: src/bin/sh
Date: Mon, 25 Dec 2023 04:52:38 +0000
Module Name: src
Committed By: kre
Date: Mon Dec 25 04:52:38 UTC 2023
Modified Files:
src/bin/sh: eval.c
Log Message:
PR bin/57773
Fix a bug reported by Jarle Fredrik Greipsland in PR bin/57773,
where a substring expansion where the substring to be removed from
a variable expansion is itself a var expansion where the value
contains one (or more) of sh's CTLxxx chars - the pattern had
CTLESC inserted, the string to be matched against did not. Fail.
We fix that by always inserting CTLESC in var assign expansions.
See the PR for all the gory details.
Thanks for the PR.
XXX pullup to everything.
To generate a diff of this commit:
cvs rdiff -u -r1.190 -r1.191 src/bin/sh/eval.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: analyzed->needs-pullups
State-Changed-By: kre@NetBSD.org
State-Changed-When: Mon, 25 Dec 2023 05:00:40 +0000
State-Changed-Why:
Pullups to -8 -9 and -10 required. Those will happen, but not
until after this has been in current for at least a week (that is,
sometime in the early new year).
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/57773 [/bin/sh: Substring processing in assignments fail for quoted control characters]
Date: Mon, 25 Dec 2023 23:21:25 +0700
I said earlier (and I still apparently cannot spell netbsd - so this
is again a repeat, just for gnats):
| I still need to look at whether a similar problem might occur when
| expanding the arg (usually filename) following a redirect operator,
I have done that, first by code examination and then testing, and
there is no problem there.
kre
From: Jarle Greipsland <jarle.greipsland@norid.no>
To: gnats-bugs@netbsd.org, kre@netbsd.org
Cc:
Subject: Re: PR/57773 CVS commit: src/bin/sh
Date: Wed, 27 Dec 2023 23:44:07 +0100 (CET)
"Robert Elz" <kre@netbsd.org> writes:
> Fix a bug reported by Jarle Fredrik Greipsland in PR bin/57773,
> where a substring expansion where the substring to be removed from
> a variable expansion is itself a var expansion where the value
> contains one (or more) of sh's CTLxxx chars - the pattern had
> CTLESC inserted, the string to be matched against did not. Fail.
> We fix that by always inserting CTLESC in var assign expansions.
> See the PR for all the gory details.
Excellent!
There are however still some gremlins in the code wrt. to the
escape characters. The string lengths are not always calculated
correctly.
------------------------------------------------------------
#!/bin/sh
export LC_ALL=C
f() {
printf 'f->$1 Length of %s: %d\n' "$1" ${#1}
}
{
c=$(printf '\202')
printf '$c Length of %s: %d\n' "$c" ${#c}
f "$c"
set -- "$c"
printf '$1 Length of %s: %d\n' "$1" ${#1}
} | vis -o
exit 0
----------------------------------------------------------------------
The above code will print lengths as 1, 2 and 2. And they ought
to be all one.
-jarle
--
"We do this not because it is easy, but because we thought it would be easy."
From: Robert Elz <kre@munnari.OZ.AU>
To: Jarle Greipsland <jarle.greipsland@norid.no>
Cc: gnats-bugs@netbsd.org
Subject: Re: PR/57773 CVS commit: src/bin/sh
Date: Thu, 28 Dec 2023 06:36:34 +0700
Date: Wed, 27 Dec 2023 23:44:07 +0100 (CET)
From: Jarle Greipsland <jarle.greipsland@norid.no>
Message-ID: <20231227.234407.2096023211607554905.jarle@norid.no>
| There are however still some gremlins in the code wrt. to the
| escape characters. The string lengths are not always calculated
| correctly.
OK, thanks, I will look into that one as well. It should be similarly simple
I think. It will delay a pullup of this into -10 (maybe into -9 as well,
I'll look see - but the sh in -8 isn't worth having this kind of esoteric
bug fixed, that one still has bigger problems - none of these affect any
common operation though).
kre
From: Robert Elz <kre@munnari.OZ.AU>
To: Jarle Greipsland <jarle.greipsland@norid.no>
Cc: gnats-bugs@netbsd.org
Subject: Re: PR/57773 CVS commit: src/bin/sh
Date: Thu, 28 Dec 2023 12:47:23 +0700
Date: Wed, 27 Dec 2023 23:44:07 +0100 (CET)
From: Jarle Greipsland <jarle.greipsland@norid.no>
Message-ID: <20231227.234407.2096023211607554905.jarle@norid.no>
| There are however still some gremlins in the code wrt. to the
| escape characters. The string lengths are not always calculated
| correctly.
That turned out to be the exact opposite problem of the previous
one - the earlier one was failing to insert the CTLESC when it
turned out to be needed, this one is in inserting CTLESC chars
when they aren't wanted.
The ${#var} code is kind of special - it doesn't bother expanding
the value of var, it just counts the length, and inserts that.
That's kind of the obvious thing to do really. But for the
special parameters, that doesn't work, as most of them don't
have string values that can just be counted - they need to be
generated first. So, that gets done, and then the length of
the value produced is counted.
The issue here, is that the positional parameters are treated
as special params for this purpose ... they aren't stored in the
same way that variables are, so they cannot simply be treated
as vars, but they are all strings, so lengths could just be counted
without expanding them, as is done for vars. But that's not the
way the code has ever worked - it simply expands all of those,
specials and positional, and then determines the length of the
result of the expansion.
In the cases in the examples given, the ${#X} was always being
performed in a situation where CTSESC processing is nomally needed,
so that was being requested. The code that handles ${#...} was
turning off some of that - but not all of it, and in any case, the
flags were being ignored and CTLESC chars inserted ahead of any
CTL char anyway (all cases).
So two minor fixes will eventually happen when I am finished
testing (less this time, as it is clearer what is happening,
and that the fix is OK) - one to turn off all the "escaping needed"
flags in the ${#X} case, and one to make sure at least one of those
is on, before inserting CTLESC chars, always.
kre
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57773 CVS commit: src/bin/sh
Date: Fri, 29 Dec 2023 15:49:24 +0000
Module Name: src
Committed By: kre
Date: Fri Dec 29 15:49:24 UTC 2023
Modified Files:
src/bin/sh: expand.c
Log Message:
PR bin/57773
Fix another bug reported by Jarle Fredrik Greipsland and added
to PR bin/57773, which relates to calculating the length of a
positional parameter which contains CTL chars -- yes, this one
really is that specific, though it would also affect the special
param $0 if it were to contain CTL chars, and its length was
requested - that is fixed with the same change. And note: $0
is not affected because it looks like a positional param (it
isn't, ${00} would be, but is always unset, ${0} isn't) all
special parame would be affected the same way, but the only one
that can ever contain a CTL char is $0 I believe. ($@ and $*
were affected, but just because they're expanding the positional
params ... ${#@} and ${#*} are both technically unspecified
expansions - and different shells produce different results.
See the PR for the details of this one (and the previous).
Thanks for the PR.
XXX pullup to everything.
To generate a diff of this commit:
cvs rdiff -u -r1.143 -r1.144 src/bin/sh/expand.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: kre@NetBSD.org
State-Changed-When: Wed, 03 Jan 2024 01:52:41 +0000
State-Changed-Why:
[pullup-10 #535]
[pullup-9 #1787]
There won't be a pullup to -8, the shell there is missing other changes which
would be needed, and its future lifespan isn't long enough to bother with
significant changes for issues like this (it has others, perhaps worse) which
in practice, no-one ever encounters.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57773 CVS commit: [netbsd-10] src/bin/sh
Date: Sun, 14 Jan 2024 13:15:06 +0000
Module Name: src
Committed By: martin
Date: Sun Jan 14 13:15:05 UTC 2024
Modified Files:
src/bin/sh [netbsd-10]: eval.c expand.c
Log Message:
Pull up following revision(s) (requested by kre in ticket #535):
bin/sh/eval.c: revision 1.191
bin/sh/expand.c: revision 1.144
PR bin/57773
Fix a bug reported by Jarle Fredrik Greipsland in PR bin/57773,
where a substring expansion where the substring to be removed from
a variable expansion is itself a var expansion where the value
contains one (or more) of sh's CTLxxx chars - the pattern had
CTLESC inserted, the string to be matched against did not. Fail.
We fix that by always inserting CTLESC in var assign expansions.
See the PR for all the gory details.
Thanks for the PR.
PR bin/57773
Fix another bug reported by Jarle Fredrik Greipsland and added
to PR bin/57773, which relates to calculating the length of a
positional parameter which contains CTL chars -- yes, this one
really is that specific, though it would also affect the special
param $0 if it were to contain CTL chars, and its length was
requested - that is fixed with the same change. And note: $0
is not affected because it looks like a positional param (it
isn't, ${00} would be, but is always unset, ${0} isn't) all
special parame would be affected the same way, but the only one
that can ever contain a CTL char is $0 I believe. ($@ and $*
were affected, but just because they're expanding the positional
params ... ${#@} and ${#*} are both technically unspecified
expansions - and different shells produce different results.
See the PR for the details of this one (and the previous).
Thanks for the PR.
To generate a diff of this commit:
cvs rdiff -u -r1.188 -r1.188.2.1 src/bin/sh/eval.c
cvs rdiff -u -r1.141 -r1.141.2.1 src/bin/sh/expand.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57773 CVS commit: [netbsd-9] src/bin/sh
Date: Sun, 14 Jan 2024 13:16:51 +0000
Module Name: src
Committed By: martin
Date: Sun Jan 14 13:16:51 UTC 2024
Modified Files:
src/bin/sh [netbsd-9]: eval.c expand.c
Log Message:
Pull up following revision(s) (requested by kre in ticket #1787):
bin/sh/eval.c: revision 1.191
bin/sh/expand.c: revision 1.144
PR bin/57773
Fix a bug reported by Jarle Fredrik Greipsland in PR bin/57773,
where a substring expansion where the substring to be removed from
a variable expansion is itself a var expansion where the value
contains one (or more) of sh's CTLxxx chars - the pattern had
CTLESC inserted, the string to be matched against did not. Fail.
We fix that by always inserting CTLESC in var assign expansions.
See the PR for all the gory details.
Thanks for the PR.
PR bin/57773
Fix another bug reported by Jarle Fredrik Greipsland and added
to PR bin/57773, which relates to calculating the length of a
positional parameter which contains CTL chars -- yes, this one
really is that specific, though it would also affect the special
param $0 if it were to contain CTL chars, and its length was
requested - that is fixed with the same change. And note: $0
is not affected because it looks like a positional param (it
isn't, ${00} would be, but is always unset, ${0} isn't) all
special parame would be affected the same way, but the only one
that can ever contain a CTL char is $0 I believe. ($@ and $*
were affected, but just because they're expanding the positional
params ... ${#@} and ${#*} are both technically unspecified
expansions - and different shells produce different results.
See the PR for the details of this one (and the previous).
Thanks for the PR.
To generate a diff of this commit:
cvs rdiff -u -r1.175.2.3 -r1.175.2.4 src/bin/sh/eval.c
cvs rdiff -u -r1.132.2.2 -r1.132.2.3 src/bin/sh/expand.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sun, 14 Jan 2024 16:47:06 +0000
State-Changed-Why:
fixed and pulled up to 10 and 9, not worth it for 8
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.