NetBSD Problem Report #57773

From jarle@festningen.kontor.norid.no  Thu Dec 14 17:36:15 2023
Return-Path: <jarle@festningen.kontor.norid.no>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 6231E1A9238
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 14 Dec 2023 17:36:15 +0000 (UTC)
Message-Id: <20231214173603.4EC4F17026E9@festningen.kontor.norid.no>
Date: Thu, 14 Dec 2023 18:36:03 +0100 (CET)
From: jarle.greipsland@norid.no
Reply-To: jarle.greipsland@norid.no
To: gnats-bugs@NetBSD.org
Subject: /bin/sh: Substring processing in assignments fail for quoted control characters
X-Send-Pr-Version: 3.95

>Number:         57773
>Category:       bin
>Synopsis:       /bin/sh: Substring processing in assignments fail for quoted control characters
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kre
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Dec 14 17:40:00 +0000 2023
>Closed-Date:    Sun Jan 14 16:47:06 +0000 2024
>Last-Modified:  Sun Jan 14 16:47:06 +0000 2024
>Originator:     Jarle Fredrik Greipsland
>Release:        NetBSD 10.0_BETA
>Organization:
Vennlig hilsen
Jarle Fredrik Greipsland
>Environment:


System: NetBSD singsaker.uninett.no 10.0_BETA NetBSD 10.0_BETA (GENERIC) #1: Sun Sep 17 11:44:36 CEST 2023  jarle@singsaker.uninett.no:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
If the variable c holds a single character that has a byte value
in the range 129-139, the following parameter expansion with substring
processing produces unexpected results:

  v="X'${c}'"       -> (X'\202')
  v=${v%"'${c}'"}   -> (X'\202') where the expected result is (X)

On the other hand, the same parameter expansion work as expected if it
is not part of an assignment, i.e.
  [ ${v%"'${c}'"} = X ] && echo Expected result.
will print "Expected result."

>How-To-Repeat:
Run the shell snippet below.
--------------------------------------------------
#!/bin/sh

export LC_ALL=C

for n in `jot 255 1`; do
    c=$(printf "$(printf '\\%o' $n)")
    v="X'$c'"
    v=${v%"'$c'"}
    if [ "$v" != X ]; then
	printf "Unexpected result for %d: got (%s), expected (%s)\n" $n "$v" "X"
    fi
done | vis -o

--------------------------------------------------

When run with /bin/sh on NetBSD, all characters outside the range
129-139 produce the expected results.  When run with /bin/ksh and
/usr/pkg/bin/bash, all characters produce the expected results.


>Fix:


>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: bin-bug-people->kre
Responsible-Changed-By: kre@NetBSD.org
Responsible-Changed-When: Fri, 15 Dec 2023 00:16:01 +0000
Responsible-Changed-Why:
I am looking into this PR


State-Changed-From-To: open->analyzed
State-Changed-By: kre@NetBSD.org
State-Changed-When: Fri, 15 Dec 2023 03:34:16 +0000
State-Changed-Why:
I know what is happening, and why (that it fails with \202, and possibly
a few nearby octet values, is no coincidence (those are used internally
in strings).   I even know why it makes a difference whether the expansion
is in an assignment or elsewhere.   The fix seems to be very simple.
I am testing it now (this kind of seemingly trivial change to sh can have
ramifications that are hard to imagine, so lots of testing is required).
The (current) sh ATF tests all pass though.


From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/57773 [/bin/sh: Substring processing in assignments fail for quoted control characters]
Date: Mon, 25 Dec 2023 11:43:35 +0700

 OK, I am about to commit the (very trivial) fix for this issue.

 But, unlike some of my previous commits, the plan this time is
 not to include a 50 line explantion of what went wrong and why the
 fix works in the commit message - but instead do that here, and
 then simply refer to this PR there.

 First, in all internal processing, sh needs to know what it is
 dealing with, so all strings being temporaily used contain
 markers that tell the shell what special processing to do
 with the string in question - those are (currently) the
 bytes 0201..0213 (129..139) incl - the ones identified in
 the PR as having the problem.

 In earlier versions of the shell values from 1 were used (there
 are ancient PRs about problems with ^A ec - all long since fixed).
 Any byte values should work for this purpose, as one of these
 CTL chars (sh terminology) means "treat the next byte literally"
 - just like \ in many places (incl sh input).

 The current values are used as they are less likely to occur in
 normal char strings for anyone using UTF-8 encoding.   That is,
 less escaping is typically needed.

 Normally whenever sh processes a string (like reading a command
 line, or expanding a variable) it inserts that CTL char (CTLESC)
 in the result before any byte value which would be interpreted
 as one of those CTL chars, and then removes it again when all
 processing is done.

 But sometimes when we "know" that there is no more processing
 going to happen, and if we inserted CTLESC all over the place,
 the only thing those would ever be good for is to be removed
 again, we just don't bother - and leave the string un-escaped.
 (Slighty simplified for this explanation).

 sh variable assignments, which are all always a single sh
 "word" of the form:
 	name=value
 are one such place, as once "value" is expanded nothing is done
 with the expansion, other than what posix calls "quote removal" 
 - removing any quote chars that were in the original command.
 In sh, those are long gone, replaced by CTL chars that achieve
 the same effect.

 Var assigns do no field splitting, no pathname expanion,
 nothing after the var expansions happen - except quote removal,
 which in sh is the removal of any CTL chars (which should only
 be CTLESC by that point) that remain.   Hence adding extra
 CTLESC chars seemed unnecessary, and generally is.

 [That's why quotes aren't needed around simple var=${other} type
  assignments or var=$(whatever) -- only when we have something like
  var="x y|x" with the value containing (as written) spaces or sh
  operator (or quote) chars are the quotes ever needed.]

 Thus in the v="X'${c}'" statement, even when c had the same
 value as a CTL code, there was no need for a CTLESC to be inserted
 before the value of ${c} even when that looked like a CTL char,
 and it all just worked.

 But in the assignment containing the substring assignment,
 there was a problem ... there the word after the % in the
 substring expansion is expanded the same as it always is,
 meaning a CTLESC is inserted before any bytes that could be
 interpreted as CTL codes.   Threin lies the rub...  The pattern
 match was looking for a value which contained a CTLESC char (to
 protect the 'magic' byte value that followd) in a string which
 didn't contain that value.   No hope of success.

 In the expansion as a regular command arg (as in the [ (test)
 example given) this isn't an issue, as the initial var expand
 also inserts a CTLESC so the matching simply has strings which
 seem to it to be 1 byte longer (in the examples given) than was
 given -- none of the CTL codes is anything but a simple byte to
 pattern matching, so that all just works.

 One possible fix to this would have been to make the pattern
 matching code understand the special meaning of CTLESC, and
 ignore it whenever seen - but that code is ugly enough as it
 is, brittle enough, and would have required many updates, that
 I didn't want to go there.

 An alternate fix is just to treat var assigns more like anything
 else when doing expansions - causing CTLESC bytes to be inserted
 wherever the look needed, just as is done with other expansions.
 That is what I have done ... the fix is just to add one more
 flag bit to the call of argstr() when expanding the value of
 a variable asignment.   Hence trivial (but can add a minor extra
 processing cost in these unusual cases).

 I have been testing this since the day after the PR was submitted
 -- all the NetBSD ATF tests pass (or rather those that failed had
 nothing to do with anything sh did) , and I have used this sh to
 build about 1000 packages from pkgsrc (including several of the big
 ones) without issue.

 I could do a system build using it - and eventually will - but
 build.sh is actually very gentle as far as sh usage goes, and
 is very unlikely to be affected (or prove anything at all).

 Thus I think this PR is fixed.   Thanks for finding the problem
 and telling us about it.

 I still need to look at whether a similar problem might occur when
 expanding the arg (usually filename) following a redirect operator,
 which is a somewhat similar case, I will get to that soon.

 kre

 ps: Merry Christmas to all who celebrate that.

 [This copy resent without the "cc", as apparently I canot
 spell netbsd ...]

From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57773 CVS commit: src/bin/sh
Date: Mon, 25 Dec 2023 04:52:38 +0000

 Module Name:	src
 Committed By:	kre
 Date:		Mon Dec 25 04:52:38 UTC 2023

 Modified Files:
 	src/bin/sh: eval.c

 Log Message:
 PR bin/57773

 Fix a bug reported by Jarle Fredrik Greipsland in PR bin/57773,
 where a substring expansion where the substring to be removed from
 a variable expansion is itself a var expansion where the value
 contains one (or more) of sh's CTLxxx chars - the pattern had
 CTLESC inserted, the string to be matched against did not.  Fail.
 We fix that by always inserting CTLESC in var assign expansions.
 See the PR for all the gory details.

 Thanks for the PR.

 XXX pullup to everything.


 To generate a diff of this commit:
 cvs rdiff -u -r1.190 -r1.191 src/bin/sh/eval.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->needs-pullups
State-Changed-By: kre@NetBSD.org
State-Changed-When: Mon, 25 Dec 2023 05:00:40 +0000
State-Changed-Why:
Pullups to -8 -9 and -10 required.   Those will happen, but not
until after this has been in current for at least a week (that is,
sometime in the early new year).


From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/57773 [/bin/sh: Substring processing in assignments fail for quoted control characters]
Date: Mon, 25 Dec 2023 23:21:25 +0700

 I said earlier (and I still apparently cannot spell netbsd - so this
 is again a repeat, just for gnats):

   | I still need to look at whether a similar problem might occur when
   | expanding the arg (usually filename) following a redirect operator,

 I have done that, first by code examination and then testing, and
 there is no problem there.

 kre


From: Jarle Greipsland <jarle.greipsland@norid.no>
To: gnats-bugs@netbsd.org, kre@netbsd.org
Cc: 
Subject: Re: PR/57773 CVS commit: src/bin/sh
Date: Wed, 27 Dec 2023 23:44:07 +0100 (CET)

 "Robert Elz" <kre@netbsd.org> writes:
 >  Fix a bug reported by Jarle Fredrik Greipsland in PR bin/57773,
 >  where a substring expansion where the substring to be removed from
 >  a variable expansion is itself a var expansion where the value
 >  contains one (or more) of sh's CTLxxx chars - the pattern had
 >  CTLESC inserted, the string to be matched against did not.  Fail.
 >  We fix that by always inserting CTLESC in var assign expansions.
 >  See the PR for all the gory details.
 Excellent!

 There are however still some gremlins in the code wrt. to the
 escape characters.  The string lengths are not always calculated
 correctly.

 ------------------------------------------------------------
 #!/bin/sh

 export LC_ALL=C

 f() {
     printf 'f->$1 Length of %s: %d\n' "$1" ${#1}
 }

 {
     c=$(printf '\202')
     printf '$c Length of %s: %d\n' "$c" ${#c}

     f "$c"

     set -- "$c"
     printf '$1 Length of %s: %d\n' "$1" ${#1}
 } | vis -o

 exit 0
 ----------------------------------------------------------------------

 The above code will print lengths as 1, 2 and 2.  And they ought
 to be all one.
 					-jarle
 -- 
 "We do this not because it is easy, but because we thought it would be easy."

From: Robert Elz <kre@munnari.OZ.AU>
To: Jarle Greipsland <jarle.greipsland@norid.no>
Cc: gnats-bugs@netbsd.org
Subject: Re: PR/57773 CVS commit: src/bin/sh
Date: Thu, 28 Dec 2023 06:36:34 +0700

     Date:        Wed, 27 Dec 2023 23:44:07 +0100 (CET)
     From:        Jarle Greipsland <jarle.greipsland@norid.no>
     Message-ID:  <20231227.234407.2096023211607554905.jarle@norid.no>

   | There are however still some gremlins in the code wrt. to the
   | escape characters.  The string lengths are not always calculated
   | correctly.

 OK, thanks, I will look into that one as well.   It should be similarly simple
 I think.   It will delay a pullup of this into -10 (maybe into -9 as well,
 I'll look see - but the sh in -8 isn't worth having this kind of esoteric
 bug fixed, that one still has bigger problems - none of these affect any
 common operation though).

 kre

From: Robert Elz <kre@munnari.OZ.AU>
To: Jarle Greipsland <jarle.greipsland@norid.no>
Cc: gnats-bugs@netbsd.org
Subject: Re: PR/57773 CVS commit: src/bin/sh
Date: Thu, 28 Dec 2023 12:47:23 +0700

     Date:        Wed, 27 Dec 2023 23:44:07 +0100 (CET)
     From:        Jarle Greipsland <jarle.greipsland@norid.no>
     Message-ID:  <20231227.234407.2096023211607554905.jarle@norid.no>

   | There are however still some gremlins in the code wrt. to the
   | escape characters.  The string lengths are not always calculated
   | correctly.

 That turned out to be the exact opposite problem of the previous
 one - the earlier one was failing to insert the CTLESC when it
 turned out to be needed, this one is in inserting CTLESC chars
 when they aren't wanted.

 The ${#var} code is kind of special - it doesn't bother expanding
 the value of var, it just counts the length, and inserts that.
 That's kind of the obvious thing to do really.   But for the
 special parameters, that doesn't work, as most of them don't
 have string values that can just be counted - they need to be
 generated first.   So, that gets done, and then the length of
 the value produced is counted.

 The issue here, is that the positional parameters are treated
 as special params for this purpose ... they aren't stored in the
 same way that variables are, so they cannot simply be treated
 as vars, but they are all strings, so lengths could just be counted
 without expanding them, as is done for vars.   But that's not the
 way the code has ever worked - it simply expands all of those,
 specials and positional, and then determines the length of the
 result of the expansion.

 In the cases in the examples given, the ${#X} was always being
 performed in a situation where CTSESC processing is nomally needed,
 so that was being requested.   The code that handles ${#...} was
 turning off some of that - but not all of it, and in any case, the
 flags were being ignored and CTLESC chars inserted ahead of any
 CTL char anyway (all cases).

 So two minor fixes will eventually happen when I am finished
 testing (less this time, as it is clearer what is happening,
 and that the fix is OK) - one to turn off all the "escaping needed"
 flags in the ${#X} case, and one to make sure at least one of those
 is on, before inserting CTLESC chars, always.

 kre

From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57773 CVS commit: src/bin/sh
Date: Fri, 29 Dec 2023 15:49:24 +0000

 Module Name:	src
 Committed By:	kre
 Date:		Fri Dec 29 15:49:24 UTC 2023

 Modified Files:
 	src/bin/sh: expand.c

 Log Message:
 PR bin/57773

 Fix another bug reported by Jarle Fredrik Greipsland and added
 to PR bin/57773, which relates to calculating the length of a
 positional parameter which contains CTL chars -- yes, this one
 really is that specific, though it would also affect the special
 param $0 if it were to contain CTL chars, and its length was
 requested - that is fixed with the same change.  And note: $0
 is not affected because it looks like a positional param (it
 isn't, ${00} would be, but is always unset, ${0} isn't) all
 special parame would be affected the same way, but the only one
 that can ever contain a CTL char is $0 I believe.  ($@ and $*
 were affected, but just because they're expanding the positional
 params ... ${#@} and ${#*} are both technically unspecified
 expansions - and different shells produce different results.

 See the PR for the details of this one (and the previous).

 Thanks for the PR.

 XXX pullup to everything.


 To generate a diff of this commit:
 cvs rdiff -u -r1.143 -r1.144 src/bin/sh/expand.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: kre@NetBSD.org
State-Changed-When: Wed, 03 Jan 2024 01:52:41 +0000
State-Changed-Why:
[pullup-10 #535]
[pullup-9 #1787]

There won't be a pullup to -8, the shell there is missing other changes which
would be needed, and its future lifespan isn't long enough to bother with
significant changes for issues like this (it has others, perhaps worse) which
in practice, no-one ever encounters.


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57773 CVS commit: [netbsd-10] src/bin/sh
Date: Sun, 14 Jan 2024 13:15:06 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sun Jan 14 13:15:05 UTC 2024

 Modified Files:
 	src/bin/sh [netbsd-10]: eval.c expand.c

 Log Message:
 Pull up following revision(s) (requested by kre in ticket #535):

 	bin/sh/eval.c: revision 1.191
 	bin/sh/expand.c: revision 1.144

 PR bin/57773

 Fix a bug reported by Jarle Fredrik Greipsland in PR bin/57773,
 where a substring expansion where the substring to be removed from
 a variable expansion is itself a var expansion where the value
 contains one (or more) of sh's CTLxxx chars - the pattern had
 CTLESC inserted, the string to be matched against did not.  Fail.

 We fix that by always inserting CTLESC in var assign expansions.
 See the PR for all the gory details.

 Thanks for the PR.

 PR bin/57773

 Fix another bug reported by Jarle Fredrik Greipsland and added
 to PR bin/57773, which relates to calculating the length of a
 positional parameter which contains CTL chars -- yes, this one
 really is that specific, though it would also affect the special
 param $0 if it were to contain CTL chars, and its length was
 requested - that is fixed with the same change.  And note: $0
 is not affected because it looks like a positional param (it
 isn't, ${00} would be, but is always unset, ${0} isn't) all
 special parame would be affected the same way, but the only one
 that can ever contain a CTL char is $0 I believe.  ($@ and $*
 were affected, but just because they're expanding the positional
 params ... ${#@} and ${#*} are both technically unspecified
 expansions - and different shells produce different results.

 See the PR for the details of this one (and the previous).

 Thanks for the PR.


 To generate a diff of this commit:
 cvs rdiff -u -r1.188 -r1.188.2.1 src/bin/sh/eval.c
 cvs rdiff -u -r1.141 -r1.141.2.1 src/bin/sh/expand.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57773 CVS commit: [netbsd-9] src/bin/sh
Date: Sun, 14 Jan 2024 13:16:51 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sun Jan 14 13:16:51 UTC 2024

 Modified Files:
 	src/bin/sh [netbsd-9]: eval.c expand.c

 Log Message:
 Pull up following revision(s) (requested by kre in ticket #1787):

 	bin/sh/eval.c: revision 1.191
 	bin/sh/expand.c: revision 1.144

 PR bin/57773

 Fix a bug reported by Jarle Fredrik Greipsland in PR bin/57773,
 where a substring expansion where the substring to be removed from
 a variable expansion is itself a var expansion where the value
 contains one (or more) of sh's CTLxxx chars - the pattern had
 CTLESC inserted, the string to be matched against did not.  Fail.

 We fix that by always inserting CTLESC in var assign expansions.
 See the PR for all the gory details.

 Thanks for the PR.

 PR bin/57773

 Fix another bug reported by Jarle Fredrik Greipsland and added
 to PR bin/57773, which relates to calculating the length of a
 positional parameter which contains CTL chars -- yes, this one
 really is that specific, though it would also affect the special
 param $0 if it were to contain CTL chars, and its length was
 requested - that is fixed with the same change.  And note: $0
 is not affected because it looks like a positional param (it
 isn't, ${00} would be, but is always unset, ${0} isn't) all
 special parame would be affected the same way, but the only one
 that can ever contain a CTL char is $0 I believe.  ($@ and $*
 were affected, but just because they're expanding the positional
 params ... ${#@} and ${#*} are both technically unspecified
 expansions - and different shells produce different results.

 See the PR for the details of this one (and the previous).

 Thanks for the PR.


 To generate a diff of this commit:
 cvs rdiff -u -r1.175.2.3 -r1.175.2.4 src/bin/sh/eval.c
 cvs rdiff -u -r1.132.2.2 -r1.132.2.3 src/bin/sh/expand.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sun, 14 Jan 2024 16:47:06 +0000
State-Changed-Why:
fixed and pulled up to 10 and 9, not worth it for 8


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.