NetBSD Problem Report #58609

From www@netbsd.org  Fri Aug 16 17:25:00 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 77B6A1A9242
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 16 Aug 2024 17:25:00 +0000 (UTC)
Message-Id: <20240816172459.2F8081A9243@mollari.NetBSD.org>
Date: Fri, 16 Aug 2024 17:24:59 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: sh(1) ignores interactive locale changes
X-Send-Pr-Version: www-1.0

>Number:         58609
>Category:       bin
>Synopsis:       sh(1) ignores interactive locale changes
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kre
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 16 17:30:01 +0000 2024
>Last-Modified:  Fri Aug 16 20:35:01 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current, 10, 9
>Organization:
The NétBSD Foundation
>Environment:
>Description:
If you start sh(1) with LC_CTYPE=C.UTF-8, you can enter UTF-8 input and sh(1) will consume it.

If you start sh(1) with LC_CTYPE=C, when you enter UTF-8 input, sh(1) will ignore it, as one might expect.

But if you start sh(1) with LC_CTYPE=C, and you do `export LC_CTYPE=C.UTF-8', then when when you enter UTF-8 input, sh(1) will still ignore it.
>How-To-Repeat:
$ LC_CTYPE=C.UTF-8 PS1='(C.UTF-8)$ ' sh
(C.UTF-8)$ echo £
£
(C.UTF-8)$ ^D
$ LC_CTYPE=C PS1='(C)$ ' sh
(C)$ echo                       # type £ -- nothing happens (expected)
(C)$ export LC_CTYPE=C.UTF-8
(C)$ locale
LANG=""
LC_CTYPE="C.UTF-8"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=""
(C)$ echo                       # type £ -- still nothing happens
>Fix:
Yes, please!

>Release-Note:

>Audit-Trail:
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/58609: sh(1) ignores interactive locale changes
Date: Sat, 17 Aug 2024 03:06:32 +0700

     Date:        Fri, 16 Aug 2024 17:30:01 +0000 (UTC)
     From:        campbell+netbsd=40mumble.net
     Message-ID:  <20240816173001.A3B3B1A9244=40mollari.NetBSD.org>


   =7C If you start sh(1) with LC_CTYPE=3DC, when you enter UTF-8 input,
   =7C sh(1) will ignore it, as one might expect.

 That makes no sense to me at all ... sh(1) really knows close to nothing
 about locales (though it should know a little more than it does) and does=

 nothing (except some pattern matching) differently at all based upon what=

 the locale is set to.   Characters are simply sequences of bytes, sh does=
 n't
 care what they represent, if you echo one of them (however many bytes the=
 re
 are) sh's echo will simply write them out as entered.

 Please try again after turning line editing off (set +VE) - if that makes=

 a difference, then it is libedit you're having an issue with.  sh should
 not be =22ignoring=22 whatever that means, anything input, except '=5C0'.=


 The issue with libedit not seeing changes to environment variables made
 while the shell is running I do understand, that one isn't all that easy
 to fix in general, as libedit() just used getenv() to see what they're se=
 t
 to, and sh consumes that environment when it starts, then largely simply
 ignores it - variables are set in its internal data structs.

 I guess I could have sh provide its own getenv() function (overriding the=

 one in libc) which I assume libedit might then call - but I am not sure=
 =5C
 how safe that is.   Builtins are compiled in an environment where getenv(=
 )
 has been turned into a macro which calls a different sh function, so they=

 don't have that issue, but that cannot be done with libedit.

 I will see what might be possible there.

 kre


Responsible-Changed-From-To: bin-bug-people->kre
Responsible-Changed-By: kre@NetBSD.org
Responsible-Changed-When: Fri, 16 Aug 2024 20:13:51 +0000
Responsible-Changed-Why:
I am looking into this PR


From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/58609: sh(1) ignores interactive locale changes
Date: Fri, 16 Aug 2024 20:20:39 -0000 (UTC)

 gnats-admin@netbsd.org writes:

 >From: Robert Elz <kre@munnari.OZ.AU>
 >   =7C If you start sh(1) with LC_CTYPE=3DC, when you enter UTF-8 input,
 >   =7C sh(1) will ignore it, as one might expect.
 > 
 > That makes no sense to me at all ... sh(1) really knows close to nothing
 > about locales (though it should know a little more than it does) and does=
 > nothing (except some pattern matching) differently at all based upon what=
 > the locale is set to.

 sh calls setlocale(LC_ALL,"") on startup to initialize the locale
 from the environment, libedit then uses the locale set by the application
 (i.e. sh).

 It would be helpful, if you could switch the locale for libedit with
 a shell command. Currently you can only change the environment and
 re-exec the shell.


From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Cc: 
Subject: Re: bin/58609: sh(1) ignores interactive locale changes
Date: Fri, 16 Aug 2024 20:31:58 +0000

 > Date: Sat, 17 Aug 2024 03:06:32 +0700
 > From: Robert Elz <kre@munnari.OZ.AU>
 >=20
 >     Date:        Fri, 16 Aug 2024 17:30:01 +0000 (UTC)
 >     From:        campbell+netbsd@mumble.net
 >     Message-ID:  <20240816173001.A3B3B1A9244@mollari.NetBSD.org>
 >=20
 >   | If you start sh(1) with LC_CTYPE=3DC, when you enter UTF-8 input,
 >   | sh(1) will ignore it, as one might expect.
 >=20
 > That makes no sense to me at all ... sh(1) really knows close to nothing
 > about locales (though it should know a little more than it does) and does
 > nothing (except some pattern matching) differently at all based upon what
 > the locale is set to.   Characters are simply sequences of bytes, sh does=
 n't
 > care what they represent, if you echo one of them (however many bytes the=
 re
 > are) sh's echo will simply write them out as entered.
 >=20
 > Please try again after turning line editing off (set +VE) - if that makes
 > a difference, then it is libedit you're having an issue with.  sh should
 > not be "ignoring" whatever that means, anything input, except '\0'.

 I tried that and now I can type in the input.  (And then when I delete
 successive characters backward, it backs over the last character of my
 prompt, a space.  But I assume that's the pty driver or terminal
 emulator's doing, not anything to do with sh(1) or libedit.)

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.