NetBSD Problem Report #58609
From www@netbsd.org Fri Aug 16 17:25:00 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 77B6A1A9242
for <gnats-bugs@gnats.NetBSD.org>; Fri, 16 Aug 2024 17:25:00 +0000 (UTC)
Message-Id: <20240816172459.2F8081A9243@mollari.NetBSD.org>
Date: Fri, 16 Aug 2024 17:24:59 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: sh(1) ignores interactive locale changes
X-Send-Pr-Version: www-1.0
>Number: 58609
>Category: bin
>Synopsis: sh(1) ignores interactive locale changes
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kre
>State: analyzed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Aug 16 17:30:01 +0000 2024
>Closed-Date:
>Last-Modified: Thu May 28 10:10:02 +0000 2026
>Originator: Taylor R Campbell
>Release: current, 10, 9
>Organization:
The NétBSD Foundation
>Environment:
>Description:
If you start sh(1) with LC_CTYPE=C.UTF-8, you can enter UTF-8 input and sh(1) will consume it.
If you start sh(1) with LC_CTYPE=C, when you enter UTF-8 input, sh(1) will ignore it, as one might expect.
But if you start sh(1) with LC_CTYPE=C, and you do `export LC_CTYPE=C.UTF-8', then when when you enter UTF-8 input, sh(1) will still ignore it.
>How-To-Repeat:
$ LC_CTYPE=C.UTF-8 PS1='(C.UTF-8)$ ' sh
(C.UTF-8)$ echo £
£
(C.UTF-8)$ ^D
$ LC_CTYPE=C PS1='(C)$ ' sh
(C)$ echo # type £ -- nothing happens (expected)
(C)$ export LC_CTYPE=C.UTF-8
(C)$ locale
LANG=""
LC_CTYPE="C.UTF-8"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=""
(C)$ echo # type £ -- still nothing happens
>Fix:
Yes, please!
>Release-Note:
>Audit-Trail:
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/58609: sh(1) ignores interactive locale changes
Date: Sat, 17 Aug 2024 03:06:32 +0700
Date: Fri, 16 Aug 2024 17:30:01 +0000 (UTC)
From: campbell+netbsd=40mumble.net
Message-ID: <20240816173001.A3B3B1A9244=40mollari.NetBSD.org>
=7C If you start sh(1) with LC_CTYPE=3DC, when you enter UTF-8 input,
=7C sh(1) will ignore it, as one might expect.
That makes no sense to me at all ... sh(1) really knows close to nothing
about locales (though it should know a little more than it does) and does=
nothing (except some pattern matching) differently at all based upon what=
the locale is set to. Characters are simply sequences of bytes, sh does=
n't
care what they represent, if you echo one of them (however many bytes the=
re
are) sh's echo will simply write them out as entered.
Please try again after turning line editing off (set +VE) - if that makes=
a difference, then it is libedit you're having an issue with. sh should
not be =22ignoring=22 whatever that means, anything input, except '=5C0'.=
The issue with libedit not seeing changes to environment variables made
while the shell is running I do understand, that one isn't all that easy
to fix in general, as libedit() just used getenv() to see what they're se=
t
to, and sh consumes that environment when it starts, then largely simply
ignores it - variables are set in its internal data structs.
I guess I could have sh provide its own getenv() function (overriding the=
one in libc) which I assume libedit might then call - but I am not sure=
=5C
how safe that is. Builtins are compiled in an environment where getenv(=
)
has been turned into a macro which calls a different sh function, so they=
don't have that issue, but that cannot be done with libedit.
I will see what might be possible there.
kre
Responsible-Changed-From-To: bin-bug-people->kre
Responsible-Changed-By: kre@NetBSD.org
Responsible-Changed-When: Fri, 16 Aug 2024 20:13:51 +0000
Responsible-Changed-Why:
I am looking into this PR
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/58609: sh(1) ignores interactive locale changes
Date: Fri, 16 Aug 2024 20:20:39 -0000 (UTC)
gnats-admin@netbsd.org writes:
>From: Robert Elz <kre@munnari.OZ.AU>
> =7C If you start sh(1) with LC_CTYPE=3DC, when you enter UTF-8 input,
> =7C sh(1) will ignore it, as one might expect.
>
> That makes no sense to me at all ... sh(1) really knows close to nothing
> about locales (though it should know a little more than it does) and does=
> nothing (except some pattern matching) differently at all based upon what=
> the locale is set to.
sh calls setlocale(LC_ALL,"") on startup to initialize the locale
from the environment, libedit then uses the locale set by the application
(i.e. sh).
It would be helpful, if you could switch the locale for libedit with
a shell command. Currently you can only change the environment and
re-exec the shell.
From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Cc:
Subject: Re: bin/58609: sh(1) ignores interactive locale changes
Date: Fri, 16 Aug 2024 20:31:58 +0000
> Date: Sat, 17 Aug 2024 03:06:32 +0700
> From: Robert Elz <kre@munnari.OZ.AU>
>=20
> Date: Fri, 16 Aug 2024 17:30:01 +0000 (UTC)
> From: campbell+netbsd@mumble.net
> Message-ID: <20240816173001.A3B3B1A9244@mollari.NetBSD.org>
>=20
> | If you start sh(1) with LC_CTYPE=3DC, when you enter UTF-8 input,
> | sh(1) will ignore it, as one might expect.
>=20
> That makes no sense to me at all ... sh(1) really knows close to nothing
> about locales (though it should know a little more than it does) and does
> nothing (except some pattern matching) differently at all based upon what
> the locale is set to. Characters are simply sequences of bytes, sh does=
n't
> care what they represent, if you echo one of them (however many bytes the=
re
> are) sh's echo will simply write them out as entered.
>=20
> Please try again after turning line editing off (set +VE) - if that makes
> a difference, then it is libedit you're having an issue with. sh should
> not be "ignoring" whatever that means, anything input, except '\0'.
I tried that and now I can type in the input. (And then when I delete
successive characters backward, it backs over the last character of my
prompt, a space. But I assume that's the pty driver or terminal
emulator's doing, not anything to do with sh(1) or libedit.)
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/58609 CVS commit: src/lib/libedit
Date: Tue, 16 Dec 2025 02:40:48 +0000
Module Name: src
Committed By: kre
Date: Tue Dec 16 02:40:48 UTC 2025
Modified Files:
src/lib/libedit: editline.3 el.c el.h histedit.h terminal.c vi.c
Log Message:
[Prereq for PR bin/58609] Add EL_GETENV to libedit
When interacting with the shell, and perhaps other applications,
editline needs to obtain the values of some environment variables.
Normally getenv(3) does that - but that doesn't work when being
used in sh(1) as getenv() simply accesses the environment as it
was when sh(1) was invoked - after that, in sh anyway, that
environment is simply abandoned (well, kind of) - but certainly
no changes made by the shell will be reflected there.
To allow editline to obtain current values of environment
variables, add a new el_set()/el_get() "op" parameter value,
which can be used to instruct editline which function to use
for the purpose. That is EL_GETENV.
This is part of a (long pending, awaiting testing) fix for
PR bin/58609 - but I'm getting tired of having it sitting uncommitted
in my source tree - and I think this part is self contained,
and simple enough, to simply commit.
To generate a diff of this commit:
cvs rdiff -u -r1.103 -r1.104 src/lib/libedit/editline.3 src/lib/libedit/el.c
cvs rdiff -u -r1.48 -r1.49 src/lib/libedit/el.h
cvs rdiff -u -r1.63 -r1.64 src/lib/libedit/histedit.h
cvs rdiff -u -r1.46 -r1.47 src/lib/libedit/terminal.c
cvs rdiff -u -r1.65 -r1.66 src/lib/libedit/vi.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->suspended
State-Changed-By: kre@NetBSD.org
State-Changed-When: Thu, 26 Mar 2026 02:03:46 +0000
State-Changed-Why:
There seems to be no actual interest in having this fixed.
State-Changed-From-To: suspended->analyzed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 13 May 2026 19:16:22 +0000
State-Changed-Why:
I applied the following changes on top of a netbsd-11 tree, and then
applied the following patch extracted from the test hg repository
(tweaked to fix a trailing whitespace merge conflict around line 1208
of bin/sh/var.c), and then made dependall install in lib/libedit and
bin/sh. The result was:
1. If started with LC_CTYPE=C.UTF-8, entering £ works fine.
2. If started with LC_CTYPE=C, entering £ moves the cursor to the
beginning of the line but does not delete it:
$ echo
^
transitions, on entering £ (i.e., octets 0xc2 0xa3), to
$ echo
^
where ^ indicates the position of the cursor on the line above.
Further insertion maintains `echo' on the command line:
$ asdfecho
^
And if I hit return I get:
sh: asdfecho: not found
(This is a change from before, when the input was apparently just
ignored.)
3. If started with LC_CTYPE=C and then interactively setting
LC_CTYPE=C.UTF-8, entering £ works fine.
So while (2) is weird, the important ones, (1) and (3), seem to work
fine with the patch (which includes some other changes that have since
been committed on HEAD).
Dependencies:
https://mail-index.netbsd.org/source-changes/2025/12/14/msg159526.html
https://mail-index.netbsd.org/source-changes/2025/12/16/msg159550.html
Patch:
# HG changeset patch
# User Robert Elz <kre@NetBSD.org>
# Date 1762139127 -25200
# Mon Nov 03 10:05:27 2025 +0700
# Branch trunk
# Node ID d12208786ee4c8f9bde73ec33f18830ba622c450
# Parent 8cdcca3e3510391c74825e9d7b67abc5a9967efb
PR bin/58609: Handle locale changes while sh is running
Previously, sh internally used the locale settings as they were when
it started, and changed to the LC_* vars (and LANG) were made available
(if exported) to commands run by the shell, but not to the shell itself
(nor its builtin commands).
Now the shell deals with changes to those variables, and updates the
locale settings (setlocale(3)) when they occur. Whether or not all
the cases are handled ideally might be a matter for some debate, and
perhaps later changes, but at least something is being done now.
To do this, several internal changes were needed - now the callback
function called when a variable is set has a third parameter - a pointer
to the struct var involved (in addition to the new value, and setting
flags). This allows comparisons between the new and old values (and more).
In addition a new flag has been added to the variable state (used only
when a var setting callback function exists) to determine whether that
callback function is called before, or after, the variable value has been
updated (different callback functions need one or the other - many simply
don't care).
Note that these changes require libedit.so.3.2 (or later).
This also allows the "setenv("TERM", ...)" hack to be removed.
diff -r 8cdcca3e3510 -r d12208786ee4 bin/sh/exec.c
--- a/bin/sh/exec.c Mon Nov 03 09:37:25 2025 +0700
+++ b/bin/sh/exec.c Mon Nov 03 10:05:27 2025 +0700
@@ -844,15 +844,14 @@ hashcd(void)
*/
void
-changepath(char *newval, int flags __unused)
+changepath(const char *new, int flags __unused, struct var *vp __unused)
{
- const char *old, *new;
+ const char *old;
int idx;
int firstchange;
int bltin;
old = pathval();
- new = newval;
firstchange = 9999; /* assume no change */
idx = 0;
bltin = -1;
diff -r 8cdcca3e3510 -r d12208786ee4 bin/sh/exec.h
--- a/bin/sh/exec.h Mon Nov 03 09:37:25 2025 +0700
+++ b/bin/sh/exec.h Mon Nov 03 10:05:27 2025 +0700
@@ -63,13 +63,15 @@ struct cmdentry {
extern const char *pathopt; /* set by padvance */
+struct var;
+
void shellexec(char **, char **, const char *, int, int) __dead;
char *padvance(const char **, const char *, int);
void find_command(char *, struct cmdentry *, int, const char *);
int (*find_builtin(char *))(int, char **);
int (*find_splbltin(char *))(int, char **);
void hashcd(void);
-void changepath(char *, int);
+void changepath(const char *, int, struct var *);
void deletefuncs(void);
void getcmdentry(char *, struct cmdentry *);
union node;
diff -r 8cdcca3e3510 -r d12208786ee4 bin/sh/histedit.c
--- a/bin/sh/histedit.c Mon Nov 03 09:37:25 2025 +0700
+++ b/bin/sh/histedit.c Mon Nov 03 10:05:27 2025 +0700
@@ -99,6 +99,18 @@ static unsigned char sh_complete(EditLin
static FILE *Hist_File_Open(const char *);
/*
+ * a getenv(3) lookalike function for libedit to
+ * use so it can access current values of sh variables
+ * so there is no need to keep doing setenv() of anything
+ * it might want to lookup.
+ */
+static char *
+el_getenv(const char *name)
+{
+ return bltinlookup(name, 1);
+}
+
+/*
* Set history and editing status. Called whenever the status may
* have changed (figures out what to do).
*/
@@ -122,8 +134,10 @@ histedit(void)
INTON;
if (hist != NULL) {
- sethistsize(histsizeval(), histsizeflags());
- sethistfile(histfileval(), histfileflags());
+ sethistsize(histsizeval(),
+ histsizeflags(), NULL);
+ sethistfile(histfileval(),
+ histfileflags(), NULL);
} else
out2str("sh: can't initialize history\n");
}
@@ -131,8 +145,6 @@ histedit(void)
/*
* turn editing on
*/
- char *term;
-
INTOFF;
if (el_in == NULL)
el_in = fdopen(0, "r");
@@ -145,28 +157,6 @@ histedit(void)
if (tracefile)
el_err = tracefile;
#endif
- /*
- * This odd piece of code doesn't affect the shell
- * at all, the environment modified here is the
- * stuff accessed via "environ" (the incoming
- * environment to the shell) which is only ever
- * touched at sh startup time (long before we get
- * here) and ignored thereafter.
- *
- * But libedit calls getenv() to discover TERM
- * and that searches the "environ" environment,
- * not the shell's internal variable data struct,
- * so we need to make sure that TERM in there is
- * correct.
- *
- * This sequence copies TERM from the shell into
- * the old "environ" environment.
- */
- term = lookupvar("TERM");
- if (term)
- setenv("TERM", term, 1);
- else
- unsetenv("TERM");
el = el_init("sh", el_in, el_out, el_err);
VTRACE(DBG_HISTORY, ("el_init() %sed\n",
el != NULL ? "succeed" : "fail"));
@@ -174,7 +164,9 @@ histedit(void)
if (hist)
el_set(el, EL_HIST, history, hist);
- set_prompt_lit(lookupvar("PSlit"), 0);
+ set_prompt_lit(lookupvar("PSlit"), 0, NULL);
+
+ el_set(el, EL_GETENV, el_getenv);
el_set(el, EL_SIGNAL, 1);
el_set(el, EL_SAFEREAD, 1);
el_set(el, EL_ALIAS_TEXT, alias_text, NULL);
@@ -221,7 +213,7 @@ histedit(void)
}
void
-set_prompt_lit(char *lit_ch, int flags __unused)
+set_prompt_lit(const char *lit_ch, int flags, struct var *vp __unused)
{
wchar_t wc;
@@ -236,7 +228,7 @@ set_prompt_lit(char *lit_ch, int flags _
mbtowc(&wc, NULL, 1); /* state init */
INTOFF;
- if (mbtowc(&wc, lit_ch, strlen(lit_ch)) <= 0)
+ if ((flags & VUNSET) || mbtowc(&wc, lit_ch, strlen(lit_ch)) <= 0)
el_set(el, EL_PROMPT, getprompt);
else
el_set(el, EL_PROMPT_ESC, getprompt, (int)wc);
@@ -244,7 +236,7 @@ set_prompt_lit(char *lit_ch, int flags _
}
void
-set_editrc(char *fname, int flags)
+set_editrc(const char *fname, int flags, struct var *vp __unused)
{
INTOFF;
if (iflag && editing && el && !(flags & VUNSET))
@@ -253,7 +245,7 @@ set_editrc(char *fname, int flags)
}
void
-sethistsize(char *hs, int flags)
+sethistsize(const char *hs, int flags, struct var *vp __unused)
{
int histsize;
HistEvent he;
@@ -278,7 +270,7 @@ sethistsize(char *hs, int flags)
}
void
-sethistfile(char *hs, int flags)
+sethistfile(const char *hs, int flags, struct var *vp __unused)
{
const char *file;
HistEvent he;
@@ -330,14 +322,14 @@ sethistfile(char *hs, int flags)
HistFileOpen = "";
sethistappend((histappflags() & VUNSET) ? NULL : histappval(),
- ~VUNSET & 0xFFFF);
+ ~VUNSET & 0xFFFF, NULL);
INTON;
}
}
void
-sethistappend(char *s, int flags __diagused)
+sethistappend(const char *s, int flags, struct var *vp __unused)
{
CTRACE(DBG_HISTORY, ("Set HISTAPPEND=%s [%x] %s ",
(s == NULL ? "''" : s), flags, "!hist" + (hist != NULL)));
@@ -558,7 +550,7 @@ save_sh_history(void)
}
void
-setterm(char *term, int flags __unused)
+setterm(const char *term, int flags __unused, struct var *vp __unused)
{
INTOFF;
if (el != NULL && term != NULL && *term != '\0')
diff -r 8cdcca3e3510 -r d12208786ee4 bin/sh/myhistedit.h
--- a/bin/sh/myhistedit.h Mon Nov 03 09:37:25 2025 +0700
+++ b/bin/sh/myhistedit.h Mon Nov 03 10:05:27 2025 +0700
@@ -39,15 +39,17 @@ extern int displayhist;
#include <filecomplete.h>
+struct var;
+
void histedit(void);
-void sethistsize(char *, int);
-void sethistfile(char *, int);
-void sethistappend(char *, int);
+void sethistsize(const char *, int, struct var *);
+void sethistfile(const char *, int, struct var *);
+void sethistappend(const char *, int, struct var *);
void save_sh_history(void);
-void setterm(char *, int);
+void setterm(const char *, int, struct var *);
int inputrc(int, char **);
-void set_editrc(char *, int);
-void set_prompt_lit(char *, int);
+void set_editrc(const char *, int, struct var *);
+void set_prompt_lit(const char *, int, struct var *);
#include <stdio.h>
extern FILE *HistFP;
diff -r 8cdcca3e3510 -r d12208786ee4 bin/sh/options.c
--- a/bin/sh/options.c Mon Nov 03 09:37:25 2025 +0700
+++ b/bin/sh/options.c Mon Nov 03 10:05:27 2025 +0700
@@ -530,7 +530,7 @@ setcmd(int argc, char **argv)
void
-getoptsreset(char *value, int flags __unused)
+getoptsreset(const char *value, int flags __unused, struct var *vp __unused)
{
/*
* This is just to detect the case where OPTIND=1
@@ -573,7 +573,8 @@ getoptscmd(int argc, char **argv)
}
STATIC int
-getopts(char *optstr, char *optvar, char **optfirst, char ***optnext, char **optpptr)
+getopts(char *optstr, char *optvar, char **optfirst,
+ char ***optnext, char **optpptr)
{
char *p, *q;
char c = '?';
@@ -588,7 +589,7 @@ getopts(char *optstr, char *optvar, char
return 1;
p = **optnext;
if (p == NULL || *p != '-' || *++p == '\0') {
-atend:
+ atend:;
ind = *optnext - optfirst + 1;
*optnext = NULL;
p = NULL;
@@ -642,11 +643,11 @@ atend:
ind = *optnext - optfirst + 1;
goto out;
-bad:
+ bad:;
ind = 1;
*optnext = NULL;
p = NULL;
-out:
+ out:;
*optpptr = p;
fmtstr(s, sizeof(s), "%d", ind);
err |= setvarsafe("OPTIND", s, VNOFUNC);
diff -r 8cdcca3e3510 -r d12208786ee4 bin/sh/options.h
--- a/bin/sh/options.h Mon Nov 03 09:37:25 2025 +0700
+++ b/bin/sh/options.h Mon Nov 03 10:05:27 2025 +0700
@@ -64,9 +64,11 @@ extern char **argptr; /* argument list
extern char *optionarg; /* set by nextopt */
extern char *optptr; /* used by nextopt */
+struct var;
+
void procargs(int, char **);
void optschanged(void);
void setparam(char **);
void freeparam(volatile struct shparam *);
int nextopt(const char *);
-void getoptsreset(char *, int);
+void getoptsreset(const char *, int, struct var *);
diff -r 8cdcca3e3510 -r d12208786ee4 bin/sh/parser.c
--- a/bin/sh/parser.c Mon Nov 03 09:37:25 2025 +0700
+++ b/bin/sh/parser.c Mon Nov 03 10:05:27 2025 +0700
@@ -2244,7 +2244,7 @@ parsesub: {
if (c == '#') {
if ((c = pgetc_linecont()) == CLOSEBRACE)
c = '#';
- else if (is_name(c) || isdigit(c))
+ else if (is_name(c) || is_digit(c))
subtype = VSLENGTH;
else if (is_special(c)) {
/*
@@ -2703,13 +2703,13 @@ getprompt(void *unused)
* behaviour.
*/
static const char *
-expandonstack(char *ps, int cmdsub, int lineno)
+expandonstack(const char *ps, int cmdsub, int lineno)
{
union node n;
struct jmploc jmploc;
struct jmploc *const savehandler = handler;
struct parsefile *const savetopfile = getcurrentfile();
- char * const save_ps = ps;
+ const char * const save_ps = ps;
const int save_x = xflag;
const int save_e_s = errors_suppressed;
struct parse_state new_state = init_parse_state;
@@ -2767,7 +2767,7 @@ expandonstack(char *ps, int cmdsub, int
}
const char *
-expandstr(char *ps, int lineno)
+expandstr(const char *ps, int lineno)
{
const char *result = NULL;
struct stackmark smark;
@@ -2842,7 +2842,7 @@ expandstr(char *ps, int lineno)
*/
const char *
-expandvar(char *var, int flags)
+expandvar(const char *var, int flags)
{
const char *result = NULL;
struct stackmark smark;
@@ -2918,7 +2918,7 @@ expandvar(char *var, int flags)
* Simply return the result, even if it is on the stack
*/
const char *
-expandenv(char *arg)
+expandenv(const char *arg)
{
return expandonstack(arg, 0, 0);
}
diff -r 8cdcca3e3510 -r d12208786ee4 bin/sh/parser.h
--- a/bin/sh/parser.h Mon Nov 03 09:37:25 2025 +0700
+++ b/bin/sh/parser.h Mon Nov 03 10:05:27 2025 +0700
@@ -78,9 +78,9 @@ void fixredir(union node *, const char *
int goodname(const char *);
int isassignment(const char *);
const char *getprompt(void *);
-const char *expandstr(char *, int);
-const char *expandvar(char *, int);
-const char *expandenv(char *);
+const char *expandstr(const char *, int);
+const char *expandvar(const char *, int);
+const char *expandenv(const char *);
struct HereDoc;
union node;
diff -r 8cdcca3e3510 -r d12208786ee4 bin/sh/var.c
--- a/bin/sh/var.c Mon Nov 03 09:37:25 2025 +0700
+++ b/bin/sh/var.c Mon Nov 03 10:05:27 2025 +0700
@@ -51,6 +51,9 @@ static char sccsid[] = "@(#)var.c 8.3 (B
#include <pwd.h>
#include <fcntl.h>
#include <inttypes.h>
+#ifndef SMALL
+#include <locale.h>
+#endif
/*
* Shell variables.
@@ -101,6 +104,9 @@ char *get_hostname(struct var *);
char *get_seconds(struct var *);
char *get_euser(struct var *);
char *get_random(struct var *);
+
+STATIC void set_locale_var(const char *, int, struct var *);
+void init_locale_vars(void);
#endif
struct localvar *localvars;
@@ -131,6 +137,15 @@ struct var euname;
struct var random_num;
intmax_t sh_start_time;
+
+struct var lc_all;
+struct var lc_collate;
+struct var lc_ctype;
+struct var lc_messages;
+struct var lc_monetary;
+struct var lc_numeric;
+struct var lc_time;
+struct var lc_lang;
#endif
struct var line_num;
@@ -173,6 +188,22 @@ const struct varinit varinit[] = {
{ .set_func= set_editrc } },
{ &ps_lit, VSTRFIXED|VTEXTFIXED|VUNSET, "PSlit=",
{ .set_func= set_prompt_lit } },
+ { &lc_all, VSTRFIXED|VTEXTFIXED|VUNSET|VFUNCPOST, "LC_ALL=",
+ { .set_func= set_locale_var } },
+ { &lc_collate, VSTRFIXED|VTEXTFIXED|VUNSET, "LC_COLLATE=",
+ { .set_func= set_locale_var } },
+ { &lc_ctype, VSTRFIXED|VTEXTFIXED|VUNSET, "LC_CTYPE=",
+ { .set_func= set_locale_var } },
+ { &lc_messages, VSTRFIXED|VTEXTFIXED|VUNSET, "LC_MESSAGES=",
+ { .set_func= set_locale_var } },
+ { &lc_monetary, VSTRFIXED|VTEXTFIXED|VUNSET, "LC_MONETARY=",
+ { .set_func= set_locale_var } },
+ { &lc_numeric, VSTRFIXED|VTEXTFIXED|VUNSET, "LC_NUMERIC=",
+ { .set_func= set_locale_var } },
+ { &lc_time, VSTRFIXED|VTEXTFIXED|VUNSET, "LC_TIME=",
+ { .set_func= set_locale_var } },
+ { &lc_lang, VSTRFIXED|VTEXTFIXED|VUNSET|VFUNCPOST, "LANG=",
+ { .set_func= set_locale_var } },
#endif
{ &voptind, VSTRFIXED|VTEXTFIXED|VNOFUNC, "OPTIND=1",
{ .set_func= getoptsreset } },
@@ -214,6 +245,7 @@ INCLUDE "var.h"
INCLUDE "version.h"
MKINIT char **environ;
MKINIT void setvareqsafe(char *, int);
+MKINIT void init_locale_vars(void);
INIT {
char **envp;
char buf[64];
@@ -257,6 +289,8 @@ INIT {
#ifndef SMALL
snprintf(buf, sizeof(buf), "%jd", sh_start_time);
setvar("START_TIME", buf, VTEXTFIXED);
+
+ init_locale_vars();
#endif
setvar("NETBSD_SHELL", NETBSD_SHELL
@@ -505,8 +539,9 @@ setvareq(char *s, int flags)
INTOFF;
- if (vp->func && !(vp->flags & VFUNCREF) && !(flags & VNOFUNC))
- (*vp->func)(s + vp->name_len + 1, flags);
+ if (vp->func && !(vp->flags & (VFUNCREF|VFUNCPOST)) &&
+ !(flags & VNOFUNC))
+ (*vp->func)(s + vp->name_len + 1, flags, vp);
if ((vp->flags & (VTEXTFIXED|VSTACK)) == 0)
ckfree(vp->text);
@@ -528,6 +563,11 @@ setvareq(char *s, int flags)
vp->flags |= flags & ~(VNOFUNC | VDOEXPORT);
vp->text = s;
+ if (vp->func &&
+ (vp->flags & (VFUNCREF|VFUNCPOST)) == VFUNCPOST &&
+ !(flags & VNOFUNC))
+ (*vp->func)(s + vp->name_len + 1, flags, vp);
+
/*
* We could roll this to a function, to handle it as
* a regular variable function callback, but why bother?
@@ -1208,7 +1248,7 @@ poplocalvars(void)
} else {
if (lvp->func && (lvp->flags & (VNOFUNC|VFUNCREF)) == 0)
(*lvp->func)(lvp->text + vp->name_len + 1,
- lvp->flags);
+ lvp->flags, lvp->vp);
if ((vp->flags & VTEXTFIXED) == 0)
ckfree(vp->text);
vp->flags = lvp->flags;
@@ -1735,4 +1775,201 @@ specialvarcmd(int argc, char **argv)
return res;
}
+struct lc_vars {
+ const char *name;
+ int category;
+ struct var *vp;
+};
+
+const struct lc_vars lc_vars[] = {
+ { .name= "LC_ALL", .category= LC_ALL, .vp= &lc_all },
+
+ { .name= "LC_COLLATE", .category= LC_COLLATE, .vp= &lc_collate },
+ { .name= "LC_CTYPE", .category= LC_CTYPE, .vp= &lc_ctype },
+ { .name= "LC_MESSAGES", .category= LC_MESSAGES, .vp= &lc_messages },
+ { .name= "LC_MONETARY", .category= LC_MONETARY, .vp= &lc_monetary },
+ { .name= "LC_NUMERIC", .category= LC_NUMERIC, .vp= &lc_numeric },
+ { .name= "LC_TIME", .category= LC_TIME, .vp= &lc_time },
+
+ { .name= "LANG", .category= LC_ALL, .vp= &lc_lang },
+
+ { .name= NULL, .category= 0, .vp= NULL }
+};
+
+STATIC void
+set_locale_var(const char *val, int flags, struct var *vp)
+{
+ const struct lc_vars *lv;
+
+ if (flags & VUNSAFE) /* parsing the environment */
+ return; /* do nothing until later */
+
+ for (lv = lc_vars; lv->name != NULL; lv++) {
+ /* Find the var being altered */
+ if (lv->vp != vp)
+ continue;
+
+ if (flags & VUNSET) {
+ /*
+ * If we are unsetting one of these, then
+ * we need to set the locale for the category
+ * back to a default value
+ */
+
+ if (lv->category == LC_ALL) {
+ /*
+ * nb: the LC_ALLL vars call this func
+ * after their value has changed, ie:
+ * since this is an unset, the var already
+ * is unset. This simplifies things.
+ */
+
+ /*
+ * LANG is just the default to use when
+ * nothing more specific is set, if it is
+ * unset, there is nothing to do.
+ */
+ if (lv->vp == &lc_lang)
+ return;
+
+ /*
+ * Otherwise LC_ALL must be being unset,
+ * in that case, simply set everything to
+ * the values obtained from the other
+ * LC_xxx vars (with LANG as default)
+ */
+ init_locale_vars();
+ return;
+ }
+
+ /*
+ * For the category vars, this func is called before
+ * the sh var is updated
+ */
+
+ /* If a category var previously was unset - easy case */
+ if (lv->vp->flags & VUNSET)
+ return;
+
+ /*
+ * One of the specific categories is being
+ * unset, in that case we fall back upon
+ * the value of LC_ALL if that is set, or
+ * otherwise the value of LANG, if that is set,
+ * or just set the "C" locale for this category
+ */
+ vp = NULL;
+ if (!(lc_all.flags & VUNSET))
+ vp = &lc_all;
+ else if (!(lc_lang.flags & VUNSET))
+ vp = &lc_lang;
+
+ if (vp != NULL)
+ val = vp->text + vp->name_len + 1;
+ else
+ val = "C";
+
+ set_locale_var(val, 0, lv->vp);
+ } else {
+ char *old;
+
+ if (lv->category == LC_ALL) {
+ /*
+ * post call, so this var's value is
+ * already established, just go use
+ * it (and the others) to set all the
+ * categories
+ */
+ init_locale_vars();
+ return;
+ }
+
+ /*
+ * Potentially changing the value of a specific
+ * category.
+ *
+ * If the value isn't actually changing, do nothing
+ */
+ old = setlocale(lv->category, NULL);
+ if (old != NULL && strcmp(old, val) != 0) {
+ /*
+ * Otherwise update the shell's locale
+ * to match the new setting
+ */
+ if (setlocale(lv->category, val) == NULL) {
+ /*XXX NetBSD! */ if (lv->category != LC_COLLATE)
+ outfmt(out2,
+ "Setting %s to '%s' failed\n",
+ lv->name, val);
+ } else if (lv->category == LC_CTYPE) {
+ /*
+ * and if that worded, and we're
+ * changing the char encodings,
+ * go re-init libedit (if in use)
+ */
+ if (iflag && (Eflag || Vflag)) {
+ int oE = Eflag, oV = Vflag;
+
+ /*
+ * re-init editing if
+ * LC_CTYPE changes
+ *
+ * easy way: disable, then
+ * enable again
+ */
+ Eflag = Vflag = 0;
+ histedit();
+ Eflag = oE, Vflag=oV;
+ histedit();
+ }
+ }
+ }
+ }
+ return;
+ }
+}
+
+void
+init_locale_vars(void)
+{
+ const struct lc_vars *lv;
+ const char *defval;
+
+ if (!(lc_all.flags & VUNSET)) {
+ /*
+ * If LC_ALL is set, just use it, ignore others
+ * Simply set all categories to the value of LC_ALL
+ *
+ * nb: strlen("LC_ALL") + 1 == 7
+ */
+ for (lv = lc_vars; lv->name != NULL; lv++) {
+ if (lv->category == LC_ALL)
+ continue;
+ set_locale_var(lc_all.text + 7, 0, lv->vp);
+ }
+ return;
+ }
+
+ /*
+ * Otherwise, for each specific category, set the value
+ * of its variable for that category, if that variable is
+ * set, otherwise we need a default for that category.
+ * If LANG is set, that gives the default, otherwise it is "C"
+ */
+ if (lc_lang.flags & VUNSET)
+ defval = "C";
+ else
+ defval = lc_lang.text + 5; /* strlen("LANG") + 1 == 5 */
+
+ for (lv = lc_vars; lv->name != NULL; lv++) {
+ /* ignore LC_ALL and LANG */
+ if (lv->category == LC_ALL)
+ continue;
+ /* otherwise set the category to the appropriate value */
+ set_locale_var( (lv->vp->flags & VUNSET ? defval :
+ lv->vp->text + lv->vp->name_len + 1),
+ 0, lv->vp);
+ }
+}
+
#endif /* SMALL */
diff -r 8cdcca3e3510 -r d12208786ee4 bin/sh/var.h
--- a/bin/sh/var.h Mon Nov 03 09:37:25 2025 +0700
+++ b/bin/sh/var.h Mon Nov 03 10:05:27 2025 +0700
@@ -53,6 +53,7 @@
#define VNOFUNC 0x0100 /* don't call the callback function */
#define VFUNCREF 0x0200 /* the function is called on ref, not set */
+#define VFUNCPOST 0x0400 /* call callback func after setting var */
#define VSPECIAL 0x1000 /* magic properties not lost when set */
#define VDOEXPORT 0x2000 /* obey VEXPORT even if VNOEXPORT */
@@ -62,7 +63,8 @@
struct var;
union var_func_union { /* function to be called when: */
- void (*set_func)(char *, int); /* variable gets set/unset */
+ /* variable set/unset */
+ void (*set_func)(const char *, int, struct var *);
char*(*ref_func)(struct var *); /* variable is referenced */
};
@@ -106,6 +108,14 @@ extern struct var ps_lit;
extern struct var euname;
extern struct var random_num;
extern intmax_t sh_start_time;
+extern struct var lc_all;
+extern struct var lc_collate;
+extern struct var lc_ctype;
+extern struct var lc_messages;
+extern struct var lc_monetary;
+extern struct var lc_numeric;
+extern struct var lc_time;
+extern struct var lc_lang;
#endif
extern int line_number;
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: riastradh@NetBSD.org, netbsd-bugs@netbsd.org
Subject: Re: bin/58609 (sh(1) ignores interactive locale changes)
Date: Fri, 15 May 2026 00:35:14 +0700
Date: Wed, 13 May 2026 19:16:22 +0000 (UTC)
From: riastradh=40NetBSD.org
Message-ID: <20260513191622.E29161A923A=40mollari.NetBSD.org>
=7C Synopsis: sh(1) ignores interactive locale changes
I have undated my source tree for that change (which hadn't been touched
in well over a year) to be consistent with what is in HEAD now, and
while I haven't checked line by line, the diffs I see look to be more
or less the same as the ones you posted.
If everyone is happy with me doing so, I will commit that to /bin/sh
in HEAD soon (you may have noticed the total lack of any sh.1 update,
which is clearly required, I had been deferring that until there was
some confidence that the code as is was acceptable, to avoid documenting
something that wasn't going to remain). That will take a few days
(at least), so there is time for people to investigate still.
If anyone would like the patches for HEAD, send me your =40netbsd.org
e-mail addr if you have one, and anything not =40gmail.com otherwise (off=
list) and I will send you a copy.
=7C 2. If started with LC_CTYPE=3DC, entering =A3 moves the cursor to t=
he
=7C beginning of the line but does not delete it:
=7C
=7C =24 echo=20
=7C =5E
=7C transitions, on entering =A3 (i.e., octets 0xc2 0xa3), to
=7C
=7C =24 echo
=7C =5E
=7C where =5E indicates the position of the cursor on the line above=
.
This is all libedit, not sh, about which I am certainly no expert.
But from that, I'd guess you're in emacs editing more (shame=21 all
real unix users use vi mode, emacs is for VMS TOPS TENEX (etc) users,
where =22one giant lump of goo which pretends to do everything=22 is SOP =
:-).
There 0xc2 0xa3 is likely to be interpreted as Meta-B Meta-=23 As best
as I can follow emacs mode key bindings, Meta-B (and Meta-b) is =22backwa=
rd
word=22 (which corresponds with what you indicated happened), and I belie=
ve
that Meta-=23 isn't bound to anything at all (so would probably just be
ignored).
=7C (This is a change from before, when the input was apparently jus=
t
=7C ignored.)
Might that have been vi mode? vi-mode is unlikely to have any bindings
for input with the top bit set in the C locale, so everything non-ascii
is most likely just ignored. Nothing I am aware of (including to libedi=
t)
has changed any way which should have altered anything like that.
But for more details on that, you need confirmation from someone who
understands libedit lots better than I do. sh doesn't get involved
until you have told libedit that you are done with the input (most often
by entering a newline). If you had libedit turned off, then it would be=
the tty driver doing the editing, until you finish a line (when not using=
libedit sh reads from stdin without doing anything special, the tty would=
be left in =22cooked=22 mode.)
One other issue (since there is no doc to read yet), when the LC_CTYPE
changes in sh, it tells libedit about that by disabling it, and then
re-initing again (at which point libedit sees the updated locale settings=
).
What that means, I believe (and again I guessing a bit about libedit) is
that any local bindings or settings that were made since the last libedit=
init would be lost at that point, and need to be done again. (The setti=
ngs
config file would be re-read though I think, so more permanent changes
would be retained). I doubt many people make use of libedit's command
mode when entering sh input, but if you do, beware, after this change app=
ears.
Lastly, for anyone who cares, very little of this change will affect SMAL=
L
shells, none of the locale manipulations will be added to one of those
(SMALL shells are typically found in limited space boot media, if you loo=
k
at the value of sh's NETBSD_SHELL variable, and see SMALL in there, then
you are in a SMALL shell) - there are some required changes to more basic=
parts of the shell, so even a SMALL shell will grow a little with this
change though (I have not yet measured how much - though the answer would=
also depend upon the various architectures).
kre
From: "Robert Elz" <kre@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/58609 CVS commit: src/bin/sh
Date: Thu, 28 May 2026 10:07:58 +0000
Module Name: src
Committed By: kre
Date: Thu May 28 10:07:58 UTC 2026
Modified Files:
src/bin/sh: exec.c exec.h histedit.c myhistedit.h options.c options.h
sh.1 var.c var.h
Log Message:
PR bin/58609 - enable locale var internal manipulation
sh now recognises the (standard) set of locale variables, and in addition
to setting up the locale environment to match those in the environment at
startup (which it has done for ages), now also causes alterations to those
variables while the shell is running to take immediate effect inside sh,
which can affect how the shell operates in some limited aspects - previously
such updates would be passed to exec'd child processes (not subshells)
if the variables are exported, and not affect the running shell at all.
See the PR, and the updated sh(1) man page, for details.
This is a feature enhancement, no pullups (not even to -11) are planned.
To generate a diff of this commit:
cvs rdiff -u -r1.60 -r1.61 src/bin/sh/exec.c
cvs rdiff -u -r1.28 -r1.29 src/bin/sh/exec.h src/bin/sh/options.h
cvs rdiff -u -r1.73 -r1.74 src/bin/sh/histedit.c
cvs rdiff -u -r1.16 -r1.17 src/bin/sh/myhistedit.h
cvs rdiff -u -r1.62 -r1.63 src/bin/sh/options.c
cvs rdiff -u -r1.276 -r1.277 src/bin/sh/sh.1
cvs rdiff -u -r1.90 -r1.91 src/bin/sh/var.c
cvs rdiff -u -r1.41 -r1.42 src/bin/sh/var.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.49 2026/05/14 01:52:41 riastradh Exp $
$NetBSD: gnats_config.sh,v 1.10 2026/05/13 22:00:09 riastradh Exp $
Copyright © 1994-2026
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.