NetBSD Problem Report #51470

From www@NetBSD.org  Sun Sep 11 17:47:15 2016
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 14DA47A289
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 11 Sep 2016 17:47:15 +0000 (UTC)
Message-Id: <20160911174713.A4DAC7A2AE@mollari.NetBSD.org>
Date: Sun, 11 Sep 2016 17:47:13 +0000 (UTC)
From: saab99@gmx.com
Reply-To: saab99@gmx.com
To: gnats-bugs@NetBSD.org
Subject: UTF-8 not support Russian
X-Send-Pr-Version: www-1.0

>Number:         51470
>Category:       misc
>Synopsis:       UTF-8 not support Russian
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    misc-bug-people
>State:          open
>Class:          support
>Submitter-Id:   net
>Arrival-Date:   Sun Sep 11 17:50:00 +0000 2016
>Last-Modified:  Fri Nov 11 19:30:01 +0000 2016
>Originator:     Michael
>Release:        NetBSD 7.0
>Organization:
Home
>Environment:
NetBSD  7.0.1 NetBSD 7.0.1 (GENERIC.201605221355Z) amd64
>Description:
My servers keep many files in cyrillic naming. Serving big user loads
it is hard to keep files in old encodings with outside world is already
living in UTF-8. Storing files not in UTF-8 cause some problems with Samba
and fatal problems with Linux & NFS, which don't have conversions at all.

So I feel there is time to move on UTF-8 on NetBSD too, and it seems
NetBSD 6 has ru_RU.UTF-8 support, however it is still is not complete.

Fresh installed 6.1.4 can store files in UTF-8. It also can share these
via SMB or NFS, but I can't make it work in shell.

As I see it has support only for LC_CTYPE and LC_MESSAGES via locale.alias
having no native ru_RU.UTF-8 support.

My linux rxvt-unicode terminal (working locally as expected) with ssh to
NetBSD box show:

[***@gloria ~]$ locale
LANG="ru_RU.UTF-8"
LC_CTYPE="ru_RU.UTF-8"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="ru_RU.UTF-8

This cause cyrillic filenamse being shown good, but I cannot access it,
because shell print hex code (f.e. \:\262\321\320) instead of letters.
Bash is 4.3.0(1) out of the box. (By the way https://wiki.netbsd.org/unicode/
says it will work out of the box)

Two questions on that:

Am I right and aliasing ru_RU.UTF-8 to en_US.UTF-8 make this that bad?

If I am right - what I shall do to complete ru_RU.UTF-8 locale and have
no problems in writing cyrillic filenames?

>How-To-Repeat:

>Fix:

>Audit-Trail:
From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: misc/51470 UTF-8 not support Russian
Date: Fri, 11 Nov 2016 20:27:44 +0100

 There are deficiencies in UTF-8 support, but Russian is no special
 case.

 The system itself is pretty agnostic and treats filenames just as
 a byte sequence with special meaning to the values 47 (slash) and
 zero. Interpreting that byte sequence as some codepage or as UTF-8
 is a matter of convention.

 The bourne shell (/bin/sh) didn't handle input bytes with bit 7 set
 because that was used internally by the parser. The C-shell can
 handle 8-bit filenames but wouldn't understand a utf-8 environment
 in NetBSD-6, NetBSD-7 is fine. Other shells, including the current
 bash (4.3.0) from pkgsrc don't have that problem and the native
 /bin/sh has been fixed in NetBSD/-current.

 VFAT stores long filenames in 16bit unicode. NetBSD would ignore
 that and use only the lower byte of each character. This allowed
 arbitrary byte sequences in filenames but is incompatible with Windows.
 NetBSD/-current can translate between the 16bit unicode data on
 disk and UTF-8.

 The vi editor gained wide character support in NetBSD/-current
 and you can now edit utf-8 text files with it.

 NetBSD locale support is limited, but LC_CTYPE shouldn't differ
 between the various languages when encoding is UTF-8. Using ru_RU.UTF-8
 is fine.


 Greetings,
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.