NetBSD Problem Report #48427
From www@NetBSD.org Fri Dec 6 21:11:50 2013
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 46234A642D
for <gnats-bugs@gnats.NetBSD.org>; Fri, 6 Dec 2013 21:11:50 +0000 (UTC)
Message-Id: <20131206211149.12016A6451@mollari.NetBSD.org>
Date: Fri, 6 Dec 2013 21:11:48 +0000 (UTC)
From: yuri@rawbw.com
Reply-To: yuri@rawbw.com
To: gnats-bugs@NetBSD.org
Subject: libedit shouldn't require ISO 10646
X-Send-Pr-Version: www-1.0
>Number: 48427
>Category: lib
>Synopsis: libedit shouldn't require ISO 10646
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: lib-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Dec 06 21:15:00 +0000 2013
>Last-Modified: Sun Dec 08 00:15:00 +0000 2013
>Originator: Yuri
>Release: current
>Organization:
n/a
>Environment:
>Description:
While porting lib/libedit to FreeBSD I noticed this lines in chartype.h:
#ifndef __STDC_ISO_10646__
/* In many places it is assumed that the first 127 code points are ASCII
* compatible, so ensure wchar_t indeed does ISO 10646 and not some other
* funky encoding that could break us in weird and wonderful ways. */
#error wchar_t must store ISO 10646 characters
#endif
You limit the character set to UCS (ISO 10646) in order to make sure that lower 127 code points are ASCII. There are many character sets that satisfy this condition, and UCS is just one of them. Other practical examples are KOI8-U,KOI8-R for Cyrillic, ISO/IEC 8859-15, and some others for some other languages.
FreeBSD, for example, doesn't have __STDC_ISO_10646__ defined because the user can set any other character set through environment.
I am not sure what is the right solution, but requiring ISO 10646 isn't right, and would break compiles in general.
>How-To-Repeat:
>Fix:
>Audit-Trail:
From: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/48427: libedit shouldn't require ISO 10646
Date: Sat, 7 Dec 2013 01:03:50 +0100
On Fri, Dec 06, 2013 at 09:15:00PM +0000, yuri@rawbw.com wrote:
> While porting lib/libedit to FreeBSD I noticed this lines in chartype.h:
>
> #ifndef __STDC_ISO_10646__
> /* In many places it is assumed that the first 127 code points are ASCII
> * compatible, so ensure wchar_t indeed does ISO 10646 and not some other
> * funky encoding that could break us in weird and wonderful ways. */
> #error wchar_t must store ISO 10646 characters
> #endif
>
> You limit the character set to UCS (ISO 10646) in order to make sure that lower 127 code points are ASCII. There are many character sets that satisfy this condition, and UCS is just one of them. Other practical examples are KOI8-U,KOI8-R for Cyrillic, ISO/IEC 8859-15, and some others for some other languages.
>
> FreeBSD, for example, doesn't have __STDC_ISO_10646__ defined because the user can set any other character set through environment.
>
> I am not sure what is the right solution, but requiring ISO 10646 isn't right, and would break compiles in general.
Do I understand correctly that you're saying that what the user
defines in the environment changes how wchar_t is defined?
Thomas
From: Yuri <yuri@rawbw.com>
To: gnats-bugs@NetBSD.org, lib-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Cc:
Subject: Re: lib/48427: libedit shouldn't require ISO 10646
Date: Fri, 06 Dec 2013 16:15:04 -0800
On 12/06/2013 16:05, Thomas Klausner wrote:
> Do I understand correctly that you're saying that what the user
> defines in the environment changes how wchar_t is defined?
No, wchar_t by definition holds numeric values of the character code
points wider than 8 bits, for various character sets.
Particular character set represented by wchar_t may vary depending on
the choice of the user.
Your compile time limit of the character set to UCS is too narrow.
Character set should be allowed to vary at the runtime. And any
character set limitations should be done at the runtime too.
Yuri
From: Thomas Klausner <wiz@NetBSD.org>
To: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Cc:
Subject: Re: lib/48427: libedit shouldn't require ISO 10646
Date: Sat, 7 Dec 2013 10:42:31 +0100
On Fri, Dec 06, 2013 at 04:15:04PM -0800, Yuri wrote:
> On 12/06/2013 16:05, Thomas Klausner wrote:
> > Do I understand correctly that you're saying that what the user
> > defines in the environment changes how wchar_t is defined?
>
> No, wchar_t by definition holds numeric values of the character code
> points wider than 8 bits, for various character sets.
> Particular character set represented by wchar_t may vary depending
> on the choice of the user.
>
> Your compile time limit of the character set to UCS is too narrow.
> Character set should be allowed to vary at the runtime. And any
> character set limitations should be done at the runtime too.
I don't know much about this stuff, but I can easily imagine that
wchar_t always contains UCS and that there is a translation layer
between the user-visible encoding and the one in wchar_t.
I'll let someone who knows more about this stuff continue this
conversation.
Thomas
From: Yuri <yuri@rawbw.com>
To: gnats-bugs@NetBSD.org, lib-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Cc:
Subject: Re: lib/48427: libedit shouldn't require ISO 10646
Date: Sat, 07 Dec 2013 01:51:57 -0800
On 12/07/2013 01:45, Thomas Klausner wrote:
> I don't know much about this stuff, but I can easily imagine that
> wchar_t always contains UCS and that there is a translation layer
> between the user-visible encoding and the one in wchar_t.
Please read the Wikipedia article:
https://en.wikipedia.org/wiki/Wide_character
It talks in detail about wchar_t definition. There is no direct
relationship between wchar_t and UCS.
Yuri
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/48427: libedit shouldn't require ISO 10646
Date: Sat, 7 Dec 2013 18:59:55 +0100
On Fri, Dec 06, 2013 at 09:15:00PM +0000, yuri@rawbw.com wrote:
> You limit the character set to UCS (ISO 10646) in order to make sure
> that lower 127 code points are ASCII. There are many character sets
> that satisfy this condition, and UCS is just one of them. Other
> practical examples are KOI8-U,KOI8-R for Cyrillic, ISO/IEC 8859-15, and
> some others for some other languages.
Just like Thomas I don't understand your point. The encoding used for
wchar_t is a fixed implementation detail and does not rely on any user
environment settings in NetBSD.
You may use any of the encodings you list above for multibyte character
sequences, but you will always get full 32bit unicode for wchar_t.
At least on NetBSD.
Martin
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, lib-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, yuri@rawbw.com
Cc:
Subject: Re: lib/48427: libedit shouldn't require ISO 10646
Date: Sat, 7 Dec 2013 15:26:49 -0500
On Dec 7, 12:05am, wiz@NetBSD.org (Thomas Klausner) wrote:
-- Subject: Re: lib/48427: libedit shouldn't require ISO 10646
| Do I understand correctly that you're saying that what the user
| defines in the environment changes how wchar_t is defined?
Even so, you can add __FreeBSD__ to the list of the OS's to avoid the
check and call it a day...
christos
From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/48427: libedit shouldn't require ISO 10646
Date: Sun, 8 Dec 2013 01:13:34 +0100
On Sat, Dec 07, 2013 at 06:00:01PM +0000, Martin Husemann wrote:
> You may use any of the encodings you list above for multibyte character
> sequences, but you will always get full 32bit unicode for wchar_t.
> At least on NetBSD.
No, you won't necessarily get that. That's why we don't set the macro
either. The internal encoding of wchar_t for a given locale is exactly
that -- an encoding detail. We do provide a few basic promises, but
that's about it.
Joerg
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.