NetBSD Problem Report #47454
From fair@clock.org Tue Jan 15 21:18:42 2013
Return-Path: <fair@clock.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
by www.NetBSD.org (Postfix) with ESMTP id 6C8C063D7B3
for <gnats-bugs@gnats.NetBSD.org>; Tue, 15 Jan 2013 21:18:42 +0000 (UTC)
Message-Id: <20130115211839.B427E15ECB9@cesium.clock.org>
Date: Tue, 15 Jan 2013 13:18:39 -0800 (PST)
From: fair@netbsd.org
Reply-To: fair@netbsd.org
To: gnats-bugs@gnats.NetBSD.org
Subject: terminfo(5) does not have a capability for terminal/display character set
X-Send-Pr-Version: 3.95
>Number: 47454
>Category: standards
>Synopsis: terminfo(5) does not have a capability for terminal/display character set
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: standards-manager
>State: open
>Class: change-request
>Submitter-Id: net
>Arrival-Date: Tue Jan 15 21:20:01 +0000 2013
>Last-Modified: Sat Jul 13 16:15:01 +0000 2013
>Originator: Erik E. Fair
>Release: NetBSD 6.0_STABLE
>Organization:
The NetBSD Project
>Environment:
System: NetBSD cesium.clock.org 6.0_STABLE NetBSD 6.0_STABLE (V240) #4: Sat Dec 1 19:39:37 PST 2012 root@rubidium.clock.org:/var/obj/sys/arch/sparc64/compile/V240 sparc64
Architecture: sparc64
Machine: sparc64
>Description:
One key piece of information from the terminal display capabilities
of terminfo(5) is missing: a given terminal's character set.
Given UNIX's origins in the USA, we've had American Standard Code for
Information Interchange (ASCII, a.k.a. US-ASCII) as the system default
assumption since its creation, but now NetBSD (Unix's successor) is
being used in many countries with multiple character sets (e.g.
ISO-8859-1, ISO-2022-JP, KOI8-R, ISO-10646 (UTF-8)), and in
multi-lingual text processing applications where international
capability in the base system is required.
The POSIX locale LANG environment variable doesn't quite do it, and
can be in conflict with what the user's terminal can actually display,
whether that be an xterm(1) (or something that emulates it like MacOS
X's "Terminal" application), or wsdisplay(4). Programs that handle
multiple labelled character sets (e.g. less(1), Mail User Agents:
Mail, pine, elm, nmh, etc) and need to match or convert character sets
for display need this information to prevent violation of the
"Principle of Least Astonishment."
At minimum, anything that sets the TERM environment variable ought to
also set LANG as appropriate to the capabilities of the display.
.How-To-Repeat:
<code/input/activities to reproduce the problem (multiple lines)>
>Fix:
>Audit-Trail:
From: "Valeriy E. Ushakov" <uwe@stderr.spb.ru>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: standards/47454: terminfo(5) does not have a capability for terminal/display character set
Date: Wed, 16 Jan 2013 17:28:37 +0400
On Tue, Jan 15, 2013 at 21:20:01 +0000, fair@netbsd.org wrote:
> One key piece of information from the terminal display capabilities
> of terminfo(5) is missing: a given terminal's character set.
Except that you don't and can't know this generally. E.g. VT200
supports downloadable fonts, so what exactly is its character set?
How can system tell whether I'm using a KOI8-R font or latin1 font?
> The POSIX locale LANG environment variable doesn't quite do it, and
> can be in conflict with what the user's terminal can actually display,
Just like TERM can be in conflict with the user's terminal.
-uwe
From: "Erik E. Fair" <fair@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: standards-manager@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: standards/47454: terminfo(5) does not have a capability for terminal/display character set
Date: Wed, 16 Jan 2013 09:25:43 -0800
Unix has to know what your terminal can do, a priori, for those programs which
attempt to manipulate it in any way (e.g. vi, emacs, clear, less; i.e. anything
linked with terminfo(3) or curses(3), hell any program that #includes
<termios.h> or uses the TIOC* ioctl(2) system calls)) to succeed. The failure
mode caused by a mismatch between what Unix thinks your terminal is or can do
from the TERM environment variable (sometimes set from /etc/ttys or provided
by remote login programs like ssh) is old and well known/understood: "this
doesn't look right."
This follows to character set display capability. We're lucky in that ASCII
is the base assumption of Unix, and that ASCII is also a proper subset of a
large number of character sets (e.g. ISO-8859-1, ISO-2022-JP, UTF-8). You're
really going to lose very badly if the character set your terminal uses does
not have ASCII as a subset - given how common ASCII is, *everything* has to
be converted (e.g. run through iconv(1)) before display, i.e. you very probably
can't just "cat a file" [to the tty] unless that file is in your terminal's
character set.
The implication for terminals described by terminfo which have downloadable
fonts is that there will have to be terminal names that are a tuple of what
it is and the current character set (e.g. "vt200-koi8-r"), and every time a
different character set is downloaded, the TERM environment variable must
change for programs to be able to do the right thing. You're still stuck with
this situation now: you still have to change the LANG environment variable
when the terminal character set is changed.
What I'm trying to argue is that character set is a capability or
characteristic of the terminal (interface) one uses to Unix, and therefore
terminfo (or termcap) is the database in which we describe such things.
Semantically, LANG is similar but not the same, in that its intention is to
describe (in part) what language, and with the other locale variables, what
cultural assumptions you have (e.g. sort(1) ordering of characters, commas
instead of periods for denoting the end of the integer part of a number and
the beginning of the decimal fraction, ordering the components of a date).
Erik <fair@netbsd.org>
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: standards/47454: terminfo(5) does not have a capability for
terminal/display character set
Date: Sat, 13 Jul 2013 16:10:11 +0000
On Wed, Jan 16, 2013 at 05:30:13PM +0000, Erik E. Fair wrote:
> [...]
wait, you're asking for i18n that works?
:-/
--
David A. Holland
dholland@netbsd.org
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.