NetBSD Problem Report #47454

From fair@clock.org  Tue Jan 15 21:18:42 2013
Return-Path: <fair@clock.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 6C8C063D7B3
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 15 Jan 2013 21:18:42 +0000 (UTC)
Message-Id: <20130115211839.B427E15ECB9@cesium.clock.org>
Date: Tue, 15 Jan 2013 13:18:39 -0800 (PST)
From: fair@netbsd.org
Reply-To: fair@netbsd.org
To: gnats-bugs@gnats.NetBSD.org
Subject: terminfo(5) does not have a capability for terminal/display character set
X-Send-Pr-Version: 3.95

>Number:         47454
>Category:       standards
>Synopsis:       terminfo(5) does not have a capability for terminal/display character set
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    standards-manager
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Tue Jan 15 21:20:01 +0000 2013
>Last-Modified:  Sat Jul 13 16:15:01 +0000 2013
>Originator:     Erik E. Fair
>Release:        NetBSD 6.0_STABLE
>Organization:
	The NetBSD Project
>Environment:


System: NetBSD cesium.clock.org 6.0_STABLE NetBSD 6.0_STABLE (V240) #4: Sat Dec 1 19:39:37 PST 2012 root@rubidium.clock.org:/var/obj/sys/arch/sparc64/compile/V240 sparc64
Architecture: sparc64
Machine: sparc64
>Description:
	One key piece of information from the terminal display capabilities
	of terminfo(5) is missing: a given terminal's character set.

	Given UNIX's origins in the USA, we've had American Standard Code for
	Information Interchange (ASCII, a.k.a. US-ASCII) as the system default
	assumption since its creation, but now NetBSD (Unix's successor) is
	being used in many countries with multiple character sets (e.g.
	ISO-8859-1, ISO-2022-JP, KOI8-R, ISO-10646 (UTF-8)), and in
	multi-lingual text processing applications where international
	capability in the base system is required.

	The POSIX locale LANG environment variable doesn't quite do it, and
	can be in conflict with what the user's terminal can actually display,
	whether that be an xterm(1) (or something that emulates it like MacOS
	X's "Terminal" application), or wsdisplay(4). Programs that handle
	multiple labelled character sets (e.g. less(1), Mail User Agents:
	Mail, pine, elm, nmh, etc) and need to match or convert character sets
	for display need this information to prevent violation of the
	"Principle of Least Astonishment."

	At minimum, anything that sets the TERM environment variable ought to
	also set LANG as appropriate to the capabilities of the display.

.How-To-Repeat:
	<code/input/activities to reproduce the problem (multiple lines)>
>Fix:


>Audit-Trail:
From: "Valeriy E. Ushakov" <uwe@stderr.spb.ru>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: standards/47454: terminfo(5) does not have a capability for terminal/display character set
Date: Wed, 16 Jan 2013 17:28:37 +0400

 On Tue, Jan 15, 2013 at 21:20:01 +0000, fair@netbsd.org wrote:

 > One key piece of information from the terminal display capabilities
 > of terminfo(5) is missing: a given terminal's character set.

 Except that you don't and can't know this generally.  E.g. VT200
 supports downloadable fonts, so what exactly is its character set?
 How can system tell whether I'm using a KOI8-R font or latin1 font?


 > The POSIX locale LANG environment variable doesn't quite do it, and
 > can be in conflict with what the user's terminal can actually display,

 Just like TERM can be in conflict with the user's terminal.


 -uwe

From: "Erik E. Fair" <fair@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: standards-manager@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: standards/47454: terminfo(5) does not have a capability for terminal/display character set
Date: Wed, 16 Jan 2013 09:25:43 -0800

 Unix has to know what your terminal can do, a priori, for those programs which
 attempt to manipulate it in any way (e.g. vi, emacs, clear, less; i.e. anything
 linked with terminfo(3) or curses(3), hell any program that #includes
 <termios.h> or uses the TIOC* ioctl(2) system calls)) to succeed. The failure
 mode caused by a mismatch between what Unix thinks your terminal is or can do
 from the TERM environment variable (sometimes set from /etc/ttys or provided
 by remote login programs like ssh) is old and well known/understood: "this
 doesn't look right."

 This follows to character set display capability. We're lucky in that ASCII
 is the base assumption of Unix, and that ASCII is also a proper subset of a
 large number of character sets (e.g. ISO-8859-1, ISO-2022-JP, UTF-8). You're
 really going to lose very badly if the character set your terminal uses does
 not have ASCII as a subset - given how common ASCII is, *everything* has to
 be converted (e.g. run through iconv(1)) before display, i.e. you very probably
 can't just "cat a file" [to the tty] unless that file is in your terminal's
 character set.

 The implication for terminals described by terminfo which have downloadable
 fonts is that there will have to be terminal names that are a tuple of what
 it is and the current character set (e.g. "vt200-koi8-r"), and every time a
 different character set is downloaded, the TERM environment variable must
 change for programs to be able to do the right thing. You're still stuck with
 this situation now: you still have to change the LANG environment variable
 when the terminal character set is changed.

 What I'm trying to argue is that character set is a capability or
 characteristic of the terminal (interface) one uses to Unix, and therefore
 terminfo (or termcap) is the database in which we describe such things.

 Semantically, LANG is similar but not the same, in that its intention is to
 describe (in part) what language, and with the other locale variables, what
 cultural assumptions you have (e.g. sort(1) ordering of characters, commas
 instead of periods for denoting the end of the integer part of a number and
 the beginning of the decimal fraction, ordering the components of a date).

 	Erik <fair@netbsd.org>

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: standards/47454: terminfo(5) does not have a capability for
 terminal/display character set
Date: Sat, 13 Jul 2013 16:10:11 +0000

 On Wed, Jan 16, 2013 at 05:30:13PM +0000, Erik E. Fair wrote:
  > [...]

 wait, you're asking for i18n that works?

 :-/

 -- 
 David A. Holland
 dholland@netbsd.org

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.