NetBSD Problem Report #44600

From www@NetBSD.org  Fri Feb 18 19:57:59 2011
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 5B37A63B842
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 18 Feb 2011 19:57:59 +0000 (UTC)
Message-Id: <20110218195758.56EA063B100@www.NetBSD.org>
Date: Fri, 18 Feb 2011 19:57:58 +0000 (UTC)
From: steve.vernon@citrix.com
Reply-To: steve.vernon@citrix.com
To: gnats-bugs@NetBSD.org
Subject: libedit does not properly handle UTF-8 when glyphs are multiple Unicode characters
X-Send-Pr-Version: www-1.0

>Number:         44600
>Category:       lib
>Synopsis:       libedit does not properly handle UTF-8 when glyphs are multiple Unicode characters
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Fri Feb 18 20:00:00 +0000 2011
>Originator:     Steven Vernon
>Release:        sources as of 2011/02/04
>Organization:
Citrix
>Environment:
>Description:
libedit when using UTF-8 assumes that one glyph (visible character) corresponds to one Unicode "code point" (character number) [and it reasonably assumes one glyph takes up one column and one row]. Unfortunately that is not always the case. There are non-composed glyphs that take up multiple Unicode code points. Examples include European languages that have accents that are not composed (e.g. a French "e" with an accent circumflex, but these are two different Unicode characters) and Indian character sets with viramas (?) that indicate vowels, such as in Hindi (which again are multiple Unicode code points).

libedit does not correctly do character deletion nor update the cursor position correctly.
>How-To-Repeat:
Enter data with non-composed accents or viramas, etc. Try backspacing over the data, moving the cursor left/right and deleting and/or inserting, and redisplaying after changes are made.

Beware that some character combinations also have pre-composed versions, which are given a single Unicode code point, such as the above French "e" with accent circumflex. These were only created for backward compability with certain character sets, such as Latin-1. Make sure you enter the non-composed versions if testing with these values.
>Fix:
Probably need to import Unicode information that determines which characters are combining. I believe that in all cases such combining characters follow the base character. See the Unicode site.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.