NetBSD Problem Report #44603

From www@NetBSD.org  Fri Feb 18 23:01:54 2011
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id DA3D263B11D
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 18 Feb 2011 23:01:53 +0000 (UTC)
Message-Id: <20110218230025.0ADB363B11D@www.NetBSD.org>
Date: Fri, 18 Feb 2011 23:00:25 +0000 (UTC)
From: steve.vernon@citrix.com
Reply-To: steve.vernon@citrix.com
To: gnats-bugs@NetBSD.org
Subject: editline el_gets drops many UTF-8 characters
X-Send-Pr-Version: www-1.0

>Number:         44603
>Category:       lib
>Synopsis:       editline el_gets drops many UTF-8 characters
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Feb 18 23:05:00 +0000 2011
>Originator:     Steven Vernon
>Release:        sources as of 2011/02/04
>Organization:
Citrix
>Environment:
>Description:
When using el_gets() in editline, which is called by the readline emulation function readline(), multi-byte characters are always dropped. This is incorrect for UTF-8 because many UTF-8 characters are multi-byte (all non-ASCII characters).
>How-To-Repeat:
Use either el_gets() or readline() when compiled for UTF-8 (build with WIDECHAR, which is the default) and set the local to some UTF-8 variant, such as en_US.UTF-8 (e.g. set the environment variable LC_ALL to this).
>Fix:
el_gets() unconditionally sets IGNORE_EXTCHARS before calling el_wgets() (and then resets it after the call). This causes read_char() to drop multi-byte characters.

There are 2 possible solutions:
1) Only set IGNORE_EXTCHARS if CHARSET_IS_UTF8 is not set (and don't unset it after the call to el_wgets()), as is done in el_getc().
2) Have read_char() not honor IGNORE_EXTCHARS if CHARSET_IS_UTF8. Ofhand, this seems like the better, more correct solution, but it could affect more paths through the code. If you do this you should probably remove the code from el_getc() to conditinally set and unset IGNORE_EXTCHARS.

More testing on UTF-8 should be done.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.