NetBSD Problem Report #45221

From dholland@macaran.localdomain  Sun Aug  7 05:38:40 2011
Return-Path: <dholland@macaran.localdomain>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 6EB9E63CB22
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  7 Aug 2011 05:38:40 +0000 (UTC)
Message-Id: <20110807042127.5AF566E1D8@macaran.localdomain>
Date: Sun,  7 Aug 2011 00:21:27 -0400 (EDT)
From: dholland@eecs.harvard.edu
Reply-To: dholland@NetBSD.org
To: gnats-bugs@gnats.NetBSD.org
Subject: xterm utf-8 mode is (partially) a one-way trip
X-Send-Pr-Version: 3.95

>Number:         45221
>Category:       xsrc
>Synopsis:       xterm utf-8 mode is (partially) a one-way trip
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    xsrc-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Aug 07 05:40:00 +0000 2011
>Last-Modified:  Tue Jan 31 22:34:31 +0000 2023
>Originator:     David A. Holland
>Release:        NetBSD 5.99.49 (pkgsrc from 20110730)
>Organization:
>Environment:
System: NetBSD macaran 5.99.49 NetBSD 5.99.49 (MACARAN) #8: Mon Apr 11 19:54:18 EDT 2011 dholland@macaran:/usr/src/sys/arch/amd64/compile/MACARAN amd64
Architecture: x86_64
Machine: amd64
>Description:

xterm-259 from pkgsrc X has the following problem:

If you
  - start it in non-utf-8 mode
  - switch to utf-8 mode with the ctrl-rightmouse menu
  - do some stuff
  - switch back to normal mode with the ctrl-rightmouse menu

then only output handling and not input handling switches back away
from utf-8. That is, if you print characters 128-255 from programs,
they're displayed as the matching iso-latin-1 glyphs; but if you paste
characters 128-255 into the xterm they are converted to utf-8 on the
way in and programs read them as multibyte sequences. This conversion
is apparently enabled when utf-8 mode is switched on, but not disabled
again when it's switched off.


>How-To-Repeat:

prepare something suitable in your cut buffer, open a fresh xterm, and
paste into cat as follows:

	% cat | hexdump -C
	<correct glyph>
	00000000  d8 0a                                             |..|
	00000002

switch on utf-8 mode:

	% cat | hexdump -C
	<correct glyph>
	00000000  c3 98 0a                                          |...|
	00000003

switch off utf-8 mode:

	% cat | hexdump -C
	<wrong glyph>
	00000000  c3 98 0a                                          |...|
	00000003

>Fix:

	dunno, probably obvious if you know where to look, which I don't.

>Release-Note:

>Audit-Trail:
From: David Holland <dholland-pbugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: pkg/45221: xterm utf-8 mode is (partially) a one-way trip
Date: Sun, 18 Jul 2021 10:02:34 +0000

 On Sun, Aug 07, 2011 at 05:40:00AM +0000, dholland@eecs.harvard.edu wrote:
  > If you
  >   - start it in non-utf-8 mode
  >   - switch to utf-8 mode with the ctrl-rightmouse menu
  >   - do some stuff
  >   - switch back to normal mode with the ctrl-rightmouse menu
  > 
  > then only output handling and not input handling switches back away
  > from utf-8.

 This still happens 10 years later with the xterm in base X, not sure
 if the one in pkgsrc is different but I doubt it.

 -- 
 David A. Holland
 dholland@netbsd.org

From: David Holland <dholland-pbugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: pkg/45221: xterm utf-8 mode is (partially) a one-way trip
Date: Tue, 31 Jan 2023 22:26:13 +0000

 On Sun, Jul 18, 2021 at 10:05:01AM +0000, David Holland wrote:
  >  This still happens 10 years later with the xterm in base X, not sure
  >  if the one in pkgsrc is different but I doubt it.

 And it still happens, though the behavior might have changed a little.

 Open three xterms, starting them in non-utf-8 mode.

 In xterm 1, run
 	% echo foo | awk '{ printf "%c\n", 216 }'
 In xterm 2, switch to utf-8 mode with the right mouse menu
 ("UTF-8 Encoding") and run
 	% echo foo | awk '{ printf "%c%c\n", 195, 152 }'

 These should print the same glyph.

 Then in xterm 3:

 	% cat | hexdump -C
 	- select the glyph from xterm 1 (non-utf-8), paste
 	- it'll echo the correct glyph
 	- and you'll get "d8 0a" (the iso-latin-1 for the glyph
 	  and a newline)
 	- hit ^D
 	% cat | hexdump -C
 	- select the glyph from xterm 2 (utf-8), paste
 	- it'll echo the correct glyph
 	- and you'll get "d8 0a" (the iso-latin-1 for the glyph
 	  and a newline)
 	- hit ^D

 	- now switch this xterm to utf-8 mode with the right mouse menu
 	% cat | hexdump -C
 	- select the glyph from xterm 1 (non-utf-8), paste
 	- it'll echo the correct glyph
 	- and you'll get "c3 98 0a" (the utf-8 for the glyph
 	  and a newline)
 	- hit ^D
 	% cat | hexdump -C
 	- select the glyph from xterm 2 (utf-8), paste
 	- it'll echo the correct glyph
 	- and you'll get "c3 98 0a" (the utf-8 for the glyph
 	  and a newline)
 	- hit ^D

 	- now switch this xterm back out of utf-8 mode
 	% cat | hexdump -C
 	- select the glyph from xterm 1 (non-utf-8), paste
 	- it'll echo some other glyph
 	- and you'll get "c3 0a" (the wrong iso-latin-1
 	  and a newline)
 	- hit ^D
 	% cat | hexdump -C
 	- select the glyph from xterm 2 (utf-8), paste
 	- it'll echo the correct glyph
 	- and you'll get "c3 0a" (the wrong iso-latin-1
 	  and a newline)
 	- hit ^D

 	- if you switch back to utf-8 mode it'll paste correctly again

 Additional weirdness happens if you try to paste from the same xterm,
 which is possibly a different bug.

 	% xterm -version
 	XTerm(370)

 I'm going to change the PR from pkgsrc to xsrc since it is happening
 there and we possibly care more that way.

 -- 
 David A. Holland
 dholland@netbsd.org

Responsible-Changed-From-To: pkg-manager->xsrc-manager
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Tue, 31 Jan 2023 22:34:31 +0000
Responsible-Changed-Why:
problem (also?) affects xsrc


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.