Display bugs with emacs and neovim #4094

batbone · 2021-10-04T08:00:08Z

Describe the bug
Some presumably uncommon characters cause display bugs in emacs and neovim, and this does not happen in most other terminal emulators I have tested.

First bug

Steps to reproduce the behavior:
1.command kitty --config=/dev/null
2. curl https://files.lilf.ir/tmp/weird.txt > weird.txt
3. emacs -Q -nw weird.txt
4. Trying to edit the text in the middle will immediately show you the corruption, but to be precise, go on the visible char e in note-taking, and press C-x = to report what char we are on. Instead of getting back e, we get SPC!

Exit emacs with C-x C-c
nvim weird.txt
Try deleting that e and type A. The corruption is obvious:

At first, I thought this was an emacs bug, as vim, and previous versions of emacs did not exhibit this behavior. But after extensive discussion on the emacs bug tracker, we think this is probably a terminal emulator issue. I have tested this with Terminal.app, Alacritty, and iTerm, and only iTerm also exhibits this buggy behavior.

Second bug

command kitty --config=/dev/null
curl https://files.lilf.ir/tmp/bug.txt > bug.txt
Do cat bug.txt and note the output:

emacs -Q -nw bug.txt

Note the corruption; In particular, the line #+TITLE: sharif/contact info is not displayed at all.
Exit emacs
nvim bug.txt

This bug reproduces with emacs 27, emacs 28, and nvim, on Kitty, and not on iTerm, Alacritty, or Terminal.app. vim still works correctly though:

Environment details

nvim: 0.5.1
kitty: 0.23.1
TERM: xterm-kitty
macOS: 11.2.1
emacs: 28 (built this week from the master)
In GNU Emacs 28.0.50 (build 1, x86_64-apple-darwin20.3.0, NS appkit-2022.30
Version 11.2.1 (Build 20D75))
 of 2021-09-21 built on Fereidoons-MacBook-Pro.local
System Description:  macOS 11.2.1

Configured using:
 'configure --disable-dependency-tracking --disable-silent-rules
 --enable-locallisppath=/usr/local/share/emacs/site-lisp
 --infodir=/usr/local/Cellar/emacs-plus <at> 28/28.0.50/share/info/emacs
 --prefix=/usr/local/Cellar/emacs-plus <at> 28/28.0.50 --with-xml2
 --with-gnutls --with-native-compilation --without-dbus
 --with-imagemagick --with-modules --with-rsvg --with-xwidgets --with-ns
 --disable-ns-self-contained 'CFLAGS=-I/usr/local/opt/gcc/include
 -I/usr/local/opt/libgccjit/include -I/usr/local/opt/gmp/include
 -I/usr/local/opt/jpeg/include' 'LDFLAGS=-L/usr/local/lib/gcc/11
 -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include
 -I/usr/local/opt/gmp/include -I/usr/local/opt/jpeg/include''

Configured features:
ACL GIF GLIB GMP GNUTLS IMAGEMAGICK JPEG JSON LCMS2 LIBXML2 MODULES
NATIVE_COMP NOTIFY KQUEUE NS PDUMPER PNG RSVG THREADS TIFF
TOOLKIT_SCROLL_BARS XIM XWIDGETS ZLIB

Important settings:
  value of $LC_ALL: en_US.UTF-8
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

I have also reproduced the bugs on Ubuntu 20 via SSH, so it happens on both macOS and Linux, at least when Kitty runs from macOS.

I have attached the reproduction files here as well:

The text was updated successfully, but these errors were encountered:

kovidgoyal · 2021-10-04T08:25:04Z

There is no bug here. These issues will happen when the program running
in the terminal has a different idea of the width (as number of terminal
cells) of a string than the terminal emulator. Many older programs use
the wcwidth() C library function, which is fundamentally broken, because
in modern unicode, you cant tell the width of strings a character at a
time, because of variation selectors, zero width joiners, etc. Therefore
they need to use wcswidth() instead to get the correct unicode width of
a string.

kitty uses a wcswidth() implementation auto-generated from the latest
unicode standard. I dont know what emacs and vim use, but the chances
are high that they are incorrect, not kitty. If you feel differently
post the unicode codepoints that you think kitty is getting the wrong
width for. You can easily test the width that kitty thinks a string
should be with:

kitty +runpy 'from kitty.fast_data_types import *; import sys; print(wcswidth(sys.argv[-1]))' foo

replace foo above with the string you want to test.

batbone · 2021-10-04T08:34:34Z

@kovidgoyal Can you keep the issue open? I have no expertise on this, I just provided the reproduction guide, but the emacs people also thought it's not 'an emacs bug,' so it seems the bug is a bit complicated, and maybe someone who knows more about both programs will come along and contribute the root of the problem. (I am not saying this is a Kitty bug, but that it being open can attract the needed attention.)

PS: Is the second bug also related to getting the wrong width? The corruption there is very extensive, so naively I thought it might be something else.

Thanks.

kovidgoyal · 2021-10-04T08:38:33Z

I'm afraid I dont keep bugs open for things I dont think are bugs, but you or anyone else is welcome to comment further on this bug and I will respond. As for the second issue, that is RTL text, which is totally broken on terminals and terminal applications in general, see #2109

batbone · 2021-10-04T09:11:45Z

@kovidgoyal The second bug is not related to the RTL text. Here is a reproduction using LTR text:

See how there are whole lines skipped, and how nvim's modeline is all wrong.

bug_ltr.txt

kovidgoyal · 2021-10-04T09:29:39Z

The fact that it renders correctly with cat already means the bug is in nvim. And it looks fine in vim for me

batbone · 2021-10-04T10:00:04Z

@kovidgoyal Exactly! But emacs has this exact same bug, and the bug only appears on Kitty, and not Terminal.app, Alacritty, or iTerm. So there is an interaction between something in nvim/emacs and kitty that is causing this bug.

BTW, I am noticing some other nondeterministic bug after playing with these files for a bit, where the whole terminal session will go faulty (any TUI command I run will behave weirdly), and doing exec zsh, reset, clear do not solve the problem. I can't reproduce this reliably though, unfortunately.

kovidgoyal · 2021-10-04T10:14:40Z

On Mon, Oct 04, 2021 at 03:00:14AM -0700, batbone wrote: @kovidgoyal Exactly! But emacs has this exact same bug, and the bug only appears on Kitty, and not Terminal.app, Alacritty, or iTerm. So there is an interaction between something in nvim/emacs and kitty that is causing this bug.

Yes, that file contains a variation selector U+FE0F. The bullets in your list are U+25AB followed by U+FE0F this combination has width two and is correctly being rendered in kitty as width 2, you can see that by moving your cursor over it, it becomes a fat square. So as I said there is no bug in kitty. What emacs' problem is, only emacs developers can tell you.

batbone · 2021-10-04T10:30:48Z

On Mon, Oct 04, 2021 at 03:00:14AM -0700, batbone wrote: @kovidgoyal Exactly! But emacs has this exact same bug, and the bug only appears on Kitty, and not Terminal.app, Alacritty, or iTerm. So there is an interaction between something in nvim/emacs and kitty that is causing this bug.
Yes, that file contains a variation selector U+FE0F. The bullets in your list are U+25AB followed by U+FE0F this combination has width two and is correctly being rendered in kitty as width 2, you can see that by moving your cursor over it, it becomes a fat square. So as I said there is no bug in kitty. What emacs' problem is, only emacs developers can tell you.

Sorry, I should have made two different bug reports, to avoid the confusion here. (If you want, I can open a separate bug report for the second bug now?) I think your comment pertains to the bug with weird.txt?

The second bug (bug_ltr.txt) makes nvim and emacs skip whole lines, and have corrupt UIs.

Here is a screenshot of the correct vim:

Here is a screenshot of the incorrect neovim:

Oh. Trying to take the screenshot of emacs, emacs actually displays bug_ltr.txt correctly:

But it has the same bug with the old, RTL version bug.txt (Note that I am not referring to the RTL text itself being all wrong, but that the line #+TITLE: sharif/contact info is not displayed, and emacs' top menu is also hidden):

All the incorrect behavior reported in this comment is exclusive to Kitty. Here is a screenshot of bug_ltr.txt in neovim on Alacritty:

Do you think this is a problem with getting the width of characters wrong? (Considering it seems more of a height issue.)

kovidgoyal · 2021-10-04T10:43:20Z

On Mon, Oct 04, 2021 at 03:30:59AM -0700, batbone wrote: Do you think this is a problem with getting the width of characters wrong? (Considering it seems more of a height issue.)

I have no idea, I am not a vim or emacs developer. Once width calculations go wrong, everything can go wrong, since the UI is strings of text of ostensibly known width. The suspicious thing is the variation selector, and I know lots of terminals don't support variation selectors. So it's not surprising that those two bugs cancel out. The terminal not supporting it and the editor not supporting it. Indeed recently there was a whole conversation with some nutcase arguing that because many terminals currently dont support it it should never be supported and I should drop support for it from kitty. And as I showed in my screenshot, I get no missing lines, and no UI corruption in vim (version 8.2.3441) in kitty.

batbone · 2021-10-04T11:07:02Z

On Mon, Oct 04, 2021 at 03:30:59AM -0700, batbone wrote: Do you think this is a problem with getting the width of characters wrong? (Considering it seems more of a height issue.)
I have no idea, I am not a vim or emacs developer. Once width calculations go wrong, everything can go wrong, since the UI is strings of text of ostensibly known width. The suspicious thing is the variation selector, and I know lots of terminals don't support variation selectors. So it's not surprising that those two bugs cancel out. The terminal not supporting it and the editor not supporting it. Indeed recently there was a whole conversation with some nutcase arguing that because many terminals currently dont support it it should never be supported and I should drop support for it from kitty. And as I showed in my screenshot, I get no missing lines, and no UI corruption in vim (version 8.2.3441) in kitty.

@kovidgoyal Can Kitty add an option to disable support for these variation selectors? It's great that you're modernizing terminals, and I agree that the support should be on by default, but having a compatibility mode would help us get through the transitionary period.

It can also help us pinpoint the bug, as it might not be the variation selectors after all.

kovidgoyal · 2021-10-04T11:11:17Z

On Mon, Oct 04, 2021 at 04:07:12AM -0700, batbone wrote: > On Mon, Oct 04, 2021 at 03:30:59AM -0700, batbone wrote: Do you think this is a problem with getting the width of characters wrong? (Considering it seems more of a height issue.) > I have no idea, I am not a vim or emacs developer. Once width calculations go wrong, everything can go wrong, since the UI is strings of text of ostensibly known width. The suspicious thing is the variation selector, and I know lots of terminals don't support variation selectors. So it's not surprising that those two bugs cancel out. The terminal not supporting it and the editor not supporting it. Indeed recently there was a whole conversation with some nutcase arguing that because many terminals currently dont support it it should never be supported and I should drop support for it from kitty. And as I showed in my screenshot, I get no missing lines, and no UI corruption in vim (version 8.2.3441) in kitty. @kovidgoyal Can Kitty add an option to disable support for these variation selectors? It's great that you're modernizing terminals, and I agree that the support should be on by default, but having a compatibility mode would help us get through the transitionary period. It can also help us pinpoint the bug, as it might not be the variation selectors after all.

It's not worth the effort to me, sorry. As for pinpointing the bug just strip those chars from the file and see if the bug goes away.

batbone · 2021-10-04T11:49:47Z

Removing the variation selectors indeed solves the bug with bug.txt:

Hopefully, emacs and neovim maintainers can solve this issue.

PS: I used this code to remove the variation selectors:

package main

import (
	"fmt"
	"io/ioutil"
	"unicode"
	"os"
	"log"
)

func main() {
	inBytes, err := ioutil.ReadAll(os.Stdin)
	if err != nil {
		log.Fatalln(err.Error())
	}
	input := string(inBytes)

	output := make([]rune, 0)
	for _, rune := range input  {
		if ! unicode.In(rune, unicode.Variation_Selector) {
			output = append(output, rune)
		}
	}

	fmt.Printf("%s", string(output))

}

Eli-Zaretskii · 2021-10-04T12:39:16Z

Regarding the issue with character width: Emacs uses character width tables computed from the latest Unicode Standard version 14.0.0, using the data in the file EastAsianWidth.txt. In that text, the U+00AD SOFT HYPHEN character, which caused the problems in your file, has the East Asian Width property value of A, which stands for "Ambiguous". The definition of this value in the Unicode Standard Annex 11 (UAX#11) is as follows:

   East Asian Ambiguous (A): All characters that can be sometimes wide and sometimes narrow. Ambiguous characters require additional information not contained in the character code to further resolve their width. Ambiguous characters occur in East Asian legacy character sets as wide characters, but as narrow (i.e., normal-width) characters in non-East Asian usage.

And since the file you show didn't have any East Asian legacy characters, treating SOFT HYPHEN as narrow is IMO correct.

kovidgoyal · 2021-10-04T12:47:29Z

On Mon, Oct 04, 2021 at 05:39:27AM -0700, Eli-Zaretskii wrote: Regarding the issue with character width: Emacs uses character width tables computed from the latest Unicode Standard version 14.0.0, using the data in the file EastAsianWidth.txt. In that text, the U+00AD SOFT HYPHEN character, which caused the problems in your file, has the East Asian Width property value of A, which stands for "Ambiguous". The definition of this value in the Unicode Standard Annex 11 (UAX#11) is as follows: East Asian Ambiguous (A): All characters that can be sometimes wide and sometimes narrow. Ambiguous characters require additional information not contained in the character code to further resolve their width. Ambiguous characters occur in East Asian legacy character sets as wide characters, but as narrow (i.e., normal-width) characters in non-East Asian usage. And since the file you show didn't have any East Asian legacy characters, treating SOFT HYPHEN as narrow is IMO correct.

A soft hyphen is not rendered at all, unless at a line break (optionally). So the correct width value for it is zero. Otherwise you would need to take screen geometry into account when computing widths, which is undesirable for many reasons. Not to mention that editors can have margins that the terminal emulator knows nothing about. So the editor and terminal emulator may not even agree about the line break locations (think for instance of multiple panes in vim or emacs or even popup windows). Therefore, the only correct value for soft hyphen width is zero. And note that EastAsianWidth.txt alone is not sufficient for wcswidth(). It does not cover emoji, variation selectors, zero width joiners, etc.

Eli-Zaretskii · 2021-10-04T13:21:21Z

A soft hyphen is not rendered at all, unless at a line break

So this is the root cause of the problem in this case, AFAIU: Kitty assumes that the SOFT HYPHEN will not be output in the middle of a line, but Emacs does output it. It has nothing to do with the width tables.

And note that EastAsianWidth.txt alone is not sufficient for wcswidth(). It does not cover emoji, variation selectors, zero width joiners, etc.

Which is one reason why Emacs doesn't use wcwidth, it uses the Unicode data directly, and accounts for character compositions. But this all is beside the point, as long as Kitty assumes the editor will not output certain characters.

kovidgoyal · 2021-10-04T13:27:37Z

On Mon, Oct 04, 2021 at 06:21:31AM -0700, Eli-Zaretskii wrote: > A soft hyphen is not rendered at all, unless at a line break So this is the root cause of the problem in this case, AFAIU: Kitty assumes that the SOFT HYPHEN will not be output in the middle of a line, but Emacs does output it. It has nothing to do with the width tables.

No, kitty does not care about the presence of soft hyphens. It counts them as zero width and does not render them. Which is the correct behavior. The problem comes from emacs incorrectly counting them as width 1.

batbone · 2021-10-04T13:42:24Z

I think I found a workaround for the SOFT HYPHEN issue (i.e., weird.txt) in emacs:

(set-char-table-range glyphless-char-display
                        (char-from-name "SOFT HYPHEN") 'zero-width)

We can detect Kitty by checking for the env var KITTY_WINDOW_ID, and run this workaround:

(defun kitty-p ()
  (let ((kitty-window-id (getenv "KITTY_WINDOW_ID")))
    (and kitty-window-id
         (not (string= kitty-window-id "")))))

(when (kitty-p)
  (set-char-table-range glyphless-char-display
                        (char-from-name "SOFT HYPHEN") 'zero-width))

So only the issue with the unicode variation selectors remain.

Eli-Zaretskii · 2021-10-04T13:43:30Z

No, kitty does not care about the presence of soft hyphens. It counts them as zero width and does not render them.

OK, but the result is the same: Kitty assumes something that is not shared by the editor.

Which is the correct behavior. The problem comes from emacs incorrectly counting them as width 1.

I respectfully disagree. Whether to display zero-width characters is up to the "higher-level protocols", and Emacs traditionally doesn't remove anything from the display. For example, ZWNJ, if it doesn't combine with surrounding text, is displayed as a single-pixel space on GUI displays, and as a regular space on text-mode displays, such as Kitty.

So even if SOFT HYPHEN were a zero-width character (which it isn't, see the citation from the Unicode Annex), a terminal should not assume that an editor will share its ideas about text layout.

Does Kitty have a setting that can change this behavior? If so, the OP could try using it. If there's no such setting, and the Kitty developers think Kitty behaves correctly I can only conclude that Emacs and Kitty currently cannot work together, and there's no reason to continue this discussion.

Eli-Zaretskii · 2021-10-04T14:04:28Z

I think I found a workaround for the SOFT HYPHEN issue (i.e., weird.txt) in emacs:

If you don't mind not seeing that character on display and having trouble deleting it, then fine, you can use this solution for your customizations.

batbone · 2021-10-04T14:07:20Z

Turning off auto-composition-mode in emacs indeed solves the problem with bug.txt:

But it makes the bullets ugly. Terminal.app behaves correctly with auto-composition-mode enabled, and shows beautiful bullets:

I think this shows that Terminal.app also supports unicode variation characters, and the issue is specifically with a non-industry-standard assumption on Kitty's part?

Eli-Zaretskii · 2021-10-04T14:18:39Z

But it makes the bullets ugly.

Well, "ugly" is better than "messed-up", I'd say. Don't you agree?

the issue is specifically with a non-industry-standard assumption on Kitty's part?

Let's say the issue is different assumptions in Kitty and in Emacs. Specifically, Emacs by default relies on the text-mode terminal to perform the necessary character shaping required by sequences such as this one.

kovidgoyal · 2021-10-04T14:31:10Z

On Mon, Oct 04, 2021 at 06:43:41AM -0700, Eli-Zaretskii wrote: > No, kitty does not care about the presence of soft hyphens. It counts them as zero width and does not render them. OK, but the result is the same: Kitty assumes something that is not shared by the editor.

Indeed, but kitty's assumption is correct, the editor's is not.

> Which is the correct behavior. The problem comes from emacs incorrectly counting them as width 1. I respectfully disagree. Whether to display zero-width characters is up to the "higher-level protocols", and Emacs traditionally doesn't remove anything from the display. For example, ZWNJ, if it doesn't combine with surrounding text, is displayed as a single-pixel space on GUI displays, and as a regular space on text-mode displays, such as Kitty. So even if SOFT HYPHEN were a zero-width character (which it isn't, see the citation from the Unicode Annex), a terminal should not assume that an editor will share its ideas about text layout. Does Kitty have a setting that can change this behavior? If so, the OP could try using it. If there's no such setting, and the Kitty developers think Kitty behaves correctly I can only conclude that Emacs and Kitty currently cannot work together, and there's no reason to continue this discussion.

Thumbs up from me. Fix your incorrect assumptions about soft-hyphen. Until then there is nothing to be done here.

kovidgoyal · 2021-10-04T14:40:27Z

Let's say the issue is different assumptions in Kitty and in Emacs. Specifically, Emacs by default relies on the text-mode terminal to perform the necessary character shaping required by sequences such as this one.

No emacs relies on text mode terminals not doing the necessary character shaping.

Eli-Zaretskii · 2021-10-04T14:53:44Z

No emacs relies on text mode terminals not doing the necessary character shaping.

You misunderstood.

kovidgoyal · 2021-10-04T14:55:01Z

On Mon, Oct 04, 2021 at 07:53:54AM -0700, Eli-Zaretskii wrote: > No emacs relies on text mode terminals not doing the necessary character shaping. You misunderstood.

If you say so.

batbone · 2021-10-04T15:27:44Z

Please, this is just a software issue. There is no need to get adversarial over it. There will always be broken assumptions around complex pieces of software interfacing with each other.

Mr. Goyal is a bit infamous for their, ahem, inflammatory behavior, but they are just one person maintaining and developing a lot of important, complex pieces of FOSS software. And they respond promptly and to every issue, and even help users on sites like mobileread. If they try to give every issue the consideration that, say, emacs can afford to give its filed issues, I fear it might not be sustainable for them. Being opinionated is somewhat of a necessary evil when the needed manpower is not present.

Anyhow, sorry @kovidgoyal if I sound patronizing or anything. Thank you for the work you have done and continue to do.

We have found the cause of the first bug, but I think it's still not clear what is causing the unicode variation selector bug. What is different about Kitty and Terminal.app that is tripping up emacs?

Why is this not breaking vim as well? vim has nice bullets on kitty:

After the breaking point is identified, workarounds can be discussed.

kovidgoyal · 2021-10-04T15:36:04Z

On Mon, Oct 04, 2021 at 08:27:54AM -0700, batbone wrote: Please, this is just a software issue. There is no need to get adversarial over it. There will always be broken assumptions around complex pieces of software interfacing with each other. Mr. Goyal is a bit infamous for their, ahem, inflammatory behavior, but they are just one person maintaining and developing a lot of important, complex pieces of FOSS software. And they respond promptly and to every issue, and even help users on sites like mobileread. If they try to give every issue the consideration that, say, emacs can afford to give its filed issues, I fear it might not be sustainable for them. Being opinionated is somewhat of a necessary evil when the needed manpower is not present. Anyhow, sorry @kovidgoyal if I sound patronizing or anything. Thank you for the work you have done and continue to do.

No worries.

--- We have found the cause of the first bug, but I think it's still not clear what is causing the unicode variation selector bug. What is different about Kitty and Terminal.app that is tripping up emacs?

The character pair U+25ab and U+FE0F must be rendered in two cells, as U+FE0F converts U+25ab from *text presentation* to *emoji presentation*. And emoji in terminals are rendered at width two. kitty does this, Terminal.app does not. emacs assumes it must be rendered in one cell.

Eli-Zaretskii · 2021-10-04T16:20:32Z

I think I found a workaround for the SOFT HYPHEN issue

Here's a potentially better workaround:

        (or standard-display-table
            (setq standard-display-table (make-display-table)))
        (aset standard-display-table
              #xAD (vector (make-glyph-code ?- 'escape-glyph)))

This will display SOFT HYPHEN as the ASCII dash character -, but with a special typeface that will make it stand out.

batbone · 2021-10-04T17:28:10Z

@Eli-Zaretskii Thanks, that was exactly what I was thinking would be best, but I did not know how to do it. Does emacs 27 do this? I see dashes with emacs 27 without running this code.

I think Mr. Goyal is right about emacs being in the wrong on bug.txt. Emojis indeed usually take a width of two, but emacs somehow thinks the bullet emoji is taking a width of one:

Compare with an emoji that emacs recognizes:

Presing <right>:

Eli-Zaretskii · 2021-10-04T17:40:03Z

Emojis indeed usually take a width of two, but emacs somehow thinks the bullet emoji is taking a width of one

Most terminals produce a single-column glyph for these sequences. If Emacs would go with Kitty, it would fail to work on all the other terminals, since there's no way for it to know which terminal does what with each composable sequence of codepoints.
At least AFAIK; if someone knows how to ask the terminal about its behavior in those cases, I'm all ears.

Eli-Zaretskii · 2021-10-04T17:41:50Z

Thanks, that was exactly what I was thinking would be best, but I did not know how to do it. Does emacs 27 do this? I see dashes with emacs 27 without running this code

Sorry, I don't understand: you see dashes for what text?

batbone · 2021-10-04T21:49:10Z

Emojis indeed usually take a width of two, but emacs somehow thinks the bullet emoji is taking a width of one

Most terminals produce a single-column glyph for these sequences. If Emacs would go with Kitty, it would fail to work on all the other terminals, since there's no way for it to know which terminal does what with each composable sequence of codepoints. At least AFAIK; if someone knows how to ask the terminal about its behavior in those cases, I'm all ears.

Can't we just configure this using some elisp? Then we can reuse the hack I did with looking for the env var KITTY_WINDOW_ID.

Sorry, I don't understand: you see dashes for what text?

Emacs 27 shows SOFT HYPHEN as a dash on Kitty:

It looks similar to the result I get on emacs 28 after running your workaround.

kovidgoyal · 2021-10-05T00:58:40Z

On Mon, Oct 04, 2021 at 10:40:14AM -0700, Eli-Zaretskii wrote: > Emojis indeed usually take a width of two, but emacs somehow thinks the bullet emoji is taking a width of one Most terminals produce a single-column glyph for these sequences. If Emacs would go with Kitty, it would fail to work on all the other terminals, since there's no way for it to know which terminal does what with each composable sequence of codepoints. At least AFAIK; if someone knows how to ask the terminal about its behavior in those cases, I'm all ears.

You home the cursor, print out your characters of interest, query the terminal for its cursor position. That will tell you what width the terminal thinks the string should be. And if needed, I am happy to implement a dedicated escape code to query character widths. My goal with kitty is to move this ecosystem forward. Supporting Unicode as well as can be done in the paradigm of fixed size cells we have, is an important part of that goal. While it is true that many legacy terminals dont support variation selectors correctly, that is not a reason to do the wrong thing forever. The fact is that in Unicode, a variation selector can change the nature of the preceding code point. All other text processing software supports this, there is no reason terminals should not. Well technically, VS2015 in particular is problematic, because it can reduce the width of the preceding codepoint, which can cause side effects if the preceding code point is at a screen boundary, but this is an issue that needs to be addressed separately by terminal developers to arrive at some standard for how to behave in this case, and is irrelevant to VS2016, which is under discussion here.

Eli-Zaretskii · 2021-10-05T13:10:57Z

Emacs 27 shows SOFT HYPHEN as a dash on Kitty:

Yes, we changed the behavior in Emacs 28, to avoid interfering with line-wrapping under visual-line-mode. Previously, Emacs would wrap lines on NBSP, because it was displayed as an ASCII space.

Eli-Zaretskii · 2021-10-05T13:24:45Z

You home the cursor, print out your characters of interest, query the terminal for its cursor position.

That'd significantly slow down text-mode display, especially if it goes via the network (which is a large portion of use cases where Emacs is used on text terminals). Currently, we just fwrite the encoded text to the device, we don't write it one character at a time. So I'd rather we didn't do that.

Emacs wants total control on the text layout, leaving the terminal as dumb as possible. For example, if you want correct RTL display in Emacs, you need to disable bidirectional reordering by the terminal. So what you suggest is against the design of Emacs in so many ways I cannot even begin explaining how major a change that would be, even if someone will be willing to pay the price of slower redisplay.

We could perhaps allow per-terminal customization of the character-width data, if we want to support terminals that deviate from the Unicode East-Asian width (as in the case of SOFT HYPHEN). But someone will have to provide the data, although for FOSS that just means to look in the sources. And even then, if terminals start having their own ideas of text layout, width data will not be enough...

kovidgoyal · 2021-10-05T13:43:26Z

On Tue, Oct 05, 2021 at 06:24:56AM -0700, Eli-Zaretskii wrote: > You home the cursor, print out your characters of interest, query the terminal for its cursor position. That'd significantly slow down text-mode display, especially if it goes via the network (which is a large portion of use cases where Emacs is used on text terminals). Currently, we just `fwrite` the encoded text to the device, we don't write it one character at a time. So I'd rather we didn't do that.

You do it once, at startup, along with all the other escape codes you use for detection. So it adds nothing to startup time. Of course it could be that emacs is doing no terminal feature detection at all, in which case, I encourage you to start. Pick a set of characters you think will be different over different terminals and get their widths.

Emacs wants total control on the text layout, leaving the terminal as dumb as possible. For example, if you want correct RTL display in Emacs, you need to disable bidirectional reordering by the terminal. So what you suggest is against the design of Emacs in so many ways I cannot even begin explaining how major a change that would be, even if someone will be willing to pay the price of slower redisplay.

What the width of a character should be is neither dumb nor smart, it just is. You need to decide what width to use, one value is correct, another is not. There certainly are some characters where it is not obvious what the correct answer is, VS2106 and the soft hyphen are not in that set.

We could perhaps allow per-terminal customization of the character-width data, if we want to support terminals that deviate from the Unicode East-Asian width (as in the case of SOFT HYPHEN). But someone will have to provide the data, although for FOSS that just means to look in the sources. And even then, if terminals start having their own ideas of text layout, width data will not be enough...

Unicode East Asian Width does not determine the width of a soft hyphen. Soft hyphens have nothing to do with east asian text. Again, you can choose to use either a wrong width or a correct width or a queried width. Two of those options are a lot better than the third. It's fairly insane that we are even discussing the right way to display a soft hyphen. The answer is obvious. You don't display it. It has zero width. I've already explained why. In any case I have spent enough time on this, its your editor, you do what you like with it. I just hope you make the sensible choice. Good luck.

kovidgoyal · 2021-10-06T09:51:20Z

Oh and just for completeness: Here is a FAQ entry from the unicode consortium, that elucidates how soft hyphens must be rendered. https://www.unicode.org/faq/unsup_char.html

All default-ignorable characters should be rendered as completely invisible (and non advancing, i.e. "zero width"), if not explicitly supported in rendering. These include:

cursive joiners (U+200C ZWNJ, U+200D ZWJ)

bidirectional format controls (e.g. U+200E LEFT-TO-RIGHT MARK)

the soft hyphen (U+00AD SOFT HYPHEN)

word joiners (U+2060 WORD JOINER, also U+FEFF ZWNBSP)

the zero width space (U+200B ZERO WIDTH SPACE)

invisible math operators (e.g., U+2061 FUNCTION APPLICATION)

Jamo filler characters (e.g., U+115F HANGUL CHOSEONG FILLER)

variation selectors

More technically, all characters with the "Default Ignorable Code Point (DI)" property must be rendered as zero width, non-advancing.

batbone added the bug label Oct 4, 2021

batbone mentioned this issue Oct 4, 2021

[Possible regression from vim] display bugs neovim/neovim#15898

Closed

kovidgoyal closed this as completed Oct 4, 2021

Repository owner locked as resolved and limited conversation to collaborators Oct 5, 2021

Display bugs with emacs and neovim #4094

Display bugs with emacs and neovim #4094

Comments

batbone commented Oct 4, 2021 • edited Loading

First bug

Second bug

kovidgoyal commented Oct 4, 2021

batbone commented Oct 4, 2021

kovidgoyal commented Oct 4, 2021

batbone commented Oct 4, 2021 • edited Loading

kovidgoyal commented Oct 4, 2021

batbone commented Oct 4, 2021

kovidgoyal commented Oct 4, 2021 via email

batbone commented Oct 4, 2021

kovidgoyal commented Oct 4, 2021 via email

batbone commented Oct 4, 2021

kovidgoyal commented Oct 4, 2021 via email

batbone commented Oct 4, 2021

Eli-Zaretskii commented Oct 4, 2021

kovidgoyal commented Oct 4, 2021 via email

Eli-Zaretskii commented Oct 4, 2021

kovidgoyal commented Oct 4, 2021 via email

batbone commented Oct 4, 2021 • edited Loading

Eli-Zaretskii commented Oct 4, 2021

Eli-Zaretskii commented Oct 4, 2021

batbone commented Oct 4, 2021

Eli-Zaretskii commented Oct 4, 2021

kovidgoyal commented Oct 4, 2021 via email

kovidgoyal commented Oct 4, 2021

Eli-Zaretskii commented Oct 4, 2021

kovidgoyal commented Oct 4, 2021 via email

batbone commented Oct 4, 2021

kovidgoyal commented Oct 4, 2021 via email

Eli-Zaretskii commented Oct 4, 2021

batbone commented Oct 4, 2021

Eli-Zaretskii commented Oct 4, 2021

Eli-Zaretskii commented Oct 4, 2021

batbone commented Oct 4, 2021

kovidgoyal commented Oct 5, 2021 via email

Eli-Zaretskii commented Oct 5, 2021

Eli-Zaretskii commented Oct 5, 2021

kovidgoyal commented Oct 5, 2021 via email

kovidgoyal commented Oct 6, 2021

batbone commented Oct 4, 2021 •

edited

Loading

batbone commented Oct 4, 2021 •

edited

Loading

batbone commented Oct 4, 2021 •

edited

Loading