Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display bugs with emacs and neovim #4094

Closed
batbone opened this issue Oct 4, 2021 · 37 comments
Closed

Display bugs with emacs and neovim #4094

batbone opened this issue Oct 4, 2021 · 37 comments
Labels

Comments

@batbone
Copy link

batbone commented Oct 4, 2021

Describe the bug
Some presumably uncommon characters cause display bugs in emacs and neovim, and this does not happen in most other terminal emulators I have tested.

First bug

Steps to reproduce the behavior:
1.command kitty --config=/dev/null
2. curl https://files.lilf.ir/tmp/weird.txt > weird.txt
3. emacs -Q -nw weird.txt
4. Trying to edit the text in the middle will immediately show you the corruption, but to be precise, go on the visible char e in note-taking, and press C-x = to report what char we are on. Instead of getting back e, we get SPC!
image

  1. Exit emacs with C-x C-c
  2. nvim weird.txt
  3. Try deleting that e and type A. The corruption is obvious:

image

At first, I thought this was an emacs bug, as vim, and previous versions of emacs did not exhibit this behavior. But after extensive discussion on the emacs bug tracker, we think this is probably a terminal emulator issue. I have tested this with Terminal.app, Alacritty, and iTerm, and only iTerm also exhibits this buggy behavior.

Second bug

  1. command kitty --config=/dev/null
  2. curl https://files.lilf.ir/tmp/bug.txt > bug.txt
  3. Do cat bug.txt and note the output:

image

  1. emacs -Q -nw bug.txt

image

  1. Note the corruption; In particular, the line #+TITLE: sharif/contact info is not displayed at all.
  2. Exit emacs
  3. nvim bug.txt

image

This bug reproduces with emacs 27, emacs 28, and nvim, on Kitty, and not on iTerm, Alacritty, or Terminal.app. vim still works correctly though:
image

Environment details

nvim: 0.5.1
kitty: 0.23.1
TERM: xterm-kitty
macOS: 11.2.1
emacs: 28 (built this week from the master)
In GNU Emacs 28.0.50 (build 1, x86_64-apple-darwin20.3.0, NS appkit-2022.30
Version 11.2.1 (Build 20D75))
 of 2021-09-21 built on Fereidoons-MacBook-Pro.local
System Description:  macOS 11.2.1

Configured using:
 'configure --disable-dependency-tracking --disable-silent-rules
 --enable-locallisppath=/usr/local/share/emacs/site-lisp
 --infodir=/usr/local/Cellar/emacs-plus <at> 28/28.0.50/share/info/emacs
 --prefix=/usr/local/Cellar/emacs-plus <at> 28/28.0.50 --with-xml2
 --with-gnutls --with-native-compilation --without-dbus
 --with-imagemagick --with-modules --with-rsvg --with-xwidgets --with-ns
 --disable-ns-self-contained 'CFLAGS=-I/usr/local/opt/gcc/include
 -I/usr/local/opt/libgccjit/include -I/usr/local/opt/gmp/include
 -I/usr/local/opt/jpeg/include' 'LDFLAGS=-L/usr/local/lib/gcc/11
 -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include
 -I/usr/local/opt/gmp/include -I/usr/local/opt/jpeg/include''

Configured features:
ACL GIF GLIB GMP GNUTLS IMAGEMAGICK JPEG JSON LCMS2 LIBXML2 MODULES
NATIVE_COMP NOTIFY KQUEUE NS PDUMPER PNG RSVG THREADS TIFF
TOOLKIT_SCROLL_BARS XIM XWIDGETS ZLIB

Important settings:
  value of $LC_ALL: en_US.UTF-8
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

I have also reproduced the bugs on Ubuntu 20 via SSH, so it happens on both macOS and Linux, at least when Kitty runs from macOS.

I have attached the reproduction files here as well:

@kovidgoyal
Copy link
Owner

There is no bug here. These issues will happen when the program running
in the terminal has a different idea of the width (as number of terminal
cells) of a string than the terminal emulator. Many older programs use
the wcwidth() C library function, which is fundamentally broken, because
in modern unicode, you cant tell the width of strings a character at a
time, because of variation selectors, zero width joiners, etc. Therefore
they need to use wcswidth() instead to get the correct unicode width of
a string.

kitty uses a wcswidth() implementation auto-generated from the latest
unicode standard. I dont know what emacs and vim use, but the chances
are high that they are incorrect, not kitty. If you feel differently
post the unicode codepoints that you think kitty is getting the wrong
width for. You can easily test the width that kitty thinks a string
should be with:

kitty +runpy 'from kitty.fast_data_types import *; import sys; print(wcswidth(sys.argv[-1]))' foo

replace foo above with the string you want to test.

@batbone
Copy link
Author

batbone commented Oct 4, 2021

@kovidgoyal Can you keep the issue open? I have no expertise on this, I just provided the reproduction guide, but the emacs people also thought it's not 'an emacs bug,' so it seems the bug is a bit complicated, and maybe someone who knows more about both programs will come along and contribute the root of the problem. (I am not saying this is a Kitty bug, but that it being open can attract the needed attention.)

PS: Is the second bug also related to getting the wrong width? The corruption there is very extensive, so naively I thought it might be something else.

Thanks.

@kovidgoyal
Copy link
Owner

I'm afraid I dont keep bugs open for things I dont think are bugs, but you or anyone else is welcome to comment further on this bug and I will respond. As for the second issue, that is RTL text, which is totally broken on terminals and terminal applications in general, see #2109

@batbone
Copy link
Author

batbone commented Oct 4, 2021

@kovidgoyal The second bug is not related to the RTL text. Here is a reproduction using LTR text:

image

image

See how there are whole lines skipped, and how nvim's modeline is all wrong.

@kovidgoyal
Copy link
Owner

The fact that it renders correctly with cat already means the bug is in nvim. And it looks fine in vim for me
vim

@batbone
Copy link
Author

batbone commented Oct 4, 2021

@kovidgoyal Exactly! But emacs has this exact same bug, and the bug only appears on Kitty, and not Terminal.app, Alacritty, or iTerm. So there is an interaction between something in nvim/emacs and kitty that is causing this bug.

BTW, I am noticing some other nondeterministic bug after playing with these files for a bit, where the whole terminal session will go faulty (any TUI command I run will behave weirdly), and doing exec zsh, reset, clear do not solve the problem. I can't reproduce this reliably though, unfortunately.

@kovidgoyal
Copy link
Owner

kovidgoyal commented Oct 4, 2021 via email

@batbone
Copy link
Author

batbone commented Oct 4, 2021

On Mon, Oct 04, 2021 at 03:00:14AM -0700, batbone wrote: @kovidgoyal Exactly! But emacs has this exact same bug, and the bug only appears on Kitty, and not Terminal.app, Alacritty, or iTerm. So there is an interaction between something in nvim/emacs and kitty that is causing this bug.
Yes, that file contains a variation selector U+FE0F. The bullets in your list are U+25AB followed by U+FE0F this combination has width two and is correctly being rendered in kitty as width 2, you can see that by moving your cursor over it, it becomes a fat square. So as I said there is no bug in kitty. What emacs' problem is, only emacs developers can tell you.

Sorry, I should have made two different bug reports, to avoid the confusion here. (If you want, I can open a separate bug report for the second bug now?) I think your comment pertains to the bug with weird.txt?

The second bug (bug_ltr.txt) makes nvim and emacs skip whole lines, and have corrupt UIs.

Here is a screenshot of the correct vim:
image

Here is a screenshot of the incorrect neovim:
image

Oh. Trying to take the screenshot of emacs, emacs actually displays bug_ltr.txt correctly:
image

But it has the same bug with the old, RTL version bug.txt (Note that I am not referring to the RTL text itself being all wrong, but that the line #+TITLE: sharif/contact info is not displayed, and emacs' top menu is also hidden):
image


All the incorrect behavior reported in this comment is exclusive to Kitty. Here is a screenshot of bug_ltr.txt in neovim on Alacritty:

image


Do you think this is a problem with getting the width of characters wrong? (Considering it seems more of a height issue.)

@kovidgoyal
Copy link
Owner

kovidgoyal commented Oct 4, 2021 via email

@batbone
Copy link
Author

batbone commented Oct 4, 2021

On Mon, Oct 04, 2021 at 03:30:59AM -0700, batbone wrote: Do you think this is a problem with getting the width of characters wrong? (Considering it seems more of a height issue.)
I have no idea, I am not a vim or emacs developer. Once width calculations go wrong, everything can go wrong, since the UI is strings of text of ostensibly known width. The suspicious thing is the variation selector, and I know lots of terminals don't support variation selectors. So it's not surprising that those two bugs cancel out. The terminal not supporting it and the editor not supporting it. Indeed recently there was a whole conversation with some nutcase arguing that because many terminals currently dont support it it should never be supported and I should drop support for it from kitty. And as I showed in my screenshot, I get no missing lines, and no UI corruption in vim (version 8.2.3441) in kitty.

@kovidgoyal Can Kitty add an option to disable support for these variation selectors? It's great that you're modernizing terminals, and I agree that the support should be on by default, but having a compatibility mode would help us get through the transitionary period.

It can also help us pinpoint the bug, as it might not be the variation selectors after all.

@kovidgoyal
Copy link
Owner

kovidgoyal commented Oct 4, 2021 via email

@batbone
Copy link
Author

batbone commented Oct 4, 2021

Removing the variation selectors indeed solves the bug with bug.txt:
image

Hopefully, emacs and neovim maintainers can solve this issue.

PS: I used this code to remove the variation selectors:

package main

import (
	"fmt"
	"io/ioutil"
	"unicode"
	"os"
	"log"
)

func main() {
	inBytes, err := ioutil.ReadAll(os.Stdin)
	if err != nil {
		log.Fatalln(err.Error())
	}
	input := string(inBytes)

	output := make([]rune, 0)
	for _, rune := range input  {
		if ! unicode.In(rune, unicode.Variation_Selector) {
			output = append(output, rune)
		}
	}

	fmt.Printf("%s", string(output))

}

@Eli-Zaretskii
Copy link

Regarding the issue with character width: Emacs uses character width tables computed from the latest Unicode Standard version 14.0.0, using the data in the file EastAsianWidth.txt. In that text, the U+00AD SOFT HYPHEN character, which caused the problems in your file, has the East Asian Width property value of A, which stands for "Ambiguous". The definition of this value in the Unicode Standard Annex 11 (UAX#11) is as follows:

   East Asian Ambiguous (A): All characters that can be sometimes wide and sometimes narrow. Ambiguous characters require additional information not contained in the character code to further resolve their width. Ambiguous characters occur in East Asian legacy character sets as wide characters, but as narrow (i.e., normal-width) characters in non-East Asian usage.

And since the file you show didn't have any East Asian legacy characters, treating SOFT HYPHEN as narrow is IMO correct.

@kovidgoyal
Copy link
Owner

kovidgoyal commented Oct 4, 2021 via email

@Eli-Zaretskii
Copy link

A soft hyphen is not rendered at all, unless at a line break

So this is the root cause of the problem in this case, AFAIU: Kitty assumes that the SOFT HYPHEN will not be output in the middle of a line, but Emacs does output it. It has nothing to do with the width tables.

And note that EastAsianWidth.txt alone is not sufficient for wcswidth(). It does not cover emoji, variation selectors, zero width joiners, etc.

Which is one reason why Emacs doesn't use wcwidth, it uses the Unicode data directly, and accounts for character compositions. But this all is beside the point, as long as Kitty assumes the editor will not output certain characters.

@kovidgoyal
Copy link
Owner

kovidgoyal commented Oct 4, 2021 via email

@batbone
Copy link
Author

batbone commented Oct 4, 2021

I think I found a workaround for the SOFT HYPHEN issue (i.e., weird.txt) in emacs:

(set-char-table-range glyphless-char-display
                        (char-from-name "SOFT HYPHEN") 'zero-width)

We can detect Kitty by checking for the env var KITTY_WINDOW_ID, and run this workaround:

(defun kitty-p ()
  (let ((kitty-window-id (getenv "KITTY_WINDOW_ID")))
    (and kitty-window-id
         (not (string= kitty-window-id "")))))

(when (kitty-p)
  (set-char-table-range glyphless-char-display
                        (char-from-name "SOFT HYPHEN") 'zero-width))

So only the issue with the unicode variation selectors remain.

@Eli-Zaretskii
Copy link

No, kitty does not care about the presence of soft hyphens. It counts them as zero width and does not render them.

OK, but the result is the same: Kitty assumes something that is not shared by the editor.

Which is the correct behavior. The problem comes from emacs incorrectly counting them as width 1.

I respectfully disagree. Whether to display zero-width characters is up to the "higher-level protocols", and Emacs traditionally doesn't remove anything from the display. For example, ZWNJ, if it doesn't combine with surrounding text, is displayed as a single-pixel space on GUI displays, and as a regular space on text-mode displays, such as Kitty.

So even if SOFT HYPHEN were a zero-width character (which it isn't, see the citation from the Unicode Annex), a terminal should not assume that an editor will share its ideas about text layout.

Does Kitty have a setting that can change this behavior? If so, the OP could try using it. If there's no such setting, and the Kitty developers think Kitty behaves correctly I can only conclude that Emacs and Kitty currently cannot work together, and there's no reason to continue this discussion.

@Eli-Zaretskii
Copy link

I think I found a workaround for the SOFT HYPHEN issue (i.e., weird.txt) in emacs:

If you don't mind not seeing that character on display and having trouble deleting it, then fine, you can use this solution for your customizations.

@batbone
Copy link
Author

batbone commented Oct 4, 2021

Turning off auto-composition-mode in emacs indeed solves the problem with bug.txt:

image

But it makes the bullets ugly. Terminal.app behaves correctly with auto-composition-mode enabled, and shows beautiful bullets:
image

I think this shows that Terminal.app also supports unicode variation characters, and the issue is specifically with a non-industry-standard assumption on Kitty's part?

@Eli-Zaretskii
Copy link

But it makes the bullets ugly.

Well, "ugly" is better than "messed-up", I'd say. Don't you agree?

the issue is specifically with a non-industry-standard assumption on Kitty's part?

Let's say the issue is different assumptions in Kitty and in Emacs. Specifically, Emacs by default relies on the text-mode terminal to perform the necessary character shaping required by sequences such as this one.

@kovidgoyal
Copy link
Owner

kovidgoyal commented Oct 4, 2021 via email

@kovidgoyal
Copy link
Owner

Let's say the issue is different assumptions in Kitty and in Emacs. Specifically, Emacs by default relies on the text-mode terminal to perform the necessary character shaping required by sequences such as this one.

No emacs relies on text mode terminals not doing the necessary character shaping.

@Eli-Zaretskii
Copy link

No emacs relies on text mode terminals not doing the necessary character shaping.

You misunderstood.

@kovidgoyal
Copy link
Owner

kovidgoyal commented Oct 4, 2021 via email

@batbone
Copy link
Author

batbone commented Oct 4, 2021

Please, this is just a software issue. There is no need to get adversarial over it. There will always be broken assumptions around complex pieces of software interfacing with each other.

Mr. Goyal is a bit infamous for their, ahem, inflammatory behavior, but they are just one person maintaining and developing a lot of important, complex pieces of FOSS software. And they respond promptly and to every issue, and even help users on sites like mobileread. If they try to give every issue the consideration that, say, emacs can afford to give its filed issues, I fear it might not be sustainable for them. Being opinionated is somewhat of a necessary evil when the needed manpower is not present.

Anyhow, sorry @kovidgoyal if I sound patronizing or anything. Thank you for the work you have done and continue to do.


We have found the cause of the first bug, but I think it's still not clear what is causing the unicode variation selector bug. What is different about Kitty and Terminal.app that is tripping up emacs?

Why is this not breaking vim as well? vim has nice bullets on kitty:
image

After the breaking point is identified, workarounds can be discussed.

@kovidgoyal
Copy link
Owner

kovidgoyal commented Oct 4, 2021 via email

@Eli-Zaretskii
Copy link

I think I found a workaround for the SOFT HYPHEN issue

Here's a potentially better workaround:

        (or standard-display-table
            (setq standard-display-table (make-display-table)))
        (aset standard-display-table
              #xAD (vector (make-glyph-code ?- 'escape-glyph)))

This will display SOFT HYPHEN as the ASCII dash character -, but with a special typeface that will make it stand out.

@batbone
Copy link
Author

batbone commented Oct 4, 2021

@Eli-Zaretskii Thanks, that was exactly what I was thinking would be best, but I did not know how to do it. Does emacs 27 do this? I see dashes with emacs 27 without running this code.


I think Mr. Goyal is right about emacs being in the wrong on bug.txt. Emojis indeed usually take a width of two, but emacs somehow thinks the bullet emoji is taking a width of one:

image

Compare with an emoji that emacs recognizes:

image

Presing <right>:
image

@Eli-Zaretskii
Copy link

Emojis indeed usually take a width of two, but emacs somehow thinks the bullet emoji is taking a width of one

Most terminals produce a single-column glyph for these sequences. If Emacs would go with Kitty, it would fail to work on all the other terminals, since there's no way for it to know which terminal does what with each composable sequence of codepoints.
At least AFAIK; if someone knows how to ask the terminal about its behavior in those cases, I'm all ears.

@Eli-Zaretskii
Copy link

Thanks, that was exactly what I was thinking would be best, but I did not know how to do it. Does emacs 27 do this? I see dashes with emacs 27 without running this code

Sorry, I don't understand: you see dashes for what text?

@batbone
Copy link
Author

batbone commented Oct 4, 2021

Emojis indeed usually take a width of two, but emacs somehow thinks the bullet emoji is taking a width of one

Most terminals produce a single-column glyph for these sequences. If Emacs would go with Kitty, it would fail to work on all the other terminals, since there's no way for it to know which terminal does what with each composable sequence of codepoints. At least AFAIK; if someone knows how to ask the terminal about its behavior in those cases, I'm all ears.

Can't we just configure this using some elisp? Then we can reuse the hack I did with looking for the env var KITTY_WINDOW_ID.

Sorry, I don't understand: you see dashes for what text?

Emacs 27 shows SOFT HYPHEN as a dash on Kitty:
image

It looks similar to the result I get on emacs 28 after running your workaround.

@kovidgoyal
Copy link
Owner

kovidgoyal commented Oct 5, 2021 via email

@Eli-Zaretskii
Copy link

Emacs 27 shows SOFT HYPHEN as a dash on Kitty:

Yes, we changed the behavior in Emacs 28, to avoid interfering with line-wrapping under visual-line-mode. Previously, Emacs would wrap lines on NBSP, because it was displayed as an ASCII space.

@Eli-Zaretskii
Copy link

You home the cursor, print out your characters of interest, query the terminal for its cursor position.

That'd significantly slow down text-mode display, especially if it goes via the network (which is a large portion of use cases where Emacs is used on text terminals). Currently, we just fwrite the encoded text to the device, we don't write it one character at a time. So I'd rather we didn't do that.

Emacs wants total control on the text layout, leaving the terminal as dumb as possible. For example, if you want correct RTL display in Emacs, you need to disable bidirectional reordering by the terminal. So what you suggest is against the design of Emacs in so many ways I cannot even begin explaining how major a change that would be, even if someone will be willing to pay the price of slower redisplay.

We could perhaps allow per-terminal customization of the character-width data, if we want to support terminals that deviate from the Unicode East-Asian width (as in the case of SOFT HYPHEN). But someone will have to provide the data, although for FOSS that just means to look in the sources. And even then, if terminals start having their own ideas of text layout, width data will not be enough...

@kovidgoyal
Copy link
Owner

kovidgoyal commented Oct 5, 2021 via email

Repository owner locked as resolved and limited conversation to collaborators Oct 5, 2021
@kovidgoyal
Copy link
Owner

Oh and just for completeness: Here is a FAQ entry from the unicode consortium, that elucidates how soft hyphens must be rendered. https://www.unicode.org/faq/unsup_char.html

All default-ignorable characters should be rendered as completely invisible (and non advancing, i.e. "zero width"), if not explicitly supported in rendering. These include:

cursive joiners (U+200C ZWNJ, U+200D ZWJ)

bidirectional format controls (e.g. U+200E LEFT-TO-RIGHT MARK)

the soft hyphen (U+00AD SOFT HYPHEN)

word joiners (U+2060 WORD JOINER, also U+FEFF ZWNBSP)

the zero width space (U+200B ZERO WIDTH SPACE)

invisible math operators (e.g., U+2061 FUNCTION APPLICATION)

Jamo filler characters (e.g., U+115F HANGUL CHOSEONG FILLER)

variation selectors

More technically, all characters with the "Default Ignorable Code Point (DI)" property must be rendered as zero width, non-advancing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants