Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cursor location seems incorrect following zero-width-joined emoji #3810

Open
dankamongmen opened this issue Jul 7, 2021 · 12 comments
Open
Labels

Comments

@dankamongmen
Copy link
Contributor

Describe the bug
Emoji joined via a ZWJ display properly, but the cursor is moved too far. An example:

[schwarzgerat](0) $ printf '\e[6n\U0001f9d1\u200d\U0001f33e\e[6n'
🧑‍🌾  [schwarzgerat](0) $ 1R5R

as you can see, the cursor report indicates that we have moved four -- which is accurate, as we have indeed moved four cursor positions forward. we only ought have moved two.

FWIW, it's nice that kitty actually implements this; it's ahead of most terminals in this regard.

To Reproduce
Steps to reproduce the behavior:

  1. Get a cursor report
  2. Print an emoji created via ZWJ (there might be some that work; the example above does not)
  3. Get a cursor report
  4. Look and see that spaces have been printed where no spaces were desired

Screenshots
2021-07-07-102248_781x659_scrot

Environment details

kitty 0.21.2 (e07ba2c53d) created by Kovid Goyal
Linux schwarzgerat 5.12.14nlb #1 SMP Sun Jul 4 17:05:42 EDT 2021 x86_64
Debian GNU/Linux 11 \n \l
Running under:X11
Loaded config files:
  /etc/xdg/kitty/kitty.conf
  /home/dank/.config/kitty/kitty.conf

Config options different from defaults:
background_opacity    0.7
enable_audio_bell     False
font_family           Hack
initial_window_height (72, 'cells')
initial_window_width  (132, 'cells')
scrollback_lines      20000
update_check_interval 0.0

This debug output has been copied to the clipboard

Additional context
Happens the same way.

@kovidgoyal
Copy link
Owner

kovidgoyal commented Jul 7, 2021

Yeah it's on my TODO list to add the ~1400 such sequences from https://unicode.org/emoji/charts/emoji-zwj-sequences.html
to gen-wcwidth.py, generate some kind of fast lookup data structure for
them and then use them in kitty's wcswidth() and text processing
functions, however its a fairly large task with very minimal payoff
since that still wont actually make them use able until some terminal
programs add support for them in their wcswidth() implementations. In
fact the current cursor movement in kitty actually matches most shells
and editors expectations, changing that is likely to actually break more
things in the short term.

If you wish to add support for them PRs are welcome.

More general and complete grapheme clustering: https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

@dankamongmen
Copy link
Contributor Author

yep, my assessment is similar. i'm looking at trying to implement unicode faithfully in this regard, at which point i'll start seeing which terminals match it. wcswidth() definitely breaks down; i think a big piece of the puzzle here would be a self-contained low-level, fast, minimal-memory EGC-to-width map. if kitty already has a fairly advanced one, how difficult would it be to extract and package up as an .so (by difficult i don't so much mean technical aspects, but more would you be interested in moving that external to kitty, and who would be maintaining it, etc)?

@kovidgoyal
Copy link
Owner

Yes, that's another item for my endless TODO list. It's never been worth
the trouble for me to actually do the work in the face of higher
priority items, but if you wish to collaborate on it, I am happy to be a
co-maintainer.

I would be willing to donate kitty's gen-wcwidth.py to such a project,
and to change kitty to depend on the .so produced by such a project. The
caveat of course is that gen-wcwidth.py is used for a lot more than just
wcswidth(), so the project would either have to become a bit of a
grab-bag of functions useful for terminal text processing, or else it
would become standalone and we would need to keep kitty and the projects
wcswidth() functions in sync by hand.

@latipun7
Copy link

Sorry, I don't know the technical details, but I have an issue.
When I type (copy paste) the emoji with ZWJ sequence, it does not print as a single emoji.
technologist emoji
Does my issue related to this issue? Thanks.

Step to reproduce:

  • go to https://emojipedia.org/technologist/ and copy (to the clipboard) the emoji
  • edit text file in terminal using file editor (nvim and go to insert mode)
  • paste to the terminal from clipboard (Ctrl+Shift+V)
  • the emoji shown as the image above
Debug Details
kitty 0.24.2 created by Kovid Goyal
Linux ROG 5.16.10-arch1-1 #1 SMP PREEMPT Wed, 16 Feb 2022 19:35:18 +0000 x86_64
Arch Linux 5.16.10-arch1-1 (/dev/tty)

Running under: X11
Frozen: False
Paths:
  kitty: /usr/bin/kitty
  base dir: /usr/lib/kitty
  extensions dir: /usr/lib/kitty/kitty
  system shell: /usr/bin/zsh
Loaded config files:
  /home/latipun/.config/kitty/kitty.conf

Config options different from defaults:
background_opacity            0.75
cursor_shape                  3
font_size                     13.0
scrollback_pager_history_size 19922944
window_padding_width          FloatEdges(left=7.0, top=7.0, right=7.0, bottom=7.0)
Changed shortcuts:
	ctrl+shift+delete → combine : clear_terminal active : send_text normal \x0c
Colors:
	active_border_color           #c9cbff   
	active_tab_background         #575268   
	active_tab_foreground         #f5c2e7   
	background                    #1e1e2e   
	bell_border_color             #fae3b0   
	color0                        #6e6c7e   
	color1                        #f28fad   
	color10                       #abe9b3   
	color11                       #fae3b0   
	color12                       #96cdfb   
	color13                       #f5c2e7   
	color14                       #89dceb   
	color15                       #d9e0ee   
	color2                        #abe9b3   
	color3                        #fae3b0   
	color4                        #96cdfb   
	color5                        #f5c2e7   
	color6                        #89dceb   
	color7                        #d9e0ee   
	color8                        #988ba2   
	color9                        #f28fad   
	cursor                        #f5e0dc   
	cursor_text_color             #1e1e2e   
	foreground                    #d9e0ee   
	inactive_border_color         #575268   
	inactive_tab_background       #1e1e2e   
	inactive_tab_foreground       #d9e0ee   
	mark1_background              #96cdfb   
	mark1_foreground              #1e1e2e   
	mark2_background              #f5c2e7   
	mark2_foreground              #1e1e2e   
	mark3_background              #b5e8e0   
	mark3_foreground              #1e1e2e   
	selection_background          #575268   
	selection_foreground          #d9e0ee   
	tab_bar_background            #161320   
	url_color                     #f5e0dc   

Environment variable names seen by the kitty process:
	ALTUSERXSESSION
	BROWSER
	CARGO_HOME
	DBUS_SESSION_BUS_ADDRESS
	DESKTOP_SESSION
	DISPLAY
	EDITOR
	ERRFILE
	FNM_DIR
	FORCE_COLOR
	GDMSESSION
	GH_TOKEN
	GOPATH
	GPG_TTY
	GTK2_RC_FILES
	GTK_MODULES
	GTK_RC_FILES
	HOME
	HOMEBREW_GITHUB_API_TOKEN
	LANG
	LC_COLLATE
	LC_CTYPE
	LC_MEASUREMENT
	LC_NUMERIC
	LC_TELEPHONE
	LESS
	LOGNAME
	MAIL
	MANPAGER
	MOTD_SHOWN
	NNN_ARCHIVE
	NNN_COLORS
	NNN_FCOLORS
	NNN_FIFO
	NNN_OPTS
	NPM_TOKEN
	PATH
	PWD
	QT_QPA_PLATFORMTHEME
	RUSTUP_HOME
	SHELL
	SHLVL
	USER
	USERXSESSION
	USERXSESSIONRC
	VISUAL
	VSCE_PAT
	XAUTHORITY
	XDG_CACHE_HOME
	XDG_CONFIG_HOME
	XDG_DATA_HOME
	XDG_GREETER_DATA_DIR
	XDG_RUNTIME_DIR
	XDG_SEAT
	XDG_SEAT_PATH
	XDG_SESSION_CLASS
	XDG_SESSION_DESKTOP
	XDG_SESSION_ID
	XDG_SESSION_PATH
	XDG_SESSION_TYPE
	XDG_STATE_HOME
	XDG_VTNR

@ghost
Copy link

ghost commented Mar 13, 2022

This seems related to my issue, so I'm posting here. I just switched to gnome-terminal which has inconsistent rendering of emojis. At first I was pleased to see that kitty renders them properly, but then I noticed that the cursor positioning is problematic. As it stands, this is actually less usable than gnome-terminal.

Screencast.from.13-03-22.16.07.38.mp4

@kovidgoyal
Copy link
Owner

That will be because whatever terminal program you are running is using a different width calculation from what kitty uses, which is based on the unicde standard. And unless your emoji is using zwj it is completely unrelated so post elsewhere.

@trygveaa
Copy link
Contributor

Sorry, I don't know the technical details, but I have an issue.
When I type (copy paste) the emoji with ZWJ sequence, it does not print as a single emoji.
technologist emoji
Does my issue related to this issue? Thanks.

@latipun7: No, this is caused by zsh not supporting ZWJ. If you run cat and paste it you'll see that it works fine (except for the cursor problem this issue is about).

@sharnoff
Copy link

@kovidgoyal @dankamongmen Just found this bug after working on a unicode library designed to handle this sort of thing. It winds up being fairly easy, and AFAICT my solution should be decently efficient (binary encoding of a codepoint-based Trie). The data for just emoji sequences (both zwj and otherwise) ends up at ~48kb, with a decent amount of extra data that kitty probably wouldn't need.

If there's interest, I'd be happy to provide suggestions for how it could be implemented in C

@kovidgoyal
Copy link
Owner

kovidgoyal commented Jun 13, 2022 via email

@sharnoff
Copy link

do you mean wcswidth() in general or looking up emoji combining sequences in particular

I was looking just at emoji sequences -- given that they're all suppposed to have the same width (in practice, font combinations can mess this up IIRC - e.g., defaulting to text presentation when the Emoji spec says otherwise, which is often a width of 1 column).

For adding support:

I was separating into grapheme clusters first, but the operation is mostly the same. If I were implementing this for kitty, I'd use the same sort of trie of codepoints, where each node stores (a) whether the sequence up to that point is one of the zwj sequences, or (b) how wide it would be otherwise. Changing wcswidth_step() gets a little tricky, because maybe it's implicitly expected not to decrease the width. Another tricky spot is handling cases like "this is most of a long zwj sequence, but it's missing the end so it's actually mutiple separate emoji now" -- there's a bit of extra work if those need to be separated into cells in post (and possibly re-parsed for new zwj sequence starters - I'm not sure whether any cases of this are possible).

I don't already know anything about kitty's cell datastructure, but I'd just add: if you're only using the listed zwj sequences, the longest sequence in the set is currently only 10 codepoints -- it doesn't need to be unbounded. But there are a small enough number of them (currently 1349) that packing them into existing empy space in the datatypes may be possible.

@kovidgoyal
Copy link
Owner

kovidgoyal commented Jun 14, 2022 via email

@dankamongmen
Copy link
Contributor Author

Sure, I am always happy to discuss design ideas. By this sort of thing do you mean wcswidth() in general or looking up emoji combining sequences in particular. For zwj+emoji support in kitty one needs basically: 1) Adding zwj+emoji support to wcswidth_step() which basically tells you how the width of a string changes when you add a codepoint to it. 2) Changing kitty's cell data structure to support infinite length codepoint strings. This will likely be a auxilliary hash mapping shorts to heap allocated codepoint arrays. The shorts will be stored per cell. This will reduce cell memory usage by 4 bytes at the cost of making looking up the text in a cell more expensive (which is fortunately not a frequent operation).

fwiw @sharnoff notcurses would need pretty much the exact same thing as @kovidgoyal mentions for his wcswidth_step(), except it would be in utf8_egc_len(). notcurses already handles arbitrarily long EGCs via spillover to the egcpool structure, of which there is one per ncplane.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants