Loading a large number of user emoji can break emoji rendering Emacs wide #103

matthew-piziak · 2022-04-07T15:57:43Z

I use the emacs-slack package with a Slack team that uses a large number of emoji. I've actually managed to throw a low-level REG_ESIZE error from regex-emacs.c because the compiled regular expression is too big for emacs.

For example, here's the backtrace for (emojify-string "❄"):

Debugger entered--Lisp error: (invalid-regexp "Regular expression too big")
  search-forward-regexp("\\(?::\\(?:\\(?:\\+11\\|0\\(?:2_\\(?:b\\(?:\\(?:lin\\|ore\\)d..." 3 t)
  #f(compiled-function (regexp) #<bytecode 0xe297d60576a0c31>)("\\(?::\\(?:\\(?:\\+11\\|0\\(?:2_\\(?:b\\(?:\\(?:lin\\|ore\\)d...")
  mapc(#f(compiled-function (regexp) #<bytecode 0xe297d60576a0c31>) ("\\(?::\\(?:\\(?:\\+11\\|0\\(?:2_\\(?:b\\(?:\\(?:lin\\|ore\\)d..." ":[[:alnum:]+_-]+:" "\\(?:#⃣\\|\\*⃣\\|0⃣\\|1⃣\\|2⃣\\|3⃣\\|4⃣\\|5⃣\\|6⃣\\|7⃣\\|8⃣\\|9..." "\\(?:#\\(?:-?)\\)\\|%\\(?:-?)\\)\\|'\\(?::\\(?:-[()D]\\|[()D..."))
  seq-do(#f(compiled-function (regexp) #<bytecode 0xe297d60576a0c31>) ("\\(?::\\(?:\\(?:\\+11\\|0\\(?:2_\\(?:b\\(?:\\(?:lin\\|ore\\)d..." ":[[:alnum:]+_-]+:" "\\(?:#⃣\\|\\*⃣\\|0⃣\\|1⃣\\|2⃣\\|3⃣\\|4⃣\\|5⃣\\|6⃣\\|7⃣\\|8⃣\\|9..." "\\(?:#\\(?:-?)\\)\\|%\\(?:-?)\\)\\|'\\(?::\\(?:-[()D]\\|[()D..."))
  emojify-display-emojis-in-region(1 3 nil)
  emojify-string(" ❄")
  eval-expression((emojify-string " ❄") nil nil 127)
  funcall-interactively(eval-expression (emojify-string " ❄") nil nil 127)
  command-execute(eval-expression)

Is there any way I can limit the number of emoji, compile down the regex, or increase the allocated regex space?

The text was updated successfully, but these errors were encountered:

matthew-piziak · 2022-04-07T16:07:40Z

I see that this package already uses regexp-opt, that's good. I've created a kludge where I take only the first 2000 user emoji in emojify-set-emoji-data.

ag91 · 2024-11-25T22:56:03Z

I fell in the same issue now that I am maintaining emacs-slack. I don't understand why we use regexp in the first place though.
Isn't a hashtable better suited, since we are displaying emojis by region with emojify-redisplay-emojis-in-region?

ag91 · 2024-11-25T23:30:09Z

oh I get it now, catching the ascii and unicode ones is a real pain unless you enumerate them.
Luckily emacs-slack uses github style ones, which should make the hashtable way easier.

ag91 · 2024-11-25T23:46:14Z

very cool, so the solution is just to have (setq emojify--user-emojis-regexp nil) when you have a large number of (github) user emojis, because that way will work with the default github regex, that for emacs-slack is sufficient.
Pretty cool this was possible and that the hash table way was already implemented: my bad I didn't see it immediately!

emojify creates an OR regex with all the custom user emojis, when they are over 2000 the regex overflow emacs limit. Setting the regex to nil works around iqbalansari/emacs-emojify#103

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading a large number of user emoji can break emoji rendering Emacs wide #103

Loading a large number of user emoji can break emoji rendering Emacs wide #103

matthew-piziak commented Apr 7, 2022

matthew-piziak commented Apr 7, 2022

ag91 commented Nov 25, 2024

ag91 commented Nov 25, 2024

ag91 commented Nov 25, 2024

Loading a large number of user emoji can break emoji rendering Emacs wide #103

Loading a large number of user emoji can break emoji rendering Emacs wide #103

Comments

matthew-piziak commented Apr 7, 2022

matthew-piziak commented Apr 7, 2022

ag91 commented Nov 25, 2024

ag91 commented Nov 25, 2024

ag91 commented Nov 25, 2024