Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji detection #27

Closed
ivanjaros opened this issue Dec 31, 2022 · 4 comments · May be fixed by #55
Closed

Emoji detection #27

ivanjaros opened this issue Dec 31, 2022 · 4 comments · May be fixed by #55

Comments

@ivanjaros
Copy link

ivanjaros commented Dec 31, 2022

Since there is the emojiPresentation map, could this library be extended to detect emojis? I have a use case where I want to remove emojis from text but due to lack of options it seems I have to use the github.com/forPelevin/gomoji, which uses this library, but it has the entire emoji db that is 1.25MB map that needs to be loaded in memory, which I am not liking. Hence my question.

@rivo
Copy link
Owner

rivo commented Dec 31, 2022

I suppose uniseg could help you do that. However, you would need to copy some code over to your own project, including the grapheCodePoints and emojiPresentation tables (although graphemeCodePoints could be greatly reduced to only include the relevant emoji code points), because I'm not planning on making these internal functions and tables public.

You can take a look at FirstGraphemeClusterInString() and runeWidth(). These functions need to detect emojis to calculate a width of 2 for them. So this is what I would do:

  1. Use uniseg to break string into grapheme clusters.
  2. For each grapheme cluster, check the returned width. If width ≠ 2, it's not an emoji.
  3. Check all runes in grapheme cluster:
    1. If a rune is the "Variation Selector-16", it's an emoji.
    2. If the first rune is a regional indicator (i.e. country flags) , it's an emoji.
    3. If the first rune is an extended pictographic, it may be an emoji. Check the emojiPresentation table. If it gives you the "emoji presentation" flag, it's an emoji.

This procedure considers ♫ not an emoji. If you want to eliminate these, too, then it's a bit different (and simpler, because you wouldn't need the emojiPresentation table or the check for the "Variation Selector-16", and emojis could have a width of 1).

@ivanjaros
Copy link
Author

thanks, i'll give it a try.

@rivo rivo closed this as completed Jul 22, 2023
@aymanbagabas
Copy link

Hey @rivo, I've stumbled upon this, and I'm trying to detect emojis without copying any code from uniseg with this function, the only thing that i'm missing is checking the extended pictographic property.

// see https://github.com/rivo/uniseg/issues/27
func isEmojiCluster(w int, runes []rune) bool {
	if w != 2 {
		return false
	}
	if len(runes) > 0 && runes[0] >= regionalIndicatorA && runes[0] <= regionalIndicatorZ {
		return true
	}
	for r := range runes {
		if r == variationSelector16 {
			return true
		}
	}
	// TODO: detect extended pictographic property
	return false
}

Would you be ok with adding IsEmoji(width int, b []byte) bool and IsEmojiInString(width int, str string) bool to uniseg? I can send a PR for this

aymanbagabas added a commit to aymanbagabas/uniseg that referenced this issue May 30, 2024
This adds the necessary logic to detect if a grapheme cluster is an
emoji based on @rivo's [comment](rivo#27 (comment))

Fixes: rivo#27
aymanbagabas added a commit to aymanbagabas/uniseg that referenced this issue May 30, 2024
This adds the necessary logic to detect if a grapheme cluster is an
emoji based on @rivo's [comment](rivo#27 (comment))

Fixes: rivo#27
aymanbagabas added a commit to aymanbagabas/uniseg that referenced this issue May 30, 2024
This adds the necessary logic to detect if a grapheme cluster is an
emoji based on @rivo's [comment](rivo#27 (comment))

Fixes: rivo#27
aymanbagabas added a commit to aymanbagabas/uniseg that referenced this issue May 30, 2024
This adds the necessary logic to detect if a grapheme cluster is an
emoji based on @rivo's [comment](rivo#27 (comment))

Fixes: rivo#27
aymanbagabas added a commit to aymanbagabas/uniseg that referenced this issue May 30, 2024
This adds the necessary logic to detect if a grapheme cluster is an
emoji based on @rivo's [comment](rivo#27 (comment))

Fixes: rivo#27
aymanbagabas added a commit to aymanbagabas/uniseg that referenced this issue May 31, 2024
This adds the necessary logic to detect if a grapheme cluster is an
emoji based on @rivo's [comment](rivo#27 (comment))

Fixes: rivo#27
@mikelorant
Copy link

This would be a great addition and hope this might be considered for merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants