scspell splits words tokens with diacritics inside words #35

robotdana · 2018-11-07T00:31:10Z

eg. händler in python 3.7 finds a token ändler and in 2.7, finds a token ndler.

The same is also an issue for words with other diacritics

The text was updated successfully, but these errors were encountered:

Previously: `Händler` would be tokenized as `ndler` or `ändler` depending on python version Rather than the expected `händler` Solution: use `regexp` rather than `re`. This gives us the ability to use unicode character clasess such as `[[:upper:]]` and `[[:lower:]]` Fixes myint#35

Previously: `Händler` would be tokenized as `ndler` or `ändler` depending on python version Rather than the expected `händler` Solution: use `regexp` rather than `re`. This gives us the ability to use unicode character clasess such as `[[:upper:]]` and `[[:lower:]]` unicodedata.normalize is because travis was working differently than my mac Fixes myint#35

Previously: `Händler` would be tokenized as `ndler` or `ändler` depending on python version Rather than the expected `händler` Solution: use `regexp` rather than `re`. This gives us the ability to use unicode character clasess such as `[[:upper:]]` and `[[:lower:]]` Fixes myint#35

kkmuffme · 2020-04-16T13:15:28Z

+1

myint added bug help wanted labels Nov 9, 2018

robotdana linked a pull request Nov 30, 2018 that will close this issue

Fix tokenising when using using more than just a-zA-Z #37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scspell splits words tokens with diacritics inside words #35

scspell splits words tokens with diacritics inside words #35

robotdana commented Nov 7, 2018 •

edited

Loading

kkmuffme commented Apr 16, 2020

scspell splits words tokens with diacritics inside words #35

scspell splits words tokens with diacritics inside words #35

Comments

robotdana commented Nov 7, 2018 • edited Loading

kkmuffme commented Apr 16, 2020

robotdana commented Nov 7, 2018 •

edited

Loading