-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scspell splits words tokens with diacritics inside words #35
Labels
Comments
robotdana
added a commit
to robotdana/scspell
that referenced
this issue
Nov 30, 2018
Previously: `Händler` would be tokenized as `ndler` or `ändler` depending on python version Rather than the expected `händler` Solution: use `regexp` rather than `re`. This gives us the ability to use unicode character clasess such as `[[:upper:]]` and `[[:lower:]]` Fixes myint#35
robotdana
added a commit
to robotdana/scspell
that referenced
this issue
Nov 30, 2018
Previously: `Händler` would be tokenized as `ndler` or `ändler` depending on python version Rather than the expected `händler` Solution: use `regexp` rather than `re`. This gives us the ability to use unicode character clasess such as `[[:upper:]]` and `[[:lower:]]` Fixes myint#35
robotdana
added a commit
to robotdana/scspell
that referenced
this issue
Nov 30, 2018
Previously: `Händler` would be tokenized as `ndler` or `ändler` depending on python version Rather than the expected `händler` Solution: use `regexp` rather than `re`. This gives us the ability to use unicode character clasess such as `[[:upper:]]` and `[[:lower:]]` Fixes myint#35
robotdana
added a commit
to robotdana/scspell
that referenced
this issue
Nov 30, 2018
Previously: `Händler` would be tokenized as `ndler` or `ändler` depending on python version Rather than the expected `händler` Solution: use `regexp` rather than `re`. This gives us the ability to use unicode character clasess such as `[[:upper:]]` and `[[:lower:]]` unicodedata.normalize is because travis was working differently than my mac Fixes myint#35
robotdana
added a commit
to robotdana/scspell
that referenced
this issue
Nov 30, 2018
Previously: `Händler` would be tokenized as `ndler` or `ändler` depending on python version Rather than the expected `händler` Solution: use `regexp` rather than `re`. This gives us the ability to use unicode character clasess such as `[[:upper:]]` and `[[:lower:]]` Fixes myint#35
+1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
eg.
händler
in python 3.7 finds a tokenändler
and in 2.7, finds a tokenndler
.The same is also an issue for words with other diacritics
The text was updated successfully, but these errors were encountered: