Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for NGram TestUTF8FullRange() tests (See #269) #316

Merged
merged 2 commits into from
Jul 24, 2020

Conversation

NightOwl888
Copy link
Contributor

@NightOwl888 NightOwl888 commented Jul 22, 2020

See the known failing tests in #269.

The problem was that the code point was being cast to a char before calling the IndexOf method, which means that it didn't support surrogate pairs. In rare instances where the cast turned the code point into a valid token character the test failed.

.NET doesn't have a built-in overload of String.IndexOf() that accepts a code point, that is an extension method in J2N.

…InnerClassHelper.IsTokenChar(int) that was causing surrogate pairs to fail in the TestUTF8FullRange() tests of NGramTokenizerTest and EdgeNGramTokenizerTest (see apache#269)
…Exceptions being thrown from char.ConvertToUtf32(string, int) by reverting back to CodePointAt() method in TestCharTokenizers.TestCrossPlaneNomalization().
@NightOwl888 NightOwl888 merged commit 43745db into apache:master Jul 24, 2020
@NightOwl888 NightOwl888 added this to the 4.8.0-beta00012 milestone Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant