Fix issue with cloned RuleBasedBreakIterator failing under heavy parallel load, #95 #96
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #95
This fixes an issue with RuleBasedBreakIterator failing under heavy parallel load. This originally was found via Lucene.NET's TestRandomStrings methods, which spawn multiple threads and generate random Unicode strings to run through analysis. See apache/lucenenet#269 for some examples.
This PR adds a test that isolates the problem, and can be easily reproduced if you revert the change to
DequeI
. The problem was that cloned RuleBasedBreakIterator instances did not properly clone all of their object graph's data, and DequeI modified itself rather than modifying the clone. This created invalid state and thus the modified integer array would accidentally leak across threads to multiple instances incorrectly.This surfaced reliably when
new CjkBreakEngine(korean: false)
was in the set of break engines, and would generally not be a problem when it wasn't in the list. I'm thinking this is possibly due to the large set of characters in this version of CjkBreakEngine making it more likely to "hit" and handle the random characters, rather than anything particularly problematic about CjkBreakEngine itself.