Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with cloned RuleBasedBreakIterator failing under heavy parallel load, #95 #96

Merged
merged 2 commits into from
Nov 25, 2024

Conversation

paulirwin
Copy link
Collaborator

@paulirwin paulirwin commented Nov 24, 2024

Fixes #95

This fixes an issue with RuleBasedBreakIterator failing under heavy parallel load. This originally was found via Lucene.NET's TestRandomStrings methods, which spawn multiple threads and generate random Unicode strings to run through analysis. See apache/lucenenet#269 for some examples.

This PR adds a test that isolates the problem, and can be easily reproduced if you revert the change to DequeI. The problem was that cloned RuleBasedBreakIterator instances did not properly clone all of their object graph's data, and DequeI modified itself rather than modifying the clone. This created invalid state and thus the modified integer array would accidentally leak across threads to multiple instances incorrectly.

This surfaced reliably when new CjkBreakEngine(korean: false) was in the set of break engines, and would generally not be a problem when it wasn't in the list. I'm thinking this is possibly due to the large set of characters in this version of CjkBreakEngine making it more likely to "hit" and handle the random characters, rather than anything particularly problematic about CjkBreakEngine itself.

@NightOwl888 NightOwl888 merged commit 6c3cc90 into NightOwl888:main Nov 25, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notes:bug-fix Contains a fix for a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random failures in cloned RuleBasedBreakIterator under parallel heavy load with CJK strings
2 participants