-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of grapheme frequency calculation #133
Conversation
…and replacing `filter_map` with `filter` and `map` in two places.
…equency calculation by parallelizing with `rayon` and switching unicode segmentation/clustering crates.
Thanks @vcfxb! |
continue; | ||
} | ||
|
||
// Count the number of graphemes in this patch, add it to the tortal, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*total
let mut total_graphemes: usize = 0; | ||
|
||
// Iterate over the file diffs by reference. | ||
for file_diff in &commit_diff.diff.file_diffs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to ask if it be possible to parallelize over the file iterator as well, but I figured I should figure out the answer myself before suggesting anything. Have a PR up in #134 to demonstrate how this could be done. I don't have/know the benchmarking setup so I haven't tested it locally myself to see what the savings are, if any.
This PR improves the performance of grapheme frequency calculation by replacing several single threaded operations with parallel operations using
rayon
. Additionally we switch from usingunicode_segmentation
tofinl_unicode
to improve performance of grapheme cluster identification and iteration. On my machine, this all results in about a 50% total runtime reduction when checkingnpm/express
.