Improve performance of grapheme frequency calculation #133

vcfxb · 2024-06-20T16:28:55Z

This PR improves the performance of grapheme frequency calculation by replacing several single threaded operations with parallel operations using rayon. Additionally we switch from using unicode_segmentation to finl_unicode to improve performance of grapheme cluster identification and iteration. On my machine, this all results in about a 50% total runtime reduction when checking npm/express.

…ization

…and replacing `filter_map` with `filter` and `map` in two places.

…equency calculation by parallelizing with `rayon` and switching unicode segmentation/clustering crates.

alilleybrinker · 2024-06-20T17:13:13Z

Thanks @vcfxb!

j-lanson · 2024-06-20T16:36:35Z

hipcheck/src/metric/entropy.rs

+			continue;
+		}
+
+		// Count the number of graphemes in this patch, add it to the tortal,


j-lanson · 2024-06-20T16:44:02Z

hipcheck/src/metric/entropy.rs

+	let mut total_graphemes: usize = 0;
+
+	// Iterate over the file diffs by reference.
+	for file_diff in &commit_diff.diff.file_diffs {


I was going to ask if it be possible to parallelize over the file iterator as well, but I figured I should figure out the answer myself before suggesting anything. Have a PR up in #134 to demonstrate how this could be done. I don't have/know the benchmarking setup so I haven't tested it locally myself to see what the savings are, if any.

vcfxb added 10 commits June 19, 2024 15:17

chore: add flamegraph file to gitignore

e578ac5

chore(deps): Add rayon and tempdir dependencies.

1edba48

feat(unstable): Add unstable command

8a4e678

chore: Refactor many uses of Rc to Arc as groundwork for parallel…

4db1963

…ization

Merge remote-tracking branch 'origin/main' into venus/performance

4a5e6a2

chore: Satisfy clippy by requiring Send + Sync in several places …

30922de

…and replacing `filter_map` with `filter` and `map` in two places.

chore: Remove duplicate comment.

86b51e2

perf(grapheme_freq): Dramatically improve performance for grapheme fr…

8bd2461

…equency calculation by parallelizing with `rayon` and switching unicode segmentation/clustering crates.

chore: cargo fmt

83ba963

Merge branch 'main' into venus/performance

9297afa

vcfxb requested review from alilleybrinker and j-lanson June 20, 2024 16:28

vcfxb self-assigned this Jun 20, 2024

alilleybrinker approved these changes Jun 20, 2024

View reviewed changes

alilleybrinker merged commit d302a24 into main Jun 20, 2024
9 checks passed

j-lanson reviewed Jun 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of grapheme frequency calculation #133

Improve performance of grapheme frequency calculation #133

vcfxb commented Jun 20, 2024

alilleybrinker commented Jun 20, 2024

j-lanson Jun 20, 2024

j-lanson Jun 20, 2024

Improve performance of grapheme frequency calculation #133

Improve performance of grapheme frequency calculation #133

Conversation

vcfxb commented Jun 20, 2024

alilleybrinker commented Jun 20, 2024

j-lanson Jun 20, 2024

Choose a reason for hiding this comment

j-lanson Jun 20, 2024

Choose a reason for hiding this comment