feat: use gix for parsing git information in mitre/git plugin #663
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request removes the need for a custom
nom
-based parser for parsing git information in themitre/git
plugin, by swapping over to usinggix
https://github.com/GitoxideLabs/gitoxide for parsing git information. As a part of this change, I got some data from the following repos:testing on
https://github.com/linux/torvalds
was skipped for time reasonsI investigated using
libgit2
first, but the performance on larger repos was not ideal, and it was difficult to integrate caching into the code.gix
(https://github.com/GitoxideLabs/gitoxide) has better support for caching and is written in pure Rust which is also nice for memory safety reasons.for determining execution speed, the following command was run with the
release
version ofhc
on my M1 Macbook Pro with 32 GB RAM, themitre/git
plugin was also configured to run therelease
version:time ./target/release/hc check --policy config/Hipcheck.kdl <REPO>
for testing
main
, commit4f205afd6ba0afd98d0ba91dc742f2038f6ddc89
was usedmain
mitre/hipcheck
gix
mitre/hipcheck
main
numpy/numpy
gix
numpy/numpy
git rev-list --count HEAD
results:mitre/hipcheck
: 529 is returned, which matches both implementationsnumpy/numpy
: 37617 is returned, which matches thegix
implementationgit log --pretty="%an %ae%n%cn %ce" | LC_ALL=C sort -u | wc -l
results:mitre/hipcheck
: 20 is returned, which matches both implementationsnumpy/numpy
: 2243 is returned, which matchesgix
implementationThe main highlight of this pull request is the fact that the parsing of git-related information is handled by a git library, rather than our
nom
-based parser, which was missing commits and contributors for larger repos. At some point, a pass can be made to dive into the performance of thegit
plugin to find places for optimization