Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Addresses: bridgecrewio#6740
Description
checkov always eagerly loads all runners and checks on all CLI invocations, regardless of whether they are needed or not. I guess this was not always an issue but currently it seems to add quite the overhead. For instance, a simple
checkov --version
orcheckov --help
on a 4-core 8Gb-memory Gitpod instance takes just over 2 seconds. Most of this time is spent importing the hundreds of python modules and checks.If you agree that this is a welcomed improvement, I've done some digging into how this could be addressed and am proposing an incremental solution in this pull request. My results show reduced runtimes of:
checkov --version
checkov --framework=ansible -d .
)The current changes in the pull request already pass all current unit tests but if this is a desired feature, I'll need to go over it more carefully to make sure it is ready for a full review.
Motivation
Firstly, it would be nice to not have to wait over 2 seconds for the output of
checkov --help
.More seriously, when running multiple checkov checks in a CI/CD pipeline, the time checkov takes to load starts to add up, mainly when using checkov with pre-commit. Consider the following pre-commit config for example:
Benchmarking
The following benchmarks were run both on my local machine (M3 mac) and on a 4-core/8Gb-memory Gitpod instance. The results can vary quite a bit on different Gitpod workspace instances, even when requesting the same resources. For this reason, the benchmarks bellow were run on the same workspace instance.
I used hyperfine to help me gather the benchmark statistics and pandas to correlate and render them in HTML.
Requirements
(expandable section)After booting the Gitpod instance, I ran the following commands:
On my local machine (macOS) I also had to install hyperfine and pandas with:
Isolating the effect of the change
(expandable section)The changes proposed in this pull request should not have any impact on the actual execution of the checks and checkov Runners. The effects are only present before the runs are triggered. You can verify this yourself by running some local tests.
To properly isolate the behaviour changed in this PR and remove any extra sources of noise, I patched the
BaseRunner.run()
to simply return an emptyReport
right away.This can be achieved by creating a new entry point under
checkov/main_patched.py
with the following code:Test that it works by running:
Running the benchmark
(expandable section)I executed the following script to generate the results displayed in the next section:
Results
M3 - macOS
--version
--list
--framework=openapi
-d tests/openapi/
--framework=ansible
-d tests/ansible/examples/
--framework=terraform
-d tests/terraform/checks/data
-d tests/
4-core/8Gb-memory Gitpod instance
--version
--list
--framework=openapi
-d tests/openapi/
--framework=ansible
-d tests/ansible/examples/
--framework=terraform
-d tests/terraform/checks/data
-d tests/
Checklist: