Lazy runner registry #1

tpvasconcelos · 2024-09-28T17:05:57Z

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Description

checkov always eagerly loads all runners and checks on all CLI invocations, regardless of whether they are needed or not. I guess this was not always an issue but currently it seems to add quite the overhead. For instance, a simple checkov --version or checkov --help on a 4-core 8Gb-memory Gitpod instance takes just over 2 seconds. Most of this time is spent importing the hundreds of python modules and checks.

If you agree that this is a welcomed improvement, I've done some digging into how this could be addressed and am proposing an incremental solution in this pull request. My results show reduced runtimes of:

around -45% reduction for simple CLI calls like checkov --version
around -40% reduction of single-framework runs (e.g. checkov --framework=ansible -d .)
very small improvement (-6%) even when for invocations that trigger all checks, which means that the lazy-loading logic does not provide any noticeable overhead.

The current changes in the pull request already pass all current unit tests but if this is a desired feature, I'll need to go over it more carefully to make sure it is ready for a full review.

Motivation

Firstly, it would be nice to not have to wait over 2 seconds for the output of checkov --help.

More seriously, when running multiple checkov checks in a CI/CD pipeline, the time checkov takes to load starts to add up, mainly when using checkov with pre-commit. Consider the following pre-commit config for example:

  - repo: https://github.com/bridgecrewio/checkov
    rev: 3.2.255
    hooks:
      - id: checkov_diff
        name: "checkov (ansible)"
        entry: checkov --framework ansible
        files: ^infra/tf/data/ansible/.+\.ya?ml$
      - id: checkov_diff
        name: "checkov (dockerfile)"
        entry: checkov --framework dockerfile
        files: (\.Dockerfile|Dockerfile)$
      - id: checkov_diff
        name: "checkov (github_actions)"
        entry: checkov --framework github_actions
        files: ^\.github/.+\.ya?ml$
      - id: checkov_diff
        name: "checkov (json, yaml)"
        entry: checkov --framework json,yaml
        files: \.(json|ya?ml)$
      - id: checkov_diff
        name: "checkov (terraform)"
        entry: checkov --framework terraform --var-file=infra/tf/terraform.tfvars
        files: ^infra/tf/.+\.(tf|tfvars)$
      - id: checkov_diff
        name: "checkov (k8s)"
        entry: checkov --framework kubernetes
        files: ^nfra/k8s/.+\.ya?ml$

Benchmarking

The following benchmarks were run both on my local machine (M3 mac) and on a 4-core/8Gb-memory Gitpod instance. The results can vary quite a bit on different Gitpod workspace instances, even when requesting the same resources. For this reason, the benchmarks bellow were run on the same workspace instance.

I used hyperfine to help me gather the benchmark statistics and pandas to correlate and render them in HTML.

Requirements

(expandable section)

After booting the Gitpod instance, I ran the following commands:

# Setup Python env
pyenv install 3.8.20
pyenv global 3.8.20
python3.8 -m pip install pipenv
pipenv --python 3.8
pipenv install
# Install hyperfine
wget https://github.com/sharkdp/hyperfine/releases/download/v1.16.1/hyperfine_1.16.1_amd64.deb
sudo dpkg -i hyperfine_1.16.1_amd64.deb
# and pandas...
pipenv install pandas

On my local machine (macOS) I also had to install hyperfine and pandas with:

brew install hyperfine
pipenv install pandas

Isolating the effect of the change

(expandable section)

The changes proposed in this pull request should not have any impact on the actual execution of the checks and checkov Runners. The effects are only present before the runs are triggered. You can verify this yourself by running some local tests.

To properly isolate the behaviour changed in this PR and remove any extra sources of noise, I patched the BaseRunner.run() to simply return an empty Report right away.

This can be achieved by creating a new entry point under checkov/main_patched.py with the following code:

from __future__ import annotations

from typing import TypeVar, Generic

import checkov.common.runners.base_runner
from checkov.common.output.report import Report

_Context = TypeVar("_Context", bound="dict[Any, Any]|None")
_Definitions = TypeVar("_Definitions", bound="dict[Any, Any]|None")
_GraphManager = TypeVar("_GraphManager", bound="GraphManager[Any, Any]|None")

class PatchedBaseRunner(
    checkov.common.runners.base_runner.BaseRunner,
    Generic[_Definitions, _Context, _GraphManager]
):
    def __new__(cls, *args, **kwargs):
        def noop_run(self, *args, **kwargs):
            return Report(check_type=self.check_type)
        cls.run = noop_run
        return super().__new__(cls, *args, **kwargs)

checkov.common.runners.base_runner.BaseRunner = PatchedBaseRunner

if __name__ == '__main__':
    from checkov.main import Checkov
    import sys

    ckv = Checkov()
    sys.exit(ckv.run())

Test that it works by running:

pipenv run python -m checkov.main_patched --version

Running the benchmark

(expandable section)

I executed the following script to generate the results displayed in the next section:

#!/usr/bin/env bash
set -o errexit  # Exit on error
set -o pipefail # Use last non-zero exit code in a pipeline
set -o nounset  # Disallow expansion of unset variables

cmd='python -m checkov.main_patched'
common_checkov_flags='--quiet --compact --skip-results-upload --skip-download'

bench(){
  local branch="${1}"
  local csv_file="bench_checkov_${branch}.csv"
  git checkout "${branch}"
  echo "Benchmarking ${branch}..."
  hyperfine \
    --warmup 3 \
    --min-runs 10 \
    --max-runs 20 \
    --time-unit second \
    --shell=none \
    --export-csv "${csv_file}" \
    "${cmd} ${common_checkov_flags} --version" \
    "${cmd} ${common_checkov_flags} --list" \
    "${cmd} ${common_checkov_flags} --framework=openapi -d tests/openapi/" \
    "${cmd} ${common_checkov_flags} --framework=ansible -d tests/ansible/examples/" \
    "${cmd} ${common_checkov_flags} --framework=terraform -d tests/terraform/checks/data" \
    "${cmd} ${common_checkov_flags} -d tests/"

  sed -i "s|${cmd} ${common_checkov_flags} ||g" "${csv_file}"
  sed -i 's|command|argv|g' "${csv_file}"
}

bench "main"
bench "lazy-cherry"

python <<EOF
import pandas as pd
drop_cols = ["median", "user", "system", "min", "max"]
df1 = pd.read_csv("bench_checkov_main.csv", index_col="argv").drop(columns=drop_cols)
df2 = pd.read_csv("bench_checkov_lazy-cherry.csv", index_col="argv").drop(columns=drop_cols)
df = pd.concat({"Eager": df1, "Lazy": df2}, axis=1)
# Round means and stds to 3 decimal places
df.loc[:, (slice(None), "mean")] = df.loc[:, (slice(None), "mean")].applymap(lambda x: round(x, 3))
df.loc[:, (slice(None), "stddev")] = df.loc[:, (slice(None), "stddev")].applymap(lambda x: round(x, 3))
# Calculate percentage difference
df.loc[:, ("pct", "mean")] = (df["Lazy"]["mean"] - df["Eager"]["mean"]) / df["Eager"]["mean"] * 100
df.loc[:, ("pct", "mean")] = df["pct"]["mean"].apply(lambda x: f"{'🔻' if x < 0 else '🔺'}{round(x)}%")
# Format index
df.index = df.index.map(lambda x: f"<code>{x}</code>")
# Break up long lines
df.index = df.index.str.replace(" -d ", "</code><br><code>-d ")
print(df.to_markdown(tablefmt="github"))
df.to_html("bench_checkov.html", border=0, escape=False, justify="left")
EOF

# Fix table's text-alignment
sed -i 's|<th>|<th align="left">|g' bench_checkov.html
sed -i 's|<td>|<td align="left">|g' bench_checkov.html

echo "HTML table was saved to ./bench_checkov.html"

Results

M3 - macOS

	Eager		Lazy		pct
argv	mean	stddev	mean	stddev	mean
`--version`	0.999	0.034	0.552	0.013	🔻-45%
`--list`	2.156	0.068	1.878	0.057	🔻-13%
`--framework=openapi` `-d tests/openapi/`	0.940	0.022	0.597	0.030	🔻-36%
`--framework=ansible` `-d tests/ansible/examples/`	0.941	0.037	0.593	0.018	🔻-37%
`--framework=terraform` `-d tests/terraform/checks/data`	0.953	0.032	0.767	0.015	🔻-20%
`-d tests/`	0.989	0.008	0.929	0.022	🔻-6%

4-core/8Gb-memory Gitpod instance

	Eager		Lazy		pct
argv	mean	stddev	mean	stddev	mean
`--version`	2.103	0.022	1.242	0.040	🔻-41%
`--list`	3.643	0.106	3.342	0.051	🔻-8%
`--framework=openapi` `-d tests/openapi/`	2.177	0.065	1.273	0.026	🔻-42%
`--framework=ansible` `-d tests/ansible/examples/`	2.166	0.059	1.245	0.037	🔻-43%
`--framework=terraform` `-d tests/terraform/checks/data`	2.113	0.060	1.759	0.037	🔻-17%
`-d tests/`	2.242	0.035	2.112	0.042	🔻-6%

Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my feature, policy, or fix is effective and works
New and existing tests pass locally with my changes

tpvasconcelos added 9 commits September 28, 2024 16:37

wip: Implement lazy runner registry

db01ae9

Escape ignored_directories

ad056e5

lint

af1ad92

fix tests

3f7c365

Freeze LazyRunner dataclass

254b00e

Fix test: test_runner_file_filter

0f26021

Merge branch 'regex' into lazy-cherry

69703dc

Remove eager import *-style imports from framework imports

ce90bbd

Lazy import framework modules in get_checks()

99431b0

This was referenced Sep 29, 2024

Making checkov between 70% to 25% faster 🚀 bridgecrewio/checkov#6740

Open

More performance improvements by hiding expensive imports until they are needed #3

Open

tpvasconcelos marked this pull request as ready for review October 27, 2024 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy runner registry #1

Lazy runner registry #1

tpvasconcelos commented Sep 28, 2024 •

edited

Loading

Lazy runner registry #1

Are you sure you want to change the base?

Lazy runner registry #1

Conversation

tpvasconcelos commented Sep 28, 2024 • edited Loading

Description

Motivation

Benchmarking

Requirements

Isolating the effect of the change

Running the benchmark

Results

M3 - macOS

4-core/8Gb-memory Gitpod instance

Checklist:

tpvasconcelos commented Sep 28, 2024 •

edited

Loading