Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy runner registry #1

Open
wants to merge 9 commits into
base: regex
Choose a base branch
from
Open

Lazy runner registry #1

wants to merge 9 commits into from

Conversation

tpvasconcelos
Copy link
Owner

@tpvasconcelos tpvasconcelos commented Sep 28, 2024

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.


Addresses: bridgecrewio#6740

Description

checkov always eagerly loads all runners and checks on all CLI invocations, regardless of whether they are needed or not. I guess this was not always an issue but currently it seems to add quite the overhead. For instance, a simple checkov --version or checkov --help on a 4-core 8Gb-memory Gitpod instance takes just over 2 seconds. Most of this time is spent importing the hundreds of python modules and checks.

If you agree that this is a welcomed improvement, I've done some digging into how this could be addressed and am proposing an incremental solution in this pull request. My results show reduced runtimes of:

  • around -45% reduction for simple CLI calls like checkov --version
  • around -40% reduction of single-framework runs (e.g. checkov --framework=ansible -d .)
  • very small improvement (-6%) even when for invocations that trigger all checks, which means that the lazy-loading logic does not provide any noticeable overhead.

The current changes in the pull request already pass all current unit tests but if this is a desired feature, I'll need to go over it more carefully to make sure it is ready for a full review.

Motivation

Firstly, it would be nice to not have to wait over 2 seconds for the output of checkov --help.

More seriously, when running multiple checkov checks in a CI/CD pipeline, the time checkov takes to load starts to add up, mainly when using checkov with pre-commit. Consider the following pre-commit config for example:

  - repo: https://github.com/bridgecrewio/checkov
    rev: 3.2.255
    hooks:
      - id: checkov_diff
        name: "checkov (ansible)"
        entry: checkov --framework ansible
        files: ^infra/tf/data/ansible/.+\.ya?ml$
      - id: checkov_diff
        name: "checkov (dockerfile)"
        entry: checkov --framework dockerfile
        files: (\.Dockerfile|Dockerfile)$
      - id: checkov_diff
        name: "checkov (github_actions)"
        entry: checkov --framework github_actions
        files: ^\.github/.+\.ya?ml$
      - id: checkov_diff
        name: "checkov (json, yaml)"
        entry: checkov --framework json,yaml
        files: \.(json|ya?ml)$
      - id: checkov_diff
        name: "checkov (terraform)"
        entry: checkov --framework terraform --var-file=infra/tf/terraform.tfvars
        files: ^infra/tf/.+\.(tf|tfvars)$
      - id: checkov_diff
        name: "checkov (k8s)"
        entry: checkov --framework kubernetes
        files: ^nfra/k8s/.+\.ya?ml$

Benchmarking

The following benchmarks were run both on my local machine (M3 mac) and on a 4-core/8Gb-memory Gitpod instance. The results can vary quite a bit on different Gitpod workspace instances, even when requesting the same resources. For this reason, the benchmarks bellow were run on the same workspace instance.

I used hyperfine to help me gather the benchmark statistics and pandas to correlate and render them in HTML.

Requirements

(expandable section)

After booting the Gitpod instance, I ran the following commands:

# Setup Python env
pyenv install 3.8.20
pyenv global 3.8.20
python3.8 -m pip install pipenv
pipenv --python 3.8
pipenv install
# Install hyperfine
wget https://github.com/sharkdp/hyperfine/releases/download/v1.16.1/hyperfine_1.16.1_amd64.deb
sudo dpkg -i hyperfine_1.16.1_amd64.deb
# and pandas...
pipenv install pandas

On my local machine (macOS) I also had to install hyperfine and pandas with:

brew install hyperfine
pipenv install pandas

Isolating the effect of the change

(expandable section)

The changes proposed in this pull request should not have any impact on the actual execution of the checks and checkov Runners. The effects are only present before the runs are triggered. You can verify this yourself by running some local tests.

To properly isolate the behaviour changed in this PR and remove any extra sources of noise, I patched the BaseRunner.run() to simply return an empty Report right away.

This can be achieved by creating a new entry point under checkov/main_patched.py with the following code:

from __future__ import annotations

from typing import TypeVar, Generic

import checkov.common.runners.base_runner
from checkov.common.output.report import Report

_Context = TypeVar("_Context", bound="dict[Any, Any]|None")
_Definitions = TypeVar("_Definitions", bound="dict[Any, Any]|None")
_GraphManager = TypeVar("_GraphManager", bound="GraphManager[Any, Any]|None")

class PatchedBaseRunner(
    checkov.common.runners.base_runner.BaseRunner,
    Generic[_Definitions, _Context, _GraphManager]
):
    def __new__(cls, *args, **kwargs):
        def noop_run(self, *args, **kwargs):
            return Report(check_type=self.check_type)
        cls.run = noop_run
        return super().__new__(cls, *args, **kwargs)

checkov.common.runners.base_runner.BaseRunner = PatchedBaseRunner

if __name__ == '__main__':
    from checkov.main import Checkov
    import sys

    ckv = Checkov()
    sys.exit(ckv.run())

Test that it works by running:

pipenv run python -m checkov.main_patched --version

Running the benchmark

(expandable section)

I executed the following script to generate the results displayed in the next section:

#!/usr/bin/env bash
set -o errexit  # Exit on error
set -o pipefail # Use last non-zero exit code in a pipeline
set -o nounset  # Disallow expansion of unset variables

cmd='python -m checkov.main_patched'
common_checkov_flags='--quiet --compact --skip-results-upload --skip-download'

bench(){
  local branch="${1}"
  local csv_file="bench_checkov_${branch}.csv"
  git checkout "${branch}"
  echo "Benchmarking ${branch}..."
  hyperfine \
    --warmup 3 \
    --min-runs 10 \
    --max-runs 20 \
    --time-unit second \
    --shell=none \
    --export-csv "${csv_file}" \
    "${cmd} ${common_checkov_flags} --version" \
    "${cmd} ${common_checkov_flags} --list" \
    "${cmd} ${common_checkov_flags} --framework=openapi -d tests/openapi/" \
    "${cmd} ${common_checkov_flags} --framework=ansible -d tests/ansible/examples/" \
    "${cmd} ${common_checkov_flags} --framework=terraform -d tests/terraform/checks/data" \
    "${cmd} ${common_checkov_flags} -d tests/"

  sed -i "s|${cmd} ${common_checkov_flags} ||g" "${csv_file}"
  sed -i 's|command|argv|g' "${csv_file}"
}

bench "main"
bench "lazy-cherry"

python <<EOF
import pandas as pd
drop_cols = ["median", "user", "system", "min", "max"]
df1 = pd.read_csv("bench_checkov_main.csv", index_col="argv").drop(columns=drop_cols)
df2 = pd.read_csv("bench_checkov_lazy-cherry.csv", index_col="argv").drop(columns=drop_cols)
df = pd.concat({"Eager": df1, "Lazy": df2}, axis=1)
# Round means and stds to 3 decimal places
df.loc[:, (slice(None), "mean")] = df.loc[:, (slice(None), "mean")].applymap(lambda x: round(x, 3))
df.loc[:, (slice(None), "stddev")] = df.loc[:, (slice(None), "stddev")].applymap(lambda x: round(x, 3))
# Calculate percentage difference
df.loc[:, ("pct", "mean")] = (df["Lazy"]["mean"] - df["Eager"]["mean"]) / df["Eager"]["mean"] * 100
df.loc[:, ("pct", "mean")] = df["pct"]["mean"].apply(lambda x: f"{'🔻' if x < 0 else '🔺'}{round(x)}%")
# Format index
df.index = df.index.map(lambda x: f"<code>{x}</code>")
# Break up long lines
df.index = df.index.str.replace(" -d ", "</code><br><code>-d ")
print(df.to_markdown(tablefmt="github"))
df.to_html("bench_checkov.html", border=0, escape=False, justify="left")
EOF

# Fix table's text-alignment
sed -i 's|<th>|<th align="left">|g' bench_checkov.html
sed -i 's|<td>|<td align="left">|g' bench_checkov.html

echo "HTML table was saved to ./bench_checkov.html"

Results

M3 - macOS

Eager Lazy pct
argv mean stddev mean stddev mean
--version 0.999 0.034 0.552 0.013 🔻-45%
--list 2.156 0.068 1.878 0.057 🔻-13%
--framework=openapi
-d tests/openapi/
0.940 0.022 0.597 0.030 🔻-36%
--framework=ansible
-d tests/ansible/examples/
0.941 0.037 0.593 0.018 🔻-37%
--framework=terraform
-d tests/terraform/checks/data
0.953 0.032 0.767 0.015 🔻-20%
-d tests/ 0.989 0.008 0.929 0.022 🔻-6%

4-core/8Gb-memory Gitpod instance

Eager Lazy pct
argv mean stddev mean stddev mean
--version 2.103 0.022 1.242 0.040 🔻-41%
--list 3.643 0.106 3.342 0.051 🔻-8%
--framework=openapi
-d tests/openapi/
2.177 0.065 1.273 0.026 🔻-42%
--framework=ansible
-d tests/ansible/examples/
2.166 0.059 1.245 0.037 🔻-43%
--framework=terraform
-d tests/terraform/checks/data
2.113 0.060 1.759 0.037 🔻-17%
-d tests/ 2.242 0.035 2.112 0.042 🔻-6%

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my feature, policy, or fix is effective and works
  • New and existing tests pass locally with my changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant