chore: switch to new yardstick validate (#672)

* chore: switch to new yardstick validate Signed-off-by: Will Murphy <[email protected]> * fix: use expected_namespaces to filter results Signed-off-by: Will Murphy <[email protected]> * pass explicit provider name and enable validations on all providers Signed-off-by: Will Murphy <[email protected]> * use dedicated namespace validation test Signed-off-by: Will Murphy <[email protected]> * update readme and remove gate Signed-off-by: Will Murphy <[email protected]> * add nvd to alpine namespaces Signed-off-by: Will Murphy <[email protected]> * explicitly pass grype-db config Signed-off-by: Will Murphy <[email protected]> * chore: dont fail oracle on empty matches Signed-off-by: Will Murphy <[email protected]> * chore: turn off fail_on_empty_match_set for oracle, amazon These providers have labeled images with no vulnerabilities for 2021. Signed-off-by: Will Murphy <[email protected]> * chore: use yardstick tag, not commit sha Signed-off-by: Will Murphy <[email protected]> --------- Signed-off-by: Will Murphy <[email protected]>
anchore · Sep 23, 2024 · 4cfe0fc · 4cfe0fc
1 parent a3548fd
commit 4cfe0fc
Show file tree

Hide file tree

Showing 9 changed files with 429 additions and 788 deletions.
diff --git a/.github/actions/quality-gate/action.yaml b/.github/actions/quality-gate/action.yaml
@@ -19,7 +19,7 @@ runs:
     - name: Validate provider results
       shell: bash
       working-directory: tests/quality
-      run: poetry run make validate
+      run: poetry run make validate provider=${{ inputs.provider }}
 
     - name: Archive the provider state (${{ inputs.provider }})
       if: ${{ failure() }}

diff --git a/poetry.lock b/poetry.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -78,7 +78,8 @@ mypy = "^1.1"
 radon = ">=5.1,<7.0"
 dunamai = "^1.15.0"
 ruff = ">=0.5.1,<0.5.7"
-yardstick = {git = "https://github.com/anchore/yardstick", rev = "v0.9.2"}
+yardstick = {git = "https://github.com/anchore/yardstick", rev = "v0.10.0"}
+# yardstick = {path = "../yardstick", develop=true }
 tabulate = "0.9.0"
 tox = "^4.11.3"
 

diff --git a/tests/quality/Makefile b/tests/quality/Makefile
@@ -26,7 +26,8 @@ all: capture validate ## Fetch or capture all data and run all quality checks
 
 .PHONY: validate
 validate:  ## Run all quality checks against already collected data
-	poetry run ./gate.py
+	poetry run ./validate-namespaces.py
+	poetry run yardstick validate --result-set $(RESULT_SET)_$(provider)
 
 
 ## Data management targets #################################
@@ -61,16 +62,16 @@ build-db:  ## Build a grype database for the given provider
 
 .PHONY: vulns
 vulns: ## Collect and store all grype results
-	poetry run yardstick -v result capture -r $(RESULT_SET)
+	poetry run yardstick -v result capture -r $(RESULT_SET)_$(provider)
 
 .PHONY: sboms
 sboms: $(YARDSTICK_RESULT_DIR) clear-results ## Collect and store all syft results (deletes all existing results)
-	bash -c "make download-sboms || (yardstick -v result capture -r $(RESULT_SET) --only-producers)"
+	bash -c "make download-sboms || (yardstick -v result capture -r $(RESULT_SET)_$(provider) --only-producers)"
 
 .PHONY: download-sboms
 download-sboms:
 	cd vulnerability-match-labels && make venv
-	bash -c "export ORAS_CACHE=$(shell pwd)/.oras-cache && . vulnerability-match-labels/venv/bin/activate && ./vulnerability-match-labels/sboms.py download -r $(RESULT_SET)"
+	bash -c "export ORAS_CACHE=$(shell pwd)/.oras-cache && . vulnerability-match-labels/venv/bin/activate && ./vulnerability-match-labels/sboms.py download -r $(RESULT_SET)_$(provider)"
 
 $(YARDSTICK_RESULT_DIR):
 	mkdir -p $(YARDSTICK_RESULT_DIR)

diff --git a/tests/quality/README.md b/tests/quality/README.md
@@ -36,7 +36,7 @@ While developing it may be useful to only run one provider for rapid troubleshoo
 
 ```
 make capture provider=github
-make validate
+make validate provider=github
 ```
 
 ## What is the quality gate criteria
@@ -51,6 +51,8 @@ specifically with the following criteria:
    release
  - otherwise, pass
 
+These criteria are configured per provider in `tests/quality/config.yaml`.
+
 F1 score is the primary way that tool matching performance is characterized. F1
 score combines the TP, FP, and FN counts into a single metric between 0 and 1.
 Ideally the F1 score for an image-tool pair should be 1. F1 score is a good way
@@ -113,7 +115,7 @@ To reduce the eroding value over time we've decided to change as many moving
 targets into fixed targets as possible:
 
 - Vulnerability results beyond a particular year are ignored (the current config
-  allows for <= 2020). Though there are still retroactive CVEs created, this
+  allows for <= 2021). Though there are still retroactive CVEs created, this
   helps a lot in terms of keeping vulnerability results relatively stable.
 
 - SBOMs are used as input into grype instead of the raw container images. This
@@ -144,14 +146,18 @@ to keep in mind:
   assets that are no longer useful for comparison, but this should rarely be
   done.
 
-- Consider not changing the CVE year max-ceiling (currently set to 2020).
+- Consider not changing the CVE year max-ceiling (currently set to 2021).
   Pushing this ceiling will likely raise the number of unlabled matches
   significantly for all images. Only bump this ceiling if all possible matches
   are labeled.
 
+- If the CVE year max-ceiling needs to be pushed, try to push it only for one
+  provider. That is, edit the max-year value on the validation for that
+  provider in `tests/quality/config.yaml`.
+
 ## Workflow
 
-One way of working is to simply run `yardstick` and `gate.py` in the `test/quality` directory.
+One way of working is to simply run `yardstick` in the `test/quality` directory.
 You will need to make sure the `vulnerabilty-match-labels` submodule has been initialized. This happens automatically
 for some `make` commands, but you can ensure this by `git submodule update --init`. After the submodule has been
 initialized, the match data from `vulnerabilty-match-labels` will be available locally.
@@ -174,7 +180,7 @@ After `make capture` has finished, we should have results and can now start insp
 modifying the comparison labels.
 
 To get started, let's assume we see some quality gate failure in like this (something found in CI
-or after running `./gate.py`):
+or after running `yardstick validate --result-set pr_vs_latest_via_sbom`):
 ```
 Running comparison against labels...
    Results used:
@@ -218,7 +224,7 @@ At this point you can run the quality gate using updated label data. The quality
 just one image, for example the image we first found in the failure, so run the quality gate and see
 how changes to the label data have affected the result:
 ```shell
-./gate.py --image docker.io/anchore/test_images@sha256:808f6cf3cf4473eb39ff9bb47ead639d2ed71255b75b9b140162b58c6102bcc9
+yardstick validate -r pr_vs_latest_via_sbom --image docker.io/anchore/test_images@sha256:808f6cf3cf4473eb39ff9bb47ead639d2ed71255b75b9b140162b58c6102bcc9
 ```
 
 After iterating on all the changes we need using `yardstick label explore`, we're now ready to commit changes. Since
@@ -307,7 +313,8 @@ like this:
 (venv) user@HOST quality %
 ```
 
-Now you should be able to run both `yardstick` and `./gate.py`.
+Now you should be able to run both `yardstick` to see and update labels and
+`make validate provider=<some provider` to validate the results.
 
 ## Troubleshooting
 

diff --git a/tests/quality/config.yaml b/tests/quality/config.yaml
@@ -1,3 +1,10 @@
+x-ref:
+  default-validations: &default-validations
+    max-f1-regression: 0.0
+    max-new-false-negatives: 0
+    max-unlabeled-percent: 10
+    max_year: 2021
+    candidate_tool_label: custom-db
 yardstick:
   default_max_year: 2021
 
@@ -31,6 +38,7 @@ yardstick:
       #  - this version should ALWAYS match that of the other "grype" tool above
       version: latest
       takes: SBOM
+      label: reference
 
 grype_db:
   # values:
@@ -71,8 +79,17 @@ tests:
       - alpine:distro:alpine:3.19
       - alpine:distro:alpine:3.20
       - alpine:distro:alpine:edge
+      - nvd:cpe # alpine lists fixes to NVD entries, so NVD entries are also expected
+    validations:
+      - *default-validations
 
   - provider: amazon
+    validations:
+      - <<: *default-validations
+        # TODO: docker.io/amazonlinux:2@sha256:1301cc9f889f21dc45733df9e58034ac1c318202b4b0f0a08d88b3fdc03004de
+        # has no matches before 2022. Label more things, move max_year to 2022, and then
+        # change fail_on_empty_match_set back to true (the default).
+        fail_on_empty_match_set: false
     images:
       - docker.io/amazonlinux:2@sha256:1301cc9f889f21dc45733df9e58034ac1c318202b4b0f0a08d88b3fdc03004de
       - docker.io/anchore/test_images:vulnerabilities-amazonlinux-2-5c26ce9@sha256:cf742eca189b02902a0a7926ac3fbb423e799937bf4358b0d2acc6cc36ab82aa
@@ -92,6 +109,8 @@ tests:
       - ghcr.io/chainguard-images/scanner-test:latest@sha256:59bddc101fba0c45d5c093575c6bc5bfee7f0e46ff127e6bb4e5acaaafb525f9
     expected_namespaces:
       - chainguard:distro:chainguard:rolling
+    validations:
+      - *default-validations
 
   - provider: debian
     # ideally we would not use cache, however, the in order to test if we are properly keeping the processing
@@ -144,19 +163,25 @@ tests:
       - github:language:ruby
       - github:language:rust
       - github:language:swift
+    validations:
+      - *default-validations
 
   - provider: mariner
     images:
       - mcr.microsoft.com/cbl-mariner/base/core:2.0.20220731-amd64@sha256:3c0f7e103ff3c39e81e7c9c042d2b321d833fb6d26d8636567f7d88a6bdde74a
     expected_namespaces:
       - mariner:distro:mariner:1.0
       - mariner:distro:mariner:2.0
+    validations:
+      - *default-validations
 
   - provider: nvd
     images:
       - docker.io/busybox:1.28.1@sha256:2107a35b58593c58ec5f4e8f2c4a70d195321078aebfadfbfb223a2ff4a4ed21
     expected_namespaces:
       - nvd:cpe
+    validations:
+      - *default-validations
 
   - provider: oracle
     additional_trigger_globs:
@@ -170,6 +195,13 @@ tests:
       - oracle:distro:oraclelinux:7
       - oracle:distro:oraclelinux:8
       - oracle:distro:oraclelinux:9
+    validations:
+      - <<: *default-validations
+        # TODO: docker.io/anchore/test_images:appstreams-oraclelinux-8-1a287dd@sha256:c8d664b0e728
+        # has no matches before 2022. Label more things, move max_year to 2022, and then
+        # change fail_on_empty_match_set back to true (the default).
+        max_year: 2021
+        fail_on_empty_match_set: false
 
   - provider: rhel
     # ideally we would not use cache, however, the ubuntu provider is currently very expensive to run.
@@ -185,6 +217,8 @@ tests:
       - docker.io/anchore/test_images:appstreams-centos-stream-8-1a287dd@sha256:808f6cf3cf4473eb39ff9bb47ead639d2ed71255b75b9b140162b58c6102bcc9
       - docker.io/anchore/test_images:appstreams-rhel-8-1a287dd@sha256:524ff8a75f21fd886ec7ed82387766df386671e8b77e898d05786118d5b7880b
       - docker.io/anchore/test_images:vulnerabilities-centos@sha256:746d31247006cc06434ce91ccf3523b2c230ff6c378ffed7ca1c60bbb48ea86f
+    validations:
+      - *default-validations
 
     expected_namespaces:
       - redhat:distro:redhat:5
@@ -220,6 +254,8 @@ tests:
       - sles:distro:sles:15.4
       - sles:distro:sles:15.5
       - sles:distro:sles:15.6
+    validations:
+      - *default-validations
 
   - provider: ubuntu
     # ideally we would not use cache, however, the ubuntu provider is currently very expensive to run.
@@ -256,6 +292,8 @@ tests:
       - ubuntu:distro:ubuntu:23.04
       - ubuntu:distro:ubuntu:23.10
       - ubuntu:distro:ubuntu:24.04
+    validations:
+      - *default-validations
 
   - provider: wolfi
     additional_providers:
@@ -265,3 +303,5 @@ tests:
       - cgr.dev/chainguard/wolfi-base:latest-20221001@sha256:be3834598c3c4b76ace6a866edcbbe1fa18086f9ee238b57769e4d230cd7d507
     expected_namespaces:
       - wolfi:distro:wolfi:rolling
+    validations:
+      - *default-validations
diff --git a/tests/quality/configure.py b/tests/quality/configure.py
@@ -23,6 +23,7 @@
     ResultSet,
     ScanMatrix,
     Tool,
+    Validation,
 )
 from yardstick.cli.config import Application as YardstickApplication
 
@@ -58,6 +59,7 @@ class Test:
     provider: str
     use_cache: bool = False
     images: list[str] = field(default_factory=list)
+    validations: list[Validation] = field(default_factory=list)
     additional_providers: list[AdditionalProvider] = field(default_factory=list)
     additional_trigger_globs: list[str] = field(default_factory=list)
     expected_namespaces: list[str] = field(default_factory=list)
@@ -100,20 +102,30 @@ def load(cls, path: str = "") -> "Config":
         return cfg
 
     def yardstick_application_config(self, test_configurations: list[Test]) -> Application:
+        # tests is the set of providers explicitly requested
+        # each provider is associated with the set of images it needs to scan
+        # and the set of validations it needs to perform.
         images = []
         for test in test_configurations:
             images += test.images
+            for validation in test.validations:
+                if test.expected_namespaces:
+                    validation.allowed_namespaces = test.expected_namespaces
+
+        def result_set_from_test(t: Test) -> ResultSet:
+            return ResultSet(
+                description=f"latest vulnerability data vs current vunnel data with latest grype tooling (via SBOM ingestion) for {test.provider}",
+                validations=test.validations,
+                matrix=ScanMatrix(
+                    images=t.images,
+                    tools=self.yardstick.tools,
+                ),
+            )
+
+        result_sets = {f"pr_vs_latest_via_sbom_{test.provider}": result_set_from_test(test) for test in test_configurations}
         return Application(
             default_max_year=self.yardstick.default_max_year,
-            result_sets={
-                "pr_vs_latest_via_sbom": ResultSet(
-                    description="latest vulnerability data vs current vunnel data with latest grype tooling (via SBOM ingestion)",
-                    matrix=ScanMatrix(
-                        images=images,
-                        tools=self.yardstick.tools,
-                    ),
-                ),
-            },
+            result_sets=result_sets,
         )
 
     def test_configuration_by_provider(self, provider: str) -> Test | None:
@@ -284,6 +296,7 @@ def write_yardstick_config(cfg: Application, path: str = ".yardstick.yaml"):
 
 
 def write_grype_db_config(providers: set[str], path: str = ".grype-db.yaml"):
+    logging.info(f"writing grype-db config to {path!r}")
     with open(path, "w") as f:
         f.write(
             """
@@ -462,6 +475,7 @@ def configure(cfg: Config, provider_names: list[str]):
 
     providers = set(cached_providers + uncached_providers)
 
+    logging.info(f"writing grype-db config for {' '.join(providers)}")
     write_grype_db_config(providers)
     write_yardstick_config(yardstick_app_cfg)
 
@@ -601,8 +615,8 @@ def build_db(cfg: Config):
         subprocess.run(["vunnel", "-v", "run", provider], check=True)
 
     logging.info("building DB")
-    subprocess.run([GRYPE_DB, "build", "-v"], check=True)
-    subprocess.run([GRYPE_DB, "package", "-v"], check=True)
+    subprocess.run([GRYPE_DB, "build", "-v", "-c", ".grype-db.yaml"], check=True)
+    subprocess.run([GRYPE_DB, "package", "-v", "-c", ".grype-db.yaml"], check=True)
 
     archives = glob.glob(f"{build_dir}/*.tar.gz")
     if not archives: