CLI tools for SARIF processing

Each of these tools present a high-level command-line interface to extract a specific subset of information from a SARIF file. The main tools are: sarif-extract-scans-runner,sarif-aggregate-scans,sarif-create-aggregate-report.

Each tool can print its options and description like: sarif-extract-scans-runner --help.

The tool was implemented using Python 3.9.

Sarif format information

The tool operates on sarif generated by LGTM 1.27.0 (by default) or by the CodeQL CLI (enabled with the -f flag given a value of CLI). The supported sarif is SARIF v2.1.0.

The values that the -f flag accepts are: LGTM and CLI.

The CLI versions used against development of the CLI support were: 2.6.3, 2.9.4, and 2.11.4.

Minimal tests are also run against the versions in this build script. Currently, those are 2.9.4, 2.12.7, 2.13.5, 2.14.0.

The CLI sarif MUST contain one additional property versionControlProvenance - which needs to look like:

"versionControlProvenance": [
      {
        "repositoryUri": "https://github.com/testorg/testrepo.git",
        "revisionId": "testsha"
      }
    ]

The script

  bin/sarif-insert-vcp

will add that entry to a SARIF file.

Test Setup

This repository includes some test data (in data) and uses =git lfs= for storing those test files; installation steps are at [[https://git-lfs.github.com][git-lfs]]; on a mac with homebrew, install it via #+BEGIN_SRC sh brew install git-lfs git lfs install #+END_SRC

Tool Setup

Set up the virtual environment and install the packages:

  python3.9 -m venv .venv
 . .venv/bin/activate

For development

 # Use requirementsDEV.txt 
 python -m pip install -r requirementsDEV.txt

For distribution

  # Use requirements.txt 
  python -m pip install -r requirements.txt

Then install:

pip install -e .

Tool Use

sarif-extract-scans-runner

Parses the SARIF results into a result set of 4 csvs located under a directory structure like:

├── results-log.scanlog
├── results-log.csv
├── results.sarif.scanspec
  ├── results.sarif.scantables
      ├── codeflows.csv
      ├── projects.csv
      ├── results.csv
      └── scans.csv

where codeflows.csv,projects.csv, results.csv, scans.csv are the consumable parsed output of the analysis.

results-log.scanlog contains a raw log of any errors encountered while parsing the sarif and results-log.csv contains a summary of the scanlog contents.

Sample usage 1 -- no separate timestamps file

python bin/sarif-extract-scans-runner sarif-files.txt -o <outer-level-results-directory>

where cat sarif-files.txt contains sarif files to process, each entry of the form <org>/<project> and separated by newline, like:

data/wxWidgets_wxWidgets__2021-11-21_16_06_30__export.sarif
data/torvalds_linux__2021-10-21_10_07_00__export.sarif

When called this way, sarif-pad-aggregate should be used because it will overwrite single-date timestamps with a random 1-year range.

Sample usage 2 -- with separate timestamps file

When a separate timestamps.json file is available and has the form

    timestamps = {
        "db_create_start"      : "2023-07-03T00:56:15.576222",
        "db_create_stop"       : ...,
        "scan_start_date"      : ...,
        "scan_stop_date"       : ..., 
    }

or

    {
      "db_create_start": ...,
      "db_create_stop": ...,
      "scan_start": ...
      "scan_stop": ...
    }

the runner can be called via e.g.,

sarif-extract-scans-runner --input-signature CLI --with-timestamps - <<EOF
foo.sarif,timestamps.json
EOF

When called this way, sarif-pad-aggregate should not be used because it will overwrite those timestamps.

sarif-aggregate-scans

Parses the codeflows.csv,projects.csv, results.csv, scans.csv files generated for some batch of input sarifs and creates a final set of codeflows.csv,projects.csv, results.csv, scans.csv files aggregating all of the contents across those sarif files.

sample usage:

python bin/sarif-aggregate-scans sarif-files.txt <combined-tables-output directory>

sarif-pad-aggregate

Optional Post-fills the scans.csv file with more realisitic (but still fake) values for the following columns: db_create_start,db_create_stop,scan_start_date,scan_stop_date. These values are not in the input sarif and it may be beneficial to have date values near the present. Otherwise sarif-extract-scans-runner will have populated these columns with the value 1970-01-01.

sample usage:

python bin/sarif-pad-aggregate <combined-tables-output directory> <padded-combined-tables-output directory>

sarif-create-aggregate-report

Parses the results-log.csv files generated for some batch of input sarifs and creates a final summary report in summary-report.csv (unless otherwise specified).

sample usage:

python bin/sarif-create-aggregate-report sarif-files.txt -in <outer-level-results-directory-to-summarize>

Sample Data Information

The query results in =data/= are taken from lgtm.com, which ran the : ql/$LANG/ql/src/codeql-suites/$LANG-lgtm.qls queries.

The linux kernel has both single-location results (="kind": "problem"=) and path results (="kind": "path-problem"=). It also has results for multiple source languages.

The subset of files referenced by the sarif results is in =data/linux-small/= and is taken from

  "versionControlProvenance": [
      {
          "repositoryUri": "https://github.com/torvalds/linux.git",
          "revisionId": "d9abdee5fd5abffd0e763e52fbfa3116de167822"
      }
  ]

The wxWidgets library has both single-location results (="kind": "problem"=) and path results (="kind": "path-problem"=).

The subset of files referenced by the sarif results is in =data/wxWidgets-small/= and is taken from

    "repositoryUri": "https://github.com/wxWidgets/wxWidgets.git",
    "revisionId": "7a03d5fe9bca2d2a2cd81fc0620bcbd2cbc4c7b0"

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
bin		bin
data		data
docs		docs
non-sarif-metadata		non-sarif-metadata
notes		notes
sarif_cli		sarif_cli
scripts		scripts
.gitignore		.gitignore
.ignore		.ignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
build-multiple-codeql-versions.sh		build-multiple-codeql-versions.sh
requirements.txt		requirements.txt
requirementsDEV.txt		requirementsDEV.txt
setup.py		setup.py
typegraph-td.pdf		typegraph-td.pdf
typegraph-td.svg		typegraph-td.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLI tools for SARIF processing

Sarif format information

Test Setup

Tool Setup

For development

For distribution

Tool Use

sarif-extract-scans-runner

Sample usage 1 -- no separate timestamps file

Sample usage 2 -- with separate timestamps file

sarif-aggregate-scans

sample usage:

sarif-pad-aggregate

sample usage:

sarif-create-aggregate-report

sample usage:

Sample Data Information

About

Releases

Packages

Contributors 3

Languages

License

hohn/sarif-cli

Folders and files

Latest commit

History

Repository files navigation

CLI tools for SARIF processing

Sarif format information

Test Setup

Tool Setup

For development

For distribution

Tool Use

sarif-extract-scans-runner

Sample usage 1 -- no separate timestamps file

Sample usage 2 -- with separate timestamps file

sarif-aggregate-scans

sample usage:

sarif-pad-aggregate

sample usage:

sarif-create-aggregate-report

sample usage:

Sample Data Information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages