diff --git a/content/posts/2021-09-22-blackduck-spdx-licenses/cover.jpg b/content/posts/2021-09-22-blackduck-spdx-licenses/cover.jpg new file mode 100644 index 00000000..335ddc2f Binary files /dev/null and b/content/posts/2021-09-22-blackduck-spdx-licenses/cover.jpg differ diff --git a/content/posts/2021-09-22-blackduck-spdx-licenses/index.md b/content/posts/2021-09-22-blackduck-spdx-licenses/index.md new file mode 100644 index 00000000..1750b34d --- /dev/null +++ b/content/posts/2021-09-22-blackduck-spdx-licenses/index.md @@ -0,0 +1,125 @@ +--- +slug: '2021/09/22/blackduck-spdx-licenses' +title: 'Black Duck SPDX licenses' +subtitle: 'Check licenses from BlackDuck vs external sources' +date: 2021-09-22 +cover: ./cover.jpg +imageFb: ./2021-09-22-blackduck-spdx-licenses-fb.png +imageTw: ./2021-09-22-blackduck-spdx-licenses-tw.png +type: post +tags: + - spdx + - licenses + - blackduck + - sbom +authors: + - jeroen +--- + +_This post explains how to check for differences in Black Duck licenses database vs the sources._ + +## Introduction + +Black Duck is a tool to analyse your software components. +It can create a Software Bill of Material and it will show the vulnerabilities, license risks and operational risks. + +This information can also be retrieved by open-source tooling and in this blog we will show a way of checking the license information of Black Duck against the license information found on various public places. + +These places can be [ClearlyDefined](https://clearlydefined.io/), package managers and in the actual source by using [ScanCode](https://github.com/nexB/scancode-toolkit). + +## Why? + +In order to comply to all license obligations, we need to create a proper Software Bill of Material. +When you know the correct composition of your software, you can make a list with all licenses and make sure everything is in line with your own policies. + +There are various tools to do this. [Snyk](https://snyk.io/), [Whitesource](https://www.whitesourcesoftware.com/open-source-scan-lp) and [Black Duck](https://www.synopsys.com/software-integrity/security-testing/software-composition-analysis.html) are commercial tools, but there are also FOSS alternatives such as [OSS Review Toolkit](https://github.com/oss-review-toolkit/ort). + +In almost all cases, license information is curated by people. It's very hard to automatically detect the licenses. If one adds a message in a README that a project is definitely **NOT LGPL**, a tool based on text scan will also mark it LGPL. :) Tools need better [NLP](https://en.wikipedia.org/wiki/Natural_language_processing) support. + +All tools have some kind of knowledge database with this information. +We want to check the differences in these databases. + +## Tools + +In this setup we will use the following tools: +- [Black Duck](https://www.synopsys.com/software-integrity/security-testing/software-composition-analysis.html) +- [SPDX-builder](https://github.com/philips-software/spdx-builder) +- [BOM-Base](https://github.com/philips-software/bom-base) +- [ScanCode](https://github.com/nexB/scancode-toolkit) +- [Bompare](https://github.com/philips-labs/bompare) + +### Black Duck +Black Duck is a tool created by Synopsys. It has a scanner which can be used in a build pipeline which analyses different types of projects. The scanner will send the package tree information to a server. On the server it will use a knowledge base to add information to the packages. Information can be about vulnerabilities, outdated packages and license information. +You can add policies in Black Duck so developers are notified when f.e. unwanted licenses are used or when there are critical vulnerabilities found in the project. Black Duck also has various reports and an API! :) We developers love APIs! + +### SPDX-builder + +**SPDX** + +[SPDX](https://spdx.dev/) is an open standard for communicating software bill of material information. It's an ISO standard ([ISO/IEC 5962:2021](https://www.iso.org/standard/81870.html)). Not all tools support the SPDX format. That's why Philips Research created a tool which can create SPDX outputs from various input sources. This tool is used in the analysis of many existing solutions. + +**Input** + +Input sources can be taken from: +1. the output of [OSS Review Toolkit](https://github.com/oss-review-toolkit/ort) +1. the REST API of Black Duck +1. the "tree" output of many build environment + +**Additional datasources** + +SPDX-builder can also use other tools to enrich the information about the packages. It can use BOM-Base for example to add license information to packages. This is what we're going to use in this setup. + +**Output** + +SPDX-builder can produce a SPDX-file, but there's also a special mode where it produces a "tree" output with all package names are normalized to [package-url](https://github.com/package-url/purl-spec)s. + +### BOM-Base +BOM-Base is a knowledge database written by Philips Research. Given a package-url, it will try to find as much information about it as possible. Depending on the type of package it will kick-off some harvesters. It will look into [ClearlyDefined](https://clearlydefined.io) and gets the information. + +In case op NPM, PyPi, Maven or NuGet it will also go to the package manager repositories to get the information (and source) there. When source is found, it will also trigger ScanCode to do a full scan of the code. + +When all harvesters are done, the data is stored in the database and there's a simple UI to view the results. + +By using the API you can also curate the database. + +### ScanCode +ScanCode discovers license information from the source code of a package. It looks for all kind of places and tries to figure out what license is used. + +### Bompare +This is a tool (build by Philips Research) used to compare 2 SPDX files and show the diffs. + +## Setup + +This is the setup of our experiment. + +!! Image goes here !! + +We're using several projects which are already present in Blackduck. Blackduck is responsible for the correct Software Bill of Material. In this experiment, we're not going to validate the correctness of the SBOM, we're going to validate the License information. + +### Blackduck SPDX-file +We're using SPDX-builder to create a SPDX file from Blackduck. At the time of this experiment, Blackduck did not have a proper way to generate an SPDX file so we use SPDX-builder for that. +The SPDX file contains package-information with licenses retrieved one-to-one from Blackduck. + +### External Sources SPDX-file +We're using SPDX-builder to create a tree output from Blackduck. This tree has converted all components names into package-urls (purls). + +With this purl tree output, we're going to use SPDX-builder to create a SPDX-file. +Since we only have the purl, we need another source to get the license information. We're using BOM-base for this. + +When you send a request to BOM-base to get information about a package, BOM-base will look in its database and returns the information if present. If not, it will return only the purl and will trigger harvesting in the background. We do this async, because scanning 1000 javescript packages will take some time and you don't want your build pipeline to wait for that. +So when you run the tests against a clean installed BOM-base, you will have to run the tests twice after BOM-base is finished harvesting the data. + +### Curation +In BOM-base we have to run some manual curations on license information and on packages. + +!!! more info about the curations !!! + +### Compare +Now we can compare both SPDX-files and see the differences. + +### Aggregate results +We aggregate the results and use a lot of production projects to do a proper analysis of packages. + +!!! more info about aggregating data !!! + +## Conclusions