-
Notifications
You must be signed in to change notification settings - Fork 28
License detection
This page summarizes the first version of the implementation and integration of the QMSTR build graphs into FASTEN's toolchain.
Developed by project partner Endocode AG, Quartermaster (QMSTR) is an Open Source license compliance solution that aims to establish industry standards regarding the documentation of Open Source license information across the supply chain. The command-line tool integrates into the build system to learn about the software product, its sources, and dependencies and then performs an analysis of the gathered information. Its goal is to reduce risk and friction in the reuse of Open Source code. With its bidirectional connection, QMSTR has the role to detect license compliance, collecting information about the dependencies from the FASTEN call graphs, and then reporting it back to the FASTEN Knowledge Base with the license and compliance information.
The first step for this process is the generation of the concrete build graph that consists of information about all the generated artifacts that will be distributed together with the necessary source code and dependency information.
While the build graph isn’t trivial, the construction and analysis of it are vital for complex projects, enhancing the accuracy of the license and compliance analysis since the only important files for it are the ones that are being shipped within the package.
QMSTR was born as a command-line tool to be launched locally.
This, however, did not align with the ultimate purpose of integrating FASTEN in CI/CD pipelines (§§ 4.1 4.3.2, D6.3).
QMSTR has been integrated into FASTEN's toolchain as a dependency and can be launched through its dedicated plugin.
For the first version of D4.1, we concentrate on showcasing the build graph of any Maven project.
To make this integration happen, QMSTR moved to the cloud: the build graph is being built in the cloud while building the Maven project. All these tasks are being performed by different containers. A fully-distributed multi-pod architecture is currently under development.
The FASTEN QMSTR plugin is triggered by the FASTEN server; however, it can also be launched as a standalone plugin for debugging purposes (step-by-step guide, video).
QMSTR/FASTEN integration: second version (analysis)
The License and Compliance plugin now also analyzes Maven projects using scancode.
As a result of this phase, our graph database is augmented with license and compliance information.
The left part of the graph consists of the usual build graph, having, in this case, a single (Java) package node in green as the central node. License and compliance information is on the right, having the analyzer node in pink right in the middle.
This was part of deliverable 4.2 "Detection of license obligations and metadata and application to the call graphs".
QMSTR/FASTEN integration: third version (report)
As a result of the analysis phase,
the License and Compliance plugin produces
a Kafka message
having
this format.
More specifically,
a custom QMSTR reporter
interrogates the internal graph database
to fetch license information,
formats the result into a message
having the previous format,
and sends it back to Kafka.
This was part of the deliverable 4.3 "D4.3 Implementation of a license compliance and compatibility solver operating on the call graphs - Version 1".
QMSTR is a modular application composed of three main phases: build, analysis, and report.
To achieve this task,
we developed a
custom reporter
(phase 3 module)
to fetch data of interest
and
format it accordingly to
this format.
More specifically,
this version returns
licenses
and
SHA-1 hashes
of
*.java
and
*.class
files.
So far, QMSTR has been using classic gRPC calls to orchestrate its different modules, but to achieve better maintainability, robustness, and (horizontal) scalability, it is progressively moving towards a message broker-based solution. The custom reporter waits for a RabbitMQ message before starting its execution: this will make sure that reporting starts once the build and analysis phase are over. From Quartermaster to the license detector & feeder plugins
FASTEN’s original plan was to detect licenses with Quartermaster, the open-source license compliance tool developed by Endocode AG. As a first step, Quartermaster builds the software project in order to extrapolate build information and store it in a so-called “build graph”. It then scans the entire project looking for license text in all files and augments the build graph accordingly. As the last step, Quartermaster queries the build graph so that only those licenses that actually end up in the final package can be considered for a compliance check.
Apart from the technical difficulty of readapting a distributed batch job like Quartermaster into a self-contained plugin to be run in a streaming application (problem arisen
by an intrinsic architecture, incompatibility), letting Quartermaster build the software project violates the separation of concerns principle. That is to say, FASTEN’s OPAL plugin already generates call graphs. A license detector plugin should only detect licenses and augment the call graph with such findings. OPAL should be the only plugin responsible for the creation of call graphs, not the one intended to detect licenses.
Accordingly, a new, streamlined plugin simply called “license detector plugin” only takes care of running a license scanner, and report back these findings to Kafka; no need to build projects anymore. The “license feeder” plugin will subsequently consume that message and augment the call graph.
Pull Request #301 contains the source code of the two previously-mentioned plugins.
Its description lists the steps that have been necessary for the two plugins to accomplish a successful license detection, as well as their progress.
First, the license detector plugin consumes a Kafka record belonging to a joint topic that combines:
• fasten.RepoCloner.out, meaning that the repository has been cloned, and
• fasten.MetadataDBJavaExtension.out, issued as soon as the call graph has been stored into the database. The detector then proceeds to scan the entire project, looking for license text inside files. Those findings are properly formatted into a new Kafka record,
fasten.LicenseDetector.out. Licenses are detected both at the file and at the pack- age level. For the latter category, the detector scans the main pom.xml file. In case the developer hasn’t specified any license in the pom.xml file and the repository is hosted on GitHub, the detector contacts their API to retrieve the so-called “outbound license”. The license feeder will then consume the fasten.LicenseDetector.out record and insert license findings into the call graph, only for those files that are present in the database.