-
Notifications
You must be signed in to change notification settings - Fork 31
Pipeline Validation
Structural validation of pipeline files (including templates) will enable the pipeline to provide more clear error messages about file format errors and logical expectations. Performing the validation will also allow the pipeline to fail early for certain types of errors instead of at runtime, or worse, continuing with undefined results.
We aim to replace:
java.lang.NullPointerException
[Pipeline] echo
[org.jenkinsci.plugins.docker.workflow.ImageNameTokens.<init>(ImageNameTokens.java:47), sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method), sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62), sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45), java.lang.reflect.Constructor.newInstance(Constructor.java:423),
...
with something more like:
"message": "PIPELINE VALIDATION FAILED",
"errors": [
{
"path": {
"path": "pipeline.yml"
},
"messages": [
"#/steps/1: required key [image] not found",
]
},
In this design, we are focused on structural validation only - i.e. the type of validation typically enforced by a schema.
We will define a JSON Schema for the pipeline.yml, step template, and config template files. Much of this work has already been done (see Schema Definitions below).
The validation will be performed in the pipeline code, after each file is read but before it is internally parsed.
Although the pipeline will fail before step execution starts if validation errors are encountered, it will attempt to continue as much parsing/retrieval/validation as possible in order to return as many errors in a single run as possible.
The exact format may change, but we'll aim to provide the file name, along with all validation errors, e.g.:
File: pipeline.yml
#/pipeline/steps/0: required key [name] not found
File: https___github.com/tmobile/_scm_poet_poet-pipeline-templates.git_master/env.yml
#/pipeline/environment/SETTINGS: expected type: String, found: JSONArray
At pipeline start, we don't have enough information to fully validate the pipeline.
With templates, the pipeline may include additional steps and configuration, each of which may include other steps and configuration, and so on. Each include may be local or remote, and is fully encapsulated and may require its own set of repositories or secrets, which are unknown ahead of time.
This requires many iterative validation passes -- validate the main file, then parse and retrieve any included files. For each included file, validate and then parse and retrieve any includes, and so on.
Finally, we may wish to validate the final merged pipeline file once all includes have been processed.
In order to efficiently perform the validation, it will be tied closely to the existing reading code.
Note we are defining our schemas as yml instead of json
- pipeline-schema-defs.yml shared definitions
- pipeline-schema.yml main pipeline.yml
- pipeline-step-include-schema.yml step include schema
- pipeline-config-include-schema.yml config include schema
The existing schemas focus on type information. They will be extended to start including specific limits:
- limit number of items in lists to some reasonable number
- limit string lengths to some reasonable number
Since validation is not backward compatible, we'll add the ability to opt-out of validation by adding a pipeline option: validation: false
, to avoid disruption of existing pipelines e.g.
wf.start(agent_label: "Linux", validation: false)
We'll use the json-schema library to perform validation. Our files and schemas will need to be converted to JSON as part of validation. This conversion is straightforward, but it provides another source of error -- for example if the input is not valid YAML to start with, we'll have more basic structural errors to report instead of schema validation errors.
The core validation methods will have to be marked @com.cloudbees.groovy.cps.NonCPS
, as the json-schema
library classes are not Serializable
.
Our existing schemas are part of the poet-pipeline
repository, under src/resources
, so they can quickly be retrieved at runtime using libraryResource.
They will still be referenced/linked or even download and included as part of the Wiki.
We had considered performing the validation externally as a container.
Because the full file retrieval is somewhat complicated with includes, it's difficult to perform validation without also understanding how to retrieve and merge the remote files.
We had considered moving the file retrieval and merging into this proposed validation container. This is still complicated due to credential handling. Only the pipeline knows how to retrieve and provide credentials from jenkins. We don't know ahead of time what credentials are required -- as new includes are downloaded and processed, we may see that we need new credentials to proceed.
Still, it would be possible to retrieve this separation if we used an iterative approach --
- The pipeline hands off the initial pipeline.yml to the validation container with any initial credentials
- The validation container performs as much downloading, merging, and validation as it can with its given credentials.
- If it gets stuck, it provides a "partial" result, which includes the current state and additional credentials needed
- The pipeline sees the file is not complete, and runs the validation container again providing its current state and additional credentials
- we loop until the final file is complete
This seemed like a somewhat complex mechanism, and it's not possible to know ahead of time how many iterative runs will be required.
It's possible we may want to revisit this in the future, but we felt it was best to use a simpler, in-line approach for structural validation.