Skip to content

Pipeline Validation

Rajan, Sarat edited this page Aug 15, 2019 · 1 revision

Pipeline Validation

Structural validation of pipeline files (including templates) will enable the pipeline to provide more clear error messages about file format errors and logical expectations. Performing the validation will also allow the pipeline to fail early for certain types of errors instead of at runtime, or worse, continuing with undefined results.

We aim to replace:

java.lang.NullPointerException
[Pipeline] echo
[org.jenkinsci.plugins.docker.workflow.ImageNameTokens.<init>(ImageNameTokens.java:47), sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method), sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62), sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45), java.lang.reflect.Constructor.newInstance(Constructor.java:423), 
...

with something more like:

 "message": "PIPELINE VALIDATION FAILED",
  "errors": [
    {
      "path": {
          "path": "pipeline.yml"
      },
      "messages": [
        "#/steps/1: required key [image] not found",
      ]
    },

In this design, we are focused on structural validation only - i.e. the type of validation typically enforced by a schema.

Summary of Approach

We will define a JSON Schema for the pipeline.yml, step template, and config template files. Much of this work has already been done (see Schema Definitions below).

The validation will be performed in the pipeline code, after each file is read but before it is internally parsed.

Although the pipeline will fail before step execution starts if validation errors are encountered, it will attempt to continue as much parsing/retrieval/validation as possible in order to return as many errors in a single run as possible.

Expected error output

The exact format may change, but we'll aim to provide the file name, along with all validation errors, e.g.:

File: pipeline.yml
#/pipeline/steps/0: required key [name] not found

File: https___github.com/tmobile/_scm_poet_poet-pipeline-templates.git_master/env.yml
#/pipeline/environment/SETTINGS: expected type: String, found: JSONArray

Iterative Validation

At pipeline start, we don't have enough information to fully validate the pipeline.

With templates, the pipeline may include additional steps and configuration, each of which may include other steps and configuration, and so on. Each include may be local or remote, and is fully encapsulated and may require its own set of repositories or secrets, which are unknown ahead of time.

This requires many iterative validation passes -- validate the main file, then parse and retrieve any included files. For each included file, validate and then parse and retrieve any includes, and so on.

Finally, we may wish to validate the final merged pipeline file once all includes have been processed.

In order to efficiently perform the validation, it will be tied closely to the existing reading code.

Existing Schema Definitions

Note we are defining our schemas as yml instead of json

Additional Schema Changes

The existing schemas focus on type information. They will be extended to start including specific limits:

  • limit number of items in lists to some reasonable number
  • limit string lengths to some reasonable number

Opting out of validation

Since validation is not backward compatible, we'll add the ability to opt-out of validation by adding a pipeline option: validation: false, to avoid disruption of existing pipelines e.g.

wf.start(agent_label: "Linux", validation: false)

Implementation

Schema validation

We'll use the json-schema library to perform validation. Our files and schemas will need to be converted to JSON as part of validation. This conversion is straightforward, but it provides another source of error -- for example if the input is not valid YAML to start with, we'll have more basic structural errors to report instead of schema validation errors.

The core validation methods will have to be marked @com.cloudbees.groovy.cps.NonCPS, as the json-schema library classes are not Serializable.

Schema location

Our existing schemas are part of the poet-pipeline repository, under src/resources, so they can quickly be retrieved at runtime using libraryResource.

They will still be referenced/linked or even download and included as part of the Wiki.

Alternatives Considered

Validation as a Container

We had considered performing the validation externally as a container.

Because the full file retrieval is somewhat complicated with includes, it's difficult to perform validation without also understanding how to retrieve and merge the remote files.

We had considered moving the file retrieval and merging into this proposed validation container. This is still complicated due to credential handling. Only the pipeline knows how to retrieve and provide credentials from jenkins. We don't know ahead of time what credentials are required -- as new includes are downloaded and processed, we may see that we need new credentials to proceed.

Still, it would be possible to retrieve this separation if we used an iterative approach --

  • The pipeline hands off the initial pipeline.yml to the validation container with any initial credentials
  • The validation container performs as much downloading, merging, and validation as it can with its given credentials.
    • If it gets stuck, it provides a "partial" result, which includes the current state and additional credentials needed
    • The pipeline sees the file is not complete, and runs the validation container again providing its current state and additional credentials
    • we loop until the final file is complete

This seemed like a somewhat complex mechanism, and it's not possible to know ahead of time how many iterative runs will be required.

It's possible we may want to revisit this in the future, but we felt it was best to use a simpler, in-line approach for structural validation.