Skip to content

Latest commit

 

History

History
102 lines (69 loc) · 7.88 KB

ci.md

File metadata and controls

102 lines (69 loc) · 7.88 KB

Ozone CI with Github Actions

The Ozone project uses Github Actions, (GA), for its CI system. GA are implemented with workflows, which are groups of jobs combined to accomplish a CI task, all defined in a single yaml file. The Ozone workflow yaml files are here.

Workflows

build-branch Workflow

This is the most important workflow. It runs the tests that verify the latest commits.

It is triggered each time a pull request is created or synchronized, (ie when the remote branch is pushed to.) It is also "scheduled" on the master branch twice a day, (00:30 and 12:30). (Those are the runs here which are marked "scheduled", and have no branch label.)

The build-branch workflow is divided into a number of different jobs, most of which run in parallel. Each job is described below.

Some of the jobs are defined using GA's "build matrix" feature. This allows you define similar jobs with a single job definition. Any differences are specified by a list of values for a specific key. For example, the "compile" job uses the matrix feature to generate the images with different versions of java. There, the matrix is specified by the "java" key which has a list of values describing which version of java to use, (8 or 11.)

The jobs currently using the "build matrix" feature are: "compile", "basic", "acceptance" and "integration". These jobs also use GA's fail-fast flag to cancel the other jobs in the same matrix, if one fails. For example, in the "compile" job, if the java 8 build fails, the java 11 build will be cancelled due to this flag, but the other jobs outside the "compile" matrix are unaffected.

While the fail-fast flag only works within a matrix job, the "Cancelling" workflow, (described below,) works across jobs.

build-info job

The build-info job script runs before the others and determines which of the other jobs are to be run. If the workflow was triggered by some event other than a PR, then all jobs/tests are run. They are also all run if the PR has a label containing the following string, "full tests needed".

Otherwise, build-info first generates a list of files that were changed by the PR. It matches that list against a series of regex's, each of which is associated with a different job. It sets the appropriate flag for each match. Those boolean flags are used later in the run to decide whether the corresponding job should be run

For example, a regex like the following is used to determine if the Kubernetes flag should be set.

    local pattern_array=(
        "^hadoop-ozone/dev-support/checks/kubernetes.sh"
        "^hadoop-ozone/dist/src/main/k8s"
    )

compile job

Builds the Java 8 and 11 versions of the jars, and saves the java 8 version for some of the subsequent jobs.

basic job

Runs a subset of the following subjobs depending on what was selected by build-info

  • author: Verifies none of the Java files contain the @author annotation
  • bats: Checks bash scripts, (using the Bash Automated Testing System)
  • checkstyle: Runs 'mvn checkstyle' plugin to confirm Java source abides by Ozone coding conventions
  • docs: Builds website with Hugo
  • findbugs: Runs spotbugs static analysis on bytecode
  • rat (release audit tool): Confirms source files include licenses
  • unit: Runs 'mvn test' for all non integration tests

dependency job

Confirms that the list of jars included in the current build matches the expected ones defined here

If they don't match, it describes how to make the updates to include the changes, (if they are intentional). Otherwise, the changes should be removed.

acceptance job

Runs smoketests using robot framework and a real docker compose cluster. There are three iterations, "secure", "unsecure", and "misc", each running in parallel, as different matrix configs.

kubernetes job

Runs k8s tests

integration job

Runs 'mvn test' for all integration/minicluster tests, split into multiple subjobs, by a matrix config.

coverage job

Merges the coverage data from the following jobs that were run earlier:

  • acceptance
  • basic
  • integration

Cancelling Workflow

This workflow is triggered each time a build-branch workflow is triggered. It reduces GA usage, by cancelling PR workflows that have a job failure. Specifically, it checks all PR workflows running at the time it is invoked. It cancels any of those which have a failed job, (at that time). Any PR workflows with jobs that fail after that will be caught by a subsequent run of the "Cancelling" workflow.

Note that it checks the status of all PR workflows running at that time, not just the one that triggered it.

close-prs Workflow

This workflow is scheduled each night at midnight; it closes PR's that have not been updated in the last 21 days, while letting the author know they are free to reopen.

comment-commands Workflow

This workflow is triggered each time a comment is added/edited to a PR. It checks to see if the body of the comment begins with one of the following strings and, if so, invokes the corresponding command.

  • /close : Close pending pull request (with message saying author is free to reopen.)
  • /help : Show all the available comment commands
  • /label : Add new label to the issue: /label "label"
  • /pending : Add a REQUESTED_CHANGE type review to mark issue non-mergeable: /pending "reason"
  • /ready : Dismiss all the blocking reviews
  • /retest : Provide help on how to trigger new CI build

Old/Deprecated Workflows

The following workflows no longer run but still exist on the actions page for historical reasons:

Note that the deprecated build-branch has the same name as the current build-branch. (They can be distinguished by the URL.)

Tips

  • When a build of the Ozone master branch fails, its artifacts are stored here.
  • To trigger rerunning the tests, push a commit like this to your PR: git commit --allow-empty -m 'trigger new CI check'
  • This wiki contains tips on running tests locally.
  • This wiki contains tips on special handling of the CI system, such as "Executing one test multiple times", or "ssh'ing in to the CI machine while the tests are running".