diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/404.html b/404.html new file mode 100644 index 0000000..0604af0 --- /dev/null +++ b/404.html @@ -0,0 +1 @@ +
This template presents users with a base configuration for a GitLab CI/CD pipeline. In this section, the guide aims to provide readers with some basic understanding of the pipeline defined in the configuration file .gitlab-ci.yml
.
That being said, readers would certainly benefit from reading up on introductory CI/CD concepts as introduced by GitLab's Docs.
The defined pipeline assumes a GitHub flow which only relies on feature branches and a main
(default) branch.
With reference to the diagram above, we have the following pointers:
git checkout -b <NAME_OF_BRANCH>
) to introduce changes to the source.main
.main
are pulled to the feature branch itself on a consistent basis. This allows the feature branch to possess the latest changes pushed by other developers through their own feature branches. In the example above, commits from the main
branch following a merge of the add-hidden-layer
branch are pulled into the change-training-image
branch while that branch still expects further changes.git pull
can be used to pull and sync these changes. However, it's recommended that developers make use of git fetch
and git log
to observe incoming changes first rather than pulling in changes in an indiscriminate manner.main
branch, it's recommended that they are kept minimal, at least for GitHub flow (other workflows might not heed such practices).As we move along, we should be able to relate parts of the flow described above with the stages defined by the default GitLab CI pipeline.
Before we can make use of the GitLab CI pipeline, we would have to define the following variable(s) for the pipeline beforehand:
HARBOR_ROBOT_CREDS_JSON
: A JSON formatted value that contains encoded credentials for a robot account on Harbor. This is to allow the pipeline to interact with the Harbor server. See the next section on how to generate this value/file.
GCP_SERVICE_ACCOUNT_KEY
: A JSON formatted value that contains encoded credentials for a service account on your GCP project. This is to allow the pipeline to interact with the Google Artifact Registry. See here on how to generate this file.
To define CI/CD variables for a project (repository), follow the steps listed here. {%- if cookiecutter.platform == 'onprem' %} The environment variable HARBOR_ROBOT_CREDS_JSON
needs to be a File
type. {%- elif cookiecutter.platform == 'gcp' %} The environment variable GCP_SERVICE_ACCOUNT_KEY
needs to be a File
type.
{%- if cookiecutter.platform == 'gcp' %}
{%- endif %}
The variable HARBOR_ROBOT_CREDS_JSON
will be used to populate the files /kaniko/.docker/config.json
and /root/.docker/config.json
for kaniko
and crane
to authenticate themselves before communicating with AI Singapore's Harbor registry. You may create the JSON file like so:
echo -n <HARBOR_USERNAME>:<HARBOR_PASSWORD> | base64
+
$Env:cred = "<HARBOR_USERNAME>:<HARBOR_PASSWORD>"
+$Env:bytes = [System.Text.Encoding]::ASCII.GetBytes($cred)
+$Env:base64 = [Convert]::ToBase64String($bytes)
+echo $base64
+
Using the output from above, copy and paste the following content into a CI/CD environment variable of type File
(under Settings
-> CI/CD
-> Variables
-> Add variable
):
{
+ "auths": {
+ "registry.aisingapore.net": {
+ "auth": "<ENCODED_OUTPUT_HERE>"
+ }
+ }
+}
+
{%- endif %}
In the default pipeline, we have 3 stages defined:
test
: For every push to certain branches, the source code residing in src
will be tested.deploy-docs
: This stage is for the purpose of deploying a static site through GitLab Pages. More on this stage is covered in "Documentation".build
: Assuming the automated tests are passed, the pipeline will build Docker images, making use of the latest source.These stages are defined and listed like so:
...
+stages:
+ - test
+ - deploy-docs
+ - build
+...
+
The jobs for each of the stages are executed using Docker images defined by users. For this, we have to specify in the pipeline the tag associated with the GitLab Runner that has the Docker executor. {%- if cookiecutter.platform == 'onprem' %} The on-prem
tag calls for runners within our on-premise infrastructure so on-premise services can be accessed within our pipelines. {%- elif cookiecutter.platform == 'gcp' %} The gcp
tag calls for runners on our GCP infrastructure so it can use the GCP services within our pipelines.
The ./conda
folder generated from creating the Conda environment is then cached and to be used for other jobs, saving time from rebuilding the environment in every job that requires it. The $CI_COMMIT_REF_SLUG
key refers to the branch name modified to be code-friendly. In this case, it is used as a namespace to store all the files that is cached within this branch.
```yaml default: tags:
{%- if cookiecutter.platform == 'onprem' %} - on-prem {%- elif cookiecutter.platform == 'gcp' %} - gcp {%- endif %} ... ```
Let's look at the job defined for the test
stage first:
...
+test:conda-build:
+ stage: test
+ image:
+ name: continuumio/miniconda3:23.10.0-1
+ script:
+ - conda env create -f {{cookiecutter.repo_name}}-conda-env.yaml -p ./conda/{{cookiecutter.repo_name}}
+ rules:
+ - if: $CI_MERGE_REQUEST_IID
+ changes:
+ - {{cookiecutter.repo_name}}-conda-env.yaml
+ - if: $CI_PIPELINE_SOURCE == "push"
+ changes:
+ - {{cookiecutter.repo_name}}-conda-env.yaml
+ - if: $CI_PIPELINE_SOURCE == "web"
+ changes:
+ - {{cookiecutter.repo_name}}-conda-env.yaml
+ - if: $CI_PIPELINE_SOURCE == "web" && $BUILD_CONDA
+ - if: $CI_COMMIT_TAG
+ when: never
+ needs: []
+...
+
First of all, this test:conda-build
job will only execute on the condition that the defined rules
are met. In this case, the job will only execute for the following cases:
{{cookiecutter.repo_name}}-conda-env.yaml
are detected. This is to prevent automated tests from running for pushes made to feature branches with merge requests when no changes have been made to files for which tests are relevant. Otherwise, tests will run in a redundant manner, slowing down the feedback loop.git push <remote> <tag_name>
), the job will not run.The job does not have any jobs that it needs to wait for, thus the needs
section is populated with []
.
The next job in the test
stage is as follows:
...
+test:pylint-pytest:
+ stage: test
+ image:
+ name: continuumio/miniconda3:23.10.0-1
+ before_script:
+ - source activate ./conda/{{cookiecutter.repo_name}}
+ - pip install -r dev-requirements.txt
+ script:
+ - pylint src --fail-under=7.0 --ignore=tests --disable=W1202
+ - pytest src/tests --junitxml=./rspec.xml
+ rules:
+ - if: $CI_MERGE_REQUEST_IID
+ changes:
+ - src/**/*
+ - conf/**/*
+ - if: $CI_PIPELINE_SOURCE == "push"
+ - if: $CI_PIPELINE_SOURCE == "web"
+ - if: $CI_COMMIT_TAG
+ when: never
+ artifacts:
+ paths:
+ - rspec.xml
+ reports:
+ junit: rspec.xml
+ needs:
+ - job: test:conda-build
+ optional: true
+...
+
In this case with the test:pylint-pytest
job, the job will only execute for the following cases:
src
or conf
are detected.git push <remote> <tag_name>
), the job will not run.The job would wait for test:conda-build
to be completed first before this job can be executed. The optional: true
option is set so that it would still run if the conda-build
job doesn't since it has already been cached to be used in this job.
The job defined above fails under any of the following conditions:
src/tests
.The job would have to succeed before moving on to the build
stage. Otherwise, no Docker images will be built. This is so that source code that fail tests would never be packaged.
The job would generate a rspec.xml
file as an artifact so that you can read the test results in the GitLab UI. More information about this can be found here.
The template has thus far introduced a couple of Docker images relevant for the team. The tags for all the Docker images are listed below:
{{cookiecutter.registry_project_path}}/data-prep
{{cookiecutter.registry_project_path}}/model-training
The build
stage aims at automating the building of these Docker images in a parallel manner. Let's look at a snippet for a single job that builds a Docker image:
```yaml ... build:data-prep-image: stage: build image: name: gcr.io/kaniko-project/executor:debug entrypoint: [""]
{%- if cookiecutter.platform == 'gcp' %} variables: GOOGLE_APPLICATION_CREDENTIALS: /kaniko/.docker/config.json {%- endif %} before_script: {%- if cookiecutter.platform == 'onprem' %} - "[[ -z ${HARBOR_ROBOT_CREDS_JSON} ]] && echo 'HARBOR_ROBOT_CREDS_JSON variable is not set.' && exit 1" {%- elif cookiecutter.platform == 'gcp' %} - "[[ -z \({GCP_SERVICE_ACCOUNT_KEY} ]] && echo 'GCP_SERVICE_ACCOUNT_KEY variable is not set.' && exit 1" {%- endif %} script: - mkdir -p /kaniko/.docker {%- if cookiecutter.platform == 'onprem' %} - cat (HARBOR_ROBOT_CREDS_JSON > /kaniko/.docker/config.json {%- elif cookiecutter.platform == 'gcp' %} - cat \(GCP_SERVICE_ACCOUNT_KEY > /kaniko/.docker/config.json {%- endif %} - >- /kaniko/executor --context "\)" --dockerfile "\)-cpu.Dockerfile" --destination "{{cookiecutter.registry_project_path}}/data-prep:\)}/docker/{{cookiecutter.repo_name}" rules: - if: $CI_MERGE_REQUEST_IID changes: - docker/{{cookiecutter.repo_name}}-cpu.Dockerfile - src//* - conf//* - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH - if: $CI_PIPELINE_SOURCE == "web" && $BUILD_ALL - if: $CI_PIPELINE_SOURCE == "web" && $BUILD_DATAPREP needs: - job: test:pylint-pytest optional: true ... ```
Note
You would have noticed that the jobs for building images utilise the command /kaniko/executor
as opposed to docker build
which most users would be more familiar with. This is due to the usage of kaniko
within a runner with a Docker executor. Using Docker within Docker (Docker-in-Docker) requires privileged mode that poses several security concerns. Hence, the image gcr.io/kaniko-project/executor:debug
is being used for all build
jobs related to building of Docker images. That being said, the flags used for kaniko
corresponds well with the flags usually used for docker
commands.
{%- if cookiecutter.platform == 'onprem' %} {%- set jsonfile = 'HARBOR_ROBOT_CREDS_JSON' -%} {%- elif cookiecutter.platform == 'gcp' %} {%- set jsonfile = 'GCP_SERVICE_ACCOUNT_KEY' -%}
Before it goes through the job, it will check whether {{jsonfile}}
has been set in the CI/CD variables. Otherwise, it will prematurely stop the job with the error, preventing the job from running any further and freeing the CI worker faster to work on other jobs in the organisation.
Just like with the test
jobs, the each of the jobs under build
will execute under certain conditions:
src
, conf
, scripts
, or the relevant Dockerfile itself. If there are changes, the job will be executed. An opened merge request is detected through the predefined variable CI_MERGE_REQUEST_IID
.CI_DEFAULT_BRANCH
) of the repo, which in most cases within our organisation would be main
, the job would execute as well. Recalling the test
stage, any pushes to the repo would trigger the automated tests and linting. If a push to the main
branch passes the tests, all Docker images will be built, regardless of whether changes have been made to files relevant to the Docker images to be built themselves.BUILD_ALL
or BUILD_DATAPREP
(or BUILD_MODEL
for the model training image) variable has been set. It can be set to any value, but we can set it to true
by default.The jobs in the build
stage requires the test:pylint-pytest
job to be successful, otherwise it would not run.
Images built through the pipeline will be tagged with the commit hashes associated with the commits that triggered it. This is seen through the usage of the predefined variable CI_COMMIT_SHORT_SHA
.
As mentioned, pushes to the default branch would trigger builds for Docker images and they would be tagged with the commit hash. However, such commit hashes aren't the best way to tag "finalised" Docker images so the usage of tags would be more appropriate here. Hence, for the job defined below, it would only trigger if a tag is pushed to the default branch and only the default branch. The tag pushed (git push <remote> <tag>
) to the default branch on the remote would have the runner retag the Docker image that exists on Harbor with the tag that is being pushed. The relevant images to be retagged are originally tagged with the short commit hash obtained from the commit that was pushed to the default branch before this.
```yaml ... build:retag-images: stage: build image:
{%- if cookiecutter.platform == 'onprem' %} name: gcr.io/go-containerregistry/crane:debug entrypoint: [""] {%- elif cookiecutter.platform == 'gcp' %} name: google/cloud-sdk:debian_component_based variables: GOOGLE_APPLICATION_CREDENTIALS: /gcp-sa.json {%- endif %} before_script: {%- if cookiecutter.platform == 'onprem' %} - "[[ -z \({HARBOR_ROBOT_CREDS_JSON} ]] && echo 'HARBOR_ROBOT_CREDS_JSON variable is not set.' && exit 1" {%- elif cookiecutter.platform == 'gcp' %} - "[[ -z ({GCP_SERVICE_ACCOUNT_KEY} ]] && echo 'GCP_SERVICE_ACCOUNT_KEY variable is not set.' && exit 1" {%- endif %} script: {%- if cookiecutter.platform == 'onprem' %} - cat \(HARBOR_ROBOT_CREDS_JSON > /root/.docker/config.json - crane tag {{cookiecutter.registry_project_path}}/data-prep:\) \({\)CI_COMMIT_TAG} - crane tag {{cookiecutter.registry_project_path}}/model-training:\) \({\)CI_COMMIT_TAG} {%- elif cookiecutter.platform == 'gcp' %} - cat \(GCP_SERVICE_ACCOUNT_KEY > /gcp-sa.json - gcloud container images add-tag "{{cookiecutter.registry_project_path}}/data-prep:\)/data-prep:\)}" "{{cookiecutter.registry_project_path}" - gcloud container images add-tag "{{cookiecutter.registry_project_path}}/model-training:\({CI_COMMIT_SHORT_SHA}" "{{cookiecutter.registry_project_path}}/model-training:\)" {%- endif %} rules: - if: $CI_COMMIT_TAG && $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH needs: - job: build:data-prep-image optional: true - job: build:model-training-image optional: true ... ```
The stages and jobs defined in this default pipeline is rudimentary at best as there is much more that could be done with GitLab CI. Some examples off the top:
There's much more that can be done but whatever has been shared thus far is hopefully enough for one to get started with CI/CD.
The boilerplate packages generated by the template are populated with some NumPy formatted docstrings. What we can do with this is to observe how documentation can be automatically generated using Sphinx, with the aid of the Napoleon extension. Let's build the HTML asset for the documentation:
# From the root folder
+conda activate {{cookiecutter.repo_name}}
+sphinx-apidoc -f -o docs src
+sphinx-build -b html docs public
+
Open the file public/index.html
with your browser and you will be presented with a static site similar to the one shown below:
Browse through the site and inspect the documentation that was automatically generated through Sphinx.
Documentation generated through Sphinx can be served on GitLab Pages, through GitLab CI/CD. With this template, a default CI job has been defined in .gitlab-ci.yml
to serve the Sphinx documentation when pushes are done to the main
branch:
...
+pages:
+ stage: deploy-docs
+ image:
+ name: continuumio/miniconda3:23.10.0-1
+ before_script:
+ - source activate ./conda/{{cookiecutter.repo_name}}
+ - pip install -r docs-requirements.txt
+ script:
+ - sphinx-apidoc -f -o docs src
+ - sphinx-build -b html docs public
+ artifacts:
+ paths:
+ - public
+ only:
+ - main
+ needs:
+ - test:conda-build
+...
+
The documentation page is viewable through the following convention: <NAMESPACE>.gitlab.aisingapore.net/<PROJECT_NAME>
or <NAMESPACE>.gitlab.aisingapore.net/<GROUP>/<PROJECT_NAME>
.
{"use strict";/*!
+ * escape-html
+ * Copyright(c) 2012-2013 TJ Holowaychuk
+ * Copyright(c) 2015 Andreas Lubbe
+ * Copyright(c) 2015 Tiancheng "Timothy" Gu
+ * MIT Licensed
+ */var Wa=/["'&<>]/;Vn.exports=Ua;function Ua(e){var t=""+e,r=Wa.exec(t);if(!r)return t;var o,n="",i=0,s=0;for(i=r.index;i