-
Notifications
You must be signed in to change notification settings - Fork 6
Dockerfile Guide
We require the following minimum for a PhenoMeNal container image:
- From and Maintainer tags
- Versioning
- Relevant scripts must be executable
- Testing features
- Development done in the develop branch, and releases on master branch. Only these two branches are being built, on push (so push only when you have locally tested the container to build and work).
These are all explained in detail below.
FROM ubuntu:16.04
MAINTAINER PhenoMeNal-H2020 Project ( [email protected] )
- If possible, try to use ubuntu:16.04 as the base image. If this doesn't work, use what works. Alpine images are interesting to try!
- Set the maintainer as advised and add your email to that Google group, so that it someone contacts us regarding the container, you can answer.
- For R containers, use the newest release version of
container-registry.phenomenal-h2020.eu/phnmnl/rbase
. As of this writing that would bev3.4.1-1xenial0_cv0.2.12
. If your package is sufficiently simple, you could try as wellartemklevtsov/r-alpine:3.3.1
.
We adhere to the BioContainers metadata specification for Labels, so you need to include the following labels in addition to the version labels specified later:
LABEL software="mtbls-factor-vis"
LABEL base.image="artemklevtsov/r-alpine:3.3.1"
LABEL description="An R-based depiction for factors and their values in MetaboLights studies"
LABEL website="https://github.com/phnmnl/container-mtbls-factors-viz"
LABEL documentation="https://github.com/phnmnl/container-mtbls-factors-viz"
LABEL license="https://github.com/phnmnl/container-mtbls-factors-viz"
LABEL tags="Metabolomics"
We require that the Dockerfile contains the following labels which set the tool and container version (which is used to tag the image):
LABEL software.version="0.4.28"
LABEL version="0.1"
LABEL software="your-tool-name"
The numbers above are of course a simple example. The version
refers to the container version and should follow semantic versioning and you should only manage the major and minor version numbers (first two), the CI will manage the patch number. The software.version
field refers to the tool's version; simply copy it as it appears, but replace any spaces with a _
. These labels are used by the CI server to set the tag of the container image once it is pushed to our docker registry.
The million dollar question. The software.version
one is easy, as the minute that you point to a new version of the software that you're making a container for, you change that one to reflect that change. For the version
of the container itself, the guideline follows a short definition of what we understand here by API change:
An API change would be any modification which alters the way that a wrapper (like the one for Galaxy) needs to call the tool or process its outputs. So, if you are changing any of these:
- command name
- the number of arguments
- output file format(s)
- input file format(s)
- conditionality between arguments (one argument requires this or another argument).
Or anything else that changes the way you invoke the tool is an API change and will produce a backward incompatibility with whatever wrappers are using the tool. Think twice before introducing any of these, and if you can avoid them at reasonable cost, then do so.
Having said that:
- For very minor changes that don't change the API contract, you don't need to do anything. The CI will on its own update the patch number (which you don't control as a developer).
- If you are making a change in the container that is not small, but doesn't change the API still, like changing the base image or changing needed libraries, making the image smaller (that is so cool to do!), change the minor version number, as this changes are backwards compatibility with the wrappers.
- If you are making changes that break the API, bump the major version up and set the minor version to 0. For instance, if you are on
version="0.3"
you would go toversion="1.0"
. Again, avoid API changes if possible.
Be mindful that changing the version of the tool being containerised might introduce API changes, please do test those things before committing to your development branch.
If the main functionality of the container is based on a script (like a Python, Perl or R script), make sure that:
- The script is in the PATH defined in the image.
- The script is executable.
- The script has the adequate shebang (e.g.
#!/bin/bash
).
This means that the script can be executed through its name, regardless of the working directory where the instruction is generated. This is necessary for the correct execution of jobs by Galaxy in Kubernetes.
For the proper testing of the container in the CI before being pushed to the registry, you need to provide at least the following two files in the base directory of the repo:
-
test_cmds.txt
for lightweight testing, where you make sure that executables are in place or other simple checks. Each line is executed independently while on the CI, so don't write complete bash scripts here. This file is not added to the docker image -- it remains outside. Please make sure that the file has no empty lines, as it might break tests. -
runTest1.sh
for heavyweight testing using real data sets. An example file can be found here. Basically in this file you will install whatever software is needed to fetch data (such as wget), whatever is needed to run a test (if anything), run the main tool with the downloaded data, and then check that, either files are exactly what you expect, they contain something that you expect, or they at least where created. This file needs to be added to the image's path, be executable and have an appropriate shebang (e.g.#!/bin/bash
). It should aim to call the tool as any wrapper would do it, but considering that it is invoked "inside" the container by the container orchestrator during tests.
You should have development done on a branch called develop
on Github. When we are close to a release, or it is clear to the developer that the container is ripe for being released, only then a merge to master
should be done (or even better, a git flow release). One way to deal with this is to use the git flow branching pattern, and even easier, through a client that supports git flow (like gitkraken, Atlassian SourceTree, or the command line gitflow among others). For a comparison on how git flow makes your life easier, see this link.
Besides reading the Docker best practices for writing a Dockerfile, we recommend the following practices:
- Combine multiple RUNs
- Don't install "recommended" packages
- Clean apt-get caches and temporary files
- Don't keep build tools in the image
- Python scripts should be installable with
pip
- R scripts should be installable
- Don't upgrade the base image
And, when installing from a git
repository:
- Use shallow
git
clones - Point to a specific
git
commit/release
Read the following subsections for more details on these points.
Each RUN
statement in a Dockerfile creates and commits a new layer to the image, and once the layer is committed, you can no longer delete its files from the image; deletions in subsequent RUN
s will only hide the files. Files and packages that are required only when building the image (i.e., build-time dependencies) should be removed in the same RUN statement that created them. This approach avoids having those files and packages add useless weight to your image.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
git && \
libcurl4-openssl-dev \
libssl-dev \
r-base \
r-base-dev \
echo 'options("repos"="http://cran.rstudio.com", download.file.method = "libcurl")' >> /etc/R/Rprofile.site && \
R -e "install.packages(c('doSNOW','plotrix','devtools','getopt','optparse'))" && \
R -e "library(devtools); install_github('jianlianggao/batman/batman',ref='c02ac5cf9206373d2dde1b8e12548964f8379627'); remove.packages('devtools')" && \
apt-get purge -y \
git \
libcurl4-openssl-dev \
libssl-dev && \
r-base-dev \
apt-get -y clean && apt-get -y autoremove && \
rm -rf /var/lib/apt/lists/* /var/lib/{cache,log}/ /tmp/* /var/tmp/*
While this reduces readability, it also reduces massively the size of the resulting image. In this example, we need git
and r-base-dev
(and their dependencies) to install a package, but not for running later on. By installing (apt-get install ...
) and removing (apt-get purge ...
) in the same RUN
statement the image won't waste space with these packages and their dependencies.
apt-get
by default pulls in a lot of "recommended" packages that are not strictly necessary. Avoid installing them by passing the command-line option
--no-install-recommends
to apt-get
, as in the example above.
The installation process for packages and other software often leaves behind plenty of temporary files. Append these lines to your installation RUN
statement to remove them:
&& apt-get autoremove -y && apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
If the installation of your package requires built tools and packages that are not required at run time, make sure you delete them! Some examples packages: curl
, wget
, gcc
, make
, build-essentials
, python-pip
, and the list could go on.
Here is an example where we use curl
to install a script and then remove it with apt-get purge -y curl
, before concluding the RUN
statement and committing the layer:
RUN apt-get -y update \
&& apt-get -y install --no-install-recommends curl \
&& curl https://raw.githubusercontent.com/.../wrapper.py -o /usr/local/bin/wrapper.py && \
&& chmod a+x /usr/local/bin/wrapper.py \
&& apt-get purge -y curl \
&& apt-get autoremove -y && apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
By removing curl
, which is a small package, we save 15 MB in the final image. Packages such as gcc
have very large footprints!
Making your set of Python scripts pip-installable will increase the chances that others will use your Python code. This also allows to handle all the dependencies and simplify the package installation inside a docker container. There are plenty of guides on how to make your scripts pip-installable, here is one. This also make your scripts executable and available in the path.
It is not necessary that your package be available through the PyPip repository (if you want, better). It can still be installed from your git repo using pip, if it complies with the structure, using:
pip install -e git+https://github.com/<your-user>/<your-tool-repo>.git#egg=<your-tool-name>
To ease the installation of your R code inside the docker container, your R objects/set of scripts, should be made available as an R package. Instructions on how to package this can be found here (please note that even if the site advertises a book, it includes all the content to do what we need). This won't make your main R script executable though, so you still need to make sure that this is the case as advised above.
To reduce image size, clone the repository without its history (you're not going to need it since you won't be developing on that checkout). To do this, specify these options to git clone
: --depth 1 --single-branch --branch <name of the branch you need>
.
Here's a full example for our current Galaxy runtime:
RUN git clone --depth 1 --single-branch --branch feature/allfeats https://github.com/phnmnl/galaxy.git
WORKDIR galaxy
RUN git checkout feature/allfeats
When making a container for a tool which installs this tool from a git repo, if you're happy with the development state of the tool (or it is a well established tool), try to point the Dockerfile to a defined commit or release of the tool. This can be done like this:
On R:
R -e "library(devtools); install_github('jianlianggao/batman/batman',ref='c02ac5cf9206373d2dde1b8e12548964f8379627'); remove.packages('devtools')" && \
in which case we are pointing to a defined commit.
Getting particular files:
ENV WRAPPER_REVISION aebde21cd2c21a09f138abb48bea19325b91d304
RUN apt-get -y update && apt-get -y install --no-install-recommends curl zip && \
curl https://raw.githubusercontent.com/ISA-tools/mzml2isa-galaxy/$WRAPPER_REVISION/galaxy/mzml2isa/wrapper.py -o /usr/local/bin/wrapper.py && \
curl https://raw.githubusercontent.com/ISA-tools/mzml2isa-galaxy/$WRAPPER_REVISION/galaxy/mzml2isa/pub_role.loc -o /usr/local/bin/pub_role.loc && \
curl https://raw.githubusercontent.com/ISA-tools/mzml2isa-galaxy/$WRAPPER_REVISION/galaxy/mzml2isa/pub_role.loc -o /usr/local/bin/pub_status.loc && \
chmod a+x /usr/local/bin/wrapper.py && \
apt-get purge -y curl && \
apt-get autoremove -y && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
- Do not upgrade the base image, that is to be done by the maintainer of the image: don't do
apt-get upgrade
orapt-get dist-upgrade
. This is a docker best practice (not to do upgrades of the base image).
Funded by the EC Horizon 2020 programme, grant agreement number 654241 |
---|