We aim to make the outputs of linuxkit build
reproducible, i.e. the
build artefacts should be bit-by-bit identical copies if invoked with
the same inputs and run with the same version of the linuxkit
command. See this
document on why this
matters.
Note, we do not (yet) aim to make linuxkit pkg build
builds
reproducible.
Currently, the following output formats provide reproducible builds:
tar
(Tested as part of the CI)tar-kernel-initrd
docker
kernel+initrd
(Tested as part of the CI)
In general, linuxkit build
lends itself for reproducible
builds. LinuxKit packages, used during linuxkit build
, are (signed)
docker images. Packages are tagged with the content hash of the source
code (and optionally release version) and are typically only updated
if the source of the package changed (in which case the tag
changes). For all intents and purposes, when pulled by tag, the
contents of a packages should be bit-by-bit identical. Alternatively,
the digest of the package, in which case, the pulled image will always
be the same.
The first phase of the linuxkit build
mostly untars and retars the
images of the packages to produce an tar file of the root filesystem.
This then serves as input for other output formats. During this first
phase, there are a number of things to watch out for to generate
reproducible builds:
- Timestamps of generated files. The
docker export
command, as well aslinuxkit build
itself, creates a small number of files. TheModTime
for these files needs to be clamped to a fixed date (otherwise the current time is used). Use thedefaultModTime
variable to set theModTime
of created files to a specific time. - Generated JSON files.
linuxkit build
generates a number of JSON files by marshalling Gostruct
variables. Examples are the OCI specificationconfig.json
andruntime.json
files for containers. The default Gojson.Marshal()
function seems to do a reasonable good job in generating reproducible output from internal structures, including for JSON objects. However, duringlinuxkit build
some of the OCI runtime spec fields are generated/modified and care must be taken to ensure consistent ordering. For JSON arrays (Go slices) it is best to sort them before Marshalling them.
Reproducible builds for the first phase of linuxkit build
can be
tested using -output tar
and comparing the output of subsequent
builds with tools like diff
or the excellent
diffoscope
.
The second phase of linuxkit build
converts the intermediary tar
format into the desired output format. Making this phase reproducible
depends on the tools used to generate the output.
Builds, which produce ISO formats should probably be converted to use
go-diskfs
before attempting
to make them reproducible.
For ideas on how to make the builds for other output formats reproducible, see this page.