Skip to content

Commit

Permalink
ARROW-11641: [CI] Use docker buildkit's inline cache to reuse build c…
Browse files Browse the repository at this point in the history
…ache across different hosts

Docker buildkit supports cache inlining, which means that the cache manifest gets bundled in the produced docker image. This helps us to avoid cache misses on different hosts, greatly improving the image reusability (regarding the build speed).

This PR passes `BUILDKIT_INLINE_CACHE` build argument by default. Also adds support for using `docker buildx build` command instead of `docker build` so we can have fine grained control over the build cache. Sadly the cache manifests are not widely supported among the popular docker registries, so use the former one by default.

Closes apache#9499 from kszucs/buildkit-inline-cache

Authored-by: Krisztián Szűcs <[email protected]>
Signed-off-by: Krisztián Szűcs <[email protected]>
  • Loading branch information
kszucs committed Feb 16, 2021
1 parent 96dbeec commit 96ff3b4
Show file tree
Hide file tree
Showing 17 changed files with 121 additions and 47 deletions.
6 changes: 6 additions & 0 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,12 @@
# the cache plugin functional
DOCKER_VOLUME_PREFIX=

# turn on inline build cache, this is a docker buildx feature documented
# at https://github.com/docker/buildx#--cache-tonametypetypekeyvalue
COMPOSE_DOCKER_CLI_BUILD=1
DOCKER_BUILDKIT=1
BUILDKIT_INLINE_CACHE=1

ULIMIT_CORE=-1
REPO=apache/arrow-dev
ARCH=amd64
Expand Down
4 changes: 1 addition & 3 deletions .github/workflows/cpp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,8 @@ on:
- 'format/Flight.proto'

env:
DOCKER_BUILDKIT: 0
DOCKER_VOLUME_PREFIX: ".docker/"
COMPOSE_DOCKER_CLI_BUILD: 1
ARROW_ENABLE_TIMING_TESTS: OFF
DOCKER_VOLUME_PREFIX: ".docker/"
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}

Expand Down
4 changes: 1 addition & 3 deletions .github/workflows/cpp_cron.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,8 @@ on:
0 */12 * * *
env:
DOCKER_BUILDKIT: 0
DOCKER_VOLUME_PREFIX: ".docker/"
COMPOSE_DOCKER_CLI_BUILD: 1
ARROW_ENABLE_TIMING_TESTS: OFF
DOCKER_VOLUME_PREFIX: ".docker/"
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}

Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,6 @@ on:
pull_request:

env:
DOCKER_BUILDKIT: 0
COMPOSE_DOCKER_CLI_BUILD: 1
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}

Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/go.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,6 @@ on:
- 'go/**'

env:
DOCKER_BUILDKIT: 0
COMPOSE_DOCKER_CLI_BUILD: 1
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}

Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,7 @@ on:
- 'rust/**'

env:
DOCKER_BUILDKIT: 0
DOCKER_VOLUME_PREFIX: ".docker/"
COMPOSE_DOCKER_CLI_BUILD: 1
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}

Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/java.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,7 @@ on:
- 'java/**'

env:
DOCKER_BUILDKIT: 0
DOCKER_VOLUME_PREFIX: ".docker/"
COMPOSE_DOCKER_CLI_BUILD: 1
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}

Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/java_jni.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,7 @@ on:
- 'java/**'

env:
DOCKER_BUILDKIT: 0
DOCKER_VOLUME_PREFIX: ".docker/"
COMPOSE_DOCKER_CLI_BUILD: 1
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}

Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/js.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,6 @@ on:
- 'js/**'

env:
DOCKER_BUILDKIT: 0
COMPOSE_DOCKER_CLI_BUILD: 1
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}

Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,7 @@ on:
- 'python/**'

env:
DOCKER_BUILDKIT: 0
DOCKER_VOLUME_PREFIX: ".docker/"
COMPOSE_DOCKER_CLI_BUILD: 1
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}

Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/python_cron.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,7 @@ on:
0 */12 * * *
env:
DOCKER_BUILDKIT: 0
DOCKER_VOLUME_PREFIX: ".docker/"
COMPOSE_DOCKER_CLI_BUILD: 1
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}

Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/r.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,7 @@ on:
- 'r/**'

env:
DOCKER_BUILDKIT: 0
DOCKER_VOLUME_PREFIX: ".docker/"
COMPOSE_DOCKER_CLI_BUILD: 1
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}

Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/ruby.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,7 @@ on:
- 'ruby/**'

env:
DOCKER_BUILDKIT: 0
DOCKER_VOLUME_PREFIX: ".docker/"
COMPOSE_DOCKER_CLI_BUILD: 1
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}

Expand Down
25 changes: 21 additions & 4 deletions dev/archery/archery/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -767,6 +767,12 @@ def docker_compose(obj, src, dry_run):
envvar='ARCHERY_USE_DOCKER_CLI',
help="Use docker CLI directly for building instead of calling "
"docker-compose. This may help to reuse cached layers.")
@click.option('--using-docker-buildx', default=False, is_flag=True,
envvar='ARCHERY_USE_DOCKER_BUILDX',
help="Use buildx with docker CLI directly for building instead "
"of calling docker-compose or the plain docker build "
"command. This option makes the build cache reusable "
"across hosts.")
@click.option('--use-cache/--no-cache', default=True,
help="Whether to use cache when building the image and its "
"ancestor images")
Expand All @@ -776,21 +782,23 @@ def docker_compose(obj, src, dry_run):
"image and its ancestors use --no-cache option.")
@click.pass_obj
def docker_compose_build(obj, image, *, force_pull, using_docker_cli,
use_cache, use_leaf_cache):
using_docker_buildx, use_cache, use_leaf_cache):
"""
Execute docker-compose builds.
"""
from .docker import UndefinedImage

compose = obj['compose']

using_docker_cli |= using_docker_buildx
try:
if force_pull:
compose.pull(image, pull_leaf=use_leaf_cache,
using_docker=using_docker_cli)
compose.build(image, use_cache=use_cache,
use_leaf_cache=use_leaf_cache,
using_docker=using_docker_cli)
using_docker=using_docker_cli,
using_buildx=using_docker_buildx)
except UndefinedImage as e:
raise click.ClickException(
"There is no service/image defined in docker-compose.yml with "
Expand All @@ -817,6 +825,12 @@ def docker_compose_build(obj, image, *, force_pull, using_docker_cli,
envvar='ARCHERY_USE_DOCKER_CLI',
help="Use docker CLI directly for building instead of calling "
"docker-compose. This may help to reuse cached layers.")
@click.option('--using-docker-buildx', default=False, is_flag=True,
envvar='ARCHERY_USE_DOCKER_BUILDX',
help="Use buildx with docker CLI directly for building instead "
"of calling docker-compose or the plain docker build "
"command. This option makes the build cache reusable "
"across hosts.")
@click.option('--use-cache/--no-cache', default=True,
help="Whether to use cache when building the image and its "
"ancestor images")
Expand All @@ -828,7 +842,8 @@ def docker_compose_build(obj, image, *, force_pull, using_docker_cli,
help="Set volume within the container")
@click.pass_obj
def docker_compose_run(obj, image, command, *, env, user, force_pull,
force_build, build_only, using_docker_cli, use_cache,
force_build, build_only, using_docker_cli,
using_docker_buildx, use_cache,
use_leaf_cache, volume):
"""Execute docker-compose builds.
Expand Down Expand Up @@ -863,6 +878,7 @@ def docker_compose_run(obj, image, command, *, env, user, force_pull,
from .docker import UndefinedImage

compose = obj['compose']
using_docker_cli |= using_docker_buildx

env = dict(kv.split('=', 1) for kv in env)
try:
Expand All @@ -872,7 +888,8 @@ def docker_compose_run(obj, image, command, *, env, user, force_pull,
if force_build:
compose.build(image, use_cache=use_cache,
use_leaf_cache=use_leaf_cache,
using_docker=using_docker_cli)
using_docker=using_docker_cli,
using_buildx=using_docker_buildx)
if build_only:
return
compose.run(
Expand Down
58 changes: 46 additions & 12 deletions dev/archery/archery/docker.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@

import os
import re
import shlex
import subprocess
from io import StringIO

Expand All @@ -43,6 +42,12 @@ def flatten(node, parents=None):
raise TypeError(node)


def _sanitize_command(cmd):
if isinstance(cmd, list):
cmd = " ".join(cmd)
return re.sub(r"\s+", " ", cmd)


class UndefinedImage(Exception):
pass

Expand Down Expand Up @@ -224,9 +229,13 @@ def _pull(service):
_pull(service)

def build(self, service_name, use_cache=True, use_leaf_cache=True,
using_docker=False):
using_docker=False, using_buildx=False):
def _build(service, use_cache):
args = ['build']
if 'build' not in service:
# nothing to do
return

args = []
cache_from = list(service.get('build', {}).get('cache_from', []))
if use_cache:
for image in cache_from:
Expand All @@ -240,10 +249,36 @@ def _build(service, use_cache):
else:
args.append('--no-cache')

if using_docker:
if 'build' not in service:
# nothing to do
return
# turn on inline build cache, this is a docker buildx feature
# used to bundle the image build cache to the pushed image manifest
# so the build cache can be reused across hosts, documented at
# https://github.com/docker/buildx#--cache-tonametypetypekeyvalue
if self.config.env.get('BUILDKIT_INLINE_CACHE') == '1':
args.extend(['--build-arg', 'BUILDKIT_INLINE_CACHE=1'])

if using_buildx:
for k, v in service['build'].get('args', {}).items():
args.extend(['--build-arg', '{}={}'.format(k, v)])

if use_cache:
cache_ref = '{}-cache'.format(service['image'])
cache_from = 'type=registry,ref={}'.format(cache_ref)
cache_to = (
'type=registry,ref={},mode=max'.format(cache_ref)
)
args.extend([
'--cache-from', cache_from,
'--cache-to', cache_to,
])

args.extend([
'--output', 'type=docker',
'-f', service['build']['dockerfile'],
'-t', service['image'],
service['build'].get('context', '.')
])
self._execute_docker("buildx", "build", *args)
elif using_docker:
# better for caching
for k, v in service['build'].get('args', {}).items():
args.extend(['--build-arg', '{}={}'.format(k, v)])
Expand All @@ -254,9 +289,9 @@ def _build(service, use_cache):
'-t', service['image'],
service['build'].get('context', '.')
])
self._execute_docker(*args)
self._execute_docker("build", *args)
else:
self._execute_compose(*args, service['name'])
self._execute_compose("build", *args, service['name'])

service = self.config.get(service_name)
# build ancestor services
Expand Down Expand Up @@ -313,10 +348,9 @@ def run(self, service_name, command=None, *, env=None, volumes=None,
args.append(command)
else:
# replace whitespaces from the preformatted compose command
cmd = shlex.split(service.get('command', ''))
cmd = [re.sub(r"\s+", " ", token) for token in cmd]
cmd = _sanitize_command(service.get('command', ''))
if cmd:
args.extend(cmd)
args.append(cmd)

# execute as a plain docker cli command
self._execute_docker('run', '--rm', *args)
Expand Down
24 changes: 20 additions & 4 deletions dev/archery/archery/tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,11 @@ def test_docker_run_with_custom_command(run, build, pull):
"ubuntu-cpp", pull_leaf=True, using_docker=False
)
build.assert_called_once_with(
"ubuntu-cpp", use_cache=True, use_leaf_cache=True, using_docker=False
"ubuntu-cpp",
use_cache=True,
use_leaf_cache=True,
using_docker=False,
using_buildx=False
)
run.assert_called_once_with(
"ubuntu-cpp",
Expand Down Expand Up @@ -73,7 +77,11 @@ def test_docker_run_options(run, build, pull):
"ubuntu-cpp", pull_leaf=True, using_docker=False
)
build.assert_called_once_with(
"ubuntu-cpp", use_cache=True, use_leaf_cache=True, using_docker=False
"ubuntu-cpp",
use_cache=True,
use_leaf_cache=True,
using_docker=False,
using_buildx=False
)
run.assert_called_once_with(
"ubuntu-cpp",
Expand Down Expand Up @@ -113,7 +121,11 @@ def test_docker_run_only_pulling_and_building(build, pull):
"ubuntu-cpp", pull_leaf=True, using_docker=False
)
build.assert_called_once_with(
"ubuntu-cpp", use_cache=True, use_leaf_cache=True, using_docker=False
"ubuntu-cpp",
use_cache=True,
use_leaf_cache=True,
using_docker=False,
using_buildx=False
)


Expand All @@ -134,7 +146,11 @@ def test_docker_run_without_build_cache(run, build):
result = CliRunner().invoke(archery, args)
assert result.exit_code == 0
build.assert_called_once_with(
"ubuntu-cpp", use_cache=False, use_leaf_cache=False, using_docker=False
"ubuntu-cpp",
use_cache=False,
use_leaf_cache=False,
using_docker=False,
using_buildx=False
)
run.assert_called_once_with(
"ubuntu-cpp",
Expand Down
Loading

0 comments on commit 96ff3b4

Please sign in to comment.