Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Docker and Dev Container setup using Buildkit #4392

Draft
wants to merge 66 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 65 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
063ab99
Comment out old CI for migration
ruffsl Jun 3, 2024
a4dff33
Simplify dockerignore by inverting matching set
ruffsl Jun 3, 2024
b070400
Refactor Dockerfile using Buildkit and bake files
ruffsl Jun 3, 2024
7a09d15
Change COLCON_HOME in Dockerfile for now
ruffsl Jun 3, 2024
cdbe165
Refactor Dev Container using Buildkit and bake files
ruffsl Jun 3, 2024
acec643
Refactor Action Workflows using Buildkit and bake files
ruffsl Jun 4, 2024
b0dd131
Comment out CI trigger while WIP
ruffsl Jun 4, 2024
35d28c7
Rollback RunsOn cache action to use GitHub's cache
ruffsl Jun 4, 2024
e78549e
Roll back RunsOn runner tags
ruffsl Jun 4, 2024
e9c2769
Roll back AWS ECR changes to use GHCR
ruffsl Jun 4, 2024
44c45d4
Update retention to use sanitized tag names
ruffsl Jun 4, 2024
f27f3a2
Add debugger and releaser bake targets
ruffsl Jun 4, 2024
37887b9
Clean ECR comments
ruffsl Jun 4, 2024
dc16919
Remove debugging command for test workflow
ruffsl Jun 4, 2024
b4ff763
Move docker stuff into .docker path
ruffsl Jun 8, 2024
2b0bae0
Update docker paths
ruffsl Jun 8, 2024
667a8ff
Mount user home as volume
ruffsl Jun 8, 2024
18bf8a4
vcs import from underlay.repos file
ruffsl Jun 13, 2024
ba682a1
Simplify given cache should only be saved
ruffsl Jul 6, 2024
a6497ad
Use separate cache step to save via if always
ruffsl Jul 6, 2024
c1a9aab
Update docker bake action from v4 to v5
ruffsl Jul 6, 2024
76110c0
Fix FromAsCasing warnings
ruffsl Jul 6, 2024
88e34e1
Merge remote-tracking branch 'origin/main' into buildkit
ruffsl Jul 6, 2024
a2652cd
Set ccache key via septate step
ruffsl Jul 6, 2024
cee6ac0
Pass ccache_cache_key between jobs
ruffsl Jul 6, 2024
3df9fb0
Restore ccache for build prod image jobs
ruffsl Jul 6, 2024
248b33c
Update path to Dockerfile
ruffsl Jul 6, 2024
5736762
Uncomment main caller workflow
ruffsl Jul 6, 2024
df9842a
Trigger on any change to the .docker path
ruffsl Jul 6, 2024
eb26b41
Change workflow action to current branch to test CI
ruffsl Jul 6, 2024
894126c
Fix typo to use GITHUB_TOKEN
ruffsl Jul 6, 2024
aef79ad
Omit OCI configuration used for AWS ECR
ruffsl Jul 6, 2024
5d8c00f
Image images using org + repo name
ruffsl Jul 6, 2024
ae8a274
Use gha for buildkit cache backend instead of S3
ruffsl Jul 6, 2024
de79d04
Bake underlay source into base image
ruffsl Jul 6, 2024
45f00f2
Revert "Bake underlay source into base image"
ruffsl Jul 6, 2024
140d1b3
Add nav2_minimal_turtlebot_simulation as submodules
ruffsl Jul 6, 2024
27e8ceb
Remove clone step using vcstool
ruffsl Jul 6, 2024
54f57aa
Fix OVERLAY_WS ENV to match Dockerfile
ruffsl Jul 7, 2024
c09f746
Try building prod image regardless of test results
ruffsl Jul 7, 2024
5121a30
Enable docker-outside-docker
ruffsl Jul 9, 2024
2c8bcc3
Reorder mounts
ruffsl Jul 9, 2024
65c0194
Enable gh CLI
ruffsl Jul 9, 2024
f22b3cb
Add svg extension to view Dockerfile graphs
ruffsl Jul 9, 2024
50b8f7c
Rename nav2 ws volume to be more descriptive
ruffsl Jul 9, 2024
bab8da7
Postfix home by use
ruffsl Jul 9, 2024
7af09f7
Add bind volume to user home
ruffsl Jul 9, 2024
922b91d
Comment out home bind mount by default
ruffsl Jul 9, 2024
ea6f72d
Add mark comments for readability
ruffsl Jul 9, 2024
93d7a92
Sort ENVs
ruffsl Jul 9, 2024
0de7df3
Explicitly set SSH_AUTH_SOCK for devcontainer CLI
ruffsl Jul 9, 2024
94faf5b
Alway return 1
ruffsl Jul 9, 2024
00fdbc8
Formatting
ruffsl Jul 9, 2024
39ca865
Install GUI tools into dever stage
ruffsl Jul 9, 2024
6126f1f
Use always() in if condition
ruffsl Jul 10, 2024
75b849b
Add shim to build dever stage from debugger image
ruffsl Jul 10, 2024
090be7f
Fix src folder to use full repo name
ruffsl Jul 10, 2024
fd84091
Remove --symlink-install from default script
ruffsl Jul 10, 2024
23fe6ec
Simplify by removing unnecessary unset
ruffsl Jul 10, 2024
17b9464
Correct comment
ruffsl Jul 10, 2024
3fd241a
Add default gitconfig to recurse over submodules
ruffsl Jul 10, 2024
e3501fe
Add readme with quick start guide
ruffsl Jul 10, 2024
1081d47
Add alias to source underlay workspace
ruffsl Jul 10, 2024
c695baa
Extend docs on build locally or pulling remotely
ruffsl Jul 10, 2024
0314c34
Fix typo
ruffsl Jul 10, 2024
73e8ed8
Update .devcontainer/README.md
ruffsl Dec 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 35 additions & 35 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -536,38 +536,38 @@ _parameters:

workflows:
version: 2
build_and_test:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all the circle stuff just be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll migrate the CircleCI workflow too, so that we can still have both while we iron things out. But I think I'll have the GitHub action's checkout job trigger CircleCI for now, instead of CircleCI being directly triggered by GitHub events, so that CircleCI can use the same base image as the rest of the Github workflows.

jobs:
- release_build:
<<: *release_parameters
- system_build:
requires:
- release_build
- release_test:
requires:
- system_build
cache_test: true
nightly:
jobs:
- release_build:
<<: *release_parameters
- system_build:
requires:
- release_build
- release_test:
requires:
- system_build
<<: *release_parameters
matrix:
alias: release_test
parameters:
rmw:
- rmw_cyclonedds_cpp
- rmw_fastrtps_cpp
triggers:
- schedule:
cron: "0 13 * * *"
filters:
branches:
only:
- main
# build_and_test:
# jobs:
# - release_build:
# <<: *release_parameters
# - system_build:
# requires:
# - release_build
# - release_test:
# requires:
# - system_build
# cache_test: true
# nightly:
# jobs:
# - release_build:
# <<: *release_parameters
# - system_build:
# requires:
# - release_build
# - release_test:
# requires:
# - system_build
# <<: *release_parameters
# matrix:
# alias: release_test
# parameters:
# rmw:
# - rmw_cyclonedds_cpp
# - rmw_fastrtps_cpp
# triggers:
# - schedule:
# cron: "0 13 * * *"
# filters:
# branches:
# only:
# - main
149 changes: 149 additions & 0 deletions .devcontainer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Dev Containers

This folder contains the necessary files to build and run development containers for the Navigation2 project. The containers are based on the ROS 2 Rolling distribution and include all necessary dependencies to build and run the Navigation2 stack.
ruffsl marked this conversation as resolved.
Show resolved Hide resolved

## Quick Start

To get started, follow the instructions below.

### Prerequisites

First, ensure your using a recent enough version of Docker Engine that supports [BuildKit](https://docs.docker.com/build/buildkit/). If you plan on running robot simulations locally, Hardware Acceleration for sensor raytracing and 3D rendering is also recommended. While other compatible devcontainer tools may be used, Visual Studio Code is recommended for simplicity.

#### System Software
- [Docker Engine](https://docs.docker.com/engine/install/)
- https://get.docker.com - simple universal install script
- [Linux post-installation](https://docs.docker.com/engine/install/linux-postinstall/) - manage Docker as a non-root user
- [Git LFS](https://git-lfs.github.com/) - optional for managing large assets
- Use for version controlling media such as figures
- [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) - optional for enabling Hardware Acceleration
- [Installing the Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) - only necessary host running Docker Engine

#### Development Tools
- [Visual Studio Code](https://code.visualstudio.com/) - alternative to Dev Containers CLI
- [Remote Development](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.vscode-remote-extensionpack) - via SSH, Tunnels, Containers, WSL
- [Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) - specific to just Containers
- [Docker extension](https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-docker) - for introspecting Docker daemon
- [Using SSH keys](https://code.visualstudio.com/remote/advancedcontainers/sharing-git-credentials#_using-ssh-keys) - sharing Git credentials with container
- [Dev Container CLI](https://github.com/devcontainers/cli) - alternative to VSCode
- [Installation via NPM](https://github.com/devcontainers/cli?tab=readme-ov-file#npm-install) - for custom install setup
- [Installation via VSCode](https://code.visualstudio.com/docs/devcontainers/devcontainer-cli) - for simple install setup
- Note: CLI installed via VSCode is warped but bugged, install via NPM is recommended for now
- https://github.com/devcontainers/cli/issues/799
- [GitHub CLI](https://cli.github.com/) - optional for interacting with CI Workflows
- [Installation](https://github.com/cli/cli#installation) - specifically [for Linux](https://github.com/cli/cli/blob/trunk/docs/install_linux.md)
- [Configuration](https://cli.github.com/manual/) - login authentication and setup

### Environment Setup

Once you've setup your ssh keys for GitHub and account credentials for the CLI, you can add your private SSH key to the SSH agent if you'd like to use your local credentials for Git operations inside the dev container:

```shell
# Add your SSH key to the SSH agent
ssh-add ~/.ssh/<github-key>
```

You may also configure Docker to use those credentials when interacting with the GitHub Container Registry (GHCR). This setup is optional, but can be useful to avoid rate limiting issues when pulling images from a popular public IP address. This can be done by using docker login with the GitHub CLI:

```shell
# Login to GitHub Container Registry
gh auth token | docker login ghcr.io --username <github-username> --password-stdin
```

### Submodule Setup

Next, recursively clone this repository and included submodules.

```shell
# Clone the repository and submodules
git clone --recurse-submodules -j8 \
[email protected]:ros-navigation/navigation2.git

# Change into the repository directory
cd navigation2

# Configure the local git include path
git config --local include.path ../.gitconfig
```

### Images Setup

To create the base image for the container, one can either build the image locally, or pull CI image layers cached remotely. If you need to modify the base image during the development process, such as adding or removing dependencies, you can build images locally before opening any PRs. If you want to evaluate modifications to the base image during the review process, you could then also pull pre-built CI images from an opened PR.

#### Building Locally

Building locally leverages various caching strategies to speedup subsequent docker rebuilds, such as caching apt package downloads, incremental build artifacts, and multi-stage optimizations for image layer reuse. You can either let the dev container's initialization lifecycle script build the image for you, or you can pre-build the image manually:

```shell
# Bake the dever image tag as a test
docker buildx bake dever

# Run container from image as a test
docker run -it --rm nav2:dever bash
```

#### Pulling Remotely

Alternatively, you can pull the CI image layers from the GitHub Container Registry (GHCR) to bootstrap the dev container. While this method may be faster or slower depending on your local network connection, it remains particularly useful for downloading pre-built colcon workspaces built by CI and baked into the debugger stage, avoiding the need to build the image layers or re-compile a PR locally. Simply use the `DEV_FROM_STAGE` environment variable to shortcut the build process to use a given reference image, as commented from the initialization script:

```shell
REFERENCE_IMAGE=ghcr.io/ros-navigation/navigation2:main-debugger
docker pull $REFERENCE_IMAGE
export DEV_FROM_STAGE=$REFERENCE_IMAGE
```

> Note: you may want to clean or create new named volumes as defined in the dev container config, given cached volumes only initialize from their first attached container. I.e. to switch between different PRs, you'll want to avoid using a named volume that was seeded from a different/older docker image.

### Launching Development Containers

Note: using Dev Containers from a remote host is also possible:

- [Open a folder on a remote SSH host in a container](https://code.visualstudio.com/docs/devcontainers/containers#_open-a-folder-on-a-remote-ssh-host-in-a-container)
- [Open a folder on a remote Tunnel host in a container](https://code.visualstudio.com/docs/devcontainers/containers#_open-a-folder-on-a-remote-tunnel-host-in-a-container)

#### Visual Studio Code
Finally, open VSCode and use the Remote Containers extension:

```shell
code .
# Press Ctrl+Shift+P to open the Command Palette
# Type and select `Dev Containers: Reopen in Container`
```

#### Dev Containers CLI
Alternatively, use the CLI to bring up and exec into the Dev Container:

```shell
devcontainer up --workspace-folder .
devcontainer exec --workspace-folder . bash
```

### Verifying Development Containers

To verify the dev container is setup correctly, i.e. hardware acceleration and display forwarding is working as expected, you can run simulation examples to check:

```shell
# Included alias to source overlay workspace
sows
# Launch simulation example with GUIs enabled
ros2 launch nav2_bringup tb4_simulation_launch.py headless:=False
```

## Further Reading and Concepts

Afterwards, you may want to further familiarize yourself more with the following topics:

- Git Submodules
- https://git-scm.com/book/en/Git-Tools-Submodules
- https://git-scm.com/docs/git-submodule
- Docker
- Multi-stage
- https://docs.docker.com/build/building/multi-stage/
- BuildKit
- https://docs.docker.com/build/buildkit/
- Bake
- https://docs.docker.com/build/bake/
- Development Containers
- https://navigation.ros.org/development_guides/devcontainer_docs/index.html
- https://containers.dev/
- https://code.visualstudio.com/docs/devcontainers/containers
88 changes: 73 additions & 15 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -1,44 +1,100 @@
{
"name": "Nav2",
"build": {
"dockerfile": "../Dockerfile",
"context": "..",
"target": "dever",
"cacheFrom": "ghcr.io/ros-navigation/navigation2:main"
},
"initializeCommand": ".devcontainer/initialize-command.sh dever", // Bakes to tag nav2:devcontainer
"image": "nav2:devcontainer",
"runArgs": [
"--name=nav2"
// "--cap-add=SYS_PTRACE", // enable debugging, e.g. gdb
// "--ipc=host", // shared memory transport with host, e.g. rviz GUIs
// "--network=host", // network access to host interfaces, e.g. eth0
// "--pid=host", // DDS discovery with host, without --network=host
// "--privileged", // device access to host peripherals, e.g. USB
// "--security-opt=seccomp=unconfined", // enable debugging, e.g. gdb
// "--device=/dev/dri", // enable Intel integrated graphics
// "--ulimit", "nofile=1024:4096", // increase file descriptor limit for valgrind
//
"--runtime=nvidia", // enable NVIDIA Container Toolkit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if no NV GPU exists?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the user should comment out this device option and use the appropriate command for their local hardware, like --device=/dev/dri for Intel integrated graphics. Nvidia is just enabled by default as it's so common in robotics and AI development on linux (my own bias). We could leave all hardware acceleration options commented out by default instead, just a minor inconveniences to me.

"--env=NVIDIA_VISIBLE_DEVICES=all", // enable GPUs with env as --gpus doesn't parse nicely
"--env=NVIDIA_DRIVER_CAPABILITIES=all", // enable all capabilities, including `graphics`
],
"workspaceFolder": "/opt/overlay_ws/src/navigation2",
"workspaceFolder": "/opt/nav2_ws/src/navigation2",
"workspaceMount": "source=${localWorkspaceFolder},target=${containerWorkspaceFolder},type=bind",
"onCreateCommand": ".devcontainer/on-create-command.sh",
"updateContentCommand": ".devcontainer/update-content-command.sh",
"postCreateCommand": ".devcontainer/post-create-command.sh",
"remoteEnv": {
"CCACHE_DIR": "/opt/nav2_ws/.ccache",
// Explicitly set DISPLAY for NVIDIA Container Toolkit
"DISPLAY": "${localEnv:DISPLAY}",
"OVERLAY_MIXINS": "release ccache lld",
"CCACHE_DIR": "/tmp/.ccache"
// Explicitly set SSH_AUTH_SOCK for devcontainer CLI
"SSH_AUTH_SOCK": "${localEnv:SSH_AUTH_SOCK}",
},
"remoteUser": "ubuntu",
"mounts": [
// ################################################################################
// # MARK: Cache mounts - for development
// ################################################################################
{
// Cache apt downloads
"source": "apt-cache",
"target": "/var/cache/apt",
"type": "volume"
},
{
// Cache ccache caches
"source": "ccache",
"target": "/opt/nav2_ws/.ccache",
"type": "volume"
},
{
"source": "ccache-${devcontainerId}",
"target": "/tmp/.ccache",
// Cache colcon workspace
"source": "nav2-ws-${devcontainerId}",
"target": "/opt/nav2_ws",
"type": "volume"
},
// ################################################################################
// # MARK: Personal mounts - for convenience
// ################################################################################
{
"source": "overlay-${devcontainerId}",
"target": "/opt/overlay_ws",
// Mount home dotfiles
"source": "nav2-home-${localEnv:USER}",
"target": "/home/ubuntu",
"type": "volume"
},
// {
// // Mount home nav2
// "source": "${localEnv:HOME}/nav2/",
// "target": "/home/ubuntu/nav2",
// "type": "bind"
// },
// ################################################################################
// # MARK: Socket mounts - for tooling
// ################################################################################
{
// Mount docker socket
"source": "/var/run/docker.sock",
"target": "/var/run/docker-host.sock",
"type": "bind"
},
{
// Explicitly mount X11 socket for NVIDIA Container Toolkit
// as setting NVIDIA_DRIVER_CAPABILITIES to include `graphics`
// interferes with VSCode's default X11 forwarding behavior
"source": "/tmp/.X11-unix",
"target": "/tmp/.X11-unix",
"type": "bind"
},
{
// Explicitly mount SSH socket for devcontainer CLI
"source": "${localEnv:SSH_AUTH_SOCK}",
"target": "${localEnv:SSH_AUTH_SOCK}",
"type": "bind"
}
],
"features": {
// "ghcr.io/devcontainers/features/desktop-lite:1": {},
"ghcr.io/devcontainers/features/github-cli:1": {}
"ghcr.io/devcontainers/features/docker-outside-of-docker:1": {},
"ghcr.io/devcontainers/features/github-cli:1": {},
},
"customizations": {
"codespaces": {
Expand All @@ -53,9 +109,11 @@
"eamodio.gitlens",
"esbenp.prettier-vscode",
"GitHub.copilot",
"hashicorp.hcl",
"ms-azuretools.vscode-docker",
"ms-iot.vscode-ros",
"streetsidesoftware.code-spell-checker",
"twxs.cmake"
"vitaliymaz.vscode-svg-previewer",
]
}
}
Expand Down
26 changes: 26 additions & 0 deletions .devcontainer/initialize-command.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

# Immediately catch all errors
set -eo pipefail

# Uncomment for debugging
# set -x
# env

# Use first argument as target name
target=$1

################################################################################
# MARK: Pull image - download image from CI and GHCR for local dev container
# REFERENCE_IMAGE=ghcr.io/ros-navigation/navigation2:main-debugger
# docker pull $REFERENCE_IMAGE
# export DEV_FROM_STAGE=$REFERENCE_IMAGE
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To create a dev container using the CI images from this PR, including a pre-built colcon workspace, simply uncomment the lines above and change the following tagname to match the current branch before using dev container tooling to create the container.

-REFERENCE_IMAGE=ghcr.io/ros-navigation/navigation2:main-debugger
+REFERENCE_IMAGE=ghcr.io/ros-navigation/navigation2:buildkit-debugger

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more information in getting started, check the included .devcontainer/README.md for details.

################################################################################

# Bake the target and export locally to static tag
docker buildx bake --load \
--file docker-bake.hcl \
--set $target.tags=nav2:devcontainer \
$target

# mkdir -p $HOME/nav2
Loading
Loading