Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a workflow to build torchtitan-ubuntu-20.04-clang12 Docker image for CI #338

Merged
merged 6 commits into from
May 16, 2024

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented May 16, 2024

Adopt from PyTorch, this workflow will prepare the Docker image torchtitan-ubuntu-20.04-clang12 for the CI.

torchtitan-ubuntu-20.04-clang12 can then be used as the input for docker-image.

@huydhn huydhn requested a review from wconstab May 16, 2024 02:03
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 16, 2024
@wconstab
Copy link
Contributor

Looking at the CI results from this PR, it looks like it spent 6 minutes in 'calculate docker image' step. If I look inside there it looks like it's building the image. I guess that's automated such that it would build a new image automatically if it detected a change in the build scripts (e.g. if requirements.txt got updated), but then it hits the cache and skips the build time normally?

@wconstab
Copy link
Contributor

Re-running the job now to observe how it behaves on the second run.

@wconstab
Copy link
Contributor

on rerun, i do see the 'docker build' step go down from 6 min to 1 sec, so that's great!

image

but the docker pull step is taking over 3min, yesterday I was seeing 1m30s roughly, for the pytorch base image you had sent me. I wonder if this is something related to docker cache- should we expect the pull step to decrease once the same runner is used a second time and its cache is warm? if so then i think this will be alright.

@huydhn
Copy link
Contributor Author

huydhn commented May 16, 2024

Looking at the CI results from this PR, it looks like it spent 6 minutes in 'calculate docker image' step. If I look inside there it looks like it's building the image. I guess that's automated such that it would build a new image automatically if it detected a change in the build scripts (e.g. if requirements.txt got updated), but then it hits the cache and skips the build time normally?

Yup, you're right.

  • The step will build a new image if it needs. Otherwise, it will download the matching existing images. Let me retry the job now, and the cached image should be used instead. (You have already done that)
  • The time to download the docker image also varies because of the local docker cache on the runner. So it's a bit hard to say. I pull a random CUDA job on PyTorch that uses pytorch-linux-focal-cuda12.1-cudnn8-py3-gcc9 https://github.com/pytorch/pytorch/actions/runs/9067872180/job/24927331103?pr=125921#step:8:1, and it shows 4m49s. The image here is smaller, so I assume it would be faster to download in the long run

@wconstab wconstab merged commit 847189d into main May 16, 2024
5 checks passed
@wconstab wconstab deleted the add-docker-builds-workflow branch May 16, 2024 20:58
tianyu-l pushed a commit that referenced this pull request May 28, 2024
…for CI (#338)

Adopt from PyTorch, this workflow will prepare the Docker image
`torchtitan-ubuntu-20.04-clang12` for the CI.

* Base on
https://hub.docker.com/layers/nvidia/cuda/12.1.0-cudnn8-runtime-ubuntu20.04/images/sha256-35d5a8eb50ad37fe707a7611a4e20414c5bd2f168adca0cf1700fe2d58411759
to include NVIDIA dependencies.
* Install `dev-requirements.txt` and `requirements.txt`. I need to move
these files from the top level to `.ci/docker` directory and create
softlinks for them because docker build process will only take a look at
`.ci/docker`. This is the reason why PyTorch keeps its CI requirements
file there.
* Install clang or gcc
* Install conda (with python 3.11)

`torchtitan-ubuntu-20.04-clang12` can then be used as the input for
`docker-image`.
tianyu-l pushed a commit to tianyu-l/torchtitan_intern24 that referenced this pull request Aug 16, 2024
…for CI (pytorch#338)

Adopt from PyTorch, this workflow will prepare the Docker image
`torchtitan-ubuntu-20.04-clang12` for the CI.

* Base on
https://hub.docker.com/layers/nvidia/cuda/12.1.0-cudnn8-runtime-ubuntu20.04/images/sha256-35d5a8eb50ad37fe707a7611a4e20414c5bd2f168adca0cf1700fe2d58411759
to include NVIDIA dependencies.
* Install `dev-requirements.txt` and `requirements.txt`. I need to move
these files from the top level to `.ci/docker` directory and create
softlinks for them because docker build process will only take a look at
`.ci/docker`. This is the reason why PyTorch keeps its CI requirements
file there.
* Install clang or gcc
* Install conda (with python 3.11)

`torchtitan-ubuntu-20.04-clang12` can then be used as the input for
`docker-image`.
tianyu-l pushed a commit that referenced this pull request Aug 16, 2024
…for CI (#338)

Adopt from PyTorch, this workflow will prepare the Docker image
`torchtitan-ubuntu-20.04-clang12` for the CI.

* Base on
https://hub.docker.com/layers/nvidia/cuda/12.1.0-cudnn8-runtime-ubuntu20.04/images/sha256-35d5a8eb50ad37fe707a7611a4e20414c5bd2f168adca0cf1700fe2d58411759
to include NVIDIA dependencies.
* Install `dev-requirements.txt` and `requirements.txt`. I need to move
these files from the top level to `.ci/docker` directory and create
softlinks for them because docker build process will only take a look at
`.ci/docker`. This is the reason why PyTorch keeps its CI requirements
file there.
* Install clang or gcc
* Install conda (with python 3.11)

`torchtitan-ubuntu-20.04-clang12` can then be used as the input for
`docker-image`.
philippguevorguian pushed a commit to YerevaNN/YNNtitan that referenced this pull request Aug 17, 2024
…for CI (pytorch#338)

Adopt from PyTorch, this workflow will prepare the Docker image
`torchtitan-ubuntu-20.04-clang12` for the CI.

* Base on
https://hub.docker.com/layers/nvidia/cuda/12.1.0-cudnn8-runtime-ubuntu20.04/images/sha256-35d5a8eb50ad37fe707a7611a4e20414c5bd2f168adca0cf1700fe2d58411759
to include NVIDIA dependencies.
* Install `dev-requirements.txt` and `requirements.txt`. I need to move
these files from the top level to `.ci/docker` directory and create
softlinks for them because docker build process will only take a look at
`.ci/docker`. This is the reason why PyTorch keeps its CI requirements
file there.
* Install clang or gcc
* Install conda (with python 3.11)

`torchtitan-ubuntu-20.04-clang12` can then be used as the input for
`docker-image`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants