Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPU runner for linux-aarch64 #289

Merged
merged 10 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/actions/test/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,13 @@ runs:
shell: bash --noprofile --norc -xeuo pipefail {0}
run: nvidia-smi

# The cache action needs this
- name: Install zstd
shell: bash --noprofile --norc -xeuo pipefail {0}
run: |
apt update
apt install zstd

- name: Download bindings build artifacts
uses: actions/download-artifact@v4
with:
Expand Down
12 changes: 7 additions & 5 deletions .github/workflows/gh-build-and-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,17 +76,19 @@ jobs:
test:
# TODO: improve the name once a separate test matrix is defined
name: Test (CUDA ${{ inputs.cuda-version }})
# TODO: enable testing once linux-aarch64 & win-64 GPU runners are up
# TODO: enable testing once win-64 GPU runners are up
if: ${{ (github.repository_owner == 'nvidia') &&
startsWith(inputs.host-platform, 'linux-x64') }}
startsWith(inputs.host-platform, 'linux') }}
permissions:
id-token: write # This is required for configure-aws-credentials
contents: read # This is required for actions/checkout
runs-on: ${{ (inputs.host-platform == 'linux-x64' && 'linux-amd64-gpu-v100-latest-1') }}
# TODO: use a different (nvidia?) container, or just run on bare image
runs-on: ${{ (inputs.host-platform == 'linux-x64' && 'linux-amd64-gpu-v100-latest-1') ||
(inputs.host-platform == 'linux-aarch64' && 'linux-arm64-gpu-a100-latest-1') }}
# Our self-hosted runners require a container
# TODO: use a different (nvidia?) container
container:
options: -u root --security-opt seccomp=unconfined --privileged --shm-size 16g
image: condaforge/miniforge3:latest
image: ubuntu:22.04
Comment on lines -89 to +91
Copy link
Member Author

@leofang leofang Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setup-python is being very restrictive, as it purposely locks in to Ubuntu:

(And for some reason, the miniforge3 container can still work w/ setup-python on linux-64 but not on linux-aarch64...)

Copy link
Collaborator

@jakirkham jakirkham Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added ARM support earlier this year: conda-incubator/setup-miniconda#331

So when you have a moment, could you please raise the Miniforge GHA specific issue

Edit: Or perhaps this is as simple as switching to conda-incubator/setup-miniconda

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the question is "why don't you use conda to set up a Python environment?" the answer is it's on our TODO list: #280. Contribution is more than welcome 😉

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw the issue flagged above and tried to help convert this into an upstream issue or provide a path forward

env:
NVIDIA_VISIBLE_DEVICES: ${{ env.NVIDIA_VISIBLE_DEVICES }}
needs:
Expand Down
Loading