Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better way to run AMD CI with different flavors (#26634) #29

Merged
merged 1 commit into from
Nov 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .github/workflows/self-push-amd-mi210-caller.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Self-hosted runner (AMD mi210 CI caller)

on:
workflow_run:
workflows: ["Self-hosted runner (push-caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- run_amd_push_ci_caller*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"

jobs:
run_amd_ci:
name: AMD mi210
if: (cancelled() != true) && ((github.event_name != 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_amd_push_ci_caller')))
uses: ./.github/workflows/self-push-amd.yml
with:
gpu_flavor: mi210
secrets: inherit
25 changes: 25 additions & 0 deletions .github/workflows/self-push-amd-mi250-caller.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Self-hosted runner (AMD mi250 CI caller)

on:
workflow_run:
workflows: ["Self-hosted runner (push-caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- run_amd_push_ci_caller*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"

jobs:
run_amd_ci:
name: AMD mi250
if: (cancelled() != true) && ((github.event_name != 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_amd_push_ci_caller')))
uses: ./.github/workflows/self-push-amd.yml
with:
gpu_flavor: mi250
secrets: inherit
31 changes: 9 additions & 22 deletions .github/workflows/self-push-amd.yml
Original file line number Diff line number Diff line change
@@ -1,21 +1,11 @@
name: Self-hosted runner AMD GPU (push)

on:
workflow_run:
workflows: ["Self-hosted runner (push-caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- ci_*
- ci-*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
repository_dispatch:
workflow_call:
inputs:
gpu_flavor:
required: true
type: string

env:
HF_HOME: /mnt/cache
Expand Down Expand Up @@ -45,8 +35,7 @@ jobs:
strategy:
matrix:
machine_type: [single-gpu, multi-gpu]
gpu_flavor: [mi210]
runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', '${{ matrix.gpu_flavor }}']
runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}']
container:
image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now
options: --device /dev/kfd --device /dev/dri --env HIP_VISIBLE_DEVICES --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
Expand All @@ -65,8 +54,7 @@ jobs:
strategy:
matrix:
machine_type: [single-gpu, multi-gpu]
gpu_flavor: [mi210]
runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', '${{ matrix.gpu_flavor }}']
runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}']
container:
image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now
options: --device /dev/kfd --device /dev/dri --env HIP_VISIBLE_DEVICES --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
Expand Down Expand Up @@ -164,8 +152,7 @@ jobs:
matrix:
folders: ${{ fromJson(needs.setup_gpu.outputs.matrix) }}
machine_type: [single-gpu, multi-gpu]
gpu_flavor: [mi210]
runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', '${{ matrix.gpu_flavor }}']
runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}']
container:
image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now
options: --device /dev/kfd --device /dev/dri --env HIP_VISIBLE_DEVICES --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
Expand Down Expand Up @@ -321,7 +308,7 @@ jobs:
CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }}
CI_SLACK_REPORT_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID_AMD }}
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
CI_EVENT: push
CI_EVENT: Push CI (AMD) - ${{ inputs.gpu_flavor }}
CI_TITLE_PUSH: ${{ github.event.head_commit.message }}
CI_TITLE_WORKFLOW_RUN: ${{ github.event.workflow_run.head_commit.message }}
CI_SHA: ${{ env.CI_SHA }}
Expand Down
7 changes: 5 additions & 2 deletions utils/notification_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -897,6 +897,9 @@ def prepare_reports(title, header, reports, to_truncate=True):
job_name_prefix = f"{framework} {version}"
elif ci_event.startswith("Nightly CI"):
job_name_prefix = "Nightly CI"
elif ci_event.startswith("Push CI (AMD) - "):
flavor = ci_event.replace("Push CI (AMD) - ", "")
job_name_prefix = f"AMD {flavor}"

for model in model_results.keys():
for artifact_path in available_artifacts[f"run_all_tests_gpu_{model}_test_reports"].paths:
Expand Down Expand Up @@ -962,7 +965,7 @@ def prepare_reports(title, header, reports, to_truncate=True):
"Torch CUDA extension tests": "run_tests_torch_cuda_extensions_gpu_test_reports",
}

if ci_event in ["push", "Nightly CI"] or ci_event.startswith("Past CI"):
if ci_event in ["push", "Nightly CI"] or ci_event.startswith("Past CI") or ci_event.startswith("Push CI (AMD)"):
del additional_files["Examples directory"]
del additional_files["PyTorch pipelines"]
del additional_files["TensorFlow pipelines"]
Expand Down Expand Up @@ -1027,6 +1030,6 @@ def prepare_reports(title, header, reports, to_truncate=True):
message = Message(title, ci_title, model_results, additional_results, selected_warnings=selected_warnings)

# send report only if there is any failure (for push CI)
if message.n_failures or ci_event != "push":
if message.n_failures or (ci_event != "push" and not ci_event.startswith("Push CI (AMD)")):
message.post()
message.post_reply()
Loading