Skip to content

[Nightly] Enhance XPU test workflows #1723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 44 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
c4a5df7
remove activate oneapi in microbenchmark to use pypi packages
Jun 3, 2025
1649d91
fix windows nightly issue
Jun 4, 2025
6e1eaa0
ut timeout
Jun 4, 2025
e88956d
deps installation
Jun 4, 2025
4653aa8
Merge branch 'main' into mengfeil/enhance-test-workflow
mengfei25 Jun 6, 2025
0bee6bf
modify tst env
Jun 6, 2025
9c62e16
modify
Jun 8, 2025
7148fb4
update
Jun 8, 2025
5e36331
test dev1 on e2e node
Jun 9, 2025
b2cf7dc
split env to common script
Jun 9, 2025
0f83a45
update
Jun 9, 2025
4df0df6
update
Jun 9, 2025
9526c42
Cleanup workspace
Jun 9, 2025
145314f
update
Jun 9, 2025
1fb7cf4
update
Jun 9, 2025
011d1f4
clean /tmp, ~/.cache and ~/.triton
Jun 9, 2025
855b7e9
clean /tmp, ~/.cache and ~/.triton
Jun 10, 2025
14bb034
update
Jun 10, 2025
f2ded38
update
Jun 10, 2025
8057d64
modify
Jun 11, 2025
563b597
modify
Jun 11, 2025
e823753
Merge branch 'main' into mengfeil/enhance-test-workflow
mengfei25 Jun 11, 2025
a6059fc
conda init
Jun 12, 2025
ec54a2d
update
Jun 12, 2025
0b7b890
skip SC1090
Jun 12, 2025
21dc4ee
remove update_lkg
Jun 12, 2025
2fd4de4
deps
Jun 12, 2025
8df670c
deps
Jun 12, 2025
27964a1
update
Jun 12, 2025
dac7180
update
Jun 12, 2025
ac69582
deps
Jun 12, 2025
bee9e44
update
Jun 12, 2025
dddf97e
update
Jun 12, 2025
c96b98e
pytest deps
Jun 13, 2025
d8dfd22
PR
Jun 13, 2025
c2c5193
update
Jun 13, 2025
e6a54bd
tmp permission
Jun 13, 2025
693d0ba
Merge branch 'main' into mengfeil/enhance-test-workflow
mengfei25 Jun 17, 2025
cc16f78
update
Jun 17, 2025
fa26db5
remove clean workspace
Jun 18, 2025
9398857
Merge branch 'main' into mengfeil/enhance-test-workflow
mengfei25 Jun 18, 2025
60da1e9
Clean diskspace
Jun 18, 2025
6cdb741
update
Jun 18, 2025
40bc0e5
Merge branch 'main' into mengfeil/enhance-test-workflow
mengfei25 Jun 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .github/actions/diskspace-clean/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: Cleans up diskspace

description: Cleans up diskspace

runs:
using: composite
steps:
- name: Cleans up diskspace
shell: bash
run: |
set -xe -o pipefail
# Clean workspace
rm -rf ${{ github.workspace }}/* || sudo rm -rf ${{ github.workspace }}/*
# Clean cache
rm -rf /tmp/ || sudo rm -rf /tmp/
mkdir -m 777 /tmp || sudo mkdir -m 777 /tmp
rm -rf ~/.torch || sudo rm -rf ~/.torch
rm -rf ~/.triton || sudo rm -rf ~/.triton
rm -rf ~/.cache || sudo rm -rf ~/.cache
rm -rf ~/.conda || sudo rm -rf ~/.conda
# Clean docker
docker stop $(docker ps -aq) || true
docker system prune -af
13 changes: 0 additions & 13 deletions .github/actions/inductor-xpu-e2e-test/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,14 +52,6 @@ runs:
run: |
source activate e2e_ci
if [[ ${{ inputs.suite }} == *"torchbench"* ]]; then
if [ "${{ inputs.pytorch }}" != "nightly_wheel" ]; then
cd ../ && rm -rf audio && git clone --single-branch -b main https://github.com/pytorch/audio.git
cd audio && git checkout $TORCHAUDIO_COMMIT_ID
python setup.py bdist_wheel && pip uninstall torchaudio -y && pip install dist/*.whl
cd ../ && rm -rf vision && git clone --single-branch -b main https://github.com/pytorch/vision.git
cd vision && git checkout $TORCHVISION_COMMIT_ID
python setup.py bdist_wheel && pip uninstall torchvision -y && pip install dist/*.whl
fi
cd ../ && python -c "import torch, torchvision, torchaudio"
rm -rf benchmark && git clone https://github.com/pytorch/benchmark.git
cd benchmark && git checkout $TORCHBENCH_COMMIT_ID
Expand All @@ -80,11 +72,6 @@ runs:
pip install --force-reinstall git+https://github.com/huggingface/transformers@${TRANSFORMERS_VERSION}
fi
if [[ ${{ inputs.suite }} == *"timm_models"* ]]; then
if [ "${{ inputs.pytorch }}" != "nightly_wheel" ]; then
cd ../ && rm -rf vision && git clone --single-branch -b main https://github.com/pytorch/vision.git
cd vision && git checkout $TORCHVISION_COMMIT_ID
python setup.py bdist_wheel && pip uninstall torchvision -y && pip install dist/*.whl
fi
# install timm without dependencies
pip install --no-deps git+https://github.com/huggingface/pytorch-image-models@$TIMM_COMMIT_ID
# install timm dependencies without torch and torchvision
Expand Down
57 changes: 57 additions & 0 deletions .github/actions/testenv-setup/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
name: Cleans up diskspace

description: Cleans up diskspace

inputs:
suite:
required: true
type: string
default: 'huggingface'
description: Dynamo benchmarks test suite. huggingface,timm_models,torchbench. Delimiter is comma
env_prepare:
required: false
description: If set to any value, will prepare suite test env
dt:
required: true
type: string
default: 'float32'
description: Data precision of the test.float32,bfloat16,float16,amp_bf16,amp_fp16. Delimiter is comma
mode:
required: true
type: string
default: 'inference'
description: inference,training. Delimiter is comma
scenario:
required: true
type: string
default: 'accuracy'
description: accuracy,performance. Delimiter is comma
cards:
required: false
type: string
default: 'all'
description: which cards can be used in the test
hf_token:
required: false
description: HUGGING_FACE_HUB_TOKEN for torchbench test
pytorch:
required: false
type: string
default: 'main'
description: Pytorch branch/commit
driver:
required: false
type: string
default: 'lts'
description: Driver lts/rolling

runs:
using: composite
defaults:
run:
shell: bash -xe -o pipefail
steps:
- name: Prepare PyTorch
shell: bash
run: |
echo
15 changes: 14 additions & 1 deletion .github/scripts/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ cp -r ${WORKSPACE}/torch-xpu-ops third_party/torch-xpu-ops

# Pre Build
cd ${WORKSPACE}/pytorch
python -m pip install requests
python -m pip install requests cmake ninja
python third_party/torch-xpu-ops/.github/scripts/apply_torch_pr.py
git submodule sync && git submodule update --init --recursive
python -m pip install -r requirements.txt
Expand Down Expand Up @@ -85,6 +85,17 @@ rm -rf ./tmp
bash third_party/torch-xpu-ops/.github/scripts/rpath.sh ${WORKSPACE}/pytorch/dist/torch*.whl
python -m pip install --force-reinstall tmp/torch*.whl

# Build torchvision torchaudio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a env var or parameter control this behavior, by default off unless the var/parameter has been set to on.

Including triton build also. For pinned triton commit, we can use make triton directly under pytorch root dir. For customized triton commit, we can build by ourselves or leverage the scripts directly https://github.com/pytorch/pytorch/blob/main/.github/scripts/build_triton_wheel.py and refer usage https://github.com/chuanqi129/pytorch/blob/fix_triton_version_split/.github/workflows/build-triton-wheel.yml#L158-L160. Before this step, we need replace the pined triton xpu commit file content to customized one.

cc: @RUIJIEZHONG66166

unset PYTORCH_VERSION
TORCHVISION_COMMIT=$(cat .github/ci_commit_pins/vision.txt)
rm -rf xpu-vision && git clone https://github.com/pytorch/vision.git xpu-vision
cd xpu-vision && git checkout ${TORCHVISION_COMMIT}
python setup.py bdist_wheel && cd ..
TORCHAUDIO_COMMIT=$(cat .github/ci_commit_pins/audio.txt)
rm -rf xpu-audio && git clone https://github.com/pytorch/audio.git xpu-audio
cd xpu-audio && git checkout ${TORCHAUDIO_COMMIT}
python setup.py bdist_wheel && cd ..

# Verify
cd ${WORKSPACE}
python ${WORKSPACE}/pytorch/torch/utils/collect_env.py
Expand All @@ -95,6 +106,8 @@ xpu_is_compiled="$(python -c 'import torch; print(torch.xpu._is_compiled())')"
# Save wheel
if [ "${xpu_is_compiled,,}" == "true" ];then
cp ${WORKSPACE}/pytorch/tmp/torch*.whl ${WORKSPACE}
cp ${WORKSPACE}/pytorch/xpu-vision/dist/torchvision*.whl ${WORKSPACE}
cp ${WORKSPACE}/pytorch/xpu-audio/dist/torchaudio*.whl ${WORKSPACE}
else
echo "Build got failed!"
exit 1
Expand Down
2 changes: 1 addition & 1 deletion .github/scripts/lintrunner.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ if ! command -v lintrunner &> /dev/null; then
fi

# Ignoring errors in one specific run
export SHELLCHECK_OPTS="-e SC2154 -e SC2086 -e SC1091 -e SC2046"
export SHELLCHECK_OPTS="-e SC2154 -e SC2086 -e SC1091 -e SC2046 -e SC1090"

# This has already been cached in the docker image
lintrunner init 2> /dev/null
Expand Down
87 changes: 87 additions & 0 deletions .github/scripts/setup_test_env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#!/bin/bash

set -xe -o pipefail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this script is over design somehow. Add some suggestions

export GIT_PAGER=cat

# Init params
WORKSPACE=$(realpath ${WORKSPACE:-"/tmp"})
CONDA_ENV=${CONDA_ENV:-"xpu-test"}
PYTHON_VERSION=${PYTHON_VERSION:-"3.10"}
PYTORCH_REPO=${PYTORCH_REPO:-"https://github.com/pytorch/pytorch.git"}
PYTORCH_VERSION=${PYTORCH_VERSION:-"main"}
for var; do
eval "export $(echo ${var@Q} |sed "s/^'-*//g;s/=/='/")"
done

# Python env via conda
. "$(conda info -e |awk '{if($1=="base"){printf("%s/etc/profile.d/conda.sh", $NF)}}')"
conda create python=${PYTHON_VERSION} -y -n ${CONDA_ENV}
conda activate ${CONDA_ENV}
conda info -e
which python && python -V && conda list
python -m pip install requests pandas scipy psutil

# Prepare pytorch
if [ "${PYTORCH_VERSION}" == "release" ];then
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu
elif [ "${PYTORCH_VERSION}" == "test" ];then
python -m pip install torch torchvision torchaudio --pre --index-url https://download.pytorch.org/whl/test/xpu
elif [ "${PYTORCH_VERSION}" == "nightly" ];then
python -m pip install torch torchvision torchaudio --pre --index-url https://download.pytorch.org/whl/nightly/xpu
else
python -m pip install ${WORKSPACE}/torch*.whl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this step include torchvision and torchaudio installation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

fi
TORCH_COMMIT="$(python -c 'import torch; print(torch.version.git_version)')"
rm -rf ./pytorch
git clone ${PYTORCH_REPO} pytorch
cd pytorch
git checkout ${TORCH_COMMIT}
git remote -v && git branch && git show -s

# Prepare torch-xpu-ops
rm -rf third_party/torch-xpu-ops
if [ "${PYTORCH_VERSION}" != "main" ];then
TORCH_XPU_OPS_COMMIT=$(cat third_party/xpu.txt)
git clone https://github.com/intel/torch-xpu-ops.git third_party/torch-xpu-ops
cd third_party/torch-xpu-ops
git checkout ${TORCH_XPU_OPS_COMMIT}
else
cp -r ${WORKSPACE} third_party/torch-xpu-ops
cd third_party/torch-xpu-ops
fi
git remote -v && git branch && git show -s
cd ../..
if [ "${GITHUB_EVENT_NAME}" == "pull_request" ];then
python third_party/torch-xpu-ops/.github/scripts/apply_torch_pr.py -e https://github.com/pytorch/pytorch/pull/152940
else
python third_party/torch-xpu-ops/.github/scripts/apply_torch_pr.py
fi

# Install triton
if [ "${TRITON_VERSION}" == "pinned" ];then
TRITON_VERSION="$(cat .ci/docker/ci_commit_pins/triton-xpu.txt)"
fi
if [ -n "${TRITON_VERSION}" ];then
TRITON_REPO="https://github.com/intel/intel-xpu-backend-for-triton"
python -m pip uninstall -y pytorch-triton-xpu
python -m pip install "git+${TRITON_REPO}@${TRITON_VERSION}#subdirectory=python"
fi

# Install requirements
python -m pip install -r .ci/docker/requirements-ci.txt
python -m pip install -U pytest pytest-timeout

# Collect env infos
cd ..
python -c "import torch; print(torch.__config__.show())"
python -c "import torch; print(torch.__config__.parallel_info())"
python -c "import torch; print(torch.__config__.torch.xpu.device_count())"
python -c "import triton; print(triton.__version__)"
python pytorch/torch/utils/collect_env.py

# Clean cache
rm -rf /tmp/ || sudo rm -rf /tmp/
mkdir -m 777 /tmp || sudo mkdir -m 777 /tmp
rm -rf ~/.torch || sudo rm -rf ~/.torch
rm -rf ~/.triton || sudo rm -rf ~/.triton
rm -rf ~/.cache || sudo rm -rf ~/.cache
5 changes: 0 additions & 5 deletions .github/workflows/_linux_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,6 @@ on:
type: string
default: 'linux.idc.xpu'
description: Runner label
update_lkg:
required: false
type: string
default: 'false'
description: Whether update LKG torch version to issue #1280
outputs:
torch_commit_id:
description: The commit id of the torch build
Expand Down
Loading