Skip to content

[CI] Enable matrix test for UT #1677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 59 commits into
base: main
Choose a base branch
from
Open

Conversation

RUIJIEZHONG66166
Copy link
Contributor

Enable matrix test method to accelerate the UT test.

@RUIJIEZHONG66166 RUIJIEZHONG66166 force-pushed the ruijie/enable_ut_matrix branch from d206dfa to 5491aa6 Compare May 16, 2025 08:45
rm -rf $(dirname ${CONDA_EXE})/../envs/xpu_op_${ZE_AFFINITY_MASK}
conda create -n xpu_op_${ZE_AFFINITY_MASK} python=${{ inputs.python }} cmake ninja -y
source activate xpu_op_${ZE_AFFINITY_MASK}
conda remove --all -y -n $CONDA_ENV_NAME || \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command on a previous line, conda clean -ay is a big problem if you have a few runners running on the same system. This effectively means that you can damage environment you create/work in one runner from another and get all sorts of fancy random failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thank you~~

- name: Create unique workspace
run: |
# Create unique conda env for each UT test
echo "CONDA_ENV_NAME=xpu_op_${ZE_AFFINITY_MASK}_${{ matrix.test.name }}" >> $GITHUB_ENV
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not truly unique actually. Meaning that if there are 2 of these workloads running in parallel, for example for different PRs, these will start to fail. To address this you can reuse approach from #1282. It will work only with Linux though. For Windows you will need something else:

Suggested change
echo "CONDA_ENV_NAME=xpu_op_${ZE_AFFINITY_MASK}_${{ matrix.test.name }}" >> $GITHUB_ENV
random=$(head /dev/urandom | tr -dc A-Za-z0-9_ | head -c ${1:-5} | xargs)
echo "CONDA_ENV_NAME=xpu_op_${ZE_AFFINITY_MASK}_${{ matrix.test.name }}_${random}" >> $GITHUB_ENV

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thank you so much for your help~~

fail-fast: false
matrix:
test:
- name: 'op_regression'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add runner as a config of the matrix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

timeout: 8000
additional_steps: |
pip install pytest pytest-timeout
- name: 'op_regression_dev1'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that this part can dispatch to multi-cards runner type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Dispatched to pytorch-06.

command_script: |
export PYTORCH_TEST_WITH_SLOW=1
export PYTORCH_TESTING_DEVICE_ONLY_FOR="xpu"
for xpu_case in build/bin/*{xpu,sycl}*; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this part binary tests, because we split pytorch build as a standalone job, and we don't artifact other binaries together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

fi
done
test_cmd="python test/run_test.py --include "
for test in $(ls test/inductor | grep test); do test_cmd="${test_cmd} inductor/$test"; done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider how to further parallel when the inductor ut enabled in this part. Can we leverage shard mechanism in stock pytorch directly?

rm -rf $(dirname ${CONDA_EXE})/../envs/xpu_op_${ZE_AFFINITY_MASK}
conda create -n xpu_op_${ZE_AFFINITY_MASK} python=${{ inputs.python }} cmake ninja -y
source activate xpu_op_${ZE_AFFINITY_MASK}
which conda
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to use setup-python https://github.com/actions/setup-python instead of conda
cc @mengfei25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants