-
Notifications
You must be signed in to change notification settings - Fork 42
[CI] Enable matrix test for UT #1677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d206dfa
to
5491aa6
Compare
rm -rf $(dirname ${CONDA_EXE})/../envs/xpu_op_${ZE_AFFINITY_MASK} | ||
conda create -n xpu_op_${ZE_AFFINITY_MASK} python=${{ inputs.python }} cmake ninja -y | ||
source activate xpu_op_${ZE_AFFINITY_MASK} | ||
conda remove --all -y -n $CONDA_ENV_NAME || \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Command on a previous line, conda clean -ay
is a big problem if you have a few runners running on the same system. This effectively means that you can damage environment you create/work in one runner from another and get all sorts of fancy random failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thank you~~
.github/workflows/_linux_ut.yml
Outdated
- name: Create unique workspace | ||
run: | | ||
# Create unique conda env for each UT test | ||
echo "CONDA_ENV_NAME=xpu_op_${ZE_AFFINITY_MASK}_${{ matrix.test.name }}" >> $GITHUB_ENV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not truly unique actually. Meaning that if there are 2 of these workloads running in parallel, for example for different PRs, these will start to fail. To address this you can reuse approach from #1282. It will work only with Linux though. For Windows you will need something else:
echo "CONDA_ENV_NAME=xpu_op_${ZE_AFFINITY_MASK}_${{ matrix.test.name }}" >> $GITHUB_ENV | |
random=$(head /dev/urandom | tr -dc A-Za-z0-9_ | head -c ${1:-5} | xargs) | |
echo "CONDA_ENV_NAME=xpu_op_${ZE_AFFINITY_MASK}_${{ matrix.test.name }}_${random}" >> $GITHUB_ENV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thank you so much for your help~~
fail-fast: false | ||
matrix: | ||
test: | ||
- name: 'op_regression' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add runner as a config of the matrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
timeout: 8000 | ||
additional_steps: | | ||
pip install pytest pytest-timeout | ||
- name: 'op_regression_dev1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that this part can dispatch to multi-cards runner type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Dispatched to pytorch-06.
.github/workflows/_linux_ut.yml
Outdated
command_script: | | ||
export PYTORCH_TEST_WITH_SLOW=1 | ||
export PYTORCH_TESTING_DEVICE_ONLY_FOR="xpu" | ||
for xpu_case in build/bin/*{xpu,sycl}*; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove this part binary tests, because we split pytorch build as a standalone job, and we don't artifact other binaries together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
fi | ||
done | ||
test_cmd="python test/run_test.py --include " | ||
for test in $(ls test/inductor | grep test); do test_cmd="${test_cmd} inductor/$test"; done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider how to further parallel when the inductor ut enabled in this part. Can we leverage shard mechanism in stock pytorch directly?
rm -rf $(dirname ${CONDA_EXE})/../envs/xpu_op_${ZE_AFFINITY_MASK} | ||
conda create -n xpu_op_${ZE_AFFINITY_MASK} python=${{ inputs.python }} cmake ninja -y | ||
source activate xpu_op_${ZE_AFFINITY_MASK} | ||
which conda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider to use setup-python https://github.com/actions/setup-python instead of conda
cc @mengfei25
Enable matrix test method to accelerate the UT test.