[CI] Enable matrix test for UT #1677

RUIJIEZHONG66166 · 2025-05-16T08:44:51Z

Enable matrix test method to accelerate the UT test.

dvrogozh · 2025-05-21T14:24:43Z

.github/workflows/_linux_ut.yml

-                rm -rf $(dirname ${CONDA_EXE})/../envs/xpu_op_${ZE_AFFINITY_MASK}
-          conda create -n xpu_op_${ZE_AFFINITY_MASK} python=${{ inputs.python }} cmake ninja -y
-          source activate xpu_op_${ZE_AFFINITY_MASK}
+          conda remove --all -y -n $CONDA_ENV_NAME || \


Command on a previous line, conda clean -ay is a big problem if you have a few runners running on the same system. This effectively means that you can damage environment you create/work in one runner from another and get all sorts of fancy random failures.

Got it. Thank you~~

dvrogozh · 2025-05-21T14:28:25Z

.github/workflows/_linux_ut.yml

+      - name: Create unique workspace
+        run: |
+          # Create unique conda env for each UT test
+          echo "CONDA_ENV_NAME=xpu_op_${ZE_AFFINITY_MASK}_${{ matrix.test.name }}" >> $GITHUB_ENV


That's not truly unique actually. Meaning that if there are 2 of these workloads running in parallel, for example for different PRs, these will start to fail. To address this you can reuse approach from #1282. It will work only with Linux though. For Windows you will need something else:

Suggested change

echo "CONDA_ENV_NAME=xpu_op_${ZE_AFFINITY_MASK}_${{ matrix.test.name }}" >> $GITHUB_ENV

random=$(head /dev/urandom | tr -dc A-Za-z0-9_ | head -c ${1:-5} | xargs)

echo "CONDA_ENV_NAME=xpu_op_${ZE_AFFINITY_MASK}_${{ matrix.test.name }}_${random}" >> $GITHUB_ENV

Got it. Thank you so much for your help~~

chuanqi129 · 2025-06-03T06:57:43Z

.github/workflows/_linux_ut.yml

+      fail-fast: false
+      matrix:
+        test:
+          - name: 'op_regression'


Let's add runner as a config of the matrix

chuanqi129 · 2025-06-03T06:58:39Z

.github/workflows/_linux_ut.yml

+            timeout: 8000
+            additional_steps: |
+              pip install pytest pytest-timeout
+          - name: 'op_regression_dev1'


So that this part can dispatch to multi-cards runner type

Done. Dispatched to pytorch-06.

chuanqi129 · 2025-06-03T07:01:09Z

.github/workflows/_linux_ut.yml

+            command_script: |
+              export PYTORCH_TEST_WITH_SLOW=1
+              export PYTORCH_TESTING_DEVICE_ONLY_FOR="xpu"
+              for xpu_case in build/bin/*{xpu,sycl}*; do


let's remove this part binary tests, because we split pytorch build as a standalone job, and we don't artifact other binaries together.

chuanqi129 · 2025-06-03T07:02:19Z

.github/workflows/_linux_ut.yml

+                fi
+              done
+              test_cmd="python test/run_test.py --include "
+              for test in $(ls test/inductor | grep test); do test_cmd="${test_cmd} inductor/$test"; done


Please consider how to further parallel when the inductor ut enabled in this part. Can we leverage shard mechanism in stock pytorch directly?

chuanqi129 · 2025-06-03T07:05:45Z

.github/workflows/_linux_ut.yml

-                rm -rf $(dirname ${CONDA_EXE})/../envs/xpu_op_${ZE_AFFINITY_MASK}
-          conda create -n xpu_op_${ZE_AFFINITY_MASK} python=${{ inputs.python }} cmake ninja -y
-          source activate xpu_op_${ZE_AFFINITY_MASK}
+          which conda


Consider to use setup-python https://github.com/actions/setup-python instead of conda
cc @mengfei25

[CI] Enable matirx test for UT

5491aa6

RUIJIEZHONG66166 force-pushed the ruijie/enable_ut_matrix branch from d206dfa to 5491aa6 Compare May 16, 2025 08:45

RUIJIEZHONG66166 added 13 commits May 16, 2025 02:04

fix the each issue

933a25a

update to json and item

068295d

fix the test matrix commands issue

20ec4db

fix the env prepare issue

71c2713

fix the if function issue

ec2bbd5

fix the parameter issue

d7f2082

test the unique test path

af5af09

fix the path issue

88b24a0

fix the strategy configuration issue

750fb01

fix the mkdir issue

348928a

fix the conda parallel conflict

84838b4

fix the path issue

aae77fe

fix the copy issue

8d9551e

dvrogozh reviewed May 21, 2025

View reviewed changes

RUIJIEZHONG66166 and others added 14 commits May 21, 2025 18:35

set the true unique conda env

603173d

remove json set

612ec73

set the env and fix the triton install

f76bc27

fix the keys not support issue

e308480

fix the dynamic para

02aeea5

fix the env set issue

15b0e3c

Merge branch 'main' into ruijie/enable_ut_matrix

73e5b47

fix the if function issue

15db778

update triton install

1bd1dcf

update result check

fce8ada

remove the disable pip set

0e9490f

fix the triton build

c94cc7c

test file only for triton

87b1459

fix the log path

5667c75

RUIJIEZHONG66166 and others added 10 commits May 30, 2025 01:23

fix the triton path set

7decafb

Merge branch 'main' into ruijie/enable_ut_matrix

3dbb50d

Merge branch 'main' into ruijie/enable_ut_matrix

de316c9

fix the outputs define issue

a94fcc5

fix the outputs define issue

1d6a06e

enable ut check for each matrix job

b9137b8

fix the ut check matrix job

3070279

fix the ut check matrix job

cf45fa9

fix the ut check matrix job

3b48b64

change the download path

b37c224

RUIJIEZHONG66166 requested a review from chuanqi129 June 9, 2025 10:29

chuanqi129 reviewed Jun 11, 2025

View reviewed changes

RUIJIEZHONG66166 added 18 commits June 11, 2025 00:46

add runner in matrix config

a134c8f

separate triton build and install

69c4f71

align the triton install

eb94af2

remove redundant path

ed06ad4

remove redundant path

989f993

fix the triton install path

54f60e6

fix the triton build

e94eead

fix the gcc issue

6ad5b8b

test triton build

4af1c6e

remove gcc set

7832d45

install python package

af00351

install python package

a8cd5d6

add python install

63ea304

set gcc11 for triton install

41ba99b

set gcc13 for triton build

d297b25

fix the can find ld

a437b11

fix the can find ld

ca69178

fix the can find ld

354a064

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Enable matrix test for UT #1677

[CI] Enable matrix test for UT #1677

Uh oh!

RUIJIEZHONG66166 commented May 16, 2025

Uh oh!

dvrogozh May 21, 2025

Uh oh!

RUIJIEZHONG66166 May 22, 2025

Uh oh!

dvrogozh May 21, 2025

Uh oh!

RUIJIEZHONG66166 May 22, 2025

Uh oh!

chuanqi129 Jun 3, 2025

Uh oh!

RUIJIEZHONG66166 Jun 11, 2025

Uh oh!

chuanqi129 Jun 3, 2025

Uh oh!

RUIJIEZHONG66166 Jun 11, 2025

Uh oh!

chuanqi129 Jun 3, 2025

Uh oh!

RUIJIEZHONG66166 Jun 11, 2025

Uh oh!

chuanqi129 Jun 3, 2025

Uh oh!

chuanqi129 Jun 3, 2025

Uh oh!

Uh oh!

	echo "CONDA_ENV_NAME=xpu_op_${ZE_AFFINITY_MASK}_${{ matrix.test.name }}" >> $GITHUB_ENV
	random=$(head /dev/urandom \| tr -dc A-Za-z0-9_ \| head -c ${1:-5} \| xargs)
	echo "CONDA_ENV_NAME=xpu_op_${ZE_AFFINITY_MASK}_${{ matrix.test.name }}_${random}" >> $GITHUB_ENV

[CI] Enable matrix test for UT #1677

Are you sure you want to change the base?

[CI] Enable matrix test for UT #1677

Uh oh!

Conversation

RUIJIEZHONG66166 commented May 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!