Skip to content

Conversation

@yucai-intel
Copy link
Contributor

@yucai-intel yucai-intel commented Nov 11, 2025

The Problem Solved
The assertion failed when the nn.TransformerEncoderLayer's test_transformer_encoder_layer test ran on an XPU device.
#2015
Root Cause: The test is designed to check the Transformer's behavior when the Key Padding Mask completely masks the input (Sequence Length=1, Mask=[[True]]).
The MHA module on XPU devices (lacking a specific Fast Path optimization) falls back to the Non Fast Path execution, where the actual calculated result is a non-NaN finite value (this is the mathematically robust result of X + Attention(0)).
However, by default, the test incorrectly entered the Fast Path assertion branch in XPU/Non CrossRef mode, which expected a NaN result. The actual non-NaN result did not match the expected NaN assertion, causing the test to fail.

Why This Solution
To validate the core semantics of TransformerEncoderLayer on XPU, we need to use the numerically most robust Non Fast Path logic as the benchmark.Non Fast Path Semantics: When fully masked, the attention output is 0, and the result is a finite value (Non-NaN) after text LayerNorm(X).
Solution: By forcing the test into the non fast path branch, its assertion is made to match the actual non-NaN result of the XPU device, ensuring the Unit Test (UT) passes.

The Implementation Method
The fix involves manually setting TEST_WITH_CROSSREF=1 within the affected test function in the relevant test file.
This forces the test method to follow the Non Fast Path branch. This switches the test's assertion from the incorrect self.assertTrue(np.isnan(result).all()) to the correct self.assertTrue(not np.isnan(result).any()), thereby fixing the test.

TIPS:
Given that PYTORCH_TEST_WITH_CROSSREF is a variable that can affect global test semantics and performance, this fix was chosen to be effective within the minimal scope to avoid unintended performance degradation or logic changes in other Unit Tests.

@yucai-intel yucai-intel changed the title Update test_nn_xpu.py Fix transformerEncoderLayer Full Mask UT Failure on XPU Nov 11, 2025
@intel intel deleted a comment from github-actions bot Nov 12, 2025
@yucai-intel yucai-intel changed the title Fix transformerEncoderLayer Full Mask UT Failure on XPU Fix TransformerEncoderLayer Full Mask UT Failure on XPU Nov 12, 2025
TEST_WITH_CROSSREF,
)

TEST_WITH_CROSSREF = 1 # noqa: F811
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that CUDA passed these cases with TEST_WITH_CROSSREF = 0. Meanwhile, Fast Paths works with CUDA. We need to check the related XPU kernel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this test on the A100. CUDA took the non-fast path, and the output of xpu was consistent with CUDA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On XPU:
res--------- [[[ 2.213 0.1344 -0.721 0.1926]]]
exp--------- False
On CUDA:
test_transformerencoderlayer---------
Non Fast Paths---------
res--------- [[[ 2.213 0.1344 -0.721 0.1926]]]
exp--------- False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants