Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WaveGetLane* support for Metal and WGSL #6371

Merged
merged 26 commits into from
Feb 28, 2025

Conversation

fairywreath
Copy link
Contributor

Closes #6208, related to #6210.

Adds support for WaveGetLane* intrinsics for Metal and WGSL through the use of GLSL flavored in/out global vars.

These intrinsics require built-in values to be passed through the entry point parameters, but the intrinsics themselves are "global" functions. The implementation reuses the already existing code that handles glsl gl_* global builtin variables and adds them to the entry point parameter list. translateGLSLGlobalVar is renamed because it also handles non-glsl code.

csyonghe
csyonghe previously approved these changes Feb 16, 2025
Copy link
Collaborator

@csyonghe csyonghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -100,7 +100,7 @@
#include "slang-ir-strip-default-construct.h"
#include "slang-ir-strip-legalization-insts.h"
#include "slang-ir-synthesize-active-mask.h"
#include "slang-ir-translate-glsl-global-var.h"
#include "slang-ir-translate-in-out-global-var.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can name it translate-global-varying-var.

@@ -3055,6 +3055,11 @@ void CLikeSourceEmitter::defaultEmitInstExpr(IRInst* inst, const EmitOpInfo& inO
emitOperand(as<IRGlobalValueRef>(inst)->getOperand(0), getInfo(EmitOp::General));
break;
}
case kIROp_RequireWGSLExtension:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use a single RequireTargetExtension for all targets?

@csyonghe csyonghe added the pr: non-breaking PRs without breaking changes label Feb 16, 2025
csyonghe
csyonghe previously approved these changes Feb 19, 2025
@fairywreath
Copy link
Contributor Author

Not sure why the Metal glsl variant test fails while the HLSL variant passes - the produced code is similar and they both compile fine using the Metal compiler. I do not have an Apple device to test, hopefully my most recent change fixes it.

@fairywreath
Copy link
Contributor Author

The glsl variant test still failed on Metal and I am not sure why. I loosened up the check requirements for the Metal case and added another test for the glsl syntax that is a copy of the hlsl syntax test(that passes just fine in Metal), so the new test should definitely pass.

@csyonghe
Copy link
Collaborator

Hmm, if it is a hardware problem, then the HLSL syntax should also fail?

Can you compare the metal shader output from both HLSL and GLSL syntax and see if there are any differences?

@fairywreath
Copy link
Contributor Author

fairywreath commented Feb 20, 2025

Yea somehow the new GLSL syntax test(that is a 1-1 copy of the passing HLSL syntax test) is failing. Here is a diff of the outputs of the HLSL and GLSL syntax. They are very similar(the difference in output type is a test case typo).

@csyonghe
Copy link
Collaborator

I noticed that the HLSL case, the compiler is able to fold the access to wave index directly as a use of _builtinWaveLaneIndex_0 parameter, where in the GLSL case, the parameter is first written into the kernel context and then loaded from the context.

Which should be fine, but I wonder if that's the reason that caused the wrong value.

@csyonghe
Copy link
Collaborator

Another suspicious thing is:

struct KernelContext_0
{
    uint device* outputBuffer_0;
    uint3 _S1;
    uint _S2;
    uint _S3;
};

note that _S2 is the wave index, and it is placed right after a uint3. It is unlikely, but I wonder if there are bugs that causes an assignment to _S1 overwrites _S2 due to mishandling of alignment/padding etc. in the metal driver.

@csyonghe
Copy link
Collaborator

We are currently generating:

#line 16
    (&kernelContext_0)->_S3 = _builtinWaveLaneCount_0;

#line 16
    (&kernelContext_0)->_S2 = _builtinWaveLaneIndex_0;

#line 16
    (&kernelContext_0)->_S1 = gl_GlobalInvocationID_0;

and this is in reverse order of the fields, might worth clean this up and make the assignment in the consistent order of the fields.

@fairywreath
Copy link
Contributor Author

Thanks for the help. I have made changes so that the GLSL and HLSL outputs are exactly the same (diff). Can you run the test again?

@fairywreath
Copy link
Contributor Author

It is surprising that the GLSL syntax variant still fails. Here is a diff of the hlsl and glsl tests and diff of the Metal output. The produced Metal code is exactly the same.

@csyonghe
Copy link
Collaborator

One test is using COMPARE_COMPUTE, and the other is using COMPARE_COMPUTE_EX, can we make both of them the same?

@csyonghe
Copy link
Collaborator

This is needing more debug. Let's disable the glsl test and get this PR merged first. We can leave it as future work.

@fairywreath fairywreath requested a review from csyonghe February 25, 2025 05:30


// TODO: There are some issues with the Metal backend when using glsl-style syntax, test is disabled for now.
//DISABLE_TEST(compute):COMPARE_COMPUTE_EX(filecheck-buffer=BUF):-metal -compute -entry computeMain -allow-glsl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of disable test here, let’s run the test, but add the test name to expected-failure.txt so we can always use that file to track tests that we still need to work on.

csyonghe
csyonghe previously approved these changes Feb 25, 2025
@fairywreath fairywreath requested a review from csyonghe February 25, 2025 17:50
csyonghe
csyonghe previously approved these changes Feb 26, 2025
@fairywreath
Copy link
Contributor Author

fairywreath commented Feb 26, 2025

Whoops I put the expected to fail tests in the wrong file, I have corrected them.

@fairywreath fairywreath requested a review from csyonghe February 26, 2025 14:51
@fairywreath
Copy link
Contributor Author

fairywreath commented Feb 27, 2025

There were some merge conflicts - I corrected them

@csyonghe csyonghe merged commit 66984eb into shader-slang:master Feb 28, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr: non-breaking PRs without breaking changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support shader subgroup/wave intrinsics for WGSL
2 participants