Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU][Codegen] Allowing mfma for narrow problem config sizes #19615

Merged
merged 2 commits into from
Jan 8, 2025

Conversation

jerryyin
Copy link
Member

@jerryyin jerryyin commented Jan 6, 2025

The motivation of this PR is convolution performance for resnet50 configs. With this PR (and a few pending ones), conv performance with igemm pipeline get decent speedup in situation where a standalone dimension size is smaller than intrinsic size. (Take dispatch 69 as example, the select tile m:7, n:512, k:4608 will be rejected from mfma because m tile is smaller than intrinsic size of 16). This happens because previously we are too defensive about when to use intrinsic: in situation when alignment is not required, we still enforce mfma to be picked up only when m/n/k tiles are all larger than intrinsic size.

With @nirvedhmeshram's #19271 and #19484, padding is allowed in tile and fuse matmul and igemm tile and fuse pipelines, it is no longer necessary to be as conservative as before. I am therefore getting rid of the conditional check that blocks mfma from being picked up.

This will impact a few pipelines that use canTargetIntrinsic():

  • LLVMGPUPadAndVectorDistribute will allow narrow m/n/k dimension sizes for batch matmul
  • In iree-codegen-rocdl-configuration-pipeline, will allow narrow m/n/k dimension sizes for matmul (instead of warp reduction)

@jerryyin
Copy link
Member Author

jerryyin commented Jan 7, 2025

I experimented with resnet50 again and can confirm between:

  • completely dropping the size check
  • size check of 4

makes no difference in full model performance.

@jerryyin jerryyin force-pushed the users/zyin/remove-intrinsic-size-limitation branch from 22102d7 to f2d5d51 Compare January 7, 2025 21:37
This make sure tiny gemm can still be lowered through warp reduction
pipeline.

Signed-off-by: jerryyin <[email protected]>
Copy link
Contributor

@qedawkins qedawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool LGTM! Also I would pay close attention to the regression tests for SDXL and llama on the CI. There is some wiggle room on those tests so it might pass small regressions.

@jerryyin jerryyin merged commit c75b686 into main Jan 8, 2025
36 checks passed
@jerryyin jerryyin deleted the users/zyin/remove-intrinsic-size-limitation branch January 8, 2025 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants