Skip to content

Update heuristic for Cutlass BF16 Grouped GEMM #4138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

cthi
Copy link

@cthi cthi commented May 16, 2025

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Differential Revision: D74836650

Copy link

netlify bot commented May 16, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 02d22bd
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/682cdb026c39e40008beb89d
😎 Deploy Preview https://deploy-preview-4138--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74836650

@cthi cthi force-pushed the export-D74836650 branch from 05a77f3 to 70162d4 Compare May 19, 2025 13:30
cthi pushed a commit to cthi/FBGEMM-1 that referenced this pull request May 19, 2025
Summary:

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650
@cthi cthi force-pushed the export-D74836650 branch from 70162d4 to e346870 Compare May 19, 2025 13:31
cthi pushed a commit to cthi/FBGEMM-1 that referenced this pull request May 19, 2025
Summary:

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74836650

cthi pushed a commit to cthi/FBGEMM-1 that referenced this pull request May 19, 2025
Summary:
Pull Request resolved: pytorch#4138

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650
@cthi cthi force-pushed the export-D74836650 branch from e346870 to da597d0 Compare May 19, 2025 13:34
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74836650

cthi pushed a commit to cthi/FBGEMM-1 that referenced this pull request May 19, 2025
Summary:
Pull Request resolved: pytorch#4138

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650
@cthi cthi force-pushed the export-D74836650 branch from da597d0 to 32b5184 Compare May 19, 2025 13:42
Summary:
We plan to make some changes to the heuristics, first refactor a bit
to parallelize kernel compilation, such as in FP8 rowwise.

Differential Revision: D74760416
@cthi cthi force-pushed the export-D74836650 branch from 32b5184 to 5ae4d8e Compare May 20, 2025 19:30
cthi pushed a commit to cthi/FBGEMM-1 that referenced this pull request May 20, 2025
Summary:

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650
cthi pushed a commit to cthi/FBGEMM-1 that referenced this pull request May 20, 2025
Summary:

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650
@cthi cthi force-pushed the export-D74836650 branch from 5ae4d8e to 959f3f9 Compare May 20, 2025 19:32
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74836650

cthi pushed a commit to cthi/FBGEMM-1 that referenced this pull request May 20, 2025
Summary:
Pull Request resolved: pytorch#4138

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650
@cthi cthi force-pushed the export-D74836650 branch from 959f3f9 to 34e7d42 Compare May 20, 2025 19:35
Summary:
Pull Request resolved: pytorch#4138

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74836650

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 841c22c.

q10 added a commit to q10/FBGEMM that referenced this pull request May 22, 2025
- Disable GenAI builds against CUDA 11.8 since it is no longr possible to support GenAI builds against CUDA 11.8.0 as of pytorch#4138
q10 added a commit to q10/FBGEMM that referenced this pull request May 22, 2025
- Disable GenAI builds against CUDA 11.8 since it is no longr possible to support GenAI builds against CUDA 11.8.0 as of pytorch#4138
q10 added a commit to q10/FBGEMM that referenced this pull request May 22, 2025
- Disable GenAI builds against CUDA 11.8 since it is no longr possible to support GenAI builds against CUDA 11.8.0 as of pytorch#4138
q10 added a commit to q10/FBGEMM that referenced this pull request May 22, 2025
- Disable GenAI builds against CUDA 11.8 since it is no longr possible to support GenAI builds against CUDA 11.8.0 as of pytorch#4138
facebook-github-bot pushed a commit that referenced this pull request May 22, 2025
Summary:
X-link: facebookresearch/FBGEMM#1255

- Disable GenAI builds against CUDA 11.8 since it is no longr possible to support GenAI builds against CUDA 11.8.0 as of #4138

Pull Request resolved: #4173

Reviewed By: jiawenliu64

Differential Revision: D75229752

Pulled By: q10

fbshipit-source-id: e9626799d371ee2671f9062df1933d3caea65087
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants