Update heuristic for Cutlass BF16 Grouped GEMM #4138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

cthi wants to merge 2 commits into pytorch:main from cthi:export-D74836650

cthi commented May 16, 2025

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Differential Revision: D74836650

netlify bot commented May 16, 2025 •

edited

Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`02d22bd`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/682cdb026c39e40008beb89d
😎 Deploy Preview	https://deploy-preview-4138--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

facebook-github-bot added the cla signed label

Contributor

facebook-github-bot commented May 16, 2025

This pull request was exported from Phabricator. Differential Revision: D74836650

facebook-github-bot added the fb-exported label

cthi force-pushed the export-D74836650 branch from 05a77f3 to 70162d4 Compare

May 19, 2025 13:30

cthi pushed a commit to cthi/FBGEMM-1 that referenced this pull request


          Update heuristic for Cutlass BF16 Grouped GEMM (pytorch#4138)

70162d4

Summary:

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650

cthi force-pushed the export-D74836650 branch from 70162d4 to e346870 Compare

May 19, 2025 13:31

cthi pushed a commit to cthi/FBGEMM-1 that referenced this pull request


          Update heuristic for Cutlass BF16 Grouped GEMM (pytorch#4138)

e346870

Summary:

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650

Contributor

facebook-github-bot commented May 19, 2025

This pull request was exported from Phabricator. Differential Revision: D74836650

cthi pushed a commit to cthi/FBGEMM-1 that referenced this pull request


          Update heuristic for Cutlass BF16 Grouped GEMM (pytorch#4138)

da597d0

Summary:
Pull Request resolved: pytorch#4138

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650

cthi force-pushed the export-D74836650 branch from e346870 to da597d0 Compare

May 19, 2025 13:34

Contributor

facebook-github-bot commented May 19, 2025

This pull request was exported from Phabricator. Differential Revision: D74836650

cthi pushed a commit to cthi/FBGEMM-1 that referenced this pull request


          Update heuristic for Cutlass BF16 Grouped GEMM (pytorch#4138)

32b5184

Summary:
Pull Request resolved: pytorch#4138

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650

cthi force-pushed the export-D74836650 branch from da597d0 to 32b5184 Compare

May 19, 2025 13:42


          Refactor cutlass BF16 grouped gemm

0a61fc6

Summary:
We plan to make some changes to the heuristics, first refactor a bit
to parallelize kernel compilation, such as in FP8 rowwise.

Differential Revision: D74760416

cthi force-pushed the export-D74836650 branch from 32b5184 to 5ae4d8e Compare

May 20, 2025 19:30

cthi pushed a commit to cthi/FBGEMM-1 that referenced this pull request


          Update heuristic for Cutlass BF16 Grouped GEMM (pytorch#4138)

5ae4d8e

Summary:

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650

cthi pushed a commit to cthi/FBGEMM-1 that referenced this pull request


          Update heuristic for Cutlass BF16 Grouped GEMM (pytorch#4138)

959f3f9

Summary:

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650

cthi force-pushed the export-D74836650 branch from 5ae4d8e to 959f3f9 Compare

May 20, 2025 19:32

Contributor

facebook-github-bot commented May 20, 2025

This pull request was exported from Phabricator. Differential Revision: D74836650

cthi pushed a commit to cthi/FBGEMM-1 that referenced this pull request


          Update heuristic for Cutlass BF16 Grouped GEMM (pytorch#4138)

34e7d42

Summary:
Pull Request resolved: pytorch#4138

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650

cthi force-pushed the export-D74836650 branch from 959f3f9 to 34e7d42 Compare

May 20, 2025 19:35


          Update heuristic for Cutlass BF16 Grouped GEMM (pytorch#4138)

02d22bd

Summary:
Pull Request resolved: pytorch#4138

X-link: facebookresearch/FBGEMM#1220

This diff updates the heuristic used for Cutlass BF16 grouped gemm, improving performance in some important shapes.

Reviewed By: jianyuh

Differential Revision: D74836650

Contributor

facebook-github-bot commented May 20, 2025

This pull request was exported from Phabricator. Differential Revision: D74836650

cthi force-pushed the export-D74836650 branch from 34e7d42 to 02d22bd Compare

May 20, 2025 19:41

facebook-github-bot closed this in

841c22c

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented May 21, 2025

This pull request has been merged in 841c22c.

q10 added a commit to q10/FBGEMM that referenced this pull request


          [fbgemm_gpu] Disable GenAI builds against CUDA 11.8

fd06f6d

- Disable GenAI builds against CUDA 11.8 since it is no longr possible to support GenAI builds against CUDA 11.8.0 as of pytorch#4138

q10 mentioned this pull request

[fbgemm_gpu] Disable GenAI builds against CUDA 11.8 #4173

Closed

q10 added a commit to q10/FBGEMM that referenced this pull request


          [fbgemm_gpu] Disable GenAI builds against CUDA 11.8

ca668ee

- Disable GenAI builds against CUDA 11.8 since it is no longr possible to support GenAI builds against CUDA 11.8.0 as of pytorch#4138

q10 added a commit to q10/FBGEMM that referenced this pull request


          [fbgemm_gpu] Disable GenAI builds against CUDA 11.8

cc35a9e

- Disable GenAI builds against CUDA 11.8 since it is no longr possible to support GenAI builds against CUDA 11.8.0 as of pytorch#4138

q10 added a commit to q10/FBGEMM that referenced this pull request


          [fbgemm_gpu] Disable GenAI builds against CUDA 11.8

8cadc28

- Disable GenAI builds against CUDA 11.8 since it is no longr possible to support GenAI builds against CUDA 11.8.0 as of pytorch#4138

facebook-github-bot pushed a commit that referenced this pull request


          Disable GenAI builds against CUDA 11.8 (#4173)

a7246da

Summary:
X-link: facebookresearch/FBGEMM#1255

- Disable GenAI builds against CUDA 11.8 since it is no longr possible to support GenAI builds against CUDA 11.8.0 as of #4138

Pull Request resolved: #4173

Reviewed By: jiawenliu64

Differential Revision: D75229752

Pulled By: q10

fbshipit-source-id: e9626799d371ee2671f9062df1933d3caea65087

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported Merged