Batch matmul fast path in MHAWithCache #449

rohan-varma · 2023-08-17T00:45:35Z

Summary: When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization in MHAWithCache.

Differential Revision: D48418780

facebook-github-bot · 2023-08-17T00:46:02Z

This pull request was exported from Phabricator. Differential Revision: D48418780

facebook-github-bot · 2023-08-17T00:50:55Z

This pull request was exported from Phabricator. Differential Revision: D48418780

Summary: Pull Request resolved: facebookresearch#449 When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization in MHAWithCache. Differential Revision: D48418780 fbshipit-source-id: e8001eb870e827b05146221bb66f82939deae0c6

facebook-github-bot · 2023-08-17T07:45:38Z

This pull request was exported from Phabricator. Differential Revision: D48418780

Summary: Pull Request resolved: facebookresearch#449 When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization in MHAWithCache. Differential Revision: D48418780 fbshipit-source-id: 0501341832910bf90a7ea1cc902b98f0760548ab

codecov-commenter · 2023-08-17T07:52:10Z

Codecov Report

Patch coverage: 77.77% and project coverage change: -0.01% ⚠️

Comparison is base (951a452) 69.11% compared to head (a2e0a70) 69.11%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #449      +/-   ##
==========================================
- Coverage   69.11%   69.11%   -0.01%     
==========================================
  Files         170      170              
  Lines       11524    11530       +6     
==========================================
+ Hits         7965     7969       +4     
- Misses       3559     3561       +2

Files Changed	Coverage Δ
...hmultimodal/modules/layers/multi_head_attention.py	`96.82% <77.77%> (-3.18%)`	⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

facebook-github-bot · 2023-08-18T17:20:24Z

This pull request was exported from Phabricator. Differential Revision: D48418780

…th (facebookresearch#449) Summary: Pull Request resolved: facebookresearch#449 When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization for MHA with cache in a new module `MultiHeadSelfAttentionWithCache`. Note: we are primarily using a new module to avoid breaking checkpoint BC with respect to `MultiHeadAttentionWithCache`. In the future, we should consolidate these MHA implementations. Differential Revision: D48418780 fbshipit-source-id: 5ad930ff27a4b131f8ff1f097a4c9e1548efb587

…th (facebookresearch#449) Summary: Pull Request resolved: facebookresearch#449 When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization for MHA with cache in a new module `MultiHeadSelfAttentionWithCache`. Note: we are primarily using a new module to avoid breaking checkpoint BC with respect to `MultiHeadAttentionWithCache`. In the future, we should consolidate these MHA implementations. Differential Revision: D48418780 fbshipit-source-id: eb0691e9d3a4bf729cfd7ca3293585c7d0108403

facebook-github-bot · 2023-08-18T17:25:23Z

This pull request was exported from Phabricator. Differential Revision: D48418780

facebook-github-bot · 2023-08-18T17:29:29Z

This pull request was exported from Phabricator. Differential Revision: D48418780

…th (facebookresearch#449) Summary: Pull Request resolved: facebookresearch#449 When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization for MHA with cache in a new module `MultiHeadSelfAttentionWithCache`. Note: we are primarily using a new module to avoid breaking checkpoint BC with respect to `MultiHeadAttentionWithCache`. In the future, we should consolidate these MHA implementations. Differential Revision: D48418780 fbshipit-source-id: 0b20fb807527109a9a3ad419805e47e0f9ba2c74

…th (facebookresearch#449) Summary: Pull Request resolved: facebookresearch#449 When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization for MHA with cache in a new module `MultiHeadSelfAttentionWithCache`. Note: we are primarily using a new module to avoid breaking checkpoint BC with respect to `MultiHeadAttentionWithCache`. In the future, we should consolidate these MHA implementations. Differential Revision: D48418780 fbshipit-source-id: 58f00205af26d39f778853c7aa50d560e024b9f8

facebook-github-bot · 2023-08-18T17:35:06Z

This pull request was exported from Phabricator. Differential Revision: D48418780

meta-cla · 2025-08-30T02:06:50Z

Hi @rohan-varma!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 17, 2023

facebook-github-bot added the fb-exported label Aug 17, 2023

rohan-varma force-pushed the export-D48418780 branch from e1233cc to dfd2ec6 Compare August 17, 2023 00:51

rohan-varma force-pushed the export-D48418780 branch from dfd2ec6 to a2e0a70 Compare August 17, 2023 07:45

rohan-varma force-pushed the export-D48418780 branch from a2e0a70 to 919dc03 Compare August 18, 2023 17:20

rohan-varma force-pushed the export-D48418780 branch from 919dc03 to 6d67dae Compare August 18, 2023 17:25

rohan-varma force-pushed the export-D48418780 branch from 6d67dae to 173699e Compare August 18, 2023 17:29

rohan-varma force-pushed the export-D48418780 branch from 173699e to d459f16 Compare August 18, 2023 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch matmul fast path in MHAWithCache #449

Batch matmul fast path in MHAWithCache #449

Uh oh!

rohan-varma commented Aug 17, 2023

Uh oh!

facebook-github-bot commented Aug 17, 2023

Uh oh!

facebook-github-bot commented Aug 17, 2023

Uh oh!

facebook-github-bot commented Aug 17, 2023

Uh oh!

codecov-commenter commented Aug 17, 2023 •

edited

Loading

Uh oh!

facebook-github-bot commented Aug 18, 2023

Uh oh!

facebook-github-bot commented Aug 18, 2023

Uh oh!

facebook-github-bot commented Aug 18, 2023

Uh oh!

facebook-github-bot commented Aug 18, 2023

Uh oh!

meta-cla bot commented Aug 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Batch matmul fast path in MHAWithCache #449

Are you sure you want to change the base?

Batch matmul fast path in MHAWithCache #449

Uh oh!

Conversation

rohan-varma commented Aug 17, 2023

Uh oh!

facebook-github-bot commented Aug 17, 2023

Uh oh!

facebook-github-bot commented Aug 17, 2023

Uh oh!

facebook-github-bot commented Aug 17, 2023

Uh oh!

codecov-commenter commented Aug 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

facebook-github-bot commented Aug 18, 2023

Uh oh!

facebook-github-bot commented Aug 18, 2023

Uh oh!

facebook-github-bot commented Aug 18, 2023

Uh oh!

facebook-github-bot commented Aug 18, 2023

Uh oh!

meta-cla bot commented Aug 30, 2025

Process

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Aug 17, 2023 •

edited

Loading