Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for GroupedQueryAttention #12

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PMMon
Copy link

@PMMon PMMon commented Sep 14, 2024

Minor modifcations to factorized_dest_nodes such that models with GroupedQueryAttention (e.g. google/gemma-2-2b) can be loaded. Importantly, the number of query heads does not respond to the number of key and value heads.

Copy link
Owner

@UFO-101 UFO-101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. Thanks for this PR!

It looks great and correct as far as I can tell. It's a simple enough change that I think tests are not required. But have you used it enough to locally to be fairly confident it hasn't introduced any bugs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants