Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attention mechanism toggle added #2384

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Aaryanverma
Copy link

added option to toggle attention mechanism while loading the model in case user do not want to use flash attention or similar in case of older GPUs.

added option to toggle attention mechanism while loading the model in case user do not want to use flash attention or similar in case of older GPUs.
@hello-11 hello-11 added triaged Issue has been triaged by maintainers functionality issue labels Oct 30, 2024
@nv-guomingz
Copy link
Collaborator

Hi @Aaryanverma , thanks for your contribution to TRT-LLM project.

If you wanna disable flash attention, I think another war is to set context_fmha to false when building the engine.

Copy link

PR has not received an update in over 14 days. Adding stale label.

@github-actions github-actions bot added the stale label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants