-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LlamaForCausalLM.forward() got an unexpected keyword argument 'use_flash_attention' #760
Comments
Please share your script and the command to run it here, that makes investigation much easier and you'll get a solution much faster. |
I'm not calling that function in my script. I was following the example here to enable flash attn. https://github.com/huggingface/optimum-habana/blob/main/examples/language-modeling/run_lora_clm.py Here is my train script
|
Thanks. What is the command you use to run this script? |
Here is the command
|
I cannot reproduce it, it works on my side. |
Thanks for your support. For some reason, I'm able to run the script without any issues now. I have another question, does this flash attention have the same effect as the official implementation? I was able to run a llama 634m parameter model full parameter training with 10 per device batch size in Nvidia A100 80 GB. But here I can only run up to 8 per device batch size |
Are you using Gaudi1 or Gaudi2? |
I'm using Gaudi2 |
Can you share the logs of your run please? |
System Info
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I'm trying to enable flash attention by setting
model.generation_config.use_flash_attention = True
. But I'm getting below error.Expected behavior
Expected to use flash attention without any issue
The text was updated successfully, but these errors were encountered: