You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your incredible work on this project.
I am encountering an issue during inference. When I use the non-LoRA weights for inference on ScienceQA, the speed is approximately 1 second per sample. However, when I switch to the LoRA fine-tuned model, the inference speed drastically increases to over 40 seconds per sample.
Here is the command I am using for fine-tuning (trained on 1 V100 with lora_r=4, bf16=False, tf32=False):
Hi Haotian,
Thank you for your incredible work on this project.
I am encountering an issue during inference. When I use the non-LoRA weights for inference on ScienceQA, the speed is approximately 1 second per sample. However, when I switch to the LoRA fine-tuned model, the inference speed drastically increases to over 40 seconds per sample.
Here is the command I am using for fine-tuning (trained on 1 V100 with lora_r=4, bf16=False, tf32=False):
Here is the command I am using for inference:
Could you please help me understand why the inference speed difference between the two models is significant?
Thank you!
Screenshots:
adapter_config.json:
config.json
The text was updated successfully, but these errors were encountered: