-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama finetune.py throws pytorch tensor datatype error with 4 bit quantization #675
Comments
Hi @AAndersn thanks for reporting. I was not able to repro this do far but I will give it another try later today. You're right about the int4, this is a left over from a back and forth while we created the PR for QLORA. Would you be interested in creating a PR to fix this? |
@mreso Happy to make a PR to update the docs. I'll also try rolling back to an older version of PyTorch and update this issue tomorrow to see if that fixes it. |
The problem appears to be an issue with I rebuilt my environment today with llama-recipes 0.0.4 and transformers 4.45.0 and am able to run this snippet successfully: import torch
from transformers import BitsAndBytesConfig, AutoModel
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_storage=torch.bfloat16
)
model = AutoModel.from_pretrained(
"meta-llama/Meta-Llama-3.1-8B",
quantization_config=bnb_config,
device_map="auto",
torch_dtype=torch.bfloat16
) However, if I copy and paste this exact snippet into finetuning.py, the AutoModel call fails with same message
|
Hi! @AAndersn Thanks for reporting this. I am wondering if changing |
@wukaixingxp - Thank you so much! Changing If that works, I will install the pytest suite and then update #681 to include this fix |
I tried your command with |
pip reported a conflict with I was able to run 8B with 4bit quantization with |
@wukaixingxp - I see you have made that update in https://github.com/meta-llama/llama-recipes/blob/main/src/llama_recipes/finetuning.py#L139, so will close this issue as fixed by #686 Thanks so much for your help! |
System Info
PyTorch 2.4.0, Cuda 12.1, CentOS HPC cluster with 7x H100 GPUs
Information
🐛 Describe the bug
Error logs
This error message is then repeated by each separate GPU process, followed by
If the command is run without the
FSDP_CPU_RAM_EFFICIENT_LOADING=1 ACCELERATE_USE_FSDP=1
header, then it throws a different error:Expected behavior
This call and dataset work fine for llama3.1-8B without quantization, but fail with 4-bit quantization. The
int4
parameter specific given in https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/finetuning/multigpu_finetuning.md#with-fsdp--qlora does not exist.The text was updated successfully, but these errors were encountered: