Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsloth 4bit models do not load in vLLM - says missing adapter path or name #688

Open
jonberliner opened this issue Jun 24, 2024 · 9 comments
Labels
currently fixing Am fixing now!

Comments

@jonberliner
Copy link

When I try to load an unsloth 4bit model with
llm = LLM("unsloth/mistral-7b-instruct-v0.3-bnb-4bit", dtype="half"),
I get the error
Cannot find any of ['adapter_name_or_path'] in the model's quantization config.

This Is true for all llama3 and gemma models as well. As far as I know, there are no lora adapters attached to the models. Please let me know how to proceed in loading them.

@jonberliner jonberliner changed the title unsloth base models do not load - says missing adapter path or name unsloth base models do not load in vLLM - says missing adapter path or name Jun 24, 2024
@jonberliner jonberliner changed the title unsloth base models do not load in vLLM - says missing adapter path or name unsloth 4bit models do not load in vLLM - says missing adapter path or name Jun 24, 2024
@hruday-markonda
Copy link

I am also getting this error, hope a fix comes soon.

@danielhanchen
Copy link
Contributor

Apologies on the late reply sorry! My bro and I relocated to SF, so just got back to Github issues!

On vLLM, you mustn't use the bnb-4bit variants - you must use model.save_pretrained_merged and save it to 16bit for inference - ie only full 16bit models work for vLLM

@nole69
Copy link

nole69 commented Jul 1, 2024

Apologies on the late reply sorry! My bro and I relocated to SF, so just got back to Github issues!

On vLLM, you mustn't use the bnb-4bit variants - you must use model.save_pretrained_merged and save it to 16bit for inference - ie only full 16bit models work for vLLM

Since vLLM v0.5.0 has been released, vLLM does support bnb quantization. Would it be possible for models finetuned and quantized with unsloth to be served with vLLM given the new release?

@danielhanchen
Copy link
Contributor

@nole69 Are you referring to vllm-project/vllm#4776? I think that's only QLoRA adapters, and not full bnb models. You can try exporting the LoRA adapters, then use vLLM I guess

@odulcy-mindee
Copy link

@danielhanchen Indeed, I think full bnb models will be supported after vllm-project/vllm#5753 is merged

@fengyunflya
Copy link

I also met this problem, and I used qlora, how to fix it?

@danielhanchen
Copy link
Contributor

@fengyunflya Sorry vLLM currently doesn't load up bitsandbytes models - I'll try add some code to export directly to vLLM

@danielhanchen danielhanchen added the currently fixing Am fixing now! label Jul 26, 2024
@YorickdeJong
Copy link

Would also love to have this feature, currently also having the same problem with 4bnb models not being able to be loaded in vllm

@odulcy-mindee
Copy link

Now, bnb models work on vLLM using enforce_eager=True on main, but not supported yet for enforce_eager=False.
See vllm-project/vllm#7294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
currently fixing Am fixing now!
Projects
None yet
Development

No branches or pull requests

7 participants