Replies: 2 comments 9 replies
-
See: https://github.com/vllm-project/vllm/blob/main/examples/lora_with_quantization_inference.py |
Beta Was this translation helpful? Give feedback.
9 replies
-
@tarukumar Feel free to open an issue at https://github.com/bd-iaas-us/vllm/issues and assign it to me. In this issue, please let us know why existing Quantization + LoRA solution in vLLM does not suffice, as well as some models in need of this feature. Thanks! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What i have observed is when I try to deploy the model using
qlora_adapter_name_or_path
for qlora adapter it fails to deploy with the error mentioned in the line https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py#L899-L911. The question is to deploy qlora adpater should i use--lora-modules
oradapter-cache
parameter for qlora adapter? What is the best approach here ?Beta Was this translation helpful? Give feedback.
All reactions