Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] support multiple lora adaptor #695

Closed
comeby opened this issue Nov 16, 2023 · 5 comments · Fixed by #894
Closed

[Feature] support multiple lora adaptor #695

comeby opened this issue Nov 16, 2023 · 5 comments · Fixed by #894
Assignees
Labels

Comments

@comeby
Copy link

comeby commented Nov 16, 2023

Motivation

S-LoRA: Serving Thousands of Concurrent LoRA Adapters [paper]
The paper claims that “S-LoRA can improve the throughput by up to 4 times and increase the number of served adapters by several orders of magnitude.”

Support multiple lora adaptor could be transcendental to cost effective LoRA model severing.

will you support this feature?

Related resources

https://github.com/S-LoRA/S-LoRA

Additional context

No response

@lvhan028
Copy link
Collaborator

Hi, @comeby
Thanks for your attention to lmdeploy
We'll investigate it

@grimoire
Copy link
Collaborator

grimoire commented Dec 6, 2023

Hi, @comeby
I am working on support S-LoRA on pytorch-poc branch. Where can I get multiple unmerged Adapters?

@grimoire grimoire linked a pull request Dec 27, 2023 that will close this issue
@Cloopen-ReLiNK
Copy link

Cloopen-ReLiNK commented Jan 30, 2024

The effects of reasoning vary widely between the following two strategies:
(1) merged adapter, then load the new model file saved. Obtained the expected accuracy
(2) Load adapter through s-lora ,the accuracy is much lower

adapters = {'default':'xx/lora_test'}
engine_config = PytorchEngineConfig(adapters=adapters)
model_path = 'xxx/chatglm2-6b'

engine = Engine.from_pretrained(model_path,
engine_config=engine_config,
trust_remote_code=True)
gen_config = GenerationConfig(top_p=0.8,
top_k=40,
temperature=0,
max_new_tokens=1024 )

@Cloopen-ReLiNK
Copy link

https://modelscope.cn/models/walker31350430a/test_lora/files
Here is an adapter weight and test data (test.json), prediction code (test.py)
@grimoire

@Cloopen-ReLiNK
Copy link

Cloopen-ReLiNK commented Jan 30, 2024

solved ,#1042
But why is using pytorch’s inference engine slower than using transformers to predict directly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants