Closed
Description
Motivation
S-LoRA: Serving Thousands of Concurrent LoRA Adapters [paper]
The paper claims that “S-LoRA can improve the throughput by up to 4 times and increase the number of served adapters by several orders of magnitude.”
Support multiple lora adaptor could be transcendental to cost effective LoRA model severing.
will you support this feature?
Related resources
https://github.com/S-LoRA/S-LoRA
Additional context
No response