diff --git a/llm/lorax/README.md b/llm/lorax/README.md index cd9ffb171f7..2fe548c92a8 100644 --- a/llm/lorax/README.md +++ b/llm/lorax/README.md @@ -4,7 +4,7 @@

- LoRAX + LoRAX

[LoRAX](https://github.com/predibase/lorax) (LoRA eXchange) is a framework that allows users to serve thousands of fine-tuned LLMs on a single GPU, dramatically reducing the cost of serving without compromising on throughput or latency. It works by dynamically loading multiple fine-tuned "adapters" (LoRAs, etc.) on top of a single base model at runtime. Concurrent requests for different adapters can be processed together in a single batch, allowing LoRAX to maintain near linear throughput scaling as the number of adapters increases.