-
-
Notifications
You must be signed in to change notification settings - Fork 107
2.3.33 Satellite: OptiLLM
Handle:
optillm
URL: http://localhost:34301
optillm
is an OpenAI API compatible optimizing inference proxy which implements several state-of-the-art techniques that can improve the accuracy and performance of LLMs.
# [Optional] Pre-build the image
harbor build optillm
# Start the service [--tail is optional, to show logs]
harbor up optillm --tail
See troubleshooting guide if you encounter any issues.
optillm
serves underlying models without any modifications, so it's not possible for the upstream services (for example webui
) to distinguish them from the original models when connected to both optillm
and the original inference backend. Additionally, its streaming implementation is not compatible with all the frontends. See compatibility section.
optillm
is pre-configured (but not tested) to connect to the following services: airllm
, aphrodite
, boost
(yes you can chain the optimising proxies), dify
, ktransformers
, litellm
, llamacpp
, mistralrs
, nexa
, ollama
, omnichain
, pipelines
, sglang
, tabbyapi
, vllm
.
See the official parameters reference. To set them, see Harbor's environment configuration guide.
# Example: see OPTILLM_APPROACH env variable value
harbor env optillm OPTILLM_APPROACH
# Example: set OPTILLM_APPROACH env variable
harbor env optillm OPTILLM_APPROACH mcts
Additionally, following options can be set via harbor config
:
# The port on the host where OptiLLM endpoint will be available
OPTILLM_HOST_PORT 34301
# The path to the workspace directory on the host machine
# (relative to $(harbor home), but can be global as well)
OPTILLM_WORKSPACE ./optillm/data