Skip to content

Commit

Permalink
docs(ai): add external containers docs (#687)
Browse files Browse the repository at this point in the history
* docs(ai): add external container manager docs

This commit adds documentation for the external container manager
feature now that it has been thoroughly tested and some issues have been
fixed.

---------

Co-authored-by: 0xb79 <[email protected]>
  • Loading branch information
rickstaa and ad-astra-video authored Nov 14, 2024
1 parent 40e3f58 commit c8166a7
Showing 1 changed file with 51 additions and 2 deletions.
53 changes: 51 additions & 2 deletions ai/orchestrators/models-config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,10 @@ currently **recommended** models and their respective prices.
{
"pipeline": "audio-to-text",
"model_id": "openai/whisper-large-v3",
"price_per_unit": 12882811
"price_per_unit": 12882811,
"url": "<CONTAINER_URL>:<PORT>",
"token": "<OPTIONAL_BEARER_TOKEN>",
"capacity": 1
},
{
"pipeline": "segment-anything-2",
Expand All @@ -65,7 +68,7 @@ currently **recommended** models and their respective prices.
"model_id": "parler-tts/parler-tts-large-v1",
"price_per_unit": 11,
"pixels_per_unit": 1e2,
"currency": "USD",
"currency": "USD"
}
]
```
Expand Down Expand Up @@ -93,6 +96,18 @@ currently **recommended** models and their respective prices.
<ParamField path="optimization_flags" type="object">
Optional flags to enhance performance (details below).
</ParamField>
<ParamField path="url" type="string" optional="true">
Optional URL and port where the model container or custom container manager software is running.
[See External Containers](#external-containers)
</ParamField>
<ParamField path="token" type="string">
Optional token required to interact with the model container or custom container manager software.
[See External Containers](#external-containers)
</ParamField>
<ParamField path="capacity" type="integer">
Optional capacity of the model. This is the number of inference tasks the model can handle at the same time. This defaults to 1.
[See External Containers](#external-containers)
</ParamField>

### Optimization Flags

Expand Down Expand Up @@ -134,3 +149,37 @@ are available:
loss**. The speedup becomes more pronounced as the number of inference steps
increases. Cannot be used simultaneously with `SFAST`.
</ParamField>

### External Containers

<Warning>
This feature is intended for advanced users. Incorrect setup can lead to a
lower orchestrator score and reduced fees. If external containers are used,
it is the Orchestrator's responsibility to ensure the correct container with
the correct endpoints is running behind the specified `url`.
</Warning>

External containers can be for one model to stack on top of managed model containers,
an auto-scaling GPU cluster behind a load balancer or anything in between. Orchestrators
can use external containers to extend the models served or fully replace the AI Worker managed model containers
using the [Docker client Go library](https://pkg.go.dev/github.com/docker/docker/client)
to start and stop containers specified at startup of the AI Worker.

External containers can be used by specifying the `url`, `capacity` and `token` fields in the
model configuration. The only requirement is that the `url` specified responds as expected to the AI Worker same
as the managed containers would respond (including http error codes). As long as the container management software
acts as a pass through to the model container you can use any container management software to implement the custom
management of the runner containers including [Kubernetes](https://kubernetes.io/), [Podman](https://podman.io/),
[Docker Swarm](https://docs.docker.com/engine/swarm/), [Nomad](https://www.nomadproject.io/), or custom scripts to
manage container lifecycles based on request volume


- The `url` set will be used to confirm a model container is running at startup of the AI Worker using the `/health` endpoint.
Inference requests will be forwarded to the `url` same as they are to the managed containers after startup.
- The `capacity` should be set to the maximum amount of requests that can be processed concurrently for the pipeline/model id (default is 1).
If auto scaling containers, take care that the startup time is fast if setting `warm: true` because slow response time will
negatively impact your selection by Gateways for future requests.
- The `token` field is used to secure the model container `url` from unauthorized access and is strongly
suggested to use if the containers are exposed to external networks.

We welcome feedback to improve this feature, so please reach out to us if you have suggestions to enable better experience running external containers.

0 comments on commit c8166a7

Please sign in to comment.