Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.15] Removes adaptive allocations feature description from conceptual docs #2766

Merged
merged 1 commit into from
Aug 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 2 additions & 14 deletions docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -186,10 +186,7 @@ NOTE: Since eland uses APIs to deploy the models, you cannot see the models in
When you deploy the model, its allocations are distributed across available {ml}
nodes. Model allocations are independent units of work for NLP tasks. To
influence model performance, you can configure the number of allocations and the
number of threads used by each allocation of your deployment. Alternatively, you
can enable <<nlp-model-adaptive-allocations>> to automatically create and remove
model allocations based on the current workload of the model (you still need to
manually set the number of threads).
number of threads used by each allocation of your deployment.

IMPORTANT: If your deployed trained model has only one allocation, it's likely
that you will experience downtime in the service your trained model performs.
Expand All @@ -214,16 +211,7 @@ You can view the allocation status in {kib} or by using the
{ref}/get-trained-models-stats.html[get trained model stats API]. If you want to
change the number of allocations, you can use the
{ref}/update-trained-model-deployment.html[update trained model stats API] after
the allocation status is `started`. You can also enable
<<nlp-model-adaptive-allocations>> to automatically create and remove model
allocations based on the current workload of the model.

[discrete]
[[nlp-model-adaptive-allocations]]
=== Adaptive allocations

include::ml-nlp-shared.asciidoc[tag=ml-nlp-adaptive-allocations]

the allocation status is `started`.

[discrete]
[[infer-request-queues]]
Expand Down
6 changes: 0 additions & 6 deletions docs/en/stack/ml/nlp/ml-nlp-e5.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -269,12 +269,6 @@ Once it's uploaded to {es}, the model will have the ID specified by
underscores `__`.
--

[discrete]
[[e5-adaptive-allocations]]
== Adaptive allocations

include::ml-nlp-shared.asciidoc[tag=ml-nlp-adaptive-allocations]


[discrete]
[[terms-of-use-e5]]
Expand Down
7 changes: 0 additions & 7 deletions docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -433,13 +433,6 @@ per document to ingest.
To learn more about ELSER performance, refer to the <<elser-benchmarks>>.


[discrete]
[[elser-adaptive-allocations]]
== Adaptive allocations

include::ml-nlp-shared.asciidoc[tag=ml-nlp-adaptive-allocations]


[discrete]
[[further-readings]]
== Further reading
Expand Down
19 changes: 0 additions & 19 deletions docs/en/stack/ml/nlp/ml-nlp-shared.asciidoc
Original file line number Diff line number Diff line change
@@ -1,22 +1,3 @@
tag::ml-nlp-adaptive-allocations[]
The numbers of threads and allocations you can set manually for a model remain constant even when not all the available resources are fully used or when the load on the model requires more resources.
Instead of setting the number of allocations manually, you can enable adaptive allocations to set the number of allocations based on the load on the process. This can help you to manage performance and cost more easily.
When adaptive allocations are enabled, the number of allocations of the model is set automatically based on the current load.
When the load is high, a new model allocation is automatically created.
When the load is low, a model allocation is automatically removed.

You can enable adaptive allocations by using:

* the Create inference endpoint API for {ref}/infer-service-elser.html[ELSER], {ref}/infer-service-elasticsearch.html[E5 and models uploaded through Eland] that are used as {infer} services.
* the {ref}/start-trained-model-deployment.html[start trained model deployment] or {ref}/update-trained-model-deployment.html[update trained model deployment] APIs for trained models that are deployed on {ml} nodes.

If the new allocations fit on the current {ml} nodes, they are immediately started.
If more resource capacity is needed for creating new model allocations, then your {ml} node will be scaled up if {ml} autoscaling is enabled to provide enough resources for the new allocation.
The number of model allocations cannot be scaled down to less than 1.
And they cannot be scaled up to more than 32 allocations, unless you explicitly set the maximum number of allocations to more.
Adaptive allocations must be set up independently for each deployment and {infer} endpoint.
end::ml-nlp-adaptive-allocations[]

tag::nlp-eland-clone-docker-build[]
You can use the {eland-docs}[Eland client] to install the {nlp} model. Use the prebuilt
Docker image to run the Eland install model commands. Pull the latest image with:
Expand Down