diff --git a/docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc index 9a68a4894..8abd484a4 100644 --- a/docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc +++ b/docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc @@ -186,10 +186,7 @@ NOTE: Since eland uses APIs to deploy the models, you cannot see the models in When you deploy the model, its allocations are distributed across available {ml} nodes. Model allocations are independent units of work for NLP tasks. To influence model performance, you can configure the number of allocations and the -number of threads used by each allocation of your deployment. Alternatively, you -can enable <> to automatically create and remove -model allocations based on the current workload of the model (you still need to -manually set the number of threads). +number of threads used by each allocation of your deployment. IMPORTANT: If your deployed trained model has only one allocation, it's likely that you will experience downtime in the service your trained model performs. @@ -214,16 +211,7 @@ You can view the allocation status in {kib} or by using the {ref}/get-trained-models-stats.html[get trained model stats API]. If you want to change the number of allocations, you can use the {ref}/update-trained-model-deployment.html[update trained model stats API] after -the allocation status is `started`. You can also enable -<> to automatically create and remove model -allocations based on the current workload of the model. - -[discrete] -[[nlp-model-adaptive-allocations]] -=== Adaptive allocations - -include::ml-nlp-shared.asciidoc[tag=ml-nlp-adaptive-allocations] - +the allocation status is `started`. [discrete] [[infer-request-queues]] diff --git a/docs/en/stack/ml/nlp/ml-nlp-e5.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-e5.asciidoc index 328732f6a..f1550f93a 100644 --- a/docs/en/stack/ml/nlp/ml-nlp-e5.asciidoc +++ b/docs/en/stack/ml/nlp/ml-nlp-e5.asciidoc @@ -269,12 +269,6 @@ Once it's uploaded to {es}, the model will have the ID specified by underscores `__`. -- -[discrete] -[[e5-adaptive-allocations]] -== Adaptive allocations - -include::ml-nlp-shared.asciidoc[tag=ml-nlp-adaptive-allocations] - [discrete] [[terms-of-use-e5]] diff --git a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc index 2b70d25b8..cf5c3022b 100644 --- a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc +++ b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc @@ -433,13 +433,6 @@ per document to ingest. To learn more about ELSER performance, refer to the <>. -[discrete] -[[elser-adaptive-allocations]] -== Adaptive allocations - -include::ml-nlp-shared.asciidoc[tag=ml-nlp-adaptive-allocations] - - [discrete] [[further-readings]] == Further reading diff --git a/docs/en/stack/ml/nlp/ml-nlp-shared.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-shared.asciidoc index 1e3948536..0568cda26 100644 --- a/docs/en/stack/ml/nlp/ml-nlp-shared.asciidoc +++ b/docs/en/stack/ml/nlp/ml-nlp-shared.asciidoc @@ -1,22 +1,3 @@ -tag::ml-nlp-adaptive-allocations[] -The numbers of threads and allocations you can set manually for a model remain constant even when not all the available resources are fully used or when the load on the model requires more resources. -Instead of setting the number of allocations manually, you can enable adaptive allocations to set the number of allocations based on the load on the process. This can help you to manage performance and cost more easily. -When adaptive allocations are enabled, the number of allocations of the model is set automatically based on the current load. -When the load is high, a new model allocation is automatically created. -When the load is low, a model allocation is automatically removed. - -You can enable adaptive allocations by using: - -* the Create inference endpoint API for {ref}/infer-service-elser.html[ELSER], {ref}/infer-service-elasticsearch.html[E5 and models uploaded through Eland] that are used as {infer} services. -* the {ref}/start-trained-model-deployment.html[start trained model deployment] or {ref}/update-trained-model-deployment.html[update trained model deployment] APIs for trained models that are deployed on {ml} nodes. - -If the new allocations fit on the current {ml} nodes, they are immediately started. -If more resource capacity is needed for creating new model allocations, then your {ml} node will be scaled up if {ml} autoscaling is enabled to provide enough resources for the new allocation. -The number of model allocations cannot be scaled down to less than 1. -And they cannot be scaled up to more than 32 allocations, unless you explicitly set the maximum number of allocations to more. -Adaptive allocations must be set up independently for each deployment and {infer} endpoint. -end::ml-nlp-adaptive-allocations[] - tag::nlp-eland-clone-docker-build[] You can use the {eland-docs}[Eland client] to install the {nlp} model. Use the prebuilt Docker image to run the Eland install model commands. Pull the latest image with: