You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The endpoints in the reindexing module (_reindex, _update_by_query, _delete_by_query) can sometimes take a lot of time (minutes) to return a response depending on the request. This can lead to timeouts being hit by the RestClient if the configured socket timeout is too retstrictive. However on some environments configuring the socket timeout for the client is not enough - for example the AWS OpenSearch Service is sitting behind a load balancer with a timeout which is not configurable: https://repost.aws/knowledge-center/opensearch-http-504-gateway-timeout
This can lead to the case where a migration sends an expensive _update_by_query request, which does not yield a response before the load balancer hits the timeout, then the client receives a 504 response and marks the migration as failed. On subsequent runs, ElasticSearch Evolution tries again to send the request, which can now also fail due to conflicts since the initial request processing has not yet completed, which in turn agains fails the migration. This way the migration ends up being not executable if the request can't be processed fast enough.
This can be somewhat mitigated by adding the wait_for_completion=false parameter to the http request which will cause an immediate response with a task id and the request will be processed asynchronously. However this does not remove the possibility of running into conflicts when executing subsequent migrations which also update documents in the same index before the asynchronous request has finished processing.
A solution here would be to poll the _tasks endpoint for the given task until it completes successfully, or fail the migration if completes with errors. The proposal is to add the possibility for ElasticSearch evolution to do this automatically.
Proposal
If enabled, when executing migrations, if the defined request targets any of the reindexing module endpoints (_update_by_query, _delete_by_query, _reindex), and the wait_for_completion=false parameter is present, ElasticSearch Evolution will start polling the _tasks/<task_id> endpoint with the given task id with a configurable poll interval until the endpoint returns a response indicating the task has completed or until a configurable timout is reached. This can be configured with the following configuration properties for example:
spring.elasticsearch.evolution.await-task-completion which is false by default and can be used to enable this feature.
spring.elasticsearch.evolution.task-poll-interval which defines the poll interval ms when polling the _tasks endpoint.
spring.elasticsearch.evolution.task-timeout which defines the timeout period in ms after which polling of the _tasks will stop and the migration will be considered failed.
The _tasks endpoint does not provide explicit information whether a task has completed successfully or not, so the following logic can be used:
the completed field is true, if not wait until the next poll interval
the error field does not exist (it is added to the response if for example a painless script was provided which does not compile), if it exists then mark the migration as failed
the response.failures field is empty (it could contain for example conflict exceptions), if it is not empty then mark the migration as failed
if the response satisifes all 3 above, mark the migration as successful
This is also the reason why this feature is limited to the reindexing endpoints - the structure of the response field depends on what the task action is, so not all endpoints which support the wait_for_completion parameter will produce tasks with the same structure. Some endpoints don't even return a task id if the wait_for_completion parameter is given, for example the _tasks api waits for the matching tasks to complete before returning a response if this parameter is set to true.
In order to support this functionality for reindexing migrations which are already existing and applied to some environments, but will be applied to other environments in the future, an additional configuration property can be added (spring.elasticsearch.evolution.use-tasks-by-default by default false) which will make ElasticSearch Evolution automatically add the wait_for_completion=false parameter to reindexing migrations when executing them, if the parameter is not explicitly set (regardless if the value is true or false). This will remove the need for manually updating the valid checksums in the history index if someone wants use this feature for already existing migrations.
If you guys accept the general idea and the implementation proposal, I would be happy to provide a PR for this :)
The text was updated successfully, but these errors were encountered:
Reasoning
The endpoints in the reindexing module (
_reindex
,_update_by_query
,_delete_by_query
) can sometimes take a lot of time (minutes) to return a response depending on the request. This can lead to timeouts being hit by theRestClient
if the configured socket timeout is too retstrictive. However on some environments configuring the socket timeout for the client is not enough - for example the AWS OpenSearch Service is sitting behind a load balancer with a timeout which is not configurable: https://repost.aws/knowledge-center/opensearch-http-504-gateway-timeoutThis can lead to the case where a migration sends an expensive
_update_by_query
request, which does not yield a response before the load balancer hits the timeout, then the client receives a 504 response and marks the migration as failed. On subsequent runs, ElasticSearch Evolution tries again to send the request, which can now also fail due to conflicts since the initial request processing has not yet completed, which in turn agains fails the migration. This way the migration ends up being not executable if the request can't be processed fast enough.This can be somewhat mitigated by adding the
wait_for_completion=false
parameter to the http request which will cause an immediate response with a task id and the request will be processed asynchronously. However this does not remove the possibility of running into conflicts when executing subsequent migrations which also update documents in the same index before the asynchronous request has finished processing.A solution here would be to poll the
_tasks
endpoint for the given task until it completes successfully, or fail the migration if completes with errors. The proposal is to add the possibility for ElasticSearch evolution to do this automatically.Proposal
If enabled, when executing migrations, if the defined request targets any of the reindexing module endpoints (
_update_by_query
,_delete_by_query
,_reindex
), and thewait_for_completion=false
parameter is present, ElasticSearch Evolution will start polling the_tasks/<task_id>
endpoint with the given task id with a configurable poll interval until the endpoint returns a response indicating the task has completed or until a configurable timout is reached. This can be configured with the following configuration properties for example:spring.elasticsearch.evolution.await-task-completion
which isfalse
by default and can be used to enable this feature.spring.elasticsearch.evolution.task-poll-interval
which defines the poll interval ms when polling the_tasks
endpoint.spring.elasticsearch.evolution.task-timeout
which defines the timeout period in ms after which polling of the_tasks
will stop and the migration will be considered failed.The
_tasks
endpoint does not provide explicit information whether a task has completed successfully or not, so the following logic can be used:completed
field istrue
, if not wait until the next poll intervalerror
field does not exist (it is added to the response if for example a painless script was provided which does not compile), if it exists then mark the migration as failedresponse.failures
field is empty (it could contain for example conflict exceptions), if it is not empty then mark the migration as failedThis is also the reason why this feature is limited to the reindexing endpoints - the structure of the
response
field depends on what the task action is, so not all endpoints which support thewait_for_completion
parameter will produce tasks with the same structure. Some endpoints don't even return a task id if thewait_for_completion
parameter is given, for example the_tasks
api waits for the matching tasks to complete before returning a response if this parameter is set totrue
.In order to support this functionality for reindexing migrations which are already existing and applied to some environments, but will be applied to other environments in the future, an additional configuration property can be added (
spring.elasticsearch.evolution.use-tasks-by-default
by defaultfalse
) which will make ElasticSearch Evolution automatically add thewait_for_completion=false
parameter to reindexing migrations when executing them, if the parameter is not explicitly set (regardless if the value istrue
orfalse
). This will remove the need for manually updating the valid checksums in the history index if someone wants use this feature for already existing migrations.If you guys accept the general idea and the implementation proposal, I would be happy to provide a PR for this :)
The text was updated successfully, but these errors were encountered: