-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Tiering Status API Model and Design #14989
Comments
Here are some more example use cases of the APIs: Get All Ongoing Tierings:
Get All Failed Hot To Warm Tierings:
Get Shard Details for a Specific Index Tiering:
|
Thanks @e-emoto for sharing the proposal. It looks good overall. Just few minor comments:
|
|
Thanks for the comments @harishbhakuni
We discussed it and decided that saying
We decided to use
We're trying to keep the not
The
I think this is a good suggestion, I'll update the name |
Thanks for your response @dblock
I'll check what we're doing for tiering states and try to make it consistent with that
I think this is a good point, we can change it to
We're using
I'll review the other fields too |
This sounds confusing to me. Let's see if I am on the same page here. According to this proposal the However, in _cat API the Perhaps the short sentence is just missing more context information thus I am confused :-) |
One more detail, the proposal discusses two cases of a node receiving the REST API request: a) the node is a cluster manager, or b) the node is a data node. I think it is just a detail but if the receiving node is neither of these, for example if it is just a |
Is your feature request related to a problem? Please describe
The Status API in Tiering will be for listing the in-progress and failed index tierings. Since the Tiering project is still developing, the API should be extensible to cover new cases such as dedicated and non-dedicated warm node clusters. The design explanations here focus on the hot to warm case, but take the future use of the API into consideration.
Describe the solution you'd like
API Models:
The API will use a source and target as input to filter which tierings are shown. It will validate that both inputs are valid tiers, and then use them to find any tierings that match the described type. The API should still work if only one of the source or target is given, and will find any tierings with that input, allowing for more flexible queries. In the default case if no source or target is given as input, the status API should return all in progress or failed tierings for the specified indices, regardless of the tiering change. There will be two APIs for status: a
GET
API and a_cat
API.GET API:
API Request:
GET /<indexNameOrPattern>/_tiering?source=hot&target=warm
GET /<indexNameOrPattern>/_tiering?state=active
GET /<indexNameOrPattern>/_tiering?detailed=true/false
The
GET
API would have a few parameters. The index name in the path will be required, but can support using_all
or*
to get migrations from all indices that match the parameters. The API will also support comma separated index names.API Parameters:
source = hot / warm
(optional, no default)target = hot / warm
(optional, no default)The values for the
source
andtarget
parameters are the tiers, withsource
being the tier the index started in andtarget
being the tier it is moving to.state = failed / active
(optional, no default)The values of the
state
parameter represent the state of the tiering.failed
indicates that the tiering has failed andactive
means the tiering process is in progress.detailed = true / false
(default false)The
detailed
parameter determines whether theGET
API response should include details like the shard relocation status and tiering start time.local = true / false
(optional, default false)The
local
parameter determines where the request retrieves information from. If true, it is from a data node, if false, it is from the master node.API Response:
Success:
Failure:
_cat API:
API Request:
/_cat/tiering?source=hot&target=warm
/_cat/tiering?state=active
The
_cat
API would have some of the same parameters asGET
, but would also have additional parameters for formatting and filtering the response columns.API Parameters:
source = hot / warm
(optional, no default)target = hot / warm
(optional, no default)The values for the
source
andtarget
parameters are the tiers, withsource
being the tier the index started in andtarget
being the tier it is moving to.state = failed / active
(optional, no default)The values of the
state
parameter represent the state of the tiering.failed
indicates that the tiering has failed andactive
means the tiering process is in progress.index = index1,index2,...
(optional, default _all)The
index
is a comma separated list of index names used to filter the responses.h = index,source,target,status,start_time,failure_time,duration,shards_total,shards_successful,shards_active,shards_failed
(optional, no default)The
h
parameter is only for the_cat
API, and it would be used to filter which columns are shown in the response. If this parameter is not passed to the API call, then it will show just theindex
,source
,target
,state
, andduration
columns by default.index
- The index namesource
- The tier the index starts intarget
- The tier the index is moving tostate
- The current state of the tieringstart_time
- The timestamp of when the tiering startedduration
- The duration of the tiering, if it failedshards_total
- The total number of shardsshards_successful
- The number of shards that succeeded tieringshards_active
- The number of shards where tiering is still ongoingshards_failed
- The number of shards that failed tieringv = true / false
(optional, default false)If the
v
parameter is true, the response will include the column labels as the first row of the response.s = index,source,target,state,start_time,...
(optional, no default)The
s
parameter is a comma separated list of column names used to sort the rows in the response.API Response:
Success:
Failure:
Design: Get Tiering Metadata from Cluster State
Since both the status
GET
and_cat
APIs contain mostly the same information but just present it in different formats with slightly different ways for the customer to interact with them, they can both evaluate the status and retrieve the information using the same design.In this design, the tiering service would store some tiering metadata in the cluster state, and then when the status API is called it would use the tiering metadata to create its response. The migration status is stored in the index settings by the tiering service, while other information like the tiering start time is stored in the index metadata. The status API can use this information from the index settings and metadata to evaluate the tiering status when it is called. Since this information is in the cluster state, it would be relatively fast for the status API to access it. Also, because the cluster state is available from the master node and data nodes, the status API would be able to be called on either type of node.
In the dedicated warm node setup, we could also use the cluster state to check the shard status and determine the tiering progress. However, for the non-dedicated warm node setup, we would need to find another way to check the tiering progress. We could do so by communicating with other nodes through the transport layer to use a service on the data nodes that checks if shards are complete, in-progress, or failed when the status API is called. Then we could use that shard information to fill out details in the verbose response.
Another option that was considered for shard relocation status in the non-dedicated setup was storing the shard level data locality in the tiering metadata. However, this would require frequent cluster state updates to refresh the values of these fields. This would be very costly when accounting for all the shards across all indices that have ongoing tiering.
Order of Operations:
TIERED_REMOTE_INDEX
is enableddetailed
:Design: Get Tiering Metadata from Cluster State
Pros:
Cons:
Related component
Search:Remote Search
Describe alternatives you've considered
No response
Additional context
Related issues:
#14640
#14679
#13294
The text was updated successfully, but these errors were encountered: