Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagination support for ListRules #6299

Merged
merged 3 commits into from
Nov 6, 2024

Conversation

rajagopalanand
Copy link
Contributor

@rajagopalanand rajagopalanand commented Nov 1, 2024

What this PR does:

This PR adds pagination support for List Rules API similar to Prometheus pagination feature for List Rules

  • getShardedRules() calls getLocalRules()
  • getLocalRules() sorts the results (rules on the local ruler) by token
  • getLocalRules() then compares the input token to the calculated token for each rule group to generate the next page by calling generatePage
  • Finally getShardedRules() merges results from all the getLocalRules(), deduplicates, sorts using token as the key, and generates the next page

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@rajagopalanand rajagopalanand force-pushed the paginated-list-rules branch 5 times, most recently from f51bdef to e710de2 Compare November 3, 2024 20:32
Signed-off-by: Anand Rajagopal <[email protected]>
@rajagopalanand rajagopalanand marked this pull request as ready for review November 3, 2024 21:11
@dosubot dosubot bot added the component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. label Nov 3, 2024
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this looks good.

}
if filter.NextToken != "" {
addQueryParams(urlValues, "group_next_token", filter.NextToken)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If we just add group_next_token="" what will happen? It should still be a non paginated request?
Maybe we can just remove the condition check here

Copy link
Contributor

@rapphil rapphil Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep consistency with upstream, we should fail if group_next_token was provided but group_limit was not.

https://github.com/prometheus/prometheus/blob/main/web/api/v1/api.go#L1609

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I see the validation in the API.

}

func (r *Ruler) getLocalRules(userID string, rulesRequest RulesRequest, includeBackups bool) ([]*GroupStateDesc, error) {
func (r *Ruler) getLocalRules(userID string, rulesRequest RulesRequest, includeBackups bool) (RulesResponse, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why we don't return pointer here? since pointer is what is asked by GetRules.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 4, 2024
MaxRuleGroup: 20,
},
resultCheckFn: func(t assert.TestingT, resultGroups []*ruler.RuleGroup, token string, iteration int) {
assert.Len(t, resultGroups, 20, "Expected %d rules but got %d", 20, len(resultGroups))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: doesn't the error message from the assertion tell us what was expected vs the actual?

}{
"List Rule Groups - Equal number of rule groups per page": {
filter: e2ecortex.RuleFilter{
MaxRuleGroup: 20,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test that use the next token?


if maxRuleGroups > 0 {
//Need to sort here before we truncate
sort.Sort(GroupStateDescs(groups))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should always sort the result.

We are already sorting the results in the prometheus list rules:

sort.Slice(groups, func(i, j int) bool {

So we could sort just once here instead for all the cases. This will reduce modality and simplify the code a bit. WDYT @yeya24 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, where is the next token from the previous request being used here to generate the response starting from the rule group that we truncated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should always sort the result.

We are already sorting the results in the prometheus list rules:

sort.Slice(groups, func(i, j int) bool {

So we could sort just once here instead for all the cases. This will reduce modality and simplify the code a bit. WDYT @yeya24 ?

Without pagination, the size of the list can be pretty substantial and we would sort it twice. This was the reason I did not want to sort for non-paginated requests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, where is the next token from the previous request being used here to generate the response starting from the rule group that we truncated?

Each response from getLocalRules has been filtered using the next token

Copy link
Contributor

@rapphil rapphil Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each response from getLocalRules has been filtered using the next token

but what about the case you are using sharding?

My understanding is that in this case the ruler that receives the request will look into the ring for the other rulers that contain the rules and fire requests to each to collect from them. This ruler will then consolidate the list of rules in a single list before returning the response. I was expecting to see the pagination happen at this point because we need the complete list of rules to paginate properly.

My understanding is that it is not possible to create a distributed pagination algorithm because the way how we spread the rulers in rulers. This specially important if there are ring reorganization during requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each response from getLocalRules has been filtered using the next token

but what about the case you are using sharding?

getShardedRules() calls getLocalRules(). So the results from each ruler is already filtered using next token

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we discussed offline, and there was a misunderstanding from my part. I stand corrected.

Since we are truncating on a sorted list, where the sorting happens by the sha(namespace + group), it is safe to use the >= as filter when we spread the requests to get the rules from other rulers. This is like a distributed filter operation.

Having said that, It would not hurt to add a bit of description to the PR to detail how the implementation was done.

Comment on lines 16 to 22
func GetRuleGroupNextToken(namespace string, group string) string {
h := sha1.New()
h.Write([]byte(namespace + ";" + group))
return hex.EncodeToString(h.Sum(nil))
}

func TruncateGroups(groups []*GroupStateDesc, maxRuleGroups int) ([]*GroupStateDesc, string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these functions need to be public?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking only about TruncateGroups and GetRuleGroupNextToken

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TruncateGroups does not need to be public. I will fix it. GetRuleGroupNextToken needs to be public since it is used in integration testing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually move all the content of this file to close where they are used, since they are only used in one place. I'm not particularly strong about this but I think it would be simpler to understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another suggestion, instead of truncateGroups we could name this function generatePage, createPage, makePage or similar to make it clear that this function is used to generate the page response with token etc. I think this function deserves some comments describing how the pagination works and that is supposed to receive a sorted list of groups. just my two cents.

resultingGroupDescs := make([]*GroupStateDesc, 0, len(combinedRuleStateDescs))
for _, group := range combinedRuleStateDescs {
groupID := GetRuleGroupNextToken(group.Group.Namespace, group.Group.Name)
if len(rulesRequest.NextToken) > 0 && rulesRequest.NextToken >= groupID {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no meaning in comparing two hashes sha1() using >=. We should use == only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, this is ok because are assuming that combinedRulesStateDescs is sorted by sha.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a small comment here detailing why it is ok to compare groupIDs.

"encoding/hex"
)

type GroupStateDescs []*GroupStateDesc
Copy link
Contributor

@rapphil rapphil Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called PaginedGroupStates or something like that since this is only used in the pagination context?

Also I don't think this type need to be public right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will now need to be public because it is used in integ tests

Signed-off-by: Anand Rajagopal <[email protected]>
@@ -5,21 +5,24 @@ import (
"encoding/hex"
)

type GroupStateDescs []*GroupStateDesc
type PaginedGroupStates []*GroupStateDesc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo? PaginedGroupStates -> PaginatedGroupStates

Signed-off-by: Anand Rajagopal <[email protected]>
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@yeya24 yeya24 merged commit 9acd3f7 into cortexproject:master Nov 6, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. lgtm This PR has been approved by a maintainer size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants