Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Gateway: Cache expanded postings for expensive matcher #8023

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yeya24
Copy link
Contributor

@yeya24 yeya24 commented Dec 26, 2024

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Store Gateway Index Cache caches each posting individually. For example, pod="prometheus-0" is one cache key and pod="prometheus-1" can be another cache key. This is kind of problematic for those high cardinality labels for example, host, node or pod.

When there is a query matcher such as pod!="" , Store Gateway tries to get all values of label pod and fetches each value from the remote cache. This is ok for small set up. However, we have 100+ vertically sharded blocks per day where each block has close to 1M values for pod label. When a query tries to fetch 30 days of data with matcher pod!="", this ended up fetching 3B cache keys from remote cache. It is even worse when there is cache miss for postings, Store Gateway tries to backfill remote cache with postings data fetched from object store. This can be another 3B SET requests to the cache.

Lazy posting is a way to solve high cardinality matchers such as pod!="" and we have several attempts to improve that. However, the root cause of the problem is that Index Cache stores each label (posting) separately and it is not doing well against high cardinality labels.

Changes

This PR made several changes:

  • For each posting group (most of the cases this is a single label matcher in your query), if it matches more than 1 key, fetch expanded postings from cache for this group. For example, pod!="" usually matches more than 1 pod so it tries to fetch expanded postings matching pod!="".
  • If cache miss, Store Gateway fallbacks to fetch individual postings for pod from cache and object storage.
  • If Store Gateway downloads a posting from object store, if the posting's corresponding posting group matches more than 1 key, Store Gateway doesn't backfill remote cache, instead it tries to backfill the whole expanded posting for the whole posting group
  • For each posting group which fetches only a single key such as __name__="metric_name", it works the same as of today.

Basically we change from caching individual postings to caching the whole matcher.

now:
pod=A => [1,2,3,4]
pod=B => [5,6,7,8]
pod=C => [9, 10]

new:
pod!="" => [1,2,3,4,5,6,7,8,9,10]

Index Cache interface FetchExpandedPostings is updated to support multiple matchers. The previous one is designed to fetch only one set of matchers.

today: FetchExpandedPostings(ctx context.Context, blockID ulid.ULID, matchers []*labels.Matcher, tenant string) ([]byte, bool)

new: FetchExpandedPostings(ctx context.Context, blockID ulid.ULID, matchers [][]*labels.Matcher, tenant string) [][]byte

Verification

Tests updated.

@yeya24 yeya24 changed the title Store Gateway: Cache expanded postings for each matcher Store Gateway: Cache expanded postings for expensive matcher Dec 26, 2024
@yeya24 yeya24 force-pushed the expanded-posting-posting-group branch 3 times, most recently from c898baa to ff747fa Compare December 27, 2024 23:58
@yeya24 yeya24 force-pushed the expanded-posting-posting-group branch from ff747fa to 08d1160 Compare December 28, 2024 21:24
if !pg.lazy {
// If posting group has more than 1 key to fetch, fetch expanded postings first.
// This helps for matcher such as !="", =~".+" to avoid fetching too many keys from cache.
if len(pg.addKeys) > 1 || len(pg.removeKeys) > 1 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make the condition > a certain number instead of > 1

continue
}
// Cache miss, have additional keys to fetch.
keysLength += len(postingGroups[pgIdx].addKeys) + len(postingGroups[pgIdx].removeKeys)
Copy link
Contributor Author

@yeya24 yeya24 Dec 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case expanded posting cache miss, we still try to fetch each posting individually from index cache and then object store.

I wonder if we should just skip cache and fetch from object store directly. It makes no sense to fetch postings from cache if we don't store cache there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant