Bloom Gateway: Implement chunk filtering using workers that multiplex requests #11181

chaudum · 2023-11-09T06:58:19Z

What this PR does / why we need it:

This PR adds the worker implementation in the bloom gateways. The workers pull multiple items from the queue, multiplex them, and execute the chunk matching on the resolved bloom blocks.

The multiplexing is used to minimise the overhead of seeking and skipping through bloom blocks when matching chunks.

Todo

Test cases for multiplexing
Functional tests of the FilterChunkRefs() method

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
CHANGELOG.md updated
- If the change is worth mentioning in the release notes, add add-to-release-notes label
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

pkg/bloomgateway/bloomgateway_test.go

pkg/bloomgateway/util.go

github-actions · 2023-11-13T09:27:41Z

Trivy scan found the following vulnerabilities:

HIGH openssl: Incorrect cipher key and IV length processing in libcrypto3 v3.1.3-r0. Fixed in v3.1.4-r0
HIGH openssl: Incorrect cipher key and IV length processing in libssl3 v3.1.3-r0. Fixed in v3.1.4-r0

salvacorts · 2023-11-13T09:57:55Z

pkg/bloomgateway/util.go

+}
+
+// convertToShortRefs converts a v1.ChunkRefs into []*logproto.ShortRef
+// TODO(chaudum): Avoid conversion by transferring v1.ChunkRefs in gRPC request.


Are you planning to do this on a followup PR or you forgot about it?
Should be simple to do by using gogoproto's nullable=false and customtype=v1.ChunkRefs

Here's a similar example:

loki/pkg/logproto/logproto.proto

Lines 329 to 332 in b49b3ce

repeated LabelPair labels = 1 [

(gogoproto.nullable) = false,

(gogoproto.customtype) = "LabelAdapter"

];

And the generated code:

loki/pkg/logproto/logproto.pb.go

Line 1879 in b49b3ce

Labels []LabelAdapter `protobuf:"bytes,1,rep,name=labels,proto3,customtype=LabelAdapter" json:"labels"`

Alternatively, as long as you use nullable=false to avoid having a slice of pointers, and v1.ChunkRefs has the same mem layout as logproto.ShortRef, you can do a cast like:

loki/pkg/logproto/compat.go

Lines 49 to 51 in aedf6df

func FromLabelAdaptersToLabels(ls []LabelAdapter) labels.Labels {

return *(*labels.Labels)(unsafe.Pointer(&ls))

}

Are you planning to do this on a followup PR or you forgot about it?

I planned to do this once the datastructures are fully settled. The request format may still change.

sounds good to me 👍 . We can also just specify protos in the v1 package directly and import them elsewhere so we don't have to do special casting -- just use them directly

pkg/bloomgateway/util.go

pkg/bloomgateway/multiplexing.go

pkg/bloomgateway/bloomgateway.go

salvacorts · 2023-11-13T11:30:53Z

pkg/bloomgateway/bloomgateway.go

+			hasNext := it.Next()
+			for _, bq := range bqs {
+				requests = requests[:0]
+				for hasNext && it.At().Fp <= bq.MaxFp {


nit: I think this would better read as:

for it.Next() && it.At().Fp <= bq.MaxFp { ... }

It would be better to read, but it would not work as you think:

When you call it.Next() the iterator will proceed to the next item. However, if the condition it.At().Fp <= bq.MaxFp does not match, the loop is exited.
Then the loop is started for the next bq again, and first, it.Next() will be called, proceeding to the next item, and therefore skipping one item.

Oh I see. Thanks for the clarification. I'd probably add a comment explaining that.

pkg/bloomgateway/bloomgateway.go

owen-d

This is looking really good.

I need to refactor the bloom querying to return the list of chunks which can be removed rather than the ones which need to be queried in storage. The reasoning for this is we can merge a removal list of chunks across bloom blocks by unioning them, but we can't do the opposite.

owen-d · 2023-11-14T23:10:10Z

pkg/bloomgateway/bloomgateway.go

 	for _, ref := range req.Refs {
 		if ref.Tenant != tenantID {
 			return nil, errors.Wrapf(errInvalidTenant, "expected chunk refs from tenant %s, got tenant %s", tenantID, ref.Tenant)
 		}
+		// Sort ShortRefs by From time in ascending order


WDYT about reusing logproto.ChunkRef instead? It's more verbose, but also more consistent. Ultimately, I'd like to refactor everything to use a more idiomatic & efficient repr like you've included in GroupedChunkRefs, but I think consistency makes more sense right now.

We should also assume these chunkrefs are already sorted -- no need to sort them again here. The index gateway should take care of that (I think it does already since the index is laid out in this order as well)

We should also assume these chunkrefs are already sorted -- no need to sort them again here. The index gateway should take care of that (I think it does already since the index is laid out in this order as well)

You're right, they are sorted already.

owen-d · 2023-11-14T23:25:11Z

pkg/bloomgateway/bloomgateway.go

+			ChunkRefs: req.Refs,
+		}, nil
+	}
+


I don't think we should loop over req.Refs to ensure the tenant matches our expectation here. This is costly in terms of CPU cycles and we should ensure it beforehand.

👍 Removed the assertion

owen-d · 2023-11-14T23:53:33Z

pkg/bloomgateway/multiplexing.go

+		return false
+	}
+
+	currIter, ok := it.heap.Iter().(*SliceIterWithIndex[*logproto.GroupedChunkRefs])


nit: if you include the index in the type you're iterator iterates over, you won't need to cast. Something like

type IndexedVal[T any] struct { idx int val T } func NewIterWithIndex[T any](iter Iterator[T], idx int) Iterator[IndexedVal[T]] {...etc}

owen-d · 2023-11-15T00:25:14Z

pkg/bloomgateway/util.go

+
+	refs := make([]*logproto.GroupedChunkRefs, 0, len(r.Refs))
+	for i := range r.Refs {
+		groupedChunkRefs := &logproto.GroupedChunkRefs{


~~todo: avoid all the O(n) casting. Can be done later 👍~~

On second thought, as it stands, you'd have to do this O(n) work m times where m equals the number of day-buckets you're looking for. Instead, you could build an iterator over the underlying chunks which filters out chunks which aren't in the day in question. Basically, build multiple "views" over the same list of chunks depending on which day-bucket you care about. WDYT? It'd avoid all the O(n) casting you're doing below and feels conceptually simple.

owen-d · 2023-11-15T00:49:52Z

pkg/bloomgateway/util.go

+}
+
+// convertToShortRefs converts a v1.ChunkRefs into []*logproto.ShortRef
+// TODO(chaudum): Avoid conversion by transferring v1.ChunkRefs in gRPC request.


sounds good to me 👍 . We can also just specify protos in the v1 package directly and import them elsewhere so we don't have to do special casting -- just use them directly

owen-d · 2023-11-15T00:57:25Z

pkg/bloomgateway/worker.go

+	requests := make([]v1.Request, 0, 128)
+	fingerprints := make([]uint64, 0, 1024)
+
+	for ctx.Err() == nil {


This should be for select ctx.Done() because otherwise it'll tight-loop your CPU. We want to halt execution waiting on a channel so the go scheduler can hand that cpu back to wherever more work is waiting to be done in process.

owen-d · 2023-11-15T01:05:24Z

pkg/bloomgateway/worker.go

+
+			it := newTaskMergeIterator(tasks...)
+
+			fingerprints = fingerprints[:0]


nit: I don't think we need another list of all the fingerprints here. We could instead build a function which found the intersecting blocks for a list of fingerprints from the original requests.

Simplified below,

func OverlappingBlocksForRequests(reqs [][]model.Fingerprint, blocks []Block) []Block

We could binary search over the fp lists, comparing them to blocks rather than iterate over every fp directly (can easily be n=millions).

I've gone ahead and put together something like this here: #11237

You'd still need the list of fingerprints to calculating the overlapping blocks. Or do I miss something?

owen-d · 2023-11-15T01:12:29Z

pkg/bloomgateway/worker.go

+				// fingerprints are already sorted. we can skip duplicates by checking
+				// if the next is greater than the previous
+				fp := uint64(it.At().Fp)
+				if len(fingerprints) > 0 && fp <= fingerprints[len(fingerprints)-1] {


Have you seen NewDedupingIter? Might be a bit heavyweight here, but I've used it to wrap a heap-iter before to handle items with the same keys.

Yes, I saw it. I thought it was too much of an overhead.

owen-d · 2023-11-15T01:17:16Z

pkg/bloomgateway/worker.go

+				continue
+			}
+
+			hasNext := it.Next()


Can use the view iter idea described earlier for this

owen-d · 2023-11-15T01:22:38Z

pkg/queue/queue.go

+	defer cancel()
+
+	var idx QueueIndex
+	items := make([]Request, 0, maxItems)


vlad-diachenko · 2023-11-22T16:50:01Z

pkg/bloomgateway/bloomgateway.go

+			if len(responses) == requestCount {
+				for _, o := range responses {
+					// we must not remove items from req.Refs as long as the worker may iterater over them
+					g.removeNotMatchingChunks(req, o)


nit: we could skip the call to this method if Removals.Len() is 0.

vlad-diachenko

lgtm

owen-d

Looking good

owen-d · 2023-11-22T19:17:47Z

pkg/bloomgateway/multiplexing.go

+			return false
+		}
+	}
+	it.cache = it.transform(it.iter.At())


Why do we need a cache here at all? It seems like At() can just call it.transform(it.iter.At())

I'd like to avoid the it.transform(it.iter.At()) function call every time it.At() is called.
Depending on the transform function, it could be expensive to do so.

owen-d · 2023-11-22T19:28:19Z

pkg/bloomgateway/util.go

+func convertToSearches(filters []*logproto.LineFilterExpression) [][]byte {
+	searches := make([][]byte, 0, len(filters))
+	for _, f := range filters {
+		searches = append(searches, []byte(f.Match))


this should only work when the match type is =. It's good to have a conversion function here like you've done because it allows us to add future optimizations like |~"ab(c|d)" -> |="abc" or |= "abd"

Right, I haven't thought about other operators than =.
Gonna add a TODO comment.

owen-d · 2023-11-22T20:05:08Z

pkg/bloomgateway/worker.go

+				}
+
+				boundedRefs := partitionFingerprintRange(tasks, blockRefs)
+				blockRefs = blockRefs[0:]


Suggested change

blockRefs = blockRefs[0:]

blockRefs = blockRefs[:0]

Good catch! 🙈

owen-d · 2023-11-22T20:08:49Z

pkg/bloomgateway/worker.go

+					it := newTaskMergeIterator(day, boundedRefs[i].tasks...)
+					requests = requests[:0]
+					for it.Next() {
+						requests = append(requests, it.At().Request)


What's the advantage of collecting these into a slice rather than building an iterator over the underlying iterator?

There is none.

owen-d · 2023-11-22T20:13:17Z

pkg/queue/queue.go

+
+	items := q.pool.Get(maxItems)
+	defer func() {
+		q.pool.Put(items)


This looks like it puts the items back into the pool when it returns them to the caller as well, creating a mutability bug waiting to happen.

owen-d · 2023-11-22T20:15:40Z

pkg/queue/util.go

+
+// BufferPool uses a bucket pool and wraps the Get() and Put() functions for
+// simpler access.
+type BufferPool[T any] struct {


super-nit: Maybe just SlicePool[T] would be a better name?

owen-d · 2023-11-22T20:17:42Z

pkg/storage/bloom/v1/merge.go

-		cur := mbq.itrs[0]
-		if ok := cur.Next(); !ok {
+		curr := mbq.itrs[0]
+		if ok := curr.Next(); !ok {


don't like my naming? 😭

This happened accidentally when I reverted a temporary change.

owen-d · 2023-11-22T20:18:18Z

pkg/storage/stores/shipper/bloomshipper/client.go

@@ -29,6 +28,14 @@ const (
 	fileNamePartDelimiter = "-"
 )

+type BoundsCheck uint8


I'd love to keep this in the v1 lib so I can use it there as well (can't import this package there)

Signed-off-by: Christian Haudum <[email protected]>

and sorting of the inputs Signed-off-by: Christian Haudum <[email protected]>

Signed-off-by: Christian Haudum <[email protected]>

Putting the returned slice of requests back to the pool but also returning them to the caller could lead to a mutability bug. Now the caller of DequeueMany() is responsible for returning the request slice back to the pool of the queue by calling ReleaseRequests(). Signed-off-by: Christian Haudum <[email protected]>

Signed-off-by: Christian Haudum <[email protected]>

… requests (grafana#11181) This change adds an internal request queue to the bloom gateway. Instead of executing every single request individually, which involves resolving bloom blocks, downloading them if needed and executing the chunk filtering, requests are now enqueued to the internal, per-tenant queue. The queue implements the same shuffle sharding mechanism as the queue in the query scheduler component. Workers then dequeue a batch of requests for a single tenant and multiplex them into a single processing task for each day. This has the big advantage that the chunks of multiple requests can be processed in a single sequential scan through a set a bloom blocks, without needing to skip back and forth within the binary stream of the block. --------- Signed-off-by: Christian Haudum <[email protected]>

pull-request-size bot added the size/XXL label Nov 9, 2023

chaudum commented Nov 9, 2023

View reviewed changes

pkg/bloomgateway/bloomgateway_test.go Outdated Show resolved Hide resolved

pkg/bloomgateway/util.go Outdated Show resolved Hide resolved

chaudum force-pushed the chaudum/bloom-query-worker branch 2 times, most recently from f223187 to 80f3a88 Compare November 10, 2023 11:51

chaudum marked this pull request as ready for review November 10, 2023 11:51

chaudum requested a review from a team as a code owner November 10, 2023 11:51

chaudum force-pushed the chaudum/bloom-query-worker branch from 80f3a88 to 2c1586c Compare November 13, 2023 09:24

salvacorts reviewed Nov 13, 2023

View reviewed changes

salvacorts approved these changes Nov 14, 2023

View reviewed changes

owen-d reviewed Nov 15, 2023

View reviewed changes

owen-d mentioned this pull request Nov 15, 2023

bloom blocks downloading queue #11201

Merged

chaudum force-pushed the chaudum/bloom-query-worker branch from 3595653 to 6270a27 Compare November 21, 2023 08:43

chaudum changed the title ~~Bloom query worker~~ Implement chunk filtering using workers that multiplex requests Nov 22, 2023

chaudum changed the title ~~Implement chunk filtering using workers that multiplex requests~~ Bloom Gateway: Implement chunk filtering using workers that multiplex requests Nov 22, 2023

chaudum force-pushed the chaudum/bloom-query-worker branch from b730276 to 4b394d8 Compare November 22, 2023 10:08

vlad-diachenko reviewed Nov 22, 2023

View reviewed changes

vlad-diachenko approved these changes Nov 22, 2023

View reviewed changes

owen-d reviewed Nov 22, 2023

View reviewed changes

chaudum force-pushed the chaudum/bloom-query-worker branch from 035ecfc to 9235e83 Compare November 24, 2023 09:19

chaudum added 10 commits November 24, 2023 11:49

Extract existing worker code into separate service struct

eb07d82

Signed-off-by: Christian Haudum <[email protected]>

Remove duplicate import

4a10400

Signed-off-by: Christian Haudum <[email protected]>

Multiplex bloom query tasks

9177822

Signed-off-by: Christian Haudum <[email protected]>

Add sync pool

8af5f39

Signed-off-by: Christian Haudum <[email protected]>

Functional test for FilterChunkRefs() gRPC method

e66f1d3

Signed-off-by: Christian Haudum <[email protected]>

Add basic worker metrics

8ade59c

Signed-off-by: Christian Haudum <[email protected]>

Fix filterRequestForDay function

4cd9961

Signed-off-by: Christian Haudum <[email protected]>

Add tests for taskMergeIterator

16899e6

Signed-off-by: Christian Haudum <[email protected]>

Fix linter warnings

8c5a945

Signed-off-by: Christian Haudum <[email protected]>

Fix import order

c3f2368

Signed-off-by: Christian Haudum <[email protected]>

chaudum added 26 commits November 24, 2023 13:27

Unify bloomgateway metrics creation

a68af7d

Signed-off-by: Christian Haudum <[email protected]>

Better variable naming

b4e623f

Signed-off-by: Christian Haudum <[email protected]>

Split worker code into separate go file

cd4011a

Signed-off-by: Christian Haudum <[email protected]>

Make separate interface for NesterIterator

e15094c

Signed-off-by: Christian Haudum <[email protected]>

Test iterator Reset() more thoroughly

d9fa50b

Signed-off-by: Christian Haudum <[email protected]>

Fix import order

388d818

Signed-off-by: Christian Haudum <[email protected]>

Remove unnecessary tenant ID assertion

01a6166

and sorting of the inputs Signed-off-by: Christian Haudum <[email protected]>

Remove unnecessary cast

f4eeb5a

Signed-off-by: Christian Haudum <[email protected]>

Use for select instead of for ctx.Err()==nil

6aee5e7

Signed-off-by: Christian Haudum <[email protected]>

Change task merge iterator to also filter chunkrefs by day

29c2d14

Signed-off-by: Christian Haudum <[email protected]>

Add TODO for using a pool when dequeueing many

f01220c

Signed-off-by: Christian Haudum <[email protected]>

Use a buffer pool when dequeuing many

81c4982

Signed-off-by: Christian Haudum <[email protected]>

Improve best case for chunk filtering by day

ed09492

Signed-off-by: Christian Haudum <[email protected]>

Implement inverted logic for checking chunks against blooms

c2ac04a

Signed-off-by: Christian Haudum <[email protected]>

Towards single-pass iteration for querying of block

66a4ac5

Signed-off-by: Christian Haudum <[email protected]>

Implement single-pass iterator when querying bloom filters

6112b7f

Signed-off-by: Christian Haudum <[email protected]>

Avoid unnecessary chunk removal function call

7340e5d

Signed-off-by: Christian Haudum <[email protected]>

Add TODO for search string conversion

7dd6f56

Signed-off-by: Christian Haudum <[email protected]>

Fix clearing of slice buffer in the worker

14079c9

Signed-off-by: Christian Haudum <[email protected]>

Avoid collecting items from interator into slice

d010e99

Signed-off-by: Christian Haudum <[email protected]>

Revert accidental renaming of variable

4900e24

Signed-off-by: Christian Haudum <[email protected]>

Remove duplicate BoundsCheck declaration

65bfcc9

Signed-off-by: Christian Haudum <[email protected]>

Rename BuffePool to SlicePool

579250c

Signed-off-by: Christian Haudum <[email protected]>

Only send error to the affected tasks if fuse operation fails

4f3d7df

Signed-off-by: Christian Haudum <[email protected]>

Fix broken import from rebase

8893eef

Signed-off-by: Christian Haudum <[email protected]>

chaudum force-pushed the chaudum/bloom-query-worker branch from 3bd8f7a to 8893eef Compare November 24, 2023 13:46

chaudum merged commit d62d4e3 into main Nov 24, 2023
7 checks passed

chaudum deleted the chaudum/bloom-query-worker branch November 24, 2023 14:56

	repeated LabelPair labels = 1 [
	(gogoproto.nullable) = false,
	(gogoproto.customtype) = "LabelAdapter"
	];

	func FromLabelAdaptersToLabels(ls []LabelAdapter) labels.Labels {
	return (labels.Labels)(unsafe.Pointer(&ls))
	}


		it := newTaskMergeIterator(tasks...)

		fingerprints = fingerprints[:0]

Bloom Gateway: Implement chunk filtering using workers that multiplex requests #11181

Bloom Gateway: Implement chunk filtering using workers that multiplex requests #11181

Conversation

chaudum commented Nov 9, 2023 • edited Loading

github-actions bot commented Nov 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

owen-d left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vlad-diachenko Nov 22, 2023 • edited Loading

Choose a reason for hiding this comment

vlad-diachenko left a comment

Choose a reason for hiding this comment

owen-d left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chaudum commented Nov 9, 2023 •

edited

Loading

github-actions bot commented Nov 13, 2023 •

edited

Loading

vlad-diachenko Nov 22, 2023 •

edited

Loading