Support multiple producers in redis streams #2581

ganeshvanahalli · 2024-08-15T11:56:38Z

This PR allows our redis stream pub-sub implementation to support multiple producers per stream.

Motivation for this change is that previously only one producer was supported per stream I.e although two producers were “technically” allowed to connect to a same redis stream, they couldn’t take any meaningful use of it i.e if both producers requested response for a same request, the consumers would treat these requests as unique and solve the same request twice thus leading to wastage of resources.

Previously each consumer used to maintain a heartbeat key in redis and set it periodically to confirm being active, these keys were used by producer to determine what messages are stuck in PEL (pending entry list- messages that are claimed by a consumer but not yet acknowledged) and reinsert them into stream. This PR simplifies this whole process by using XAUTOCLAIM that automatically lets any consumer claim a message which belongs to PEL and is idle for a minimum amount of time.

Testing Done

Various test cases covering different scenarios have been added to test the working of this design

Resolves NIT-2685

pubsub/consumer.go

magicxyyz

Great PR! Added few comments for improvements, some error handling and to the stream trimming (it might need some more complex solution)

pubsub/consumer.go

pubsub/producer.go

validator/validation_entry.go

pubsub/consumer.go

…rect request scenario

pubsub/consumer.go

eljobe

LGTM

magicxyyz · 2024-08-28T14:03:08Z

pubsub/consumer.go

+			Consumer: c.id,
+			MinIdle:  c.cfg.IdletimeToAutoclaim, // Minimum idle time for messages to claim (in milliseconds)
+			Stream:   c.redisStream,
+			Start:    decrementMsgIdByOne(pendingMsgs[idx].ID),


why do we need to decrement the ID?

I've tested this locally and can confirm that we definitely need to decrement the start by 1, because for some reason XAUTOCLAIM is non-inclusive of start, this is weird and also contrary to their docs https://redis.io/docs/latest/commands/xautoclaim/

Internally, the command begins scanning the consumer group's Pending Entries List (PEL) from and filters out entries having an idle time less than or equal to min-idle-time.

magicxyyz · 2024-08-29T12:12:11Z

pubsub/consumer.go

+		}).Result()
+		if err != nil {
+			log.Error("error from xautoclaim", "err", err)
+		}


would be good to:

either add a comment saying that if we fail to auto claim then we don't retry and just try to consume a new message
(eg. other client(s) front-run us to msgs within (scan start PEL index, scan start PEL index + 10] and it is possible that there are other timed out msgs)
This option might not be optimal as in conjunction with random choice of timed out Pending Entires there is a risk of starving one of the timed out entries. This starving to happen requires quite frequent consumer failures (in comparison to time required to process single msg) so as XPENDING would need to return always more then one entry. That might be fine for the redis validation use case, but might bite us in future when pubsub is used for some other component.

or we can check here if len(messages) == 0 and exit early with some defined error (something like EAGAIN) that we can check on the Consume caller side. That way the caller could retry after some delay - the same as when there are no new messages

nitro/validator/valnode/redis/consumer.go

Lines 86 to 95 in ecd2722

s.StopWaiter.CallIteratively(func(ctx context.Context) time.Duration {

req, ackNotifier, err := c.Consume(ctx)

if err != nil {

log.Error("Consuming request", "error", err)

return 0

}

if req == nil {

// There's nothing in the queue.

return time.Second

}

This option gives us guarantee that retries will be handled first and the new messages will be consumed only after all failed are retried.

I am FOR option 1 instead of 2 because option 2 would be similar to trying to xautoclaim on every message from xpending, the reason we went with random pick is to have a balance between choosing idle pel entries and new messages with first preference given to idle pel entries.

retrying on idle pel entries again and again until success or xpending is exhausted might lead to unintended consequence of starving new messages. length of idle pel entries should anyway be less in number and shouldn't regularly increase - which again will only lead to starvation if rate of addition of entries to this idle pel entries is greater than rate of removal (which is equal to N/[avg processing time] where N is number of workers)

magicxyyz · 2024-08-29T14:51:07Z

pubsub/consumer.go

+		if !errors.Is(err, redis.Nil) {
+			log.Error("Error from XpendingExt in getting PEL for auto claim", "err", err, "penindlen", len(pendingMsgs))
+		}
+	} else if len(pendingMsgs) > 0 {


maybe for a future PR, but we might want to filter pendingMsgs by pendingMsg.RetryCount - log error and skip if we found a message that was retried too many times - that would prevent a situation when the stream throughput is degraded because of some specific messages constantly triggering consumer crashes.

magicxyyz · 2024-08-29T15:03:10Z

pubsub/producer.go

-		trimmed, trimErr = p.client.XTrimMinID(ctx, p.redisStream, minId).Result()
-	} else {
-		trimmed, trimErr = p.client.XTrimMaxLen(ctx, p.redisStream, 0).Result()
+	// XDEL on consumer side already deletes acked messages (mark as deleted) but doesnt claim the memory back, XTRIM helps in claiming this memory in normal conditions


Depending on how much we prioritize freeing memory early vs lower computation required,
but I think that XTRIM might not be needed here - we could save on some additional complexity here and processing on redis server side.

as per the XDEL docs:

Eventually if all the entries in a macro-node are marked as deleted, the whole node is destroyed and the memory reclaimed.

I think we should prioritize freeing memory early even though implications of XDEL would eventually do it any way sometime after all the elements in a macro-node are marked as deleted mainly because- if there is a scenario where one/few consumers fail to delete their ack-ed entries, this will lead to growing of memory indefinitely as ack-ed entries don't show up in PEL and there's really now for us to detect this other than seeing the logs on consumer side.
Besides I think having a deletion mechanism on both producer and consumer side is a bit safer, I do agree with your point that we don't have to do xtrim as regularly as CheckResultInterval, maybe at a 5*CheckResultInterval duration sounds reasonable?

pubsub/consumer.go

Support multiple producers in redis streams

7e49635

cla-bot bot added the s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA. label Aug 15, 2024

ganeshvanahalli marked this pull request as ready for review August 15, 2024 12:49

ganeshvanahalli requested review from diegoximenes, eljobe, magicxyyz and tsahee August 15, 2024 12:50

ganeshvanahalli added 3 commits August 15, 2024 21:27

trim acknotifiers map and use previous keepalive timeouts

e155ebb

Use faster hash function

1473c60

Merge branch 'master' into multiple-producers-redisstream

85ba40b

eljobe previously approved these changes Aug 20, 2024

View reviewed changes

pubsub/consumer.go Outdated Show resolved Hide resolved

magicxyyz requested changes Aug 20, 2024

View reviewed changes

pubsub/consumer.go Outdated Show resolved Hide resolved

pubsub/consumer.go Outdated Show resolved Hide resolved

pubsub/producer.go Outdated Show resolved Hide resolved

validator/validation_entry.go Show resolved Hide resolved

magicxyyz reviewed Aug 21, 2024

View reviewed changes

pubsub/consumer.go Outdated Show resolved Hide resolved

address PR comments, handle memory better and add test to cover incor…

40e6b9b

…rect request scenario

ganeshvanahalli dismissed eljobe’s stale review via 40e6b9b August 26, 2024 08:13

Merge branch 'master' into multiple-producers-redisstream

a2d9b03

ganeshvanahalli requested review from magicxyyz and eljobe August 26, 2024 08:13

ganeshvanahalli added 2 commits August 26, 2024 13:48

increase TestProducerConfig requestTimeout

b5fff68

fix tests

cee4620

magicxyyz reviewed Aug 27, 2024

View reviewed changes

pubsub/consumer.go Outdated Show resolved Hide resolved

ganeshvanahalli added 2 commits August 27, 2024 23:35

rectify xautoclaim logic and address PR comments

2a3dc1d

Merge branch 'master' into multiple-producers-redisstream

ecd2722

eljobe previously approved these changes Aug 29, 2024

View reviewed changes

magicxyyz requested changes Aug 29, 2024

View reviewed changes

merge master and resolve conflicts

4028e99

ganeshvanahalli dismissed eljobe’s stale review via 4028e99 September 5, 2024 04:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple producers in redis streams #2581

Support multiple producers in redis streams #2581

ganeshvanahalli commented Aug 15, 2024

magicxyyz left a comment

eljobe left a comment

magicxyyz Aug 28, 2024

ganeshvanahalli Sep 4, 2024

magicxyyz Aug 29, 2024 •

edited

Loading

ganeshvanahalli Sep 4, 2024 •

edited

Loading

magicxyyz Aug 29, 2024

magicxyyz Aug 29, 2024

ganeshvanahalli Sep 4, 2024

	s.StopWaiter.CallIteratively(func(ctx context.Context) time.Duration {
	req, ackNotifier, err := c.Consume(ctx)
	if err != nil {
	log.Error("Consuming request", "error", err)
	return 0
	}
	if req == nil {
	// There's nothing in the queue.
	return time.Second
	}

Support multiple producers in redis streams #2581

Are you sure you want to change the base?

Support multiple producers in redis streams #2581

Conversation

ganeshvanahalli commented Aug 15, 2024

Testing Done

magicxyyz left a comment

Choose a reason for hiding this comment

eljobe left a comment

Choose a reason for hiding this comment

magicxyyz Aug 28, 2024

Choose a reason for hiding this comment

ganeshvanahalli Sep 4, 2024

Choose a reason for hiding this comment

magicxyyz Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

ganeshvanahalli Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

magicxyyz Aug 29, 2024

Choose a reason for hiding this comment

magicxyyz Aug 29, 2024

Choose a reason for hiding this comment

ganeshvanahalli Sep 4, 2024

Choose a reason for hiding this comment

magicxyyz Aug 29, 2024 •

edited

Loading

ganeshvanahalli Sep 4, 2024 •

edited

Loading