Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental alternative fetch strategies #970

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

erikvanoosten
Copy link
Collaborator

@erikvanoosten erikvanoosten commented Jul 5, 2023

  1. ManyPartitionsQueueSizeBasedFetchStrategy, a variation on the default QueueSizeBasedFetchStrategy which limits total memory usage.
  2. PredictiveFetchStrategy an improved predictive fetching strategy (compared to the predictive strategy from zio-kafka 2.3.x) which uses history to calculate the average number of polls the stream needed to process, and uses that to estimate when the stream needs more data.

To do:

  • Add unit tests

1. `ManyPartitionsQueueSizeBasedFetchStrategy`, a variation on the default `QueueSizeBasedFetchStrategy` which limits total memory usage.
2. `PredictiveFetchStrategy` an improved predictive fetching strategy (compared to the predictive strategy from zio-kafka 2.3.x) which uses history to calculate the average number of polls the stream needed to process, and uses that to estimate when the stream needs more data.
@erikvanoosten
Copy link
Collaborator Author

ManyPartitionsQueueSizeBasedFetchStrategy has been split of to #1281.


import scala.collection.mutable

/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number of polls is quite a discrete measurement, what do you think of converting this into a running average (like exponentially weighed moving average) of the number of records dequeued each poll?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will work against people that use something like Grafana. First you need to collect and sum the raw counters from each instance of your service. Then, and only then, you can calculate a running average, an integral, or whatever other operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants