-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate size based flush threshold per topic #14765
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate more on why do we want to separate size info for each topic? If the data is ingested into the same table, they should have the same schema, and most of the case similar data distribution. I guess in most cases user still want to track per table threshold.
If this is indeed needed, can we make it configurable, and add a config flag to enable it?
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #14765 +/- ##
============================================
+ Coverage 61.75% 63.67% +1.92%
- Complexity 207 1483 +1276
============================================
Files 2436 2716 +280
Lines 133233 152231 +18998
Branches 20636 23530 +2894
============================================
+ Hits 82274 96937 +14663
- Misses 44911 47993 +3082
- Partials 6048 7301 +1253
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
For our multi-stream ingestion use case data rarely has the same schema or a similar data distribution. For us, topics are logically separate datasets. The current implementation lead to many index build failures, e.g. forward index size, too many MV values, etc. Beyond build failures, we also saw wild swings in segment sizes as many segments from one topic flushed, and then many segments from another topic flushed. A concrete example is for observability, we have service A and service B emitting to topics A and B, both ingesting into a single table. Generally they are not emitting the same shape or volume of logs, which causes inaccurate segment size estimations. Even if they are emitting the same shape/size, deployments that change the log fingerprints cannot always happen in sync, so if computed at a table level and service A was upgraded and B wasn't yet, there would be a period where segment build failures are likely. Another case is that we turn on debug level logging temporarily for A, and the same issues happen.
What's the downside of leaving this per topic by default? If we make the above assumption, that data should have the same schema/similar data distribution, then computing it at a per topic level should not be measurably different than computing it per table. If the assumption doesn't hold, then we have a better behavior by default. I feel that an extra |
While in most cases, there is only one topic per table (single-stream ingestion). I don't want to make size based threshold updater to be per topic by default. It will cause backward incompatibility for metrics as well. |
@Jackie-Jiang Do you still have concerns here? This PR looks good to me without extensive testings on Production. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it breaks existing metrics, which is a severe backward compatibility issue.
We do want a knob to switch between per topic tracking and per table tracking for backward compatibility. If you really want to track on each topic when there are multiple streams by default, you can make it the default setting when the value is not explicitly set.
The main concern is on the metric name. We cannot rely on the metric system behavior. For single stream case, we should keep the behavior the same as before |
I've updated the metrics to emit a backwards compatible set, PTAL. As a controller metric I feel it's cleaner to emit another gauge than to maintain an additional config that decides which metric to emit. It also makes migrating any alerting between metrics simpler. |
With multi-stream ingestion added, it makes sense to calculate segment flush thresholds based on individual topics, not at a table level. This patch has been running internally, behavior was validated by checking logs for calculated thresholds.
The metrics should be backwards compatible (at least for our metrics system they are, not familiar w/ all). For example, the gauge essentially changes from
numRowsThreshold.tableName
tonumRowsThreshold.tableName.topicName