Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[timeseries] Part-3.1: Add Support for Partial Aggregate and Complex Intermediate Type #14631

Merged
merged 4 commits into from
Dec 17, 2024

Conversation

ankitsultana
Copy link
Contributor

@ankitsultana ankitsultana commented Dec 11, 2024

Adds support for partial aggregates and complex types in TimeSeries. This is required now because we will be soon adding support for data transfer from the server to the broker, and certain aggregation functions require storing data in complex types until the final aggregate can be run.

Examples of these functions are rate, irate, etc. in PromQL.

Summary of Changes

  • Adds isPartial parameter to AggInfo.
  • In plan fragmenter, the leaf node will now have an AggInfo with isPartial=true, and the TimeSeriesExchangeNode will have the same but with isPartial=false.
  • If the TimeSeries contains byte[][] values, then the TimeSeriesBlockSerde will Hex encode them and send them as String[] to the broker. This is inefficient but should be good enough for now.

How to Build Complex Series Builders

If you want to build a complex series builder, where complex refers to the fact that the intermediate aggregate will need to be stored in a non-double type, you can do the following:

  • In your BaseTimeSeriesBuilder, in the build call, generate the TimeSeries with byte[][] values if the aggregate was partial.
  • Update your mergeAlignedSeries for such series builders to handle byte[][] values.

Test Plan

Added unit tests to cover testing for TimeSeries serde. Also ran quickstart to verify E2E integration works as before.

@ankitsultana ankitsultana added the timeseries-engine Tracking tag for generic time-series engine work label Dec 11, 2024
@codecov-commenter
Copy link

codecov-commenter commented Dec 11, 2024

Codecov Report

Attention: Patch coverage is 84.05797% with 11 lines in your changes missing coverage. Please review.

Project coverage is 63.97%. Comparing base (59551e4) to head (fc83cd8).
Report is 1463 commits behind head on master.

Files with missing lines Patch % Lines
...runtime/timeseries/serde/TimeSeriesBlockSerde.java 92.15% 2 Missing and 2 partials ⚠️
...e/pinot/tsdb/planner/TimeSeriesPlanFragmenter.java 50.00% 2 Missing and 1 partial ⚠️
...c/main/java/org/apache/pinot/tsdb/spi/AggInfo.java 60.00% 2 Missing ⚠️
...common/response/PinotBrokerTimeSeriesResponse.java 0.00% 1 Missing ⚠️
...he/pinot/tsdb/spi/plan/LeafTimeSeriesPlanNode.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14631      +/-   ##
============================================
+ Coverage     61.75%   63.97%   +2.22%     
- Complexity      207     1600    +1393     
============================================
  Files          2436     2696     +260     
  Lines        133233   148442   +15209     
  Branches      20636    22751    +2115     
============================================
+ Hits          82274    94971   +12697     
- Misses        44911    46502    +1591     
- Partials       6048     6969     +921     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.96% <84.05%> (+2.25%) ⬆️
java-21 63.86% <84.05%> (+2.24%) ⬆️
skip-bytebuffers-false 63.97% <84.05%> (+2.22%) ⬆️
skip-bytebuffers-true 63.84% <84.05%> (+36.11%) ⬆️
temurin 63.97% <84.05%> (+2.22%) ⬆️
unittests 63.97% <84.05%> (+2.22%) ⬆️
unittests1 56.18% <76.81%> (+9.29%) ⬆️
unittests2 34.51% <13.04%> (+6.78%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ankitsultana ankitsultana marked this pull request as ready for review December 12, 2024 02:20
} else {
Preconditions.checkState(!currentAggInfo.getIsPartial(),
"Leaf node in the logical plan should not have partial agg");
context._fragments.add(leafNode.withAggInfo(currentAggInfo.withPartialAggregation()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If currentAggInfo isPartial is false, then shouldn't we use currentAggInfo.withFullAggregation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leaf node created by the users should have agg-mode=full, but the leaf node generated by Pinot should have partial aggregation since the TimeSeriesExchangeNode will have the full aggregation.

There's a cleanup pending here: since users are returning Logical plan, we shouldn't allow them to choose between partial or full aggregation. But implementation wise it's a bit tricky.. but I should add a TODO for this (I thought I already did).

@ankitsultana ankitsultana changed the title [timeseries] Add Support for Partial Aggregate and Complex Intermediate Type [timeseries] Part-3.1: Add Support for Partial Aggregate and Complex Intermediate Type Dec 13, 2024
@@ -102,8 +104,15 @@ public static List<BaseTimeSeriesPlanNode> getFragments(BaseTimeSeriesPlanNode r
private static BaseTimeSeriesPlanNode fragmentRecursively(BaseTimeSeriesPlanNode planNode, Context context) {
if (planNode instanceof LeafTimeSeriesPlanNode) {
LeafTimeSeriesPlanNode leafNode = (LeafTimeSeriesPlanNode) planNode;
context._fragments.add(leafNode.withInputs(Collections.emptyList()));
return new TimeSeriesExchangeNode(planNode.getId(), Collections.emptyList(), leafNode.getAggInfo());
AggInfo currentAggInfo = leafNode.getAggInfo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do Users have to specify or set isPartial flag while creating LeafPlanNode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to specify, but you should always set it to false. There's a TODO in AggInfo regarding this. Ideally logical plan shouldn't be aware of this since this is a physical detail. I'll fix it as part of MSE integration.

* TODO(timeseries): Ideally we should remove this from the logical plan completely.
* </p>
*/
private final boolean _isPartial;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the _isPartial flag Users are expected to run SeriesBuilderoperations. For example if _isPartial = true merge series based on function which would produce final result for leaf phase.
Am I correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Note that you should set isPartial to false in the logical plan that you return. Pinot will automatically create a partial AggInfo when the execution is split between the brokers and servers.

@raghavyadav01
Copy link
Contributor

@ankitsultana One more change we discussed was passing raw timestamps along with values to SeriesBuilder for each time bucket . I do not see that change are you planning to do it in next iteration?

@ankitsultana
Copy link
Contributor Author

@ankitsultana One more change we discussed was passing raw timestamps along with values to SeriesBuilder for each time bucket . I do not see that change are you planning to do it in next iteration?

Yup that'll be in a follow-up. I'll wrap up the broker reduce support first.

@tibrewalpratik17 tibrewalpratik17 merged commit b64bd81 into apache:master Dec 17, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
timeseries-engine Tracking tag for generic time-series engine work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants