Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-36881][table] Introduce GroupTableAggFunction in GroupTableAggregate with Async State API #25789

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Au-Miner
Copy link

What is the purpose of the change

Introduce GroupTableAggFunction in GroupTableAggregate with async state api.

Brief change log

  • Introduce GroupTableAggFunction in GroupTableAggregate with async state api.
  • Add ITs and HarnessTests

Verifying this change

Existent tests and new added tests can verify this change.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 11, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

final boolean enableAsyncState = AggregateUtil.enableAsyncState(config, aggInfoList);

final OneInputStreamOperator<RowData, RowData> operator;
if (!enableAsyncState) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if (enableAsyncState)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified it in the pr

config, planner.getFlinkContext().getClassLoader()),
planner.createRelBuilder(),
JavaScalaConversionUtil.toScala(inputRowType.getChildren()),
// TODO: heap state backend do not copy key currently,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure of the impact of these TODOs.

it would be worth tracking these TODOs with Jiras and including the numbers in the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will track the progress of Jira and follow up on PR

import static org.apache.flink.table.runtime.util.StateConfigUtil.createTtlConfig;

/** Aggregate Function used for the groupby (without window) table aggregate in async state. */
public class AsyncStateGroupTableAggFunction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we can have the async version extend the sync version and reuse variables / logic, to try to keep the 2 implementations in line and reduce the change of diverging.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, sync and asynchronous use different versions of state objects. In consideration of bccace0, the design concept is to design another set of corresponding files and reuse the code logic as much as possible in Helper

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, sync and asynchronous use different versions of state objects. In consideration of bccace0, the design concept is to design another set of corresponding files and reuse the code logic as much as possible in Helper

I am curious why there are different state objects - used for sync and async - this does not seem right - can we align or how can we be sure that the behaviour remains the same.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The async version of the state object is still under development, and many operators such as MiniBatchGroupAggFunction cannot support state objects well, so we need to wait for the maturity of the async version of the state framework

import static org.apache.flink.table.runtime.util.StateConfigUtil.createTtlConfig;

/** Aggregate Function used for the groupby (without window) table aggregate in async state. */
public class AsyncStateGroupTableAggFunction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs unit test

Copy link
Author

@Au-Miner Au-Miner Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function related to agg requires GeneratedTableAggsHandleFunction which seems not to be able to create quickly in the test, and other aggFunctions do not implement similar tests. Therefore, I have placed the specific test in TableAggregateITCase and TableAggregateHarnessTest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed by Chi on 12/12/24. Asked submitter questions

import static org.apache.flink.table.data.util.RowDataUtil.isAccumulateMsg;

/** A helper to do the logic of group table agg. */
public abstract class GroupTableAggHelper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need unit tests

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corresponding to Asynchronous State Group Table AggFunction

this.function = function;
}

public RowData processElement(
Copy link
Contributor

@davidradl davidradl Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be great to have javadoc to detail the intent of the method

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review. Relevant comments have been added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants