Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-36625] add lineage helper class for connector integration #25712

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

HuangZhenQiu
Copy link
Contributor

What is the purpose of the change

Add helper classes in stream java api for lineage integration in connectors.

Brief change log

  • Add TypeDatasetFacet class and Provide interface for easier extract type info from connectors.
  • Add LineageUtil to easily contract Lineage related POJOs

Verifying this change

This change added tests and can be verified as follows:

  • Added LineageUtilsTest to cover public functions

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: ( no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@HuangZhenQiu HuangZhenQiu force-pushed the FLINK-36625-lineage-helper branch from 8171f17 to 87538a2 Compare November 30, 2024 06:30
@flinkbot
Copy link
Collaborator

flinkbot commented Nov 30, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@PublicEvolving
public interface TypeDatasetFacet extends LineageDatasetFacet {

TypeInformation getTypeInformation();
Copy link
Contributor

@davidradl davidradl Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: can we add @nonnull and @PublicEvolving to this method. I see examples in the Flink codebase where @PublicEvolving is on the methods as well as the class.

If it can be null it should be annotated with @nullable

* Returns a type dataset facet or `Optional.empty` in case an implementing class is not able to
* resolve type.
*/
Optional<TypeDatasetFacet> getTypeDatasetFacet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is a new interface but it is not referenced in the fix, what is the thinking here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface is mainly for some connector in which return type is provided by internal class rather than the source/sink directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. It would be good to understand when a connector author should use this interface. Could we add documentation around when this and the utility classes could/ should be used ?

return datasetOf(name, namespace, Collections.singletonList(typeDatasetFacet));
}

public static LineageDataset datasetOf(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these datasetOf constructor methods can be put in DefaultLineageDataset or a class that creates DefaultLineageDataset , maybe a DefaultLineageDatasetProvider

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mainly to make the LineageDataset creation easier and simplify the code in each of connectors.

@HuangZhenQiu HuangZhenQiu force-pushed the FLINK-36625-lineage-helper branch from edbdb1d to 7771506 Compare December 4, 2024 06:03
@davidradl
Copy link
Contributor

Reviewed by Chi on 05/12/24. Asked submitter questions. Requests documentation be updated on how connector authors should / could use this helper class.

@HuangZhenQiu
Copy link
Contributor Author

@davidradl
Given we have existing interfaces that haven't been documented, I plan to add en end to end native lineage page for this jira https://issues.apache.org/jira/browse/FLINK-35745. May I do it in a follow up PR?

@HuangZhenQiu
Copy link
Contributor Author

After this PR is merged, I will update this PR #25762.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants