Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#6779] feat(core): Support lineage framework in Gravitino #6782

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

FANNG1
Copy link
Contributor

@FANNG1 FANNG1 commented Mar 28, 2025

What changes were proposed in this pull request?

Support lineage framework in Gravitino, lineage endpoint and lineage sink manager will be proposed in separate PR.

Total workflow draft PR: #6723

The main work flow:

  1. Gravitino server creates lineage service which manages lineage source and lineage sinks.
  2. lineage source implementation receives lineage run event and dispatches to lineage service.
  3. lineage service process the run event and dispatch to lineage sink manager.
  4. lineage sink manager manges the life cycle of link lineage sinks, will dispatch run event to lineage sinks.

Why are the changes needed?

Fix: #6779

Does this PR introduce any user-facing change?

no

How was this patch tested?

setup Spark&Marquez environment and test the work flow.

@FANNG1 FANNG1 force-pushed the lineage-framework branch from 4c8e15f to 2276f0f Compare March 28, 2025 10:52
@FANNG1 FANNG1 marked this pull request as draft March 29, 2025 01:29
@FANNG1 FANNG1 changed the title [SIP] feat(core): Support lineage framework in Gravitino [#6779] feat(core): Support lineage framework in Gravitino Mar 31, 2025
@FANNG1 FANNG1 force-pushed the lineage-framework branch from 0ec623e to 30767fa Compare March 31, 2025 03:35
@FANNG1 FANNG1 force-pushed the lineage-framework branch from 30767fa to ef9cb53 Compare March 31, 2025 03:42
@FANNG1 FANNG1 marked this pull request as ready for review March 31, 2025 03:42
@FANNG1
Copy link
Contributor Author

FANNG1 commented Mar 31, 2025

@jerryshao @yuqi1129 @mchades @diqiu50 @jerqi @xunliu PTAL, thx.

@@ -36,7 +36,13 @@ dependencies {
implementation(libs.commons.collections4)
implementation(libs.guava)
implementation(libs.h2db)
implementation(libs.jackson.datatype.jdk8)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jackson package is introduced by LineageLogSinker to deserializate run event to string, this will add extra dependences for all catalog and may introduce some package conflict for jackson. Do you think is it necessary to move LineageLogSinker to server package? @jerryshao


@Override
public Set<String> getRESTPackages() {
return ImmutableSet.of();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add real package location in another PR.

@jerryshao
Copy link
Contributor

jerryshao commented Mar 31, 2025

I was think that if we should move the lineage related framework and implementations to the separated module, the main reason is that lineage is indirectly related to core metadata module, and Gravitino can be worked w/o lineage, so lineage is more like a addon module, is it better to separate them to the Gravitino core, what do you think?

@FANNG1
Copy link
Contributor Author

FANNG1 commented Apr 1, 2025

I was think that if we should move the lineage related framework and implementations to the separated module, the main reason is that lineage is indirectly related to core metadata module, and Gravitino can be worked w/o lineage, so lineage is more like a addon module, is it better to separate them to the Gravitino core, what do you think?

seems reasonable to add a new module , I'll do a refactor to adding lineage module.

@FANNG1
Copy link
Contributor Author

FANNG1 commented Apr 1, 2025

@jerryshao , add an new lineage module to place lineage related code, and move LineageService from GravitinoEnv which is included in core module, to GravitinoServer, please help to review again

List<String> sinks = sinks();

Map<String, String> config = getAllConfig();
Map m = new HashMap(config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add generic parameter here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public class ClassUtils {
public static <T> T loadClass(String className) {
try {
return (T) Class.forName(className).getDeclaredConstructor().newInstance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method aim to load all classes, or is it just for loading LineageSource classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method is not bond to lineage classes, the other package could use this util to load class too.

@@ -38,6 +38,9 @@ dependencies {
implementation(libs.jackson.datatype.jsr310)
implementation(libs.jackson.databind)
implementation(libs.metrics.jersey2)
implementation(libs.openlineage.java) {
isTransitive = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the configuration used for?

Copy link
Contributor Author

@FANNG1 FANNG1 Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to exclude all, not include the dependences package to Gravitino


public void initialize(List<String> sinks, Map<String, String> LineageConfigs) {}

public boolean isHighWaterMark() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the meaning of highWatherMark here? Could you please leave more comments here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@mchades mchades closed this Apr 2, 2025
@mchades mchades reopened this Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Support Lineage framework for Gravitino
4 participants