-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#6779] feat(core): Support lineage framework in Gravitino #6782
base: main
Are you sure you want to change the base?
Conversation
4c8e15f
to
2276f0f
Compare
0ec623e
to
30767fa
Compare
30767fa
to
ef9cb53
Compare
core/build.gradle.kts
Outdated
@@ -36,7 +36,13 @@ dependencies { | |||
implementation(libs.commons.collections4) | |||
implementation(libs.guava) | |||
implementation(libs.h2db) | |||
implementation(libs.jackson.datatype.jdk8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jackson
package is introduced by LineageLogSinker
to deserializate run event to string, this will add extra dependences for all catalog and may introduce some package conflict for jackson
. Do you think is it necessary to move LineageLogSinker
to server package? @jerryshao
|
||
@Override | ||
public Set<String> getRESTPackages() { | ||
return ImmutableSet.of(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add real package location in another PR.
I was think that if we should move the lineage related framework and implementations to the separated module, the main reason is that lineage is indirectly related to core metadata module, and Gravitino can be worked w/o lineage, so lineage is more like a addon module, is it better to separate them to the Gravitino core, what do you think? |
seems reasonable to add a new module , I'll do a refactor to adding |
@jerryshao , add an new |
List<String> sinks = sinks(); | ||
|
||
Map<String, String> config = getAllConfig(); | ||
Map m = new HashMap(config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add generic parameter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
public class ClassUtils { | ||
public static <T> T loadClass(String className) { | ||
try { | ||
return (T) Class.forName(className).getDeclaredConstructor().newInstance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this method aim to load all classes, or is it just for loading LineageSource
classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the method is not bond to lineage classes, the other package could use this util to load class too.
server/build.gradle.kts
Outdated
@@ -38,6 +38,9 @@ dependencies { | |||
implementation(libs.jackson.datatype.jsr310) | |||
implementation(libs.jackson.databind) | |||
implementation(libs.metrics.jersey2) | |||
implementation(libs.openlineage.java) { | |||
isTransitive = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the configuration used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to exclude all
, not include the dependences package to Gravitino
|
||
public void initialize(List<String> sinks, Map<String, String> LineageConfigs) {} | ||
|
||
public boolean isHighWaterMark() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the meaning of highWatherMark
here? Could you please leave more comments here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
What changes were proposed in this pull request?
Support lineage framework in Gravitino, lineage endpoint and lineage sink manager will be proposed in separate PR.
Total workflow draft PR: #6723
The main work flow:
Why are the changes needed?
Fix: #6779
Does this PR introduce any user-facing change?
no
How was this patch tested?
setup Spark&Marquez environment and test the work flow.