-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Spark 4.0: Add variant round trip test for Spark #14276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
CI will pass once #14261 is in. |
f43ded7
to
6e4873a
Compare
cc @aihuaxu @amogh-jahagirdar @singhpk234 Could you please take a look when you have a moment? Thanks! |
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkVariantRead.java
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkVariantRead.java
Outdated
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkVariantRead.java
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkVariantRead.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks @huaxingao !
vv1 = new Variant(((VariantVal) v1row1).getValue(), ((VariantVal) v1row1).getMetadata()); | ||
vv2 = new Variant(((VariantVal) v1row2).getValue(), ((VariantVal) v1row2).getMetadata()); | ||
} else { | ||
fail("Expected Variant/VariantVal but got: " + (v1row1 == null ? "null" : v1row1.getClass())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Assertions#fail
method supports string template, so it would be better to use it directly instead of concatenating strings on the caller side.
import org.junit.jupiter.params.ParameterizedTest; | ||
import org.junit.jupiter.params.provider.ValueSource; | ||
|
||
public class TestSparkVariantRead extends TestBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we include "Read" in the test class name? It looks like there are some write operations too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a TestSparkVariants
, but for different test purpose. Even though there are write operations, this test is mainly used for test read path.
Object v1row2 = directRows.get(1).get(1); | ||
Variant vv1; | ||
Variant vv2; | ||
if (v1row1 instanceof Variant) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have Variant or VariantVal here? In Spark, would it always be VariantVal since it's from Spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. This should only be VariantVal
import org.junit.jupiter.params.ParameterizedTest; | ||
import org.junit.jupiter.params.provider.ValueSource; | ||
|
||
public class TestSparkVariantRead extends TestBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we are covering the variant query as a whole column. The variant extraction such as v1:k::string is not part of this PR, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, currently this only tests variant query as a whole column. I will add more tests as followup.
Adding variant round trip test for Spark, covering projection and filtering. Column pruning and filtering are on the whole variant column for now, with Spark change push Variant into DSv2 scan, we should be able to do column pruning and filtering on shredded variant in the future.