Spark 4.0: Add variant round trip test for Spark #14276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

huaxingao wants to merge 4 commits into apache:main from huaxingao:read_variant

Contributor

huaxingao commented Oct 7, 2025 •

edited

Loading

Adding variant round trip test for Spark, covering projection and filtering. Column pruning and filtering are on the whole variant column for now, with Spark change push Variant into DSv2 scan, we should be able to do column pruning and filtering on shredded variant in the future.

github-actions bot added the spark label

Contributor Author

huaxingao commented Oct 7, 2025

CI will pass once #14261 is in.

amogh-jahagirdar mentioned this pull request

Handle NPE for VariantLogicalType in TypeWithSchemaVisitor #14261

Merged

huaxingao added 2 commits

October 8, 2025 22:20


          Spark 4.0: Add variant round trip test for Spark

d8cdbd0


          add a test for variant null value projection

6e4873a

huaxingao force-pushed the read_variant branch from f43ded7 to 6e4873a Compare

October 9, 2025 05:21

Contributor Author

huaxingao commented Oct 9, 2025

cc @aihuaxu @amogh-jahagirdar @singhpk234 Could you please take a look when you have a moment? Thanks!

singhpk234 reviewed

View reviewed changes

spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkVariantRead.java Show resolved Hide resolved

spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkVariantRead.java Outdated Show resolved Hide resolved

spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkVariantRead.java Show resolved Hide resolved

spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkVariantRead.java Outdated Show resolved Hide resolved


          address comment

d6d1fb1

singhpk234 approved these changes

View reviewed changes

Contributor

singhpk234 left a comment

LGTM, Thanks @huaxingao !

ebyhr approved these changes

View reviewed changes

spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkVariantRead.java Outdated

+                    vv1 = new Variant(((VariantVal) v1row1).getValue(), ((VariantVal) v1row1).getMetadata());
+                    vv2 = new Variant(((VariantVal) v1row2).getValue(), ((VariantVal) v1row2).getMetadata());
+                  } else {
+                    fail("Expected Variant/VariantVal but got: " + (v1row1 == null ? "null" : v1row1.getClass()));

Contributor

ebyhr Oct 13, 2025

The Assertions#fail method supports string template, so it would be better to use it directly instead of concatenating strings on the caller side.

spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkVariantRead.java

+              import org.junit.jupiter.params.ParameterizedTest;
+              import org.junit.jupiter.params.provider.ValueSource;
+              public class TestSparkVariantRead extends TestBase {

Contributor

ebyhr Oct 13, 2025

Why do we include "Read" in the test class name? It looks like there are some write operations too.

Contributor Author

huaxingao Oct 13, 2025

There is already a TestSparkVariants, but for different test purpose. Even though there are write operations, this test is mainly used for test read path.

aihuaxu reviewed

View reviewed changes

spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkVariantRead.java Outdated

+                  Object v1row2 = directRows.get(1).get(1);
+                  Variant vv1;
+                  Variant vv2;
+                  if (v1row1 instanceof Variant) {

Contributor

aihuaxu Oct 13, 2025

Why do we have Variant or VariantVal here? In Spark, would it always be VariantVal since it's from Spark?

Contributor Author

huaxingao Oct 13, 2025

You are right. This should only be VariantVal

spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkVariantRead.java

+              import org.junit.jupiter.params.ParameterizedTest;
+              import org.junit.jupiter.params.provider.ValueSource;
+              public class TestSparkVariantRead extends TestBase {

Contributor

aihuaxu Oct 13, 2025

Seems we are covering the variant query as a whole column. The variant extraction such as v1:k::string is not part of this PR, correct?

Contributor Author

huaxingao Oct 13, 2025

Right, currently this only tests variant query as a whole column. I will add more tests as followup.


          address comments

fd3f656

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels