Skip to content

Conversation

huaxingao
Copy link
Contributor

@huaxingao huaxingao commented Oct 7, 2025

Adding variant round trip test for Spark, covering projection and filtering. Column pruning and filtering are on the whole variant column for now, with Spark change push Variant into DSv2 scan, we should be able to do column pruning and filtering on shredded variant in the future.

@github-actions github-actions bot added the spark label Oct 7, 2025
@huaxingao
Copy link
Contributor Author

CI will pass once #14261 is in.

@huaxingao
Copy link
Contributor Author

cc @aihuaxu @amogh-jahagirdar @singhpk234 Could you please take a look when you have a moment? Thanks!

Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @huaxingao !

vv1 = new Variant(((VariantVal) v1row1).getValue(), ((VariantVal) v1row1).getMetadata());
vv2 = new Variant(((VariantVal) v1row2).getValue(), ((VariantVal) v1row2).getMetadata());
} else {
fail("Expected Variant/VariantVal but got: " + (v1row1 == null ? "null" : v1row1.getClass()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Assertions#fail method supports string template, so it would be better to use it directly instead of concatenating strings on the caller side.

import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.ValueSource;

public class TestSparkVariantRead extends TestBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we include "Read" in the test class name? It looks like there are some write operations too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a TestSparkVariants, but for different test purpose. Even though there are write operations, this test is mainly used for test read path.

Object v1row2 = directRows.get(1).get(1);
Variant vv1;
Variant vv2;
if (v1row1 instanceof Variant) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have Variant or VariantVal here? In Spark, would it always be VariantVal since it's from Spark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This should only be VariantVal

import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.ValueSource;

public class TestSparkVariantRead extends TestBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we are covering the variant query as a whole column. The variant extraction such as v1:k::string is not part of this PR, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, currently this only tests variant query as a whole column. I will add more tests as followup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants