Skip to content

Conversation

aihuaxu
Copy link
Contributor

@aihuaxu aihuaxu commented Oct 6, 2025

Handle NPE for VariantLogicalType in TypeWithSchemaVisitor

Description:
This change adds null-safety checks when detecting and processing variant types in the TypeWithSchemaVisitor.visit() method. Previously, the code could throw a NullPointerException when the Parquet schema contains variant columns and iType could null, e.g., the variant column(s) are getting pruned. This is to add the safety check for null and also add test coverage with variant pruning test.

@aihuaxu
Copy link
Contributor Author

aihuaxu commented Oct 6, 2025

@amogh-jahagirdar and @Fokko Can you help take a look? Thanks.

@aihuaxu aihuaxu requested a review from huaxingao October 6, 2025 18:13
}

@Test
public void testVariant() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did the test cover the null scenario?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both variant fields are OPTIONAL and the expected pruned schema keeps it OPTIONAL, so seems to me that null scenario is covered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevenzwu null scenario you mean the null value for the variant? This issue is related to the schema, not the data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would additionally have an engine round trip test which can also more easily demonstrate the specific case where this issue surfaces, I see @huaxingao took this up in #14276.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually in my separate PR I have a roundtrip test and notice this issue. Huaxin or I can work on the test.

@aihuaxu aihuaxu requested a review from stevenzwu October 7, 2025 03:55
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable and right to me, just had a minor comment on the test

}

@Test
public void testVariant() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would additionally have an engine round trip test which can also more easily demonstrate the specific case where this issue surfaces, I see @huaxingao took this up in #14276.

Comment on lines 297 to 306
Types.buildGroup(Type.Repetition.OPTIONAL)
.as(LogicalTypeAnnotation.variantType((byte) 1))
.addField(
Types.primitive(PrimitiveTypeName.BINARY, Type.Repetition.REQUIRED)
.named("metadata"))
.addField(
Types.primitive(PrimitiveTypeName.BINARY, Type.Repetition.REQUIRED)
.named("value"))
.id(3)
.named("variant_2"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This test is a bit hard to parse due to all the fields that need to be setup for a variant, could we abstract that behind a buildVariant helper

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separated into helper method.

|| (iType != null && iType.isVariantType())) {
// when Parquet has a VARIANT logical type, use it here
return visitVariant(iType.asVariantType(), group, visitor);
return visitVariant(iType != null ? iType.asVariantType() : null, group, visitor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this NPE issue would happen when we prune out a variant column that we know we don't need to read; and in that case the iceberg type passed to the visitor would expectedly be null. Makes sense, it's really no different than the other cases when pruning.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well sorry, I mean more generally this applies any case the visitor is visiting a type where there's no companion iceberg type for the parquet type. The issue probably primarily manifests during pruning though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. It's general for any visitors and we would hit NPE without Iceberg type, but more for pruning.

|| (iType != null && iType.isVariantType())) {
// when Parquet has a VARIANT logical type, use it here
return visitVariant(iType.asVariantType(), group, visitor);
return visitVariant(iType != null ? iType.asVariantType() : null, group, visitor);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. It's general for any visitors and we would hit NPE without Iceberg type, but more for pruning.

}

@Test
public void testVariant() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually in my separate PR I have a roundtrip test and notice this issue. Huaxin or I can work on the test.

assertThat(actual).as("Pruned schema should be matched").isEqualTo(expected);
}

private static Type buildVariantType(int id, String name) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separate into a helper method here.

@amogh-jahagirdar
Copy link
Contributor

Will go ahead and merge, thank you @aihuaxu and thanks @stevenzwu @huaxingao @singhpk234 for reviewing

@amogh-jahagirdar amogh-jahagirdar merged commit 9be7c18 into apache:main Oct 9, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants