-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Handle NPE for VariantLogicalType in TypeWithSchemaVisitor #14261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@amogh-jahagirdar and @Fokko Can you help take a look? Thanks. |
parquet/src/test/java/org/apache/iceberg/parquet/TestPruneColumns.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Test | ||
public void testVariant() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did the test cover the null scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both variant fields are OPTIONAL and the expected pruned schema keeps it OPTIONAL, so seems to me that null scenario is covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevenzwu null scenario you mean the null value for the variant? This issue is related to the schema, not the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would additionally have an engine round trip test which can also more easily demonstrate the specific case where this issue surfaces, I see @huaxingao took this up in #14276.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually in my separate PR I have a roundtrip test and notice this issue. Huaxin or I can work on the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable and right to me, just had a minor comment on the test
} | ||
|
||
@Test | ||
public void testVariant() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would additionally have an engine round trip test which can also more easily demonstrate the specific case where this issue surfaces, I see @huaxingao took this up in #14276.
Types.buildGroup(Type.Repetition.OPTIONAL) | ||
.as(LogicalTypeAnnotation.variantType((byte) 1)) | ||
.addField( | ||
Types.primitive(PrimitiveTypeName.BINARY, Type.Repetition.REQUIRED) | ||
.named("metadata")) | ||
.addField( | ||
Types.primitive(PrimitiveTypeName.BINARY, Type.Repetition.REQUIRED) | ||
.named("value")) | ||
.id(3) | ||
.named("variant_2")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This test is a bit hard to parse due to all the fields that need to be setup for a variant, could we abstract that behind a buildVariant helper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separated into helper method.
|| (iType != null && iType.isVariantType())) { | ||
// when Parquet has a VARIANT logical type, use it here | ||
return visitVariant(iType.asVariantType(), group, visitor); | ||
return visitVariant(iType != null ? iType.asVariantType() : null, group, visitor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this NPE issue would happen when we prune out a variant column that we know we don't need to read; and in that case the iceberg type passed to the visitor would expectedly be null. Makes sense, it's really no different than the other cases when pruning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well sorry, I mean more generally this applies any case the visitor is visiting a type where there's no companion iceberg type for the parquet type. The issue probably primarily manifests during pruning though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. It's general for any visitors and we would hit NPE without Iceberg type, but more for pruning.
|| (iType != null && iType.isVariantType())) { | ||
// when Parquet has a VARIANT logical type, use it here | ||
return visitVariant(iType.asVariantType(), group, visitor); | ||
return visitVariant(iType != null ? iType.asVariantType() : null, group, visitor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. It's general for any visitors and we would hit NPE without Iceberg type, but more for pruning.
} | ||
|
||
@Test | ||
public void testVariant() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually in my separate PR I have a roundtrip test and notice this issue. Huaxin or I can work on the test.
assertThat(actual).as("Pruned schema should be matched").isEqualTo(expected); | ||
} | ||
|
||
private static Type buildVariantType(int id, String name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separate into a helper method here.
Will go ahead and merge, thank you @aihuaxu and thanks @stevenzwu @huaxingao @singhpk234 for reviewing |
Handle NPE for VariantLogicalType in TypeWithSchemaVisitor
Description:
This change adds null-safety checks when detecting and processing variant types in the TypeWithSchemaVisitor.visit() method. Previously, the code could throw a NullPointerException when the Parquet schema contains variant columns and iType could null, e.g., the variant column(s) are getting pruned. This is to add the safety check for null and also add test coverage with variant pruning test.