-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50463][SQL][3.5] Fix ConstantColumnVector
with Columnar to Row conversion
#49131
Conversation
…nversion ### What changes were proposed in this pull request? apache@800faf0 frees column vector resources between batches in columnar to row conversion. However, like `WritableColumnVector`, `ConstantColumnVector` should not free resources between batches because the same data is used across batches ### Why are the changes needed? Without this change, ConstantColumnVectors with string values, for example, will fail if used with column->row conversion. For instance, reading a parquet table partitioned by a string column with multiple batches. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? added UT that failed before and now passes ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#49021 from richardc-db/col_to_row_const_col_vec_fix. Authored-by: Richard Chen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @LuciferYang .
cc @richardc-db and @viirya , too |
also cc @pan3793 |
FYI: SPARK-50235 causes Iceberg test failure (JVM crash), and this follow-up PR does not help, I may not be able to provide conclusions within a short time because I'm not familiar with the "native" world. See more details at apache/iceberg#11731 @viirya could you please give some guidance? |
Thank you for the reporting, @pan3793 . |
Thank you for re-triggerring the failed test pipelines. |
Does Iceberg implement ColumnVector interface and have it own vector implementations which are similar to writable column vector or constant column vector? |
Iceberg has an implementation similar to |
Thanks. So looks like it is a vector that allows writing into its values although it doesn't inherit the writable interface. Then it can overwrite |
…ow conversion ### What changes were proposed in this pull request? 800faf0 frees column vector resources between batches in columnar to row conversion. However, like `WritableColumnVector`, `ConstantColumnVector` should not free resources between batches because the same data is used across batches ### Why are the changes needed? Without this change, ConstantColumnVectors with string values, for example, will fail if used with column->row conversion. For instance, reading a parquet table partitioned by a string column with multiple batches. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added UT that failed before and now passes ### Was this patch authored or co-authored using generative AI tooling? no Closes #49131 from LuciferYang/SPARK-50463-3.5. Authored-by: Richard Chen <[email protected]> Signed-off-by: yangjie01 <[email protected]>
Merged into branch-3.5 for Apache Spark 3.5.4 RC2. Thanks @dongjoon-hyun @viirya and @pan3793 |
Thank you, @LuciferYang and all! |
The method is newly added just in branch-3.5, not released yet. |
What changes were proposed in this pull request?
800faf0 frees column vector resources between batches in columnar to row conversion. However, like
WritableColumnVector
,ConstantColumnVector
should not free resources between batches because the same data is used across batchesWhy are the changes needed?
Without this change, ConstantColumnVectors with string values, for example, will fail if used with column->row conversion. For instance, reading a parquet table partitioned by a string column with multiple batches.
Does this PR introduce any user-facing change?
No
How was this patch tested?
added UT that failed before and now passes
Was this patch authored or co-authored using generative AI tooling?
no