-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39456: [Go][Parquet] Arrow DATE64 Type Coerced to Parquet DATE Logical Type #39460
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… logical type instead of DATE (32-bit)
|
zeroshade
approved these changes
Jan 9, 2024
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit eade938. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
This was referenced Jan 15, 2024
clayburn
pushed a commit
to clayburn/arrow
that referenced
this pull request
Jan 23, 2024
…TE Logical Type (apache#39460) ### Rationale for this change Closes: apache#39456 ### What changes are included in this PR? Update physical and logical type mapping from Arrow to Parquet for DATE64 type ### Are these changes tested? Yes, - Update expected schema mapping in existing test - Tests asserting new behavior - Arrow DATE64 will roundtrip -> Parquet -> Arrow as DATE32 - Arrow DATE64 _not aligned_ to exact date boundary will truncate to milliseconds at boundary of greatest full day on Parquet roundtrip ### Are there any user-facing changes? Yes, users of `pqarrow.FileWriter` will produce Parquet files containing `DATE` logical type instead of `TIMESTAMP[ms]` when writing Arrow data containing DATE64 field(s). The proposed implementation truncates `int64` values to be divisible by 86400000 rather than validating that this is already the case, as some implementations do. I am happy to add this validation if it would be preferred, but the truncating behavior will likely break fewer existing users. I'm not sure whether this is technically considered a breaking change to a public API and if/how it should be communicated. Any direction regarding this would be appreciated. * Closes: apache#39456 Authored-by: Joel Lubinitsky <[email protected]> Signed-off-by: Matt Topol <[email protected]>
dgreiss
pushed a commit
to dgreiss/arrow
that referenced
this pull request
Feb 19, 2024
…TE Logical Type (apache#39460) ### Rationale for this change Closes: apache#39456 ### What changes are included in this PR? Update physical and logical type mapping from Arrow to Parquet for DATE64 type ### Are these changes tested? Yes, - Update expected schema mapping in existing test - Tests asserting new behavior - Arrow DATE64 will roundtrip -> Parquet -> Arrow as DATE32 - Arrow DATE64 _not aligned_ to exact date boundary will truncate to milliseconds at boundary of greatest full day on Parquet roundtrip ### Are there any user-facing changes? Yes, users of `pqarrow.FileWriter` will produce Parquet files containing `DATE` logical type instead of `TIMESTAMP[ms]` when writing Arrow data containing DATE64 field(s). The proposed implementation truncates `int64` values to be divisible by 86400000 rather than validating that this is already the case, as some implementations do. I am happy to add this validation if it would be preferred, but the truncating behavior will likely break fewer existing users. I'm not sure whether this is technically considered a breaking change to a public API and if/how it should be communicated. Any direction regarding this would be appreciated. * Closes: apache#39456 Authored-by: Joel Lubinitsky <[email protected]> Signed-off-by: Matt Topol <[email protected]>
zanmato1984
pushed a commit
to zanmato1984/arrow
that referenced
this pull request
Feb 28, 2024
…TE Logical Type (apache#39460) ### Rationale for this change Closes: apache#39456 ### What changes are included in this PR? Update physical and logical type mapping from Arrow to Parquet for DATE64 type ### Are these changes tested? Yes, - Update expected schema mapping in existing test - Tests asserting new behavior - Arrow DATE64 will roundtrip -> Parquet -> Arrow as DATE32 - Arrow DATE64 _not aligned_ to exact date boundary will truncate to milliseconds at boundary of greatest full day on Parquet roundtrip ### Are there any user-facing changes? Yes, users of `pqarrow.FileWriter` will produce Parquet files containing `DATE` logical type instead of `TIMESTAMP[ms]` when writing Arrow data containing DATE64 field(s). The proposed implementation truncates `int64` values to be divisible by 86400000 rather than validating that this is already the case, as some implementations do. I am happy to add this validation if it would be preferred, but the truncating behavior will likely break fewer existing users. I'm not sure whether this is technically considered a breaking change to a public API and if/how it should be communicated. Any direction regarding this would be appreciated. * Closes: apache#39456 Authored-by: Joel Lubinitsky <[email protected]> Signed-off-by: Matt Topol <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rationale for this change
Closes: #39456
What changes are included in this PR?
Update physical and logical type mapping from Arrow to Parquet for DATE64 type
Are these changes tested?
Yes,
Are there any user-facing changes?
Yes, users of
pqarrow.FileWriter
will produce Parquet files containingDATE
logical type instead ofTIMESTAMP[ms]
when writing Arrow data containing DATE64 field(s). The proposed implementation truncatesint64
values to be divisible by 86400000 rather than validating that this is already the case, as some implementations do. I am happy to add this validation if it would be preferred, but the truncating behavior will likely break fewer existing users.I'm not sure whether this is technically considered a breaking change to a public API and if/how it should be communicated. Any direction regarding this would be appreciated.