-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet: clear metadata and project fields of ParquetRecordBatchStream::schema #5135
Merged
tustvold
merged 8 commits into
apache:master
from
Jefffrey:clear_parquet_record_batch_stream_schema_metadata
Dec 5, 2023
Merged
Changes from 6 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
84be336
Parquet: clear metadata of ParquetRecordBatchStream::schema
Jefffrey fd662ad
Revert "Parquet: clear metadata of ParquetRecordBatchStream::schema"
Jefffrey ef9601e
Document expected behaviour
Jefffrey 8e6489d
Revert "Document expected behaviour"
Jefffrey 91f3780
Reapply "Parquet: clear metadata of ParquetRecordBatchStream::schema"
Jefffrey af4f2d1
ParquetRecordBatchStream should strip schema metadata and respect pro…
Jefffrey 1d96883
Merge branch 'master' into clear_parquet_record_batch_stream_schema_m…
Jefffrey 7ea1e1b
Fix projection of nested fields
Jefffrey File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite correct in the case of a projection of a nested schema, its possible this isn't actually determined until the ArrayReader is constructed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although https://docs.rs/parquet/latest/parquet/arrow/fn.parquet_to_arrow_field_levels.html
takes a ProjectionMask and should apply it already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit worried about this, as I couldn't find a straightforward way that the schema was constructed from ParquetField + ProjectionMask, as it seems done in the ArrayReader construction indeed.
Edit: wasn't aware of https://docs.rs/parquet/latest/parquet/arrow/fn.parquet_to_arrow_field_levels.html, will check it out 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a quick play, stand by