You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a Parquet file is sorted we can define an index consisting of the boundary values for the pages of the columns sorted on as well as the offsets and length of said pages in the file.
The goal is to optimize lookup and range scan type queries, using this to read only the pages containing data matching the filter.
We'd require the pages to be aligned accross columns.
[~marcelk] will add a link to the google doc to discuss the spec
I am working on it. Already linked the related JIRA to this one: PARQUET-1201. Please, feel free to add any questions to that JIRA if you think it is public or send an email directly to me.
When a Parquet file is sorted we can define an index consisting of the boundary values for the pages of the columns sorted on as well as the offsets and length of said pages in the file.
The goal is to optimize lookup and range scan type queries, using this to read only the pages containing data matching the filter.
We'd require the pages to be aligned accross columns.
[~marcelk]
will add a link to the google doc to discuss the specReporter: Julien Le Dem / @julienledem
Assignee: Marcel Kinard
Related issues:
PRs and other links:
Note: This issue was originally created as PARQUET-922. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: