Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index pages to the format to support efficient page skipping #324

Closed
asfimport opened this issue Mar 24, 2017 · 6 comments
Closed

Add index pages to the format to support efficient page skipping #324

asfimport opened this issue Mar 24, 2017 · 6 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Mar 24, 2017

When a Parquet file is sorted we can define an index consisting of the boundary values for the pages of the columns sorted on as well as the offsets and length of said pages in the file.
The goal is to optimize lookup and range scan type queries, using this to read only the pages containing data matching the filter.
We'd require the pages to be aligned accross columns.

[~marcelk] will add a link to the google doc to discuss the spec

Reporter: Julien Le Dem / @julienledem
Assignee: Marcel Kinard

Related issues:

PRs and other links:

Note: This issue was originally created as PARQUET-922. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

@asfimport
Copy link
Collaborator Author

Ryan Blue / @rdblue:
Merged format PR #72. Thanks for getting this pushed through @lekv!

@asfimport
Copy link
Collaborator Author

Zoltan Ivanfi / @zivanfi:
I was looking for a JIRA for the actual implementation in parquet-mr, but couldn't find it. Does such a JIRA already exist?

@asfimport
Copy link
Collaborator Author

legend:
Hi @zivanfi

    I have same question with you. Do you working on the issue that implement in parquet-mr for Index Page?

@asfimport
Copy link
Collaborator Author

Gabor Szadovszky / @gszadovszky:
Hi [~legend],

I am working on it. Already linked the related JIRA to this one: PARQUET-1201. Please, feel free to add any questions to that JIRA if you think it is public or send an email directly to me.

@asfimport
Copy link
Collaborator Author

legend:
@gszadovszky, Great!(y)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant