Add index pages to the format to support efficient page skipping #324

asfimport · 2017-03-24T18:45:28Z

When a Parquet file is sorted we can define an index consisting of the boundary values for the pages of the columns sorted on as well as the offsets and length of said pages in the file.
The goal is to optimize lookup and range scan type queries, using this to read only the pages containing data matching the filter.
We'd require the pages to be aligned accross columns.

[~marcelk] will add a link to the google doc to discuss the spec

Reporter: Julien Le Dem / @julienledem
Assignee: Marcel Kinard

Related issues:

Release Parquet format 2.4.0 (blocks)
Don't write page level statistics in Parquet files in anticipation of page indexes (is required by)
Write page index in Parquet files (is required by)
Column indexes (is depended upon by)
Write index page in parquet file (is depended upon by)

PRs and other links:

_{Note: This issue was originally created as PARQUET-922. Please see the migration documentation for further details.}

The text was updated successfully, but these errors were encountered:

asfimport · 2017-04-03T22:39:59Z

Lars Volker / @lekv:
The design doc for this feature can be found here: https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8ku4BFxf8U_Do5K2wSO4/edit?usp=sharing

asfimport · 2017-10-16T23:49:39Z

Ryan Blue / @rdblue:
Merged format PR #72. Thanks for getting this pushed through @lekv!

asfimport · 2018-01-19T12:32:27Z

Zoltan Ivanfi / @zivanfi:
I was looking for a JIRA for the actual implementation in parquet-mr, but couldn't find it. Does such a JIRA already exist?

asfimport · 2018-02-08T07:10:55Z

legend:
Hi @zivanfi

I have same question with you. Do you working on the issue that implement in parquet-mr for Index Page?

asfimport · 2018-02-08T07:17:45Z

Gabor Szadovszky / @gszadovszky:
Hi [~legend],

I am working on it. Already linked the related JIRA to this one: PARQUET-1201. Please, feel free to add any questions to that JIRA if you think it is public or send an email directly to me.

asfimport · 2018-02-08T07:34:26Z

legend:
@gszadovszky, Great!(y)

asfimport closed this as completed Oct 16, 2017

This was referenced Jun 23, 2024

Release Parquet format 2.4.0 #274

Closed

Column indexes apache/parquet-java#2123

Closed

Write index page in parquet file apache/parquet-java#2127

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add index pages to the format to support efficient page skipping #324

Add index pages to the format to support efficient page skipping #324

asfimport commented Mar 24, 2017 •

edited

Loading

asfimport commented Apr 3, 2017

asfimport commented Oct 16, 2017

asfimport commented Jan 19, 2018

asfimport commented Feb 8, 2018

asfimport commented Feb 8, 2018

asfimport commented Feb 8, 2018

Add index pages to the format to support efficient page skipping #324

Add index pages to the format to support efficient page skipping #324

Comments

asfimport commented Mar 24, 2017 • edited Loading

Related issues:

PRs and other links:

asfimport commented Apr 3, 2017

asfimport commented Oct 16, 2017

asfimport commented Jan 19, 2018

asfimport commented Feb 8, 2018

asfimport commented Feb 8, 2018

asfimport commented Feb 8, 2018

asfimport commented Mar 24, 2017 •

edited

Loading