Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write index page in parquet file #2127

Closed
asfimport opened this issue Feb 11, 2018 · 4 comments
Closed

Write index page in parquet file #2127

asfimport opened this issue Feb 11, 2018 · 4 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Feb 11, 2018

PARQUET-922 has been resolved, parquet-format 2.4.0 supported index page. Once PARQUET-1201 has been resolved, we need to  write index page in parquet file.

Reporter: legend

Related issues:

Note: This issue was originally created as PARQUET-1207. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Gabor Szadovszky / @gszadovszky:
Hi [~legend],

When I've created PARQUET-1201 my intention was to track the whole implementation phase of this feature. The PR I've created for PARQUET-1201 was just a little first step in parquet-format required for the parquet-mr implementation.
The parquet-mr implementation is still ongoing under PARQUET-1201 but not yet in a phase that I would initiate a PR. If you are interested in the early implementation phase you can check it here.
Hopefully, I will be able to start the first PR of the parquet-mr implementation next week.

@asfimport
Copy link
Collaborator Author

legend:
Hi @gszadovszky.

Sorry, I think that you just provided api now.

The reason I'm concerned about index page is that this feature helps query performance. I want to use the feature as soon as possible , and test performance relative to No-Index Page. You can assign the issue to your, I will continue to follow the feature.

By the way, do you have the guidance or FAQs about parquet performance tuning? Please email , if you have it. Thanks for your work and help.

@asfimport
Copy link
Collaborator Author

Gabor Szadovszky / @gszadovszky:
Hi [~legend],

The first phase I've referenced and currently working on will concentrate on the writing path so no performance gains will apply. It already requires quite large code modifications so, I've decided to split the implementation. I will implement the read path but cannot say any deadlines.
I don't have any guidance. The components like Hive and Impala have they own default values and personally I did not test the different values from performance point-of-view. I would suggest sending an email to [email protected] with your typical data so someone who have more experience can reply some hints about the proper parameters.
If you agree I would close this issue with duplicate to PARQUET-1201.

@asfimport
Copy link
Collaborator Author

legend:
Thanks @gszadovszky  for your reply and advice. I will close the issue with duplicate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant