-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-535: Make writeAllFields more efficient #324
base: master
Are you sure you want to change the base?
Conversation
Thanks for the patch @garlicccbulb ! I'll look into this and run some tests. Could you add a Parquet JIRA issue for this and add it to this PR's summary? |
Thanks @proflin., I created the jira in PARQUT queue, https://issues.apache.org/jira/browse/PARQUET-535. Let me know if anything else to do. |
Can someone take a quick look at this? |
This looks good to me. |
@lw-lin @lukasnalezenec: any more comments? |
6038af3
to
582af43
Compare
582af43
to
7fb20af
Compare
@julienledem @lw-lin @lukasnalezenec Sorry this one has been delayed for a while. I updated the PR to work with the new semantics with protobuf extensions. In a nutshell, there isn't much to do with extensions, due to the following facts.
However, we can still benefit from the improvement from |
@julienledem @lw-lin @lukasnalezenec Any thoughts? |
This will improve the write performance by 1/3 - 1/4 based on my testing. It makes a huge difference when dealing with really large files (several GBs).
In the original implementation, a significant time was spent on creating unnecessary Java objects.