You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
parquet-hadoop library does not support row-group sizes less then a 100 (PARQUET-409).
Until resolved by Parquet project, we should add a patch (or a reference to a pull request) + build instructions to make it easier for our users to generate parquet files with row groups smaller than a 100.
The text was updated successfully, but these errors were encountered:
selitvin
changed the title
Add a parquet we use to control row-group sizes better
Commit a parquet-mr patch that enables writing out row-group sizes smaller than 100
Aug 27, 2018
From my experience, it's typically not a good idea to have parquet stores with small row-groups. It does violate a bunch of assumptions on the parquet store structure and makes you "fight" parquet library implementation a lot. It manifests as poor performance and large memory footprints in some scenarios.
parquet-hadoop
library does not support row-group sizes less then a 100 (PARQUET-409).Until resolved by Parquet project, we should add a patch (or a reference to a pull request) + build instructions to make it easier for our users to generate parquet files with row groups smaller than a 100.
The text was updated successfully, but these errors were encountered: