-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there any plan to support MultiIndex DataFrames in Parquet I/O in the future? #223
Comments
@Roger-Liang Thanks for the feature request. This is definitely something we can look into. Could you share more about your use case and why this is important to you? Also, example code to support would be appreciated. |
Hi @ehsantn , Thanks for the quick response! I frequently work with MultiIndex DataFrames where one of the levels represents dates, and I rely on the pyarrow engine to handle Parquet I/O with partitioning based on the Date column. Partitioning by date is essential for efficiently managing and querying large, time-series datasets. Currently, my workflow requires resetting the MultiIndex to promote the Date level to a column so that I can partition the data during the write process. When reading the data back, I need to manually reconstruct the MultiIndex. This workaround not only adds extra code but also increases the risk of errors, especially as the complexity and size of the data grow. Below is an example of my current approach:
Native support for MultiIndex DataFrames in Parquet I/O would greatly simplify this process by preserving the full hierarchical index automatically—even when partitioning by Date. This enhancement would not only streamline my workflow but also improve data integrity and reduce the overhead of manual index management. Looking forward to your thoughts on this! |
Thank you @Roger-Liang for the detailed example! We will look into it and prioritize soon. |
Thank you @ehsantn !!!!! |
As the title descripted.
The text was updated successfully, but these errors were encountered: