Appending to parquet files #4150
Replies: 2 comments 2 replies
-
This is definitely technically feasible, and has come up before (#557). There are some challenges around how to handle other metadata such as page index information, bloom filters, etc... I'll see if I can't write up how this could be implemented |
Beta Was this translation helpful? Give feedback.
-
I think the current way to handle this is not to update the parquet file, but instead write a new parquet file and then merge them together at query time, which many query engines support Then when enough small files accumulate to rewrite them all into a new file. I can see that for some use cases appending would be helpful, though the metadata for the entire file probably needs to be rewritten each time |
Beta Was this translation helpful? Give feedback.
-
Hi
Does arrow-rs support appending data to an existing Parquet file?
If I'm understanding https://github.com/apache/parquet-format#file-format correctly, it should theoretically be possible to implement append fairly efficiently:
Am I missing something in the above description? Is that already possible?
If append is not possible, the alternative is to read and decode the existing parquet file, and then reencoding the entire file again. I would like to avoid this though, and reuse the existing row groups.
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions