Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading and writing custom IPC metadata #17560

Closed
nameexhaustion opened this issue Jul 11, 2024 · 5 comments · Fixed by #20066
Closed

Support reading and writing custom IPC metadata #17560

nameexhaustion opened this issue Jul 11, 2024 · 5 comments · Fixed by #20066
Labels
A-io Area: reading and writing data accepted Ready for implementation enhancement New feature or an improvement of an existing feature P-goal Priority: aligns with long-term Polars goals

Comments

@nameexhaustion
Copy link
Collaborator

nameexhaustion commented Jul 11, 2024

Description

The IpcReader should have a function that returns custom metadata stored in the file, and the IpcWriter should have an option to set custom metadata to be written.

@nameexhaustion nameexhaustion added enhancement New feature or an improvement of an existing feature accepted Ready for implementation A-io Area: reading and writing data P-goal Priority: aligns with long-term Polars goals labels Jul 11, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Jul 11, 2024
@nameexhaustion nameexhaustion self-assigned this Jul 13, 2024
@nameexhaustion nameexhaustion changed the title Explore writing custom IPC metadata Support reading and writing custom IPC metadata Oct 28, 2024
@nameexhaustion nameexhaustion removed their assignment Oct 28, 2024
@PrettyWood
Copy link

@nameexhaustion @alexander-beedie @ritchie46 Following up on #18527 (comment) we would really need to keep those metadata
And we also would love to be able to use the latest polars release 😅
@lukapeschke or myself are ready to work on this if needed. We just need to have your insights on the desired API

@nameexhaustion
Copy link
Collaborator Author

Hi @PrettyWood

Thanks for the offer to help! We haven't had the time to look at this yet due to how busy we've been.

For some background, it was removed from the arrow schema struct because we internally didn't want to carry custom metadata around on it. This is still the case - but we'd be happy for there to be functions on the IpcReader and IpcWriter structs themselves that allow for reading / writing custom metadata.

For the implementation, I would start with making something similar to this -

struct IpcReader {
    fn custom_metadata(&self) -> &Arc<PlHashMap<PlSmallStr, PlSmallStr>> {}
}

struct IpcWriter {
    fn set_custom_metadata(&mut self, custom_metdata: Arc<PlHashMap<PlSmallStr, PlSmallStr>>) ->  {}
}

It could also be helpful to look at some of the deleted code in https://github.com/pola-rs/polars/pull/18527/files.

I believe IPC supports message-level metadata, but for the initial PR, we should restrict to only support writing file-level custom metadata

@lukapeschke
Copy link
Contributor

Hi @nameexhaustion thanks for the hints! I've started working on this, and I noticed that the async IPC stream sink was gone as well since #19223 .

We were using it, and would like to keep using an async writer. Is this something you'd be willing to accept if we worked on it ?

@lukapeschke
Copy link
Contributor

@nameexhaustion would something like #20066 work for you ? If yes, I'll add a few tests

@nameexhaustion
Copy link
Collaborator Author

Hi @nameexhaustion thanks for the hints! I've started working on this, and I noticed that the async IPC stream sink was gone as well since #19223 .

We were using it, and would like to keep using an async writer. Is this something you'd be willing to accept if we worked on it ?

Hello,

I won't be able to give you an answer for this one. From what I can tell, it was removed due to not being worth the maintenance burden as we were not using it. I think it would be better to consult with Ritchie for this matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: reading and writing data accepted Ready for implementation enhancement New feature or an improvement of an existing feature P-goal Priority: aligns with long-term Polars goals
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants