Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace FileMetadata in parquet writer with in memory representation. #1004

Open
liurenjie1024 opened this issue Feb 25, 2025 · 1 comment
Open
Labels
good first issue Good for newcomers
Milestone

Comments

@liurenjie1024
Copy link
Contributor

Currently in parquet writer the FileMetadata we used is this one, which is auto generated from parquet thrift definition. We should use this one for in memory operations.

@jonathanc-n
Copy link
Contributor

@liurenjie1024 I think the current problem with this is that ArrowFileReader (reader) returns ParquetMetadata and AsyncFileWriter (writer) returns the thrift definition. The solution I was thinking of is creating a conversion from thrift -> ParquetMetadata, but this seems like an unnecessary step. I think keeping both functions so that the parquet writer can convert to datafile given any of the two metadatas without an unnecessary conversion step in between seems to be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
Status: No status
Development

No branches or pull requests

2 participants