-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support serialize/deserialize DataFile into avro bytes #797
Conversation
0218dc7
to
f766445
Compare
f766445
to
b94c978
Compare
I change this PR to add interface to help serialize/deserialize DataFile into avro bytes. The idea comes from #774 (comment). I think it can be a good start for #774. It provides the interface to let user serialize/deserialize the DataFile. In later, we can discuss whether to let DataFile itself serializable and essentially, it means that we should contain more info in the DataFile and we don't need to provide the info(e.g. partition type) as parameters in the interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ZENOTME for this pr. Left some suggestions to improve api consistency, others LGTM.
crates/iceberg/src/spec/manifest.rs
Outdated
@@ -656,6 +656,38 @@ mod _const_schema { | |||
}) | |||
}; | |||
|
|||
fn data_file_fields_v2(partition_type: StructType) -> Vec<NestedFieldRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn data_file_fields_v2(partition_type: StructType) -> Vec<NestedFieldRef> { | |
fn data_file_fields_v2(partition_type: &StructType) -> Vec<NestedFieldRef> { |
crates/iceberg/src/spec/manifest.rs
Outdated
] | ||
} | ||
|
||
pub(super) fn data_file_schema_v2(partition_type: StructType) -> Result<AvroSchema, Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub(super) fn data_file_schema_v2(partition_type: StructType) -> Result<AvroSchema, Error> { | |
pub(super) fn data_file_schema_v2(partition_type: &StructType) -> Result<AvroSchema, Error> { |
crates/iceberg/src/spec/manifest.rs
Outdated
)), | ||
]; | ||
let schema = Schema::builder().with_fields(fields).build()?; | ||
schema_to_avro_schema("manifest_entry", &schema) | ||
} | ||
|
||
fn data_file_fields_v1(partition_type: StructType) -> Vec<NestedFieldRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn data_file_fields_v1(partition_type: StructType) -> Vec<NestedFieldRef> { | |
fn data_file_fields_v1(partition_type: &StructType) -> Vec<NestedFieldRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a small nit.
eb7270e
to
92871af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ZENOTME for working this!
This PR exposes the _serde::DataFile so that the user can serialize && deserialize the data file. related issue: #774