-
Notifications
You must be signed in to change notification settings - Fork 38
feat: add support for avro to arrow data conversion #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
} | ||
const auto& avro_array = avro_datum.value<::avro::GenericArray>(); | ||
|
||
auto* list_builder = internal::checked_cast<::arrow::ListBuilder*>(array_builder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any chance list_builder
could be a nullptr, should we add a check for that?
If the user provides the same schema for ToArrowSchema and Project, it shouldn't be an issue. However, should we return an error if the user make mistakes? Not so sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. This is not supposed to be called by users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, that should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like underlying typeof ::arrow::ArrayBuilder
is owned by user, and would have ub when type is wrong. But it's ok to me here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all internal functions and array builders are owned by the AvroReader.
/// \param avro_datum The Avro data to append | ||
/// \param projection Schema projection from `projected_schema` to `avro_node` | ||
/// \param projected_schema The projected schema | ||
/// \param array_builder The Arrow array builder to append to (must be a struct builder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's a undefined behavior to pass-in a different type builder? Can the argument here just a StructBuilder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That requires additional cast before using this function.
} | ||
const auto& avro_array = avro_datum.value<::avro::GenericArray>(); | ||
|
||
auto* list_builder = internal::checked_cast<::arrow::ListBuilder*>(array_builder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like underlying typeof ::arrow::ArrayBuilder
is owned by user, and would have ub when type is wrong. But it's ok to me here
auto* key_builder = map_builder->key_builder(); | ||
auto* item_builder = map_builder->item_builder(); | ||
|
||
const auto& record_node = avro_node->leafAt(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should check leafCount == 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
::avro::AVRO_ARRAY
type should guarantee this so I don't think we need to over-protect it.
This PR is complete. Please take a look @Fokko |
Preliminary avro datum to arrow array data conversion support: