-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IPC FileDecoder #5249
Add IPC FileDecoder #5249
Conversation
/// let trailer_start = buffer.len() - 10; | ||
/// let footer_len = read_footer_length(buffer[trailer_start..].try_into().unwrap()).unwrap(); | ||
/// let footer = root_as_footer(&buffer[trailer_start - footer_len..trailer_start]).unwrap(); | ||
/// | ||
/// let back = fb_to_schema(footer.schema().unwrap()); | ||
/// assert_eq!(&back, schema.as_ref()); | ||
/// | ||
/// let mut decoder = FileDecoder::new(schema, footer.version()); | ||
/// | ||
/// // Read dictionaries | ||
/// for block in footer.dictionaries().iter().flatten() { | ||
/// let block_len = block.bodyLength() as usize + block.metaDataLength() as usize; | ||
/// let data = buffer.slice_with_length(block.offset() as _, block_len); | ||
/// decoder.read_dictionary(&block, &data).unwrap(); | ||
/// } | ||
/// | ||
/// // Read record batch | ||
/// let batches = footer.recordBatches().unwrap(); | ||
/// assert_eq!(batches.len(), 1); // Only wrote a single batch | ||
/// | ||
/// let block = batches.get(0); | ||
/// let block_len = block.bodyLength() as usize + block.metaDataLength() as usize; | ||
/// let data = buffer.slice_with_length(block.offset() as _, block_len); | ||
/// let back = decoder.read_record_batch(block, &data).unwrap().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still somewhat verbose, but I couldn't see an easy way to reduce this further that didn't end up in knots with self-referential structs (as flatbuffers borrow data)
pub fn read_record_batch( | ||
&self, | ||
block: &Block, | ||
buf: &Buffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is perhaps worth noting that this interface won't allow pushing down column projection to IO, I think this is a bridge we cross when we add support for this.
|
||
/// Read the dictionary with the given block and data buffer | ||
pub fn read_dictionary(&mut self, block: &Block, buf: &Buffer) -> Result<(), ArrowError> { | ||
let message = self.read_message(buf)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not bit deal but seems we didn't check for metadata version for DictionaryBatch
before. Maybe it was missed before.
Co-authored-by: Liang-Chi Hsieh <[email protected]>
😆 -- love it! |
Which issue does this PR close?
Closes #5153
Closes #5165
Closes #5252
Part of #1207
Rationale for this change
In a similar vein to arrow-csv and arrow-json this extracts a low-level, push-based interface for decoding files. This lends itself to use-cases that need more control over how IO is performed, for example:
What changes are included in this PR?
Are there any user-facing changes?