Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve docs for sending flight dictionaries #1

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 26 additions & 23 deletions arrow-flight/src/encode.rs
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ pub struct FlightDataEncoderBuilder {
schema: Option<SchemaRef>,
/// Optional flight descriptor, if known before data.
descriptor: Option<FlightDescriptor>,
/// Deterimines how [`DictionaryArray`]s are encoded for transport.
/// Deterimines how `DictionaryArray`s are encoded for transport.
/// See [`DictionaryHandling`] for more information.
dictionary_handling: DictionaryHandling,
}
Expand Down Expand Up @@ -209,7 +209,7 @@ pub struct FlightDataEncoder {
done: bool,
/// cleared after the first FlightData message is sent
descriptor: Option<FlightDescriptor>,
/// Deterimines how [`DictionaryArray`]s are encoded for transport.
/// Deterimines how `DictionaryArray`s are encoded for transport.
/// See [`DictionaryHandling`] for more information.
dictionary_handling: DictionaryHandling,
}
Expand Down Expand Up @@ -353,16 +353,31 @@ impl Stream for FlightDataEncoder {
}

/// Defines how a [`FlightDataEncoder`] encodes [`DictionaryArray`]s
///
/// [`DictionaryArray`]: arrow_array::DictionaryArray
#[derive(Debug, PartialEq)]
pub enum DictionaryHandling {
/// Expands to the underlying type (default). This likely sends more data over the network
/// but requires less memory (dictionaries are not tracked) and is more compatible
/// with other arrow flight client implementations that may not support `DictionaryEncoding`
/// see [`hydrate_dictionary`] for more details.
/// Expands to the underlying type (default). This likely sends more data
/// over the network but requires less memory (dictionaries are not tracked)
/// and is more compatible with other arrow flight client implementations
/// that may not support `DictionaryEncoding`
///
/// An IPC response, streaming or otherwise, defines its schema up front
/// which defines the mapping from dictionary IDs. It then sends these
/// dictionaries over the wire.
///
/// This requires identifying the different dictionaries in use, assigning
/// them IDs, and sending new dictionaries, delta or otherwise, when needed
///
/// See also:
/// * <https://github.com/apache/arrow-rs/issues/1206>
Hydrate,
/// Send dictionary FlightData with every RecordBatch that contains a [`DictionaryArray`].
/// See [`Self::Hydrate`] for more tradeoffs. No attempt is made to skip sending the same (logical)
/// dictionary values twice.
/// Send dictionary FlightData with every RecordBatch that contains a
/// [`DictionaryArray`]. See [`Self::Hydrate`] for more tradeoffs. No
/// attempt is made to skip sending the same (logical) dictionary values
/// twice.
///
/// [`DictionaryArray`]: arrow_array::DictionaryArray
Resend,
}

Expand Down Expand Up @@ -493,20 +508,8 @@ fn prepare_batch_for_flight(
)?)
}

/// Hydrates a dictionary to its underlying type
///
/// An IPC response, streaming or otherwise, defines its schema up front
/// which defines the mapping from dictionary IDs. It then sends these
/// dictionaries over the wire.
///
/// This requires identifying the different dictionaries in use, assigning
/// them IDs, and sending new dictionaries, delta or otherwise, when needed
///
/// See also:
/// * <https://github.com/apache/arrow-rs/issues/1206>
///
/// For now we just hydrate the dictionaries to their underlying type. If send_dictionaries
/// is true, dictionaries are sent with every batch which is not as optimal as described above,
/// Hydrates a dictionary to its underlying type if send_dictionaries is false. If send_dictionaries
/// is true, dictionaries are sent with every batch which is not as optimal as described in [DictionaryHandling::Hydrate] above,
/// but does enable sending DictionaryArray's via Flight.
fn hydrate_dictionary(array: &ArrayRef, send_dictionaries: bool) -> Result<ArrayRef> {
let arr = match array.data_type() {
Expand Down