Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support setting key field in MapBuilder #7101

Merged
merged 8 commits into from
Feb 12, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 75 additions & 8 deletions arrow-array/src/builder/map_builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ pub struct MapBuilder<K: ArrayBuilder, V: ArrayBuilder> {
field_names: MapFieldNames,
key_builder: K,
value_builder: V,
key_field: Option<FieldRef>,
value_field: Option<FieldRef>,
}

Expand Down Expand Up @@ -107,13 +108,32 @@ impl<K: ArrayBuilder, V: ArrayBuilder> MapBuilder<K, V> {
field_names: field_names.unwrap_or_default(),
key_builder,
value_builder,
key_field: None,
value_field: None,
}
}

/// Override the field passed to [`MapBuilder::new`]
///
/// By default a nullable field is created with the name `values`
/// By default, a nullable field is created with the name `keys`
///
/// This function panics if the given field is nullable as map keys are not
/// allowed to be null
///
/// Note: [`Self::finish`] and [`Self::finish_cloned`] will panic if the
/// field's data type does not match that of `K`
pub fn with_keys_field(self, field: impl Into<FieldRef>) -> Self {
let field: FieldRef = field.into();
assert!(!field.is_nullable(), "Key field must not be nullable");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return Err instead? Given it's user input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

I'm thinking now this check might be unnecessary because finish asserts here that there are no null-keys anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've removed checking for nullability of the key field. I didn't love that MapBuilder::with_keys_field returns a Result while MapBuilder::with_values_field or GenericListBuilder::with_field do not. We still ensure that we protect against nulls with the existing assertion in finish.

Happy to go back to returning Result. Not feeling strongly about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need this check; the check in finish() only checks for null values, but the datatype of the key field itself should always be non-nullable. An array could have no null values but still a datatype with nullable = true I believe.

I can see how it's awkward for with_keys_field to be the only one returning Result, but it makes sense to me since it has a pre-condition to guard against, and we should take advantage of Rust's type system where possible instead of documenting panics and hoping users read the documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Updated

Copy link
Contributor

@tustvold tustvold Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given a nullable map field is simply out of spec, I personally don't think panicking is wrong and tbh for this sort of invariant it is arguably more correct...

I don't fell strongly but panicking would be more consistent with the rest of the builders

Self {
key_field: Some(field),
..self
}
}

/// Override the field passed to [`MapBuilder::new`]
///
/// By default, a nullable field is created with the name `values`
///
/// Note: [`Self::finish`] and [`Self::finish_cloned`] will panic if the
/// field's data type does not match that of `V`
Expand Down Expand Up @@ -194,11 +214,14 @@ impl<K: ArrayBuilder, V: ArrayBuilder> MapBuilder<K, V> {
keys_arr.null_count()
);

let keys_field = Arc::new(Field::new(
self.field_names.key.as_str(),
keys_arr.data_type().clone(),
false, // always non-nullable
));
let keys_field = match &self.key_field {
Some(f) => f.clone(),
None => Arc::new(Field::new(
self.field_names.key.as_str(),
keys_arr.data_type().clone(),
false, // always non-nullable
)),
};
let values_field = match &self.value_field {
Some(f) => f.clone(),
None => Arc::new(Field::new(
Expand Down Expand Up @@ -262,10 +285,10 @@ impl<K: ArrayBuilder, V: ArrayBuilder> ArrayBuilder for MapBuilder<K, V> {

#[cfg(test)]
mod tests {
use super::*;
use crate::builder::{make_builder, Int32Builder, StringBuilder};
use crate::{Int32Array, StringArray};

use super::*;
use std::collections::HashMap;

#[test]
#[should_panic(expected = "Keys array must have no null values, found 1 null value(s)")]
Expand Down Expand Up @@ -377,4 +400,48 @@ mod tests {
)
);
}

#[test]
fn test_with_keys_field() {
let mut key_metadata = HashMap::new();
key_metadata.insert("foo".to_string(), "bar".to_string());
let key_field = Arc::new(
Field::new("keys", DataType::Int32, false).with_metadata(key_metadata.clone()),
);
let mut builder = MapBuilder::new(None, Int32Builder::new(), Int32Builder::new())
.with_keys_field(key_field.clone());
builder.keys().append_value(1);
builder.values().append_value(2);
builder.append(true).unwrap();
let map = builder.finish();

assert_eq!(map.len(), 1);
assert_eq!(
map.data_type(),
&DataType::Map(
Arc::new(Field::new(
"entries",
DataType::Struct(
vec![
Arc::new(
Field::new("keys", DataType::Int32, false)
.with_metadata(key_metadata)
),
Arc::new(Field::new("values", DataType::Int32, true))
]
.into()
),
false,
)),
false
)
);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for sanity could we get a test that it panics if you try to specify a keys field with the wrong datatype as well

#[test]
#[should_panic(expected = "Key field must not be nullable")]
fn test_with_nullable_keys_field() {
let mut builder = MapBuilder::new(None, Int32Builder::new(), Int32Builder::new())
.with_keys_field(Arc::new(Field::new("keys", DataType::Int32, true)));
}
}
1 change: 1 addition & 0 deletions arrow-array/src/builder/struct_builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,7 @@ pub fn make_builder(datatype: &DataType, capacity: usize) -> Box<dyn ArrayBuilde
value_builder,
capacity,
)
.with_keys_field(fields[0].clone())
.with_values_field(fields[1].clone()),
)
}
Expand Down
Loading