Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine documentation to Array::is_null #4838

Merged
merged 7 commits into from
Sep 20, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 22 additions & 18 deletions arrow-array/src/array/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -173,52 +173,56 @@ pub trait Array: std::fmt::Debug + Send + Sync {
/// ```
fn offset(&self) -> usize;

/// Returns the null buffer of this array if any
/// Returns the null buffer of this array if any.
///
/// Note: some arrays can encode their nullability in their children, for example,
/// The null buffer encodes the "physical" nulls of an array.
/// However, some arrays can also encode nullability in their children, for example,
/// [`DictionaryArray::values`] values or [`RunArray::values`], or without a null buffer,
/// such as [`NullArray`]. Use [`Array::logical_nulls`] to obtain a computed mask encoding this
/// such as [`NullArray`]. To determine if each element of such an array is logically null,
/// you can use the slower [`Array::logical_nulls`] to obtain a computed mask .
fn nulls(&self) -> Option<&NullBuffer>;

/// Returns the logical null buffer of this array if any
/// Returns a potentially computed [`NullBuffer`] that represent the logical null values of this array, if any.
///
/// In most cases this will be the same as [`Array::nulls`], except for:
///
/// * DictionaryArray where [`DictionaryArray::values`] contains nulls
/// * RunArray where [`RunArray::values`] contains nulls
/// * NullArray where all indices are nulls
/// * [`DictionaryArray`] where [`DictionaryArray::values`] contains nulls
/// * [`RunArray`] where [`RunArray::values`] contains nulls
/// * [`NullArray`] where all indices are nulls
///
/// In these cases a logical [`NullBuffer`] will be computed, encoding the logical nullability
/// of these arrays, beyond what is encoded in [`Array::nulls`]
fn logical_nulls(&self) -> Option<NullBuffer> {
self.nulls().cloned()
}

/// Returns whether the element at `index` is null.
/// When using this function on a slice, the index is relative to the slice.
/// Returns whether the element at `index` is null according to [`Array::nulls`]
///
/// Note: this method returns the physical nullability, i.e. that encoded in [`Array::nulls`]
/// see [`Array::logical_nulls`] for logical nullability
/// Note: For performance reasons, this method returns nullability solely as determined by the
/// null buffer. This difference can lead to surprising results, for example, [`NullArray::is_null`] always
/// returns `false` as the array lacks a null buffer. Similarly [`DictionaryArray`] and [`RunArray`] may
/// encode nullability in their children. See [`Self::logical_nulls`] for more information.
///
/// # Example:
///
/// ```
/// use arrow_array::{Array, Int32Array};
/// use arrow_array::{Array, Int32Array, NullArray};
///
/// let array = Int32Array::from(vec![Some(1), None]);
///
/// assert_eq!(array.is_null(0), false);
/// assert_eq!(array.is_null(1), true);
///
/// // NullArrays do not have a null buffer, and therefore always
/// // return false for is_null.
/// let array = NullArray::new(1);
/// assert_eq!(array.is_null(0), false);
/// ```
fn is_null(&self, index: usize) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've said this before but I think from a user PoV, having is_null and is_logical_null is confusing as hell. Which NULL is is_null?! Yeah, historically this is the physical null but do most users really care about the physical repr.? I would argue that at least this method should be called is_physical_null to force users to think about what kind of null they want, instead of tricking them into using the wrong implicit default for their use case.

Copy link
Contributor

@tustvold tustvold Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that is_null should always return logical nullability? What about for RunArray where this would have O(log(n)) complexity? What about null_count? The only consistent thing I can see is to only ever return physical nullability...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that is_null should be renamed to is_physical_null (potentially w/ a soft deprecation period) to avoid that users accidentally pick the wrong method.

You make a good point regarding null_count. My argument would be: rename that one as well, to physical_null_count. Then it's clear to which semantic you're referring to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #4840 to track

self.nulls().map(|n| n.is_null(index)).unwrap_or_default()
}

/// Returns whether the element at `index` is not null.
/// When using this function on a slice, the index is relative to the slice.
///
/// Note: this method returns the physical nullability, i.e. that encoded in [`Array::nulls`]
/// see [`Array::logical_nulls`] for logical nullability
/// Returns whether the element at `index` is *not* null, the
/// opposite of [`Self::is_null`].
///
/// # Example:
///
Expand Down