Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine documentation to Array::is_null #4838

Merged
merged 7 commits into from
Sep 20, 2023
Merged

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Sep 19, 2023

Which issue does this PR close?

Closes #4835

Rationale for this change

The fact that NullArray::is_null() returns false is both consistent with the technical definition of physical/logical nulls, but also deeply confusing to a casual user, as explained on #4835

I don't think the implications of logical vs physical nullability are well understood by the arrow user community (and to be honest I am not sure they should in most cases).

Thus helping them find the right API for what they want to do would be incel

What changes are included in this PR?

  1. Update doc comments to explicitly mention the NullArray case (the one where I think it is the most deeply confusing, even though this does potentially apply to DictionaryArray and RunArray
  2. Add an Array::is_logical_null that returns the the logical nullability (mostly as a way to document the behavior and save downstream crates from having to handle NullArray specially)

Are there any user-facing changes?

New function and improved documentation

Questions

  1. What are your opinions on Array::is_logical_null
  2. Should I add an equivalent of Array::is_logical_valid to mirror Array::is_valid ?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 19, 2023
@alamb alamb marked this pull request as ready for review September 19, 2023 11:53
/// let array = NullArray::new(1);
/// assert_eq!(array.is_logical_null(0), true);
/// ```
fn is_logical_null(&self, index: usize) -> bool {
Copy link
Contributor

@tustvold tustvold Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at the very least we should provide an efficient implementation of this, instead of computing logical_nulls which could be very expensive.

In general I am really not a fan of adding this method, it is a fairly major potential performance footgun

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

///
/// // NullArrays do not have a validity mask
/// let array = NullArray::new(1);
/// assert_eq!(array.is_null(0), false);
/// ```
fn is_null(&self, index: usize) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've said this before but I think from a user PoV, having is_null and is_logical_null is confusing as hell. Which NULL is is_null?! Yeah, historically this is the physical null but do most users really care about the physical repr.? I would argue that at least this method should be called is_physical_null to force users to think about what kind of null they want, instead of tricking them into using the wrong implicit default for their use case.

Copy link
Contributor

@tustvold tustvold Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that is_null should always return logical nullability? What about for RunArray where this would have O(log(n)) complexity? What about null_count? The only consistent thing I can see is to only ever return physical nullability...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that is_null should be renamed to is_physical_null (potentially w/ a soft deprecation period) to avoid that users accidentally pick the wrong method.

You make a good point regarding null_count. My argument would be: rename that one as well, to physical_null_count. Then it's clear to which semantic you're referring to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #4840 to track

@alamb
Copy link
Contributor Author

alamb commented Sep 19, 2023

@tustvold and I talked about this earlier today and what I suggest is:

  1. I will remove the code changes, and update the documentation
  2. I will flle a follow on ticket where we can discuss what, if anything, to do about physical and logical nulls

@alamb alamb changed the title Add documentation and Array::is_logical_null Refine documentation to Array::is_null Sep 19, 2023
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just some minor copy alterations, although one of the comments was incorrect

arrow-array/src/array/mod.rs Outdated Show resolved Hide resolved
arrow-array/src/array/mod.rs Outdated Show resolved Hide resolved
arrow-array/src/array/mod.rs Outdated Show resolved Hide resolved
arrow-array/src/array/mod.rs Outdated Show resolved Hide resolved
@alamb alamb merged commit f9cd26f into apache:master Sep 20, 2023
@alamb alamb deleted the alamb/null branch September 20, 2023 21:04
ryanaston pushed a commit to segmentio/arrow-rs that referenced this pull request Nov 6, 2023
* Add documentation and Array::is_logical_null

* Remove code change, refine comments

* fix docs

* Apply suggestions from code review

Co-authored-by: Raphael Taylor-Davies <[email protected]>

* Fix link formatting

---------

Co-authored-by: Raphael Taylor-Davies <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NullArray::is_null() returns false incorrectly
3 participants