Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loosen nullability restrictions added in #3205 (#3226) #3244

Merged
merged 6 commits into from
Dec 5, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Nov 30, 2022

Which issue does this PR close?

Closes #3226

Rationale for this change

#3205 added restrictions to prevent nulls in non-nullable children of a StructArray. Whilst on the face of it this sounds sensible, the issue is that nulls in StructArray and FixedSizeListArray both take up space in their children. This leads to the question of "what value should the child have for a null in its parent". In the case of DictionaryArray there isn't a valid placeholder value that can be used, the value at a null index is undefined.

This PR works around this by loosening the restriction to allow a null in a non-nullable child array provided it corresponds to a null in the parent. This ensures that the children are valid, and can be safely interpreted outside the context of their parent, whilst also ensuring that the semantics of nullability are preserved.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 30, 2022
@tustvold tustvold force-pushed the loosen-nullability-restriction branch from 2caa0a4 to dff7cba Compare November 30, 2022 11:28
@tustvold tustvold requested a review from alamb December 3, 2022 09:37
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- cc @Jefffrey

arrow-array/src/array/mod.rs Show resolved Hide resolved
arrow-array/src/array/string_array.rs Show resolved Hide resolved
@@ -568,7 +562,9 @@ mod tests {
}

#[test]
#[should_panic(expected = "non-nullable field cannot have null values")]
#[should_panic(
expected = "non-nullable child of type Int32 contains nulls not present in parent Struct"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 1036 to 1055
if !field.is_nullable() {
match nulls {
Some(nulls) => {
let element_len = *len as usize;
let mut buffer =
MutableBuffer::new_null(element_len * self.len);

for i in 0..self.len {
if !bit_util::get_bit(nulls.as_ref(), self.offset + i) {
continue;
}
for j in 0..element_len {
bit_util::set_bit(
buffer.as_mut(),
i * element_len + j,
)
}
}
let mask = buffer.into();
self.validate_non_nullable(Some(&mask), 0, child)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some comments about this mask (what it is and why it is built) might be helpful

None => return match data.null_count {
0 => Ok(()),
_ => Err(ArrowError::InvalidArgumentError(format!(
"non-nullable child of type {} contains nulls not present in parent {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making these two messages different so it is easier to find what error is being hit by grep if someone hits the error

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, looks like there was some nuance I missed in regards to nested types & nullability 👍

@tustvold
Copy link
Contributor Author

tustvold commented Dec 5, 2022

Thanks for this, looks like there was some nuance I missed in regards to nested types & nullability

Not at all, the ecosystem as a whole does not seem to handle this in a consistent manner. It certainly isn't documented 😆

@tustvold tustvold merged commit b155461 into apache:master Dec 5, 2022
@ursabot
Copy link

ursabot commented Dec 5, 2022

Benchmark runs are scheduled for baseline = 06e1111 and contender = b155461. b155461 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new_null_array Panics creating StructArray with non-nullable fields
4 participants