-
Notifications
You must be signed in to change notification settings - Fork 832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loosen nullability restrictions added in #3205 (#3226) #3244
Loosen nullability restrictions added in #3205 (#3226) #3244
Conversation
2caa0a4
to
dff7cba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- cc @Jefffrey
@@ -568,7 +562,9 @@ mod tests { | |||
} | |||
|
|||
#[test] | |||
#[should_panic(expected = "non-nullable field cannot have null values")] | |||
#[should_panic( | |||
expected = "non-nullable child of type Int32 contains nulls not present in parent Struct" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
arrow-data/src/data.rs
Outdated
if !field.is_nullable() { | ||
match nulls { | ||
Some(nulls) => { | ||
let element_len = *len as usize; | ||
let mut buffer = | ||
MutableBuffer::new_null(element_len * self.len); | ||
|
||
for i in 0..self.len { | ||
if !bit_util::get_bit(nulls.as_ref(), self.offset + i) { | ||
continue; | ||
} | ||
for j in 0..element_len { | ||
bit_util::set_bit( | ||
buffer.as_mut(), | ||
i * element_len + j, | ||
) | ||
} | ||
} | ||
let mask = buffer.into(); | ||
self.validate_non_nullable(Some(&mask), 0, child)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some comments about this mask (what it is and why it is built) might be helpful
None => return match data.null_count { | ||
0 => Ok(()), | ||
_ => Err(ArrowError::InvalidArgumentError(format!( | ||
"non-nullable child of type {} contains nulls not present in parent {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest making these two messages different so it is easier to find what error is being hit by grep
if someone hits the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, looks like there was some nuance I missed in regards to nested types & nullability 👍
Not at all, the ecosystem as a whole does not seem to handle this in a consistent manner. It certainly isn't documented 😆 |
Benchmark runs are scheduled for baseline = 06e1111 and contender = b155461. b155461 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #3226
Rationale for this change
#3205 added restrictions to prevent nulls in non-nullable children of a StructArray. Whilst on the face of it this sounds sensible, the issue is that nulls in StructArray and FixedSizeListArray both take up space in their children. This leads to the question of "what value should the child have for a null in its parent". In the case of DictionaryArray there isn't a valid placeholder value that can be used, the value at a null index is undefined.
This PR works around this by loosening the restriction to allow a null in a non-nullable child array provided it corresponds to a null in the parent. This ensures that the children are valid, and can be safely interpreted outside the context of their parent, whilst also ensuring that the semantics of nullability are preserved.
What changes are included in this PR?
Are there any user-facing changes?