-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data type field to NullArray
#5173
Comments
I'm not sure this will work, aside from this diverging from the arrow standard, DataType must correspond to the appropriate array variant. Perhaps you might consider using RunArray instead? |
BTW, I also find the behavior of /// \brief Return true if value at index is null. Does not boundscheck
bool IsNull(int64_t i) const { return !IsValid(i); }
/// \brief Return true if value at index is valid (not null). Does not
/// boundscheck
bool IsValid(int64_t i) const {
if (null_bitmap_data_ != NULLPTR) {
return bit_util::GetBit(null_bitmap_data_, i + data_->offset);
}
// Dispatching with a few conditionals like this makes IsNull more
// efficient for how it is used in practice. Making IsNull virtual
// would add a vtable lookup to every call and prevent inlining +
// a potential inner-branch removal.
if (type_id() == Type::SPARSE_UNION) {
return !internal::IsNullSparseUnion(*data_, i);
}
if (type_id() == Type::DENSE_UNION) {
return !internal::IsNullDenseUnion(*data_, i);
}
if (type_id() == Type::RUN_END_ENCODED) {
return !internal::IsNullRunEndEncoded(*data_, i);
}
return data_->null_count != data_->length;
} Hence I come across another question: should we strive to maintain consistent behavior of common APIs across various implementations? |
See #4840
Where possible yes, but keeping the APIs coherent across the crate is more important. Some divergence is to be expected |
Thanks for replying. I find what confuses me is how to understand the But when From the spec I can't figure out which way accord with the
|
NullArray is a bit of an odd one, but it is for the case where there is no value type known or possible, e.g. a SQL NULL literal, and it is therefore necessary to provide a container with no value type. There may be other use-cases, but that is the major one. The notion to then include a type on a container that is by design untyped is a little surprising to me |
My opinion is that the behavior described on #4835 / #4840 may be pedantically correct but is incredibly confusing from a user perspective
The idea of saving memory for null only arrays sounds like a very reasonable usecase to me and it is my understanding of one of the main uses of the Representing this idea via the existing I wonder if you could use a I realize the support for RunEndEncoded is relatively sparse at the moment, but maybe this could be a reason to improve it 🤔 |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
NullArray::data_type()
returnsNull
type unconditionally. This limits its use case toNull
s only array. I'm trying to leverageNullArray
in some scenarios where I know all the values of an array areNull
(but typed). This can save lots of memory.Describe the solution you'd like
Add a
data_type
field toNullArray
, something likeI've referenced the CPP
NullArray
implementation, which inherits a completeArrayData
fromArray
class. We don't need to do the same inarrow-rs
but can consider adding useful fields likelen
anddata_type
separately.Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: