-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug fix] Append() in Nullable columns causes nulls_ array to be incorrectly appended to if nested column Append() fails #376
base: master
Are you sure you want to change the base?
Conversation
If Append() of the nested column fails, the nulls column will be incremented despite nothing being appended. It will also be marked as not null, causing an error to be thrown if this new "fake" value in the column is accessed later. this fixes that by moving the appending of the nulls markers to the end of the Append() operation of the nested column
if (value.has_value()) { | ||
typed_nested_data_->Append(std::move(*value)); | ||
} else { | ||
typed_nested_data_->Append(typename ValueType::value_type{}); | ||
} | ||
// Must be called after Append to nested column because it might fail | ||
ColumnNullable::Append(!value.has_value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may also fail, so perhaps we can make ColumnNullable::Append(!value.has_value());
the first operation, wrap second part in try-catch (or with some RAII wrapper), and rollback last value in nulls_
in case of error. (perhaps by introducing protected method ColumnNullable::PopBackNull()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the wanted behavior for Append()
to fail if the nested Append()
fails too? For example, if we append a 10-character string to ColumnFixedString(5)
, it will throw a validation error:
string.cpp, line 38:
if (str.size() > string_size_) {
throw ValidationError("Expected string of length not greater than "
+ std::to_string(string_size_) + " bytes, received "
+ std::to_string(str.size()) + " bytes.");
}
So, my assumption was that Nullable(ColumnFixedString(5))
should behave the same way as the nested column and also throw an error instead of catching the error and failing silently. By moving ColumnNullable::Append(!value.has_value());
to the end, we ensure that we throw an error early, matching the behavior of the nested column without causing unintended side effects.
However, I may have misunderstood the intended behavior. Could you please clarify further?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it needs to fail, but also the behavior must be consistent. I.e. nulls_->Size()
must be the same as typed_nested_data_->Size()
, which is not true with current implementation (and your proposed change).
So I just propose the way to rollback changes of nulls_
, since it is the easiest one to do, if there is any exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please provide a specific example of how it leads to inconsistencies? I can't seem to think of one. This is my understanding:
We have 3 cases:
1- value is not null and nested Append()
does not fail:
- we call
typed_nested_data_->Append(std::move(*value));
which does not failtyped_nested_data_
size = N+1,nulls_
size = N- we call
ColumnNullable::Append(!value.has_value());
typed_nested_data_
size = N+1,nulls_
size = N+1- we have consistent sizes
2- value is not null and nested Append
fails:
- we call
typed_nested_data_->Append(std::move(*value));
which fails- an error is thrown so we exit the function
typed_nested_data_
size = N,nulls_
size = N- we have consistent sizes
3- value is null:
- we call
typed_nested_data_->Append(typename ValueType::value_type{});
which cannot fail- we call
ColumnNullable::Append(!value.has_value());
typed_nested_data_
size = N+1,nulls_
size = N+1- we have consistent sizes
I have also tested those 3 cases and it passed them:
const auto col = new ColumnNullableT<ColumnFixedString>(5);
col->Append("hello");
CHECK(col->Size() == 1);
CHECK(col->typed_nested_data_->Size() == 1);
CHECK_THROWS(col->Append("world!"));
CHECK(col->Size() == 1);
CHECK(col->typed_nested_data_->Size() == 1);
col->Append(std::nullopt);
CHECK(col->Size() == 2);
CHECK(col->typed_nested_data_->Size() == 2);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but what if ColumnNullable::Append(!value.has_value());
fails?
In that case we have typed_nested_data_->Size() == ColumnNullable::Size() + 1
and hence the ColumnNullableT
instance is in inconsistent state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some changes required
Hi @moaazassali thank you for contribution! Please fix the issue mentioned in the review, and then we can merge it. |
If
Append()
of the nested column fails, thenulls_
column will be incremented despite nothing being appended.This causes the following issues:
1- The
Size()
reported by theNullable<T>
column will be wrong since it depends on the size ofnulls_
.2- If you try to access this new "fake" element of the
Nullable<T>
column, an error will be thrown becausefalse
would be appended tonulls_
indicating that a valid non-null element exists. WhenAt()
or[]
is used to access it, the actual nested column does not have this value, which causes the error to be thrown.3- This is especially the case for columns like
FixedString
which could throw.This pull request resolves that by moving the appending of
nulls_
to the end of theAppend()
operation of the nested column. If the nested columnAppend()
does fail, then theAppend()
tonulls_
is never called.