-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Window operator with NaN as the frame bound #11293
Conversation
This pull request was exported from Phabricator. Differential Revision: D64510519 |
✅ Deploy Preview for meta-velox canceled.
|
ae5ba10
to
aaace8e
Compare
) Summary: NaN could appear as the frame bound of k-range frames in window operations. When NaN appear in either of the frame bounds, the window operation should return NULL at this row. This diff fixes facebookincubator#11213. Differential Revision: D64510519
This pull request was exported from Phabricator. Differential Revision: D64510519 |
) Summary: NaN could appear as the frame bound of k-range frames in window operations. When NaN appear in either of the frame bounds, the window operation should return NULL at this row. This diff fixes facebookincubator#11213. Differential Revision: D64510519
aaace8e
to
b249ee4
Compare
This pull request was exported from Phabricator. Differential Revision: D64510519 |
) Summary: NaN could appear as the frame bound of k-range frames in window operations. When NaN appear in either of the frame bounds, the window operation should return NULL at this row. This diff fixes facebookincubator#11213. Differential Revision: D64510519
b249ee4
to
06c9e58
Compare
This pull request was exported from Phabricator. Differential Revision: D64510519 |
@kagamiori : Quick question : This logic should be applicable for k row frames as well. If yes, please modify https://github.com/facebookincubator/velox/blob/main/velox/exec/Window.cpp#L347 as well. |
Hi @aditi-pandit, NaN shouldn't be an issue to k rows frames because rows frames require the frame bounds to be integral while NaN is floating-point. Let me know if I misunderstand your question. |
@kagamiori : Makes sense about k row frame. Though, we should add a validation in WindowNode for rejecting NaN value for k range constant frame as well https://github.com/facebookincubator/velox/blob/main/velox/core/PlanNode.cpp#L1412 |
velox/exec/WindowPartition.cpp
Outdated
@@ -383,6 +384,20 @@ void WindowPartition::updateKRangeFrameBounds( | |||
orderByRowColumn.nullByte(), | |||
orderByRowColumn.nullMask())); | |||
|
|||
// Mark the frame invalid if the frame bound is NaN. | |||
auto frameType = data_->columnTypes()[mappedFrameColumn]; | |||
if (frameType->isReal()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could avoid this check on a per row basis. They can be computed outside the loop or the outer function could pass to the inner function if it should do a float or double NaN check. You can even go as far as templatizing this function with boolean parameters for doing floating/double NaN check, and then you can avoid the impact of the conditionals at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kagamiori thanks for the fix % minors!
velox/exec/RowContainer.h
Outdated
typename T, | ||
std::enable_if_t<std::is_floating_point_v<T>, int32_t> = 0> | ||
bool isNanAt(const char* row, int32_t columnIndex) { | ||
auto column = columnAt(columnIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const auto
velox/exec/WindowPartition.h
Outdated
@@ -158,14 +158,18 @@ class WindowPartition { | |||
/// @param numRows number of rows to compute buffer for. | |||
/// @param rawPeerStarts buffer of peer row values for each row. If the frame | |||
/// column is null, then its peer row value is the frame boundary. | |||
/// @param rawFrameBounds SelectivityVector to keep track of valid frames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment correct? validFrames?
velox/exec/WindowPartition.cpp
Outdated
@@ -383,6 +384,20 @@ void WindowPartition::updateKRangeFrameBounds( | |||
orderByRowColumn.nullByte(), | |||
orderByRowColumn.nullMask())); | |||
|
|||
// Mark the frame invalid if the frame bound is NaN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we directly call this method and return false for non floating data type?
@kagamiori : I realized later that k range frames (even if constant) always have range frame column. So we don't get the constant k value in WindowNode PlanNode frame. So it won't be possible for us to reject this case in Velox. We should error it in the co-ordinator. The best at runtime is to return NULL values for the entire result. Ideally we shouldn't have such an inconsistent implementation. But we will need to do more design work if we want to do this validation. |
Hi @aditi-pandit, I see. Just to ensure we're on the same page, so for k range frames with either constant or column bounds, Presto coordinator is going to translate the frame bound into the form acceptable by Velox, so we'll let coordinator error out when the frame bound contains NaN. Is this right? |
@kagamiori : Yes. Let me elaborate to reduce confusion. We have 2 error situations with range frames and Nan :
Its possible that all column values in the range frame column are NaN and so all window values returned at NULL as well, but that is expected behavior. To me that is a point of inconsistency between the RANGE frame behavior between the constant and column bound as something that looks similar has different results. We should try to avoid it in general. But the only option to fix it would be to pass the original frame value to Velox. Definitely something we can do with some design work. |
Hi @aditi-pandit, thank you for the reply! Could you help me understand this?
I thought that for ==> Update: I've just realized...is it that Presto co-ordinator wouldn't process column bound of k range frames, but would add project node before the window node in the query plan to do that process in workers? |
@kagamiori :: Yes. Presto co-ordinator adds project node. It doesn't look at the column values. |
06c9e58
to
4acb017
Compare
) Summary: NaN could appear as the frame bound of k-range frames in window operations. When NaN appear in either of the frame bounds, the window operation should return NULL at this row. This diff fixes facebookincubator#11213. Differential Revision: D64510519
This pull request was exported from Phabricator. Differential Revision: D64510519 |
velox/exec/WindowPartition.cpp
Outdated
@@ -368,6 +369,23 @@ void WindowPartition::updateKRangeFrameBounds( | |||
// reordered as per inputMapping_. | |||
RowColumn frameRowColumn = columns_[frameColumn]; | |||
RowColumn orderByRowColumn = data_->columnAt(orderByColumn); | |||
|
|||
// Mark the frame invalid if the frame bound is NaN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kagamiori : We should avoid 2 loops over all the rows. That is expensive.
The idea was to determine which NaN check (float or double or none) once. We could abstract a function for the loop over the rows and pass the Nan check as a template parameter to it. Using the compile time parameter we can pick up the right NaN check within the original loop.
Something like (this is just for pseudocode, not perfect code)
for (auto i = 0; i < numRows; i++) {
auto currentRow = startRow + i;
auto* partitionRow = partition_[currentRow];
// The user is expected to set the frame column equal to NULL when the
// ORDER BY value is NULL and not in any other case. Validate this
// assumption.
VELOX_DCHECK_EQ(
RowContainer::isNullAt(
partitionRow, frameRowColumn.nullByte(), frameRowColumn.nullMask()),
RowContainer::isNullAt(
partitionRow,
orderByRowColumn.nullByte(),
orderByRowColumn.nullMask()));
if constexpr (floatNan) {
if (data_->isNanAt<float>(partition_[i], mappedFrameColumn)) {
validFrames.setValid(i, false);
continue;
}
} else if constexpr (doubleNan) {
if (data_->isNanAt<double>(partition_[i], mappedFrameColumn)) {
validFrames.setValid(i, false);
continue;
}
}
// If the frame is NULL or 0 preceding or 0 following then the current row
// has same values for order by and frame column. In that case
// the bound matches the peer row for this row.
if (data_->compare(
partitionRow,
partitionRow,
orderByColumn,
mappedFrameColumn,
flags) == 0) {
rawFrameBounds[i] = rawPeerBounds[i];
} else {
// If the search is for a preceding bound then rows between
// [0, currentRow] are examined. For following bounds, rows between
// [currentRow, numRows()) are checked.
....
}
) Summary: NaN could appear as the frame bound of k-range frames in window operations. When NaN appear in either of the frame bounds, the window operation should return NULL at this row. This diff fixes facebookincubator#11213. Differential Revision: D64510519
27eac93
to
f5f5280
Compare
This pull request was exported from Phabricator. Differential Revision: D64510519 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
rawPeerBuffer, | ||
rawFrameBounds, | ||
validFrames); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use VELOX_DYNAMIC_SCALAR_TYPE_DISPATCH_ALL here instead of passing void?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @xiaoxmeng, I intentionally attempted to avoid using VELOX_DYNAMIC_SCALAR_TYPE_DISPATCH_ALL. This will cause the code instantiation for every type, but we don't really differentiate the code for non-floating-point types.
rawPeerBuffer, | ||
rawFrameBounds, | ||
validFrames); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use VELOX_DYNAMIC_SCALAR_TYPE_DISPATCH_ALL here instead of passing void?
rawPeerBuffer, | ||
rawFrameBounds, | ||
validFrames); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use VELOX_DYNAMIC_SCALAR_TYPE_DISPATCH_ALL here instead of passing void for non-float or double type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kagamiori thanks for the offline explanation!
for (auto i = 0; i < numRows; i++) { | ||
auto currentRow = startRow + i; | ||
auto* partitionRow = partition_[currentRow]; | ||
|
||
// Mark the frame invalid if the frame bound is NaN. | ||
if constexpr (std::is_floating_point_v<T>) { | ||
if (data_->isNanAt<T>(partitionRow, mappedFrameRowColumn) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment to describe this edge case as chat offline? thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider to add a unit test to cover the new row container API as well as have a test to cover both frame bound and order by columns are nan case? thanks!
) Summary: NaN could appear as the frame bound of k-range frames in window operations. When NaN appear in either of the frame bounds, the window operation should return NULL at this row. This diff fixes facebookincubator#11213. Reviewed By: xiaoxmeng Differential Revision: D64510519
f5f5280
to
84c41ed
Compare
This pull request was exported from Phabricator. Differential Revision: D64510519 |
) Summary: NaN could appear as the frame bound of k-range frames in window operations. When NaN appear in either of the frame bounds, the window operation should return NULL at this row. This diff fixes facebookincubator#11213. Reviewed By: xiaoxmeng Differential Revision: D64510519
84c41ed
to
a368dc8
Compare
This pull request was exported from Phabricator. Differential Revision: D64510519 |
Hi @aditi-pandit, I found that if the order-by value is NaN, it would cause the frame bounds to be NaN too, even if the original frame offsets are not NaN. E.g.,
In this case, Presto makes the frame to always contain only the peer rows of the current row, even if the frame is logically empty, e.g., range between 10.0 preceding and 20.0 preceding. However, the Velox window operator cannot tell whether NaN in frame bounds comes from the frame offset or the order-by value and will return NULL after this fix. Would it make sense to disallow NaN in order-by column for window operations as well (given that logically empty frames are not handled correctly)? If not, the window operator will need to know the original frame offsets specified in the SQL queries. |
Hi Wei, I'm a bit confused by your question. You are saying that generally "range between 10.0 preceding and 20.0 preceding" is an empty frame, but Presto return results like (1, 3). I see that other systems like DuckDB are also returning similar results
So likely they are expected results. Now coming to Velox: Your fix is also doing:
So it catches when the frame is NaN but the orderBy is not NaN then mark the frame as empty.. else it uses searchFrameValue NaN in orderBy which should always align to the ends of the partition. Does that not happen right now ? Does it give a value beyond the partition start or end causing the empty frames check to catch those bounds ? |
Hi @aditi-pandit, yeah, let me clarify. I added the condition Thinking about it twice, I think we can return the same result as Presto for now. And when we reject the queries with NaN frame offsets, we can handle this situation together. |
@kagamiori : In addition to this PR, we also need a Presto side change to reject frames with NaN as a frame bound. This Velox PR is only for cases where NaN appears in the frame column at certain individual rows. But yeah, we cannot distinguish if NaN in frame is coming from ORDER BY or the NaN frame bound. See previous discussion #11293 (comment) |
@kagamiori : I see. I think the new condition works well. As far as being able to distinguish if NaN in frame column is from order by or frame column, Velox is not setup that way. We will have to explicitly reject NaN frames in Presto with a separate fix. The inconsistency leads to issues like #11293 (comment) |
@kagamiori : Needn't be in this same PR (though better for it), but we should add documentation for this NaN behavior in the section on k range frames in the window function developer documentation at https://facebookincubator.github.io/velox/develop/window.html#k-range-frames |
) Summary: NaN could appear as the frame bound of k-range frames in window operations. When NaN appear in either of the frame bounds, the window operation should return NULL at this row. This diff fixes facebookincubator#11213. Reviewed By: xiaoxmeng Differential Revision: D64510519
a368dc8
to
1fe00b6
Compare
) Summary: NaN could appear as the frame bound of k-range frames in window operations. When NaN appear in either of the frame bounds, the window operation should return NULL at this row. This diff fixes facebookincubator#11213. Reviewed By: xiaoxmeng Differential Revision: D64510519
1fe00b6
to
9f1b807
Compare
This pull request was exported from Phabricator. Differential Revision: D64510519 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D64510519 |
@aditi-pandit, Thanks! I'll update the documentation in a separate PR. |
This pull request has been merged in cce529e. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary:
NaN could appear as the frame bound of k-range frames in window
operations. When NaN appear in either of the frame bounds, the window
operation should return NULL at this row.
This diff fixes #11213.
Differential Revision: D64510519