-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement HOST_UDF
aggregation for reduction and groupby
#17249
Conversation
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
MERGE_TDIGEST(32), // This can take a delta argument for accuracy level | ||
HISTOGRAM(33), | ||
MERGE_HISTOGRAM(34); | ||
HOST_UDF(26), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Append HOST_UDF
to the tail? then do not touch other aggregation values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately HOST_UDF
is classified as "UDF" which should be grouped together with other UDF kinds. That's why I have to insert it in the middle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The majority of the complexity in host_udf_base
arises from its handling of both reduction and groupby operations. I mainly went through examples and tests and found this feature quite powerful overall. However, the design supporting both reduction and groupby could use some refinement. For now, I recommend focusing exclusively on groupby in the current work, which will help define the simplest possible interface for this new aggregation. Once that's established, we can extend the feature to include reduction. This approach will also help refine the scope of the PR, making it easier to review. @ttnghia What do you think?
rmm::cuda_stream_view stream; | ||
rmm::device_async_resource_ref mr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A larger scope of changes is required since the given stream and mr need to be used for all the factories like make_lists_column
, etc.
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
This will be closed and replaced by multiple smaller PRs for easier review: |
cpp/tests/CMakeLists.txt
Outdated
@@ -122,44 +122,7 @@ ConfigureTest(TIMESTAMPS_TEST wrappers/timestamps_test.cu) | |||
# * groupby tests --------------------------------------------------------------------------------- | |||
ConfigureTest( | |||
GROUPBY_TEST | |||
groupby/argmin_tests.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't intend for this change to be checked in, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this was temporarily modified for local tests, and should be reverted now in the new PR.
This implements
HOST_UDF
aggregation, allowing to execute a host-side user-defined function (UDF) through libcudf aggregation framework.cudf::host_udf_base
). The interface provides the ability to fully interact with libcudf aggregation framework.Closes #16633.
Usage
cudf::host_udf_base
and implement the required virtual functions declared in that base struct. For example:HOST_UDF
aggregation which is constructed from an instance of the functor defined above. For example: