Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Allow to run groupby/reduction with externally derived aggregations #16633

Open
ttnghia opened this issue Aug 21, 2024 · 1 comment · May be fixed by #17645 or #17249
Open

[FEA] Allow to run groupby/reduction with externally derived aggregations #16633

ttnghia opened this issue Aug 21, 2024 · 1 comment · May be fixed by #17645 or #17249
Assignees
Labels
feature request New feature or request

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Aug 21, 2024

This idea arose after many times trying to add new aggregations into the libcudf framework to accommodate specific use cases outside of cudf. However, most of the time, the application (Spark plugin) wants very special behaviors that cannot be accommodated. For example, for M2/MERGE_M2 aggregations, we want to output more than just one columns (the main M2 values as well as their intermediate values) for reuse somewhere else.

I would like to refactor the grouby/reduction framework such that it allows runing on aggregations extended outside of libcudf. By doing so, the downstream applications can implement any new, customized aggregations they want and call libcudf code on them. The outside aggregations just need to be implemented from classes derived from cudf base classes (cudf::groupby_aggregation for example).

Allowing extension like this would be very beneficial in the long term, allowing any downstream application to accommodate their needs and maximize performance gain. That would also help reduce maintenance efforts in the libcudf repository.

@ttnghia ttnghia added the feature request New feature or request label Aug 21, 2024
@davidwendt
Copy link
Contributor

We actually have precedence for custom UDF aggregation

/**
* @brief Factory to create an aggregation base on UDF for PTX or CUDA
*
* @param[in] type: either udf_type::PTX or udf_type::CUDA
* @param[in] user_defined_aggregator A string containing the aggregator code
* @param[in] output_type expected output type
*
* @return An aggregation containing a user-defined aggregator string
*/
template <typename Base = aggregation>
std::unique_ptr<Base> make_udf_aggregation(udf_type type,
std::string const& user_defined_aggregator,
data_type output_type);

Example usage in rolling here:
auto cuda_udf_agg = cudf::make_udf_aggregation<cudf::rolling_aggregation>(
cudf::udf_type::CUDA, cuda_func, cudf::data_type{cudf::type_id::INT64});

@ttnghia ttnghia self-assigned this Dec 13, 2024
rapids-bot bot pushed a commit that referenced this issue Dec 20, 2024
This implements `HOST_UDF` aggregation, allowing to execute a host-side user-defined function (UDF) through libcudf aggregation framework.
 * A host-side function can be an arbitrarily independent function running on the host machine. It may or may not call other device kernels depending on its implementation.
 * Such user-defined function must follow the libcudf provided interface (`cudf::host_udf_base`). The interface provides the ability to fully interact with libcudf aggregation framework.
 * Since it is implemented on the user application side, it has a very high degree of freedom to perform arbitrary operations to satisfy the user's need.

Partially contributes to #16633.

---
Usage
 1. Define a functor deriving from `cudf::host_udf_base` and implement the required virtual functions declared in that base struct. For example:
```
struct my_aggregation : cudf::host_udf_base {
   ...
};
```
 2. Create an instance of libcudf `HOST_UDF` aggregation which is constructed from an instance of the functor defined above. For example:
```
auto agg = cudf::make_host_udf_aggregation<cudf::groupby_aggregation>(
    std::make_unique<my_aggregation>());
```
 3. Perform aggregation operation on the created instance.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - Chong Gao (https://github.com/res-life)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - David Wendt (https://github.com/davidwendt)

URL: #17592
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
2 participants