-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Hyper Log Log PLus Plus(HLL++) #11638
base: branch-24.12
Are you sure you want to change the base?
Conversation
d42d80a
to
1945192
Compare
} | ||
} | ||
|
||
case class GpuHLL(childExpr: Expression, relativeSD: Double) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let' call by full name like GpuHyperLogLogPlusPlus
to better reflect the CPU version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ReductionAggregation.HLL(numRegistersPerSketch), DType.STRUCT) | ||
override lazy val groupByAggregate: GroupByAggregation = | ||
GroupByAggregation.HLL(numRegistersPerSketch) | ||
override val name: String = "CudfHLL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if "PlusPlus" is necessary.
override val name: String = "CudfHLL" | |
override val name: String = "CudfHyperLogLogPlusPlus" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
0a4939f
to
eb00c2b
Compare
Signed-off-by: Chong Gao <[email protected]>
Description
Spark
approx_count_distinct
description linkSpark accepts one column(can be nested column) and a double literal relativeSD.
Currently only support
TypeSig.cpuAtomics
types, next will support nested types.Building is blocked, depending on JNI/cuDF PRs.
TODO
Perf test
correctness
Please look at the following result.
GPU results for group 0~8 are identical to CPU.
GPU result for group 9 is not equal, this should be a bug about boundary check/handling.
Signed-off-by: Chong Gao [email protected]