-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sortperm with dims #2308
sortperm with dims #2308
Conversation
return extraneous_block(vals[1], dims) | ||
end | ||
|
||
# methods are defined for Val{1} because using view has 2x speed penalty for 1D arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably speed up view
by returning a CuDeviceArray
when possible, just like we do with CuArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting a change here or is this an idea for a separate change in the repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate change.
Nice! So we only use quicksort now for |
That's right. However, some preliminary benchmarking suggests bitonic sort is faster than partialsort for comparable input. Maybe we want to use bitonic for everything? This would also let us add methods for |
I'd be in favor, also because I haven't implemented the new version of dynamic parallelism yet. |
54a5ff9
to
10f430e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Some of the kernel launches added here exceed device resources, e.g., during this CI run: https://buildkite.com/julialang/cuda-dot-jl/builds/4942#018ee5fc-7926-4b0a-a855-11039bda2b72
|
Adds sortperm with dims kwarg (and bitonicsort with dims) to solve #2061