-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Priority Queue #105
base: dev
Are you sure you want to change the base?
Priority Queue #105
Conversation
Can one of the admins verify this patch? |
include/cuco/priority_queue.cuh
Outdated
namespace cuco { | ||
|
||
/* | ||
* @brief A GPU-accelerated priority queue of key-value pairs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for this to be hardcoded for key-value pairs? Can't it be for any trivially copyable type T
? e.g., with std::priority_queue
I could have a std::priority_queue<int>
or a std::priority_queue<std::pair<int,int>>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update docs now that this has been updated.
I reviewed the top level header at this point and gave some thoughts/questions on how to make this a little more generic. |
ok to test |
@PointKernel Thanks for your comments! I believe that I have addressed or responded to them all. Please let me know what you think and what other comments you might have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another round of review.
Thanks @andrewbriand for your effort and persistence made to this PR! We are almost there.
~priority_queue(); | ||
|
||
class device_mutable_view { | ||
public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public: | |
public: | |
using value_type = T; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I also replace references to T
with value_type
in device_mutable_view
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will be great!
detail::push_kernel<<<num_blocks, block_size, get_shmem_size(block_size), stream>>>( | ||
first, | ||
last - first, | ||
d_heap_, | ||
d_size_, | ||
node_size_, | ||
d_locks_, | ||
d_p_buffer_size_, | ||
lowest_level_start_, | ||
compare_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
detail::push_kernel<<<num_blocks, block_size, get_shmem_size(block_size), stream>>>( | |
first, | |
last - first, | |
d_heap_, | |
d_size_, | |
node_size_, | |
d_locks_, | |
d_p_buffer_size_, | |
lowest_level_start_, | |
compare_); | |
auto view = get_device_mutable_view(); | |
detail::push_kernel<<<num_blocks, block_size, get_shmem_size(block_size), stream>>>( | |
first, num_elements, view); |
This is a great example showing the power of "view". Accordingly, the push_kernel
would look like:
template <typename OutputIt, typename viewT>
__global__ void push_kernel(OutputIt elements,
std::size_t const num_elements,
viewT view)
{
using T = typename viewT::value_type;
...
}
If you want, push_n_kernel
instead of push_kernel
would be a more descriptive name in this case.
detail::push_single_node(g, | ||
first + i * node_size_, | ||
d_heap_, | ||
d_size_, | ||
node_size_, | ||
d_locks_, | ||
lowest_level_start_, | ||
shmem, | ||
compare_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
push_single_node
, push_partial_node
, and related utilities should be member functions of device_mutable_view
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as pop_single_node
and pop_partial_node
/* | ||
* @brief Return the amount of temporary storage required for operations | ||
* on the queue with a cooperative group size of block_size | ||
* | ||
* @param block_size Size of the cooperative groups to calculate storage for | ||
* @return The amount of temporary storage required in bytes | ||
*/ | ||
__device__ int get_shmem_size(int block_size) const | ||
{ | ||
int intersection_bytes = 2 * (block_size + 1) * sizeof(int); | ||
int node_bytes = node_size_ * sizeof(T); | ||
return intersection_bytes + 2 * node_bytes; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems never used
* @param shmem The shared memory layout for this cooperative group | ||
* @param compare Comparison operator ordering the elements in the heap | ||
*/ | ||
template <typename InputIt, typename T, typename Compare, typename CG> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OutputIt
instead of InputIt
* @param lowest_level_start The first index of the heaps lowest layer | ||
* @param compare Comparison operator ordering the elements in the heap | ||
*/ | ||
template <typename OutputIt, typename T, typename Compare> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template <typename OutputIt, typename T, typename Compare> | |
template <typename InputIt, typename viewT> |
T* heap, | ||
int* size, | ||
std::size_t node_size, | ||
int* locks, | ||
std::size_t* p_buffer_size, | ||
int lowest_level_start, | ||
Compare compare) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T* heap, | |
int* size, | |
std::size_t node_size, | |
int* locks, | |
std::size_t* p_buffer_size, | |
int lowest_level_start, | |
Compare compare) | |
viewT view) |
The kernel implementation can also be simplified with view.
Co-authored-by: Yunsong Wang <[email protected]>
Co-authored-by: Yunsong Wang <[email protected]>
Co-authored-by: Yunsong Wang <[email protected]>
@andrewbriand Can you please also merge with the latest |
Adds a GPU-accelerated priority queue
Allows for multiple concurrent insertions as well as multiple concurrent
deletions.
The implementation of the priority queue is based on https://arxiv.org/pdf/1906.06504.pdf.
The queue supports two operations:
push
: Add elements into the queuepop
: Remove the element(s) with the lowest (when Max == false) or highest(when Max == true) keys
The priority queue supports bulk host-side operations and more fine-grained
device-side operations.
The host-side bulk operations
push
andpop
allow an arbitrary number ofelements to be pushed to or popped from the queue.
The device-side operations allow a cooperative group to push or pop
some number of elements less than or equal to node_size. These device side
operations are invoked with a trivially-copyable device view,
device_mutable_view
which can be obtained with the host functionget_mutable_device_view
and passed to the device.Current limitations:
TODO: Port tests to Catch2 and benchmarks to google benchmark