-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dump v #200
base: master
Are you sure you want to change the base?
Dump v #200
Conversation
Documentation previewhttps://nvidia-merlin.github.io/HierarchicalKV/review/pr-200 |
template <class K, class V, class S, | ||
template <typename, typename> class PredFunctor, | ||
int TILE_SIZE> | ||
__global__ void dump_kernel_v2(const Table<K, V, S>* __restrict table, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No call to this kernel?
include/merlin/core_kernels.cuh
Outdated
int dim = table->dim; | ||
auto g = cg::tiled_partition<TILE_SIZE>(cg::this_thread_block()); | ||
|
||
__shared__ block_acc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
block_acc is not used.
Hi, @jiashuy This PR is under development yet. I'll fix the problems ASAP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/blossom-ci |
2 similar comments
/blossom-ci |
/blossom-ci |
cudaEventCreate(&start); | ||
cudaEventCreate(&stop); | ||
cudaEventRecord(start); | ||
table->export_batch_if<ExportIfPredFunctor>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are total three kernel templates, and has each kernel been tested? If not is it necessary to test each kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested them seperately, but not added them into the tests case. Since it's an internal option not for public API.
…memory wavefronts
Here is the costs in microseconds of
dump_kernel
anddump_kernel_v2
on both pinned host or device output on 2^24 capacity table with half of the contents are exported. The table values are stored on pure GPU buckets.capacity: 2^24, and the table is full when running the export_batch_if
num exported: 8388771
dim: 64
A100 + AMD
H20 + Intel