Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic growth of slab pool for concurrent_map #22

Open
wants to merge 61 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
5b20e97
updated SlabAlloc to place super blocks on separate arrays
Jun 11, 2020
7da2ae7
buildBulk now reports which keys were successfully inserted
Jun 11, 2020
e5257c0
buildBulk function can detect if allocation pool needs to be resized
Jun 12, 2020
a314c19
Added support for resizing dynamic allocation pool
Jun 12, 2020
49161a2
removed prints
Jun 12, 2020
ff5ba41
added pool growth support for unique bulk insert and temporarily remo…
Jun 15, 2020
44ab2f5
added support for preemptive pool growth
Jun 16, 2020
48b21df
added preemptive resizing for non unique build kernel
Jun 17, 2020
1978da6
decreased suberblock size
Nicolas-Iskos Sep 27, 2020
01c972b
updated gitmodules
Nicolas-Iskos Dec 28, 2020
c3c9bcc
removed explicit checks for need to resize
Nicolas-Iskos Dec 28, 2020
c1f8e7b
removed dead code
Nicolas-Iskos Dec 29, 2020
a4f6ba0
resizing parameter tuning
Nicolas-Iskos Dec 29, 2020
b29b988
added cub
Nicolas-Iskos Dec 30, 2020
e3c6463
added block reduce key counting for buildBulkWithUniqueKeys
Nicolas-Iskos Dec 31, 2020
36ee554
bug fixes in insert unique call
Nicolas-Iskos Dec 31, 2020
610af7b
removed print
Nicolas-Iskos Dec 31, 2020
8b006f9
37.5 split
Nicolas-Iskos Jan 1, 2021
c1fb642
resize at 70% lf
Nicolas-Iskos Jan 1, 2021
d2d2926
prints
Nicolas-Iskos Jan 1, 2021
de70e25
changes to SlabAlloc
Nicolas-Iskos Jan 1, 2021
b090ece
prints
Nicolas-Iskos Jan 1, 2021
5769e20
capacity calculation correction
Nicolas-Iskos Jan 1, 2021
90909ee
removed prints
Nicolas-Iskos Jan 1, 2021
855db99
87.5% splut
Nicolas-Iskos Jan 5, 2021
b82f0aa
37.5% split
Nicolas-Iskos Jan 6, 2021
df66adc
prints
Nicolas-Iskos Jan 6, 2021
12d4d9c
bug fix in insertUnique device function
Nicolas-Iskos Jan 6, 2021
1f08535
bug fix in build
Nicolas-Iskos Jan 7, 2021
7c51175
bug fix in insert unique function
Nicolas-Iskos Jan 7, 2021
d1a116c
changes to SlabAlloc
Nicolas-Iskos Jan 7, 2021
854b1f9
resize at 70%
Nicolas-Iskos Jan 7, 2021
34afe70
87.5 split
Nicolas-Iskos Jan 7, 2021
57c0133
resize at 50%
Nicolas-Iskos Jan 7, 2021
79eccec
prints
Nicolas-Iskos Jan 7, 2021
8314c45
resize at 60% occupancy
Nicolas-Iskos Jan 7, 2021
3a73666
resize at 65% lf
Nicolas-Iskos Jan 7, 2021
31d7a0f
prints
Nicolas-Iskos Jan 8, 2021
8f7b436
prints
Nicolas-Iskos Jan 8, 2021
42d2568
prints
Nicolas-Iskos Jan 8, 2021
1911889
prints
Nicolas-Iskos Jan 8, 2021
c2db909
resize at 90%
Nicolas-Iskos Jan 8, 2021
c1df972
changes to SlabAlloc
Nicolas-Iskos Jan 8, 2021
32558d2
changes to SlabAlloc
Nicolas-Iskos Jan 8, 2021
936ae45
SlabAlloc
Nicolas-Iskos Jan 8, 2021
7053d9c
fixes to SlabAlloc
Nicolas-Iskos Jan 8, 2021
b6ae38e
reisze at 75
Nicolas-Iskos Jan 9, 2021
ba0b0c3
prints
Nicolas-Iskos Jan 9, 2021
946aa03
changes to SlabAlloc
Nicolas-Iskos Jan 9, 2021
a217079
resize 60
Nicolas-Iskos Jan 9, 2021
f5f6f00
resize at 55%
Nicolas-Iskos Jan 16, 2021
d9f39f9
resize 60
Nicolas-Iskos Apr 2, 2021
6a57ca8
resize 90
Nicolas-Iskos Apr 2, 2021
38b5a03
SlabAlloc changes
Nicolas-Iskos Apr 2, 2021
b24b998
number of mem blocks and super blocks configurable from user code
Nicolas-Iskos Apr 18, 2021
10c0074
user configurable threshold resizing lf
Nicolas-Iskos Apr 18, 2021
148f1b8
minor changes
Nicolas-Iskos Apr 18, 2021
e701dbf
SlabAlloc changes
Nicolas-Iskos Apr 18, 2021
b44ccde
style fixes
Nicolas-Iskos Apr 19, 2021
c956e08
style fixes
Nicolas-Iskos Apr 19, 2021
cd2319d
Update README.md
Nicolas-Iskos Apr 20, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
[submodule "SlabAlloc"]
path = SlabAlloc
url = https://github.com/owensgroup/SlabAlloc
url = https://github.com/Nicolas-Iskos/SlabAlloc
[submodule "ThirdParty/rapidjson"]
path = ThirdParty/rapidjson
url = https://github.com/Tencent/rapidjson
[submodule "ThirdParty/googletest"]
path = ThirdParty/googletest
url = https://github.com/google/googletest
[submodule "cub"]
path = cub
url = https://github.com/NVIDIA/cub.git
3 changes: 2 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ find_package(CUDA 8.0 REQUIRED)
option(CMAKE_VERBOSE_MAKEFILE ON)
option(DGTEST, "DGTEST" ON)

set(CUDA_NVCC_FLAGS -std=c++11)
set(CUDA_NVCC_FLAGS -std=c++14)
set (CMAKE_CXX_STANDARD 11)

if (CUDA_VERBOSE_PTXAS)
Expand Down Expand Up @@ -83,6 +83,7 @@ if(SLABHASH_GENCODE_SM75)
endif(SLABHASH_GENCODE_SM75)

include_directories(SlabAlloc/src)
include_directories(cub)
include_directories(src src/concurrent)
include_directories(ThirdParty/rapidjson/include)
include_directories(ThirdParty/googletest/googletest)
Expand Down
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# SlabHash
A warp-oriented dynamic hash table for GPUs
A warp-oriented dynamic hash table for GPUs.

This fork of SlabHash is modified to allow for dynamic growth of the slab pool, allowing the slab hash to increase its total memory footprint as keys are inserted during run time.

## Publication:
This library is based on the original slab hash paper, initially proposed in the following IPDPS'18 paper:
Expand Down Expand Up @@ -489,4 +491,4 @@ init lf final lf num buckets init build rate(M/s) concurre
0.59 0.58 80660 501.725 639.850
0.64 0.62 69906 493.702 601.822

```
```
2 changes: 1 addition & 1 deletion SlabAlloc
1 change: 1 addition & 0 deletions cub
Submodule cub added at c3be9a
75 changes: 55 additions & 20 deletions src/concurrent_map/cmap_class.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -22,21 +22,26 @@
* used at runtime. This class does not own the allocated memory on the gpu
* (i.e., d_table_)
*/
template <typename KeyT, typename ValueT>
class GpuSlabHashContext<KeyT, ValueT, SlabHashTypeT::ConcurrentMap> {
template <typename KeyT, typename ValueT, uint32_t log_num_mem_blocks, uint32_t num_super_blocks>
class GpuSlabHashContext<KeyT, ValueT, SlabHashTypeT::ConcurrentMap,
log_num_mem_blocks, num_super_blocks> {
public:
// fixed known parameters:
static constexpr uint32_t PRIME_DIVISOR_ = 4294967291u;
static constexpr uint32_t WARP_WIDTH_ = 32;

#pragma hd_warning_disable
__host__ __device__ GpuSlabHashContext()
: num_buckets_(0), hash_x_(0), hash_y_(0), d_table_(nullptr) {}
: num_buckets_(0), total_num_slabs_(0), total_num_keys_(0),
hash_x_(0), hash_y_(0), d_table_(nullptr) {}

#pragma hd_warning_disable
__host__ __device__ GpuSlabHashContext(
GpuSlabHashContext<KeyT, ValueT, SlabHashTypeT::ConcurrentMap>& rhs) {
GpuSlabHashContext<KeyT, ValueT, SlabHashTypeT::ConcurrentMap,
log_num_mem_blocks, num_super_blocks>& rhs) {
num_buckets_ = rhs.getNumBuckets();
total_num_slabs_ = rhs.getTotalNumSlabs();
total_num_keys_ = rhs.getTotalNumKeys();
hash_x_ = rhs.getHashX();
hash_y_ = rhs.getHashY();
d_table_ = rhs.getDeviceTablePointer();
Expand All @@ -58,16 +63,30 @@ class GpuSlabHashContext<KeyT, ValueT, SlabHashTypeT::ConcurrentMap> {
const uint32_t hash_x,
const uint32_t hash_y,
int8_t* d_table,
AllocatorContextT* allocator_ctx) {
SlabAllocLightContext<log_num_mem_blocks,
num_super_blocks, 1>* allocator_ctx) {
num_buckets_ = num_buckets;
hash_x_ = hash_x;
hash_y_ = hash_y;
d_table_ =
reinterpret_cast<typename ConcurrentMapT<KeyT, ValueT>::SlabTypeT*>(d_table);
global_allocator_ctx_ = *allocator_ctx;
total_num_slabs_ = num_buckets + global_allocator_ctx_.getNumSlabsInPool();
total_num_keys_ = 0;
}

__device__ __host__ __forceinline__ AllocatorContextT& getAllocatorContext() {
__host__ void updateAllocatorContext(SlabAllocLightContext<log_num_mem_blocks,
num_super_blocks, 1>* allocator_ctx) {
global_allocator_ctx_ = *allocator_ctx;
total_num_slabs_ = num_buckets_ + global_allocator_ctx_.getNumSlabsInPool();
}

__host__ void updateTotalNumKeys(uint32_t keysAdded) {
total_num_keys_ += keysAdded;
}

__device__ __host__ __forceinline__ SlabAllocLightContext<log_num_mem_blocks,
num_super_blocks, 1>& getAllocatorContext() {
return global_allocator_ctx_;
}

Expand All @@ -76,6 +95,8 @@ class GpuSlabHashContext<KeyT, ValueT, SlabHashTypeT::ConcurrentMap> {
return d_table_;
}

__device__ __host__ __forceinline__ uint32_t getTotalNumSlabs() {return total_num_slabs_; }
__device__ __host__ __forceinline__ uint32_t getTotalNumKeys() {return total_num_keys_; }
__device__ __host__ __forceinline__ uint32_t getNumBuckets() { return num_buckets_; }
__device__ __host__ __forceinline__ uint32_t getHashX() { return hash_x_; }
__device__ __host__ __forceinline__ uint32_t getHashY() { return hash_y_; }
Expand All @@ -86,22 +107,25 @@ class GpuSlabHashContext<KeyT, ValueT, SlabHashTypeT::ConcurrentMap> {

// threads in a warp cooperate with each other to insert key-value pairs
// into the slab hash
__device__ __forceinline__ void insertPair(bool& to_be_inserted,
__device__ __forceinline__ void insertPair(/*bool& mySuccess,*/
bool& to_be_inserted,
const uint32_t& laneId,
const KeyT& myKey,
const ValueT& myValue,
const uint32_t bucket_id,
AllocatorContextT& local_allocator_context);
SlabAllocLightContext<log_num_mem_blocks,
num_super_blocks, 1>& local_allocator_context);

// threads in a warp cooperate with each other to insert a unique key (and its value)
// into the slab hash
__device__ __forceinline__ bool insertPairUnique(
int& mySuccess,
bool& to_be_inserted,
const uint32_t& laneId,
const KeyT& myKey,
const ValueT& myValue,
const uint32_t bucket_id,
AllocatorContextT& local_allocator_context);
SlabAllocLightContext<log_num_mem_blocks, num_super_blocks, 1>& local_allocator_context);

// threads in a warp cooperate with each other to search for keys
// if found, it returns the corresponding value, else SEARCH_NOT_FOUND
Expand Down Expand Up @@ -154,7 +178,8 @@ class GpuSlabHashContext<KeyT, ValueT, SlabHashTypeT::ConcurrentMap> {
}

__device__ __forceinline__ SlabAllocAddressT
allocateSlab(AllocatorContextT& local_allocator_ctx, const uint32_t& laneId) {
allocateSlab(SlabAllocLightContext<log_num_mem_blocks, num_super_blocks, 1>&
local_allocator_ctx, const uint32_t& laneId) {
return local_allocator_ctx.warpAllocate(laneId);
}

Expand All @@ -165,18 +190,20 @@ class GpuSlabHashContext<KeyT, ValueT, SlabHashTypeT::ConcurrentMap> {

// === members:
uint32_t num_buckets_;
uint32_t total_num_slabs_;
uint32_t total_num_keys_;
uint32_t hash_x_;
uint32_t hash_y_;
typename ConcurrentMapT<KeyT, ValueT>::SlabTypeT* d_table_;
// a copy of dynamic allocator's context to be used on the GPU
AllocatorContextT global_allocator_ctx_;
SlabAllocLightContext<log_num_mem_blocks, num_super_blocks, 1> global_allocator_ctx_;
};

/*
* This class owns the allocated memory for the hash table
*/
template <typename KeyT, typename ValueT>
class GpuSlabHash<KeyT, ValueT, SlabHashTypeT::ConcurrentMap> {
template <typename KeyT, typename ValueT, uint32_t log_num_mem_blocks, uint32_t num_super_blocks>
class GpuSlabHash<KeyT, ValueT, SlabHashTypeT::ConcurrentMap, log_num_mem_blocks, num_super_blocks> {
private:
// fixed known parameters:
static constexpr uint32_t BLOCKSIZE_ = 128;
Expand All @@ -198,24 +225,29 @@ class GpuSlabHash<KeyT, ValueT, SlabHashTypeT::ConcurrentMap> {

// slab hash context, contains everything that a GPU application needs to be
// able to use this data structure
GpuSlabHashContext<KeyT, ValueT, SlabHashTypeT::ConcurrentMap> gpu_context_;
GpuSlabHashContext<KeyT, ValueT, SlabHashTypeT::ConcurrentMap,
log_num_mem_blocks, num_super_blocks> gpu_context_;

// const pointer to an allocator that all instances of slab hash are going to
// use. The allocator itself is not owned by this class
DynamicAllocatorT* dynamic_allocator_;
SlabAllocLight<log_num_mem_blocks, num_super_blocks, 1> *dynamic_allocator_;
uint32_t device_idx_;

float thresh_lf_;

public:
GpuSlabHash(const uint32_t num_buckets,
DynamicAllocatorT* dynamic_allocator,
SlabAllocLight<log_num_mem_blocks, num_super_blocks, 1>* dynamic_allocator,
uint32_t device_idx,
const time_t seed = 0,
const bool identity_hash = false)
const bool identity_hash = false,
float thresh_lf = 0.60)
: num_buckets_(num_buckets)
, d_table_(nullptr)
, slab_unit_size_(0)
, dynamic_allocator_(dynamic_allocator)
, device_idx_(device_idx) {
, device_idx_(device_idx)
, thresh_lf_(thresh_lf) {
assert(dynamic_allocator && "No proper dynamic allocator attached to the slab hash.");
assert(sizeof(typename ConcurrentMapT<KeyT, ValueT>::SlabTypeT) ==
(WARP_WIDTH_ * sizeof(uint32_t)) &&
Expand All @@ -227,7 +259,8 @@ class GpuSlabHash<KeyT, ValueT, SlabHashTypeT::ConcurrentMap> {
CHECK_CUDA_ERROR(cudaSetDevice(device_idx_));

slab_unit_size_ =
GpuSlabHashContext<KeyT, ValueT, SlabHashTypeT::ConcurrentMap>::getSlabUnitSize();
GpuSlabHashContext<KeyT, ValueT, SlabHashTypeT::ConcurrentMap,
log_num_mem_blocks, num_super_blocks>::getSlabUnitSize();

// allocating initial buckets:
CHECK_CUDA_ERROR(cudaMalloc((void**)&d_table_, slab_unit_size_ * num_buckets_));
Expand Down Expand Up @@ -258,7 +291,9 @@ class GpuSlabHash<KeyT, ValueT, SlabHashTypeT::ConcurrentMap> {
// returns some debug information about the slab hash
std::string to_string();
double computeLoadFactor(int flag);


void resize();
uint32_t checkForPreemptiveResize(uint32_t keysAdded);
void buildBulk(KeyT* d_key, ValueT* d_value, uint32_t num_keys);
void buildBulkWithUniqueKeys(KeyT* d_key, ValueT* d_value, uint32_t num_keys);
void searchIndividual(KeyT* d_query, ValueT* d_result, uint32_t num_queries);
Expand Down
Loading