This library provides data structures to ease programming in CUDA (version 12 or higher). For a tutorial and further information, please read this manual.
Quick example on how to transfer a std::vector
on CPU to a battery::vector
on GPU (notice you don't need to do any manual memory allocation or deallocation):
#include <vector>
#include "battery/vector.hpp"
#include "battery/unique_ptr.hpp"
#include "battery/allocator.hpp"
using mvector = battery::vector<int, battery::managed_allocator>;
__global__ void kernel(mvector* v_ptr) {
mvector& v = *v_ptr;
// ... Compute on `v` in parallel.
}
int main(int argc, char** argv) {
std::vector<int> v(10000, 42);
// Transfer from CPU vector to GPU vector.
auto gpu_v = battery::make_unique<mvector, battery::managed_allocator>(v);
kernel<<<256, 256>>>(gpu_v.get());
CUDAEX(cudaDeviceSynchronize());
// Transfering the new data to the initial vector.
for(int i = 0; i < v.size(); ++i) {
v[i] = (*gpu_v)[i];
}
return 0;
}
- How to transfer data from the CPU to the GPU?
- How to create a CMake project for CUDA project?
- How to allocate a vector shared by all threads of a block inside a kernel?
- How to allocate a vector shared by all blocks inside a kernel?
- CUDA runtime error an illegal memory access was encountered
- How to allocate a vector in shared memory?
- Namespace:
battery::*
. - The documentation is not exhaustive (which is why we provide a link to the standard C++ STL documentation), but we document most of the main differences and the features without a standard counterpart.
- The table below is a quick reference to the most useful features, but it is not exhaustive.
- The structures provided here are not thread-safe, this responsibility is delegated to the user of this library.
Category | Main features | |||
---|---|---|---|---|
Allocator | standard_allocator |
global_allocator |
managed_allocator |
pool_allocator |
Pointers | shared_ptr (std ) |
make_shared (std ) |
allocate_shared (std ) |
|
unique_ptr (std ) |
make_unique (std ) |
make_unique_block |
make_unique_grid |
|
Containers | vector (std ) |
string (std ) |
dynamic_bitset |
|
tuple |
variant (std ) |
bitset (std ) |
||
Utility | CUDA |
INLINE |
CUDAE |
CUDAEX |
limits |
ru_cast |
rd_cast |
||
popcount (std ) |
countl_zero (std ) |
countl_one (std ) |
countr_zero (std ) |
|
countr_one (std ) |
signum |
ipow |
||
add_up |
add_down |
sub_up |
sub_down |
|
mul_up |
mul_down |
div_up |
div_down |
|
Memory | local_memory |
read_only_memory |
atomic_memory |
|
atomic_scoped_memory |
atomic_memory_block |
atomic_memory_grid |