Inefficient memory access patterns in CUDA kernels #12

dimitrivlachos · 2024-10-24T15:49:37Z

Currently, all CUDA kernels (including new kernels from #1) use global memory accesses when accessing neighbouring pixels in their convolution operations. This approach is inefficient and slows down performance.

We should modify these kernels to use more spatially optimised memory for accessing neighbouring pixels, thereby reducing global memory latency and enhancing overall performance.

This could include the use of local memory, shared memory, texture memory or surface memory, to name a few.

Implement extended dispersion spotfinding Implement a GPU-based version of the extended dispersion spotfinding algorithm. This builds on regular dispersion by making two passes. This allows for the detection of fainter spots by using the first pass to detect candidate spots and exclude them from the background calculation in the second pass. Extended dispersion spotfinding is unavoidably slower than regular dispersion by the fact that it requires two passes. However, the performance gained through massively parallel processing on the GPU should make this a viable option, when needed, even for fast feedback. Create several CUDA kernels to perform the extended dispersion spotfinding algorithm (`threshold.cu`, `erosion.cu`). Refactor the dispersion kernel to share code with extended dispersion. Move common code to `cuda_common.hpp`. Create basic test script for extended dispersion spotfinding. Add an `--algorithm` argument to `spotfinder.cc` along with the necessary code to parse it, allowing for algorithm selection at runtime. Add new files to the CMakeLists.txt file to include them in the build. See also: #12, #13, #14

dimitrivlachos added the enhancement New feature or request label Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inefficient memory access patterns in CUDA kernels #12

Inefficient memory access patterns in CUDA kernels #12

dimitrivlachos commented Oct 24, 2024 •

edited

Loading

Inefficient memory access patterns in CUDA kernels #12

Inefficient memory access patterns in CUDA kernels #12

Comments

dimitrivlachos commented Oct 24, 2024 • edited Loading

dimitrivlachos commented Oct 24, 2024 •

edited

Loading