You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, all CUDA kernels (including new kernels from #1) use global memory accesses when accessing neighbouring pixels in their convolution operations. This approach is inefficient and slows down performance.
We should modify these kernels to use more spatially optimised memory for accessing neighbouring pixels, thereby reducing global memory latency and enhancing overall performance.
This could include the use of local memory, shared memory, texture memory or surface memory, to name a few.
The text was updated successfully, but these errors were encountered:
Implement extended dispersion spotfinding
Implement a GPU-based version of the extended dispersion spotfinding
algorithm. This builds on regular dispersion by making two passes.
This allows for the detection of fainter spots by using the first pass
to detect candidate spots and exclude them from the background
calculation in the second pass.
Extended dispersion spotfinding is unavoidably slower than regular
dispersion by the fact that it requires two passes. However, the
performance gained through massively parallel processing on the GPU
should make this a viable option, when needed, even for fast feedback.
Create several CUDA kernels to perform the extended dispersion
spotfinding algorithm (`threshold.cu`, `erosion.cu`).
Refactor the dispersion kernel to share code with extended dispersion.
Move common code to `cuda_common.hpp`.
Create basic test script for extended dispersion spotfinding.
Add an `--algorithm` argument to `spotfinder.cc` along with the
necessary code to parse it, allowing for algorithm selection at runtime.
Add new files to the CMakeLists.txt file to include them in the build.
See also: #12, #13, #14
Currently, all CUDA kernels (including new kernels from #1) use global memory accesses when accessing neighbouring pixels in their convolution operations. This approach is inefficient and slows down performance.
We should modify these kernels to use more spatially optimised memory for accessing neighbouring pixels, thereby reducing global memory latency and enhancing overall performance.
This could include the use of local memory, shared memory, texture memory or surface memory, to name a few.
The text was updated successfully, but these errors were encountered: