Skip to content

upsj/gpu_selection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
This library implements a bucket-based selection algorithm on GPUs

More details can be found in

* T. Ribizel and H. Anzt, "Approximate and Exact Selection on GPUs," 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, 2019, pp. 471-478.
doi: 10.1109/IPDPSW.2019.00088
* T. Ribizel, H. Anzt, "Parallel selection on GPUs," Parallel Computing, Volume 91, 2020, doi: 10.1016/j.parco.2019.102588

It uses Catch2 as a test framework and the CUB library as a reference implementation for sorting.

The tests can be run by simply executing `app/unittests`, the benchmarks can be run by executing `app/benchmark` with one of the following parameters

[full]             The full benchmark for exact single and multiple selection and the individual kernels (sample, count, reduce, filter)
[full-multionly]   The full benchmark for multiple selection only
[approx]           The full benchmark for approximate selection with shared-memory atomics
[approx-g]         The full benchmark for approximate selection with global-memory atomics
[multi]            The full benchmark for multiple selection with different numbers of ranks
[test]             A small benchmark that only executes a single benchmark with small input size

The output of these tests is the following:
On stdout, they print error messages in case the algorithm execution produces invalid results. For the approx tests, additionally the exact and approximate rank are being output in CSV format.
On stderr, they print the individual timings of the kernels in CSV format for different input sizes given by the first CSV field. Runtime breakdowns are listed within parentheses ().

`app/benchmark-sort` contains a benchmark for the CUB radix sort implementation as a performance baseline for the multiple selection.

Structure of the project

include/cpu_reference.hpp     - Reference implementations for testing
include/verification.hpp      - Validation functions for testing
include/cuda_definitions.cuh  - Type definitions and hardware limits
include/cuda_error.cuh        - Wrapper for CUDA error handling
include/cuda_memory.cuh       - Wrapper for CUDA memory allocations
include/cuda_timer.cuh        - Wrapper for CUDA timing measurements
include/kernel_config.cuh     - Configuration struct for kernel templates
include/launcher_fwd.cuh      - Forward-declarations of launcher and kernel templates

lib/generated/*               - Explicit template instantiations to parallelize compilation
lib/cpu_reference.cpp         - Reference implementations for testing
lib/verification.cpp          - Validation functions for testing
lib/qs_launchers.cuh          - Wrappers for quickselect kernels
lib/qs_recursion.cuh          - Kernels for quickselect single-selection
lib/qs_recursion_multi.cuh    - Kernels for quickselect multi-selection
lib/qs_reduce.cuh             - Kernels for reducing quickselect partial sums
lib/qs_scan.cuh               - Kernels for quickselect bipartitioning
lib/ssss_build_searchtree.cuh - Kernels for sampleselect sampling
lib/ssss_collect.cuh          - Kernels for sampleselect single-selection filtering
lib/ssss_collect_multi.cuh    - Kernels for sampleselect multi-selection filtering
lib/ssss_count.cuh            - Kernels for sampleselect counting
lib/ssss_launchers.cuh        - Wrappers for sampleselect kernels
lib/ssss_merged.cuh           - Kernels for multiple simultaneous sampleselects
lib/ssss_merged_memory.cuh    - Auxiliary data structure for sampleselect multi-selection
lib/ssss_recursion.cuh        - Kernels for sampleselect single-selection
lib/ssss_recursion_multi.cuh  - Kernels for sampleselect multi-selection
lib/ssss_reduce.cuh           - Kernels for reducing sampleselect partial sums
lib/utils_basecase.cuh        - Kernels for recursion basecase
lib/utils_bytestorage.cuh     - Auxiliary functions for reading/writing unaligned bytes
lib/utils_mask.cuh            - Auxiliary functions for bitmasks
lib/utils_prefixsum.cuh       - Auxiliary functions for tree-based partial sums
lib/utils_sampling.cuh        - Auxiliary functions for sampling
lib/utils_search.cuh          - Auxiliary functions for binary and warp-ary searches
lib/utils_sort.cuh            - Auxiliary functions for bitonic sorting
lib/utils_warpaggr.cuh        - Auxiliary functions for warp-aggregation
lib/utils_work.cuh            - Auxiliary functions for work-distribution
lib/utils.cuh                 - Auxiliary wrappers for basic operations

About

Parallel selection on GPUs

Topics

Resources

License

GPL-3.0 and 2 other licenses found

Licenses found

GPL-3.0
LICENSE
BSL-1.0
LICENSE-catch2
BSD-3-Clause
LICENSE-cub

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages