Skip to content

Commit

Permalink
Metal support in VkFFT
Browse files Browse the repository at this point in the history
-This update adds Apple Metal backend in VkFFT (VKFFT_BACKEND 5)
-Metal backend has similar performance compared to other backends (tested on M1 Pro 8c SoC)
-Metal backend passes all VkFFT tests OpenCL passes (tested on M1 Pro 8c SoC)
-Current limitations of the Metal backend: no double precision, no saving/loading binaries, forced 256 max threads, C++ bindings only, incomplete error handling.
-Bugfixes: Rader uint LUT offset not working in some cases, Mult Rader coalescing with <1024 threads, DCT-III reordering index issues with OpenCL on Intel/Apple GPUs.
-Slightly improved coalescing logic for Nvidia GPUs
-Added precision plots
  • Loading branch information
DTolm authored Oct 6, 2022
1 parent ba7001c commit f5e1009
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,8 @@ The test configuration below takes multiple 1D FFTs of all lengths from the rang
![alt text](https://github.com/DTolm/VkFFT/blob/master/benchmark_plot/fp64_cuda_a100.png?raw=true)
![alt text](https://github.com/DTolm/VkFFT/blob/master/benchmark_plot/fp64_hip_mi250.png?raw=true)
## Precision comparison of cuFFT/VkFFT/FFTW
![alt text](https://github.com/DTolm/VkFFT/blob/master/benchmark_plot/FP64_precision.png?raw=true)
![alt text](https://github.com/DTolm/VkFFT/blob/master/benchmark_plot/FP32_precision.png?raw=true)
![alt text](https://github.com/DTolm/VkFFT/blob/master/precision_results/FP64_precision.png?raw=true)
![alt text](https://github.com/DTolm/VkFFT/blob/master/precision_results/FP32_precision.png?raw=true)

Above, VkFFT precision is verified by comparing its results with FP128 version of FFTW. We test all FFT lengths from the [2, 100000] range. We perform tests in single and double precision on random input data from [-1;1] range.

Expand All @@ -93,4 +93,4 @@ For FP32, twiddle factors can be calculated on-the-fly in FP32 or precomputed in

## Contact information
The initial version of VkFFT is developed by Tolmachev Dmitrii\
E-mail 1: <[email protected]>
E-mail 1: <[email protected]>

0 comments on commit f5e1009

Please sign in to comment.