Metal support in VkFFT

-This update adds Apple Metal backend in VkFFT (VKFFT_BACKEND 5) -Metal backend has similar performance compared to other backends (tested on M1 Pro 8c SoC) -Metal backend passes all VkFFT tests OpenCL passes (tested on M1 Pro 8c SoC) -Current limitations of the Metal backend: no double precision, no saving/loading binaries, forced 256 max threads, C++ bindings only, incomplete error handling. -Bugfixes: Rader uint LUT offset not working in some cases, Mult Rader coalescing with <1024 threads, DCT-III reordering index issues with OpenCL on Intel/Apple GPUs. -Slightly improved coalescing logic for Nvidia GPUs -Added precision plots
DTolm · Oct 6, 2022 · f5e1009 · f5e1009
1 parent ba7001c
commit f5e1009
Showing 1 changed file with 3 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -82,8 +82,8 @@ The test configuration below takes multiple 1D FFTs of all lengths from the rang
 ![alt text](https://github.com/DTolm/VkFFT/blob/master/benchmark_plot/fp64_cuda_a100.png?raw=true)
 ![alt text](https://github.com/DTolm/VkFFT/blob/master/benchmark_plot/fp64_hip_mi250.png?raw=true)
 ## Precision comparison of cuFFT/VkFFT/FFTW
-![alt text](https://github.com/DTolm/VkFFT/blob/master/benchmark_plot/FP64_precision.png?raw=true)
-![alt text](https://github.com/DTolm/VkFFT/blob/master/benchmark_plot/FP32_precision.png?raw=true)
+![alt text](https://github.com/DTolm/VkFFT/blob/master/precision_results/FP64_precision.png?raw=true)
+![alt text](https://github.com/DTolm/VkFFT/blob/master/precision_results/FP32_precision.png?raw=true)
 
 Above, VkFFT precision is verified by comparing its results with FP128 version of FFTW. We test all FFT lengths from the [2, 100000] range. We perform tests in single and double precision on random input data from [-1;1] range.
 
@@ -93,4 +93,4 @@ For FP32, twiddle factors can be calculated on-the-fly in FP32 or precomputed in
 
 ## Contact information
 The initial version of VkFFT is developed by Tolmachev Dmitrii\
-E-mail 1: <[email protected]>
+E-mail 1: <[email protected]>