Skip to content

Commit

Permalink
Multi-upload performance improvements + bugfixes
Browse files Browse the repository at this point in the history
-Improved multi-upload FFT algorithm performance in double precision on HPC GPUs
-Fixed double precision sincos computation. Now it is possible to disable LUT - useLUT switched to int64_t, -1 disables LUT, 0 - auto decision, 1 forces it. It is possible to disable LUT for 4-step algorithm rotation only - useLUT_4step
-Optimized swapTo3Stage4Step and switched it to direct number value from the power of 2
-Bugfixes: fixed FP64 usage in FP32 when number ending was not printed in kernels (important), fixed registerBoost incorrect writing, fixed #93
  • Loading branch information
DTolm committed Oct 25, 2022
1 parent 3e5ad2a commit e1c5886
Show file tree
Hide file tree
Showing 2 changed files with 363 additions and 183 deletions.
2 changes: 1 addition & 1 deletion Vulkan_FFT.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -533,7 +533,7 @@ int main(int argc, char* argv[])
version_decomposed[0] = version / 10000;
version_decomposed[1] = (version - version_decomposed[0] * 10000) / 100;
version_decomposed[2] = (version - version_decomposed[0] * 10000 - version_decomposed[1] * 100);
printf("VkFFT v%d.%d.%d (06-10-2022). Author: Tolmachev Dmitrii\n", version_decomposed[0], version_decomposed[1], version_decomposed[2]);
printf("VkFFT v%d.%d.%d (25-10-2022). Author: Tolmachev Dmitrii\n", version_decomposed[0], version_decomposed[1], version_decomposed[2]);
#if (VKFFT_BACKEND==0)
printf("Vulkan backend\n");
#elif (VKFFT_BACKEND==1)
Expand Down
Loading

0 comments on commit e1c5886

Please sign in to comment.