Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Multi-upload performance improvements + bugfixes
-Improved multi-upload FFT algorithm performance in double precision on HPC GPUs -Fixed double precision sincos computation. Now it is possible to disable LUT - useLUT switched to int64_t, -1 disables LUT, 0 - auto decision, 1 forces it. It is possible to disable LUT for 4-step algorithm rotation only - useLUT_4step -Optimized swapTo3Stage4Step and switched it to direct number value from the power of 2 -Bugfixes: fixed FP64 usage in FP32 when number ending was not printed in kernels (important), fixed registerBoost incorrect writing, fixed #93
- Loading branch information