-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for more transform directions #1903
Conversation
... removed allocations and added support for |
Thanks! I'm not really familiar with FFTs though, so maybe @stevengj (or @ali-ramadhan or @btmit from #119) could give this a quick review. |
So you only support FFT-ing the leading dimensions? That is certainly the low-hanging fruit, since you then just have a single loop of contiguous FFTs (which could conceivably be done in parallel), though it's not super general.
This deviates from the AbstractFFTs API. Why not just follow the standard API and accept a |
Not quite. Supported are FFts with up to a total of three transform directions. These can have at most one gap of any number of non-transform directions. E.g. in 4D (XYZT) supported dimensions are X, Y, Z, T, XY, XZ, XT, YZ, ZT, XYZ, XYT, YZT. Not supported is only YT.
Indeed this is the core idea here. Yet the strides already support a single gap of N dimensions, which is also exploited. Cuda is able to run asynchronously possibly utilizes all kernels even if the transform size is smaller than the number of CUDA-kernels.
See above. Many more cases are supported. The few unsupported cases do thow an error. The general case is currently not implented, since this would need a complete revamping of the plan structure, which is more of a high-hanging fruit storing multiple CuFFT plans and transform orders in an encasulating plan. Also for those cases it may actually be better and faster for the users to use |
@stevengj, you are describing here, what was already the case, when work started on the improvements to support more combinations of dimensions, most notably any individual dimension of ND-arrays. Of course the Yes, there is a small difference in the interface compared to the standard API, as the plan dimension does not need to correspond wrt. trailing dimensions. This is in accordance to broadcasting rules where you can also multiply a .* b with b having more trailing dimensions. I agree that it would be nicer to also throw an error to be in 100% agreement with the standard interface regarding application of plans, though one could argue that it would be good to allow the standard interface to apply plans to trainling dimensions automatically without needing to change (and store) another plan. The trouble for the CuArray version implementation is that it, somewhat exploits the trailing dimension broadcasting ability. If wanted, this can probably be fixed, by throwing an error, keeping the internal workings, but it may need one more dimension property in the struct of the plan. |
The code is changed now such that plans have to always agree in dimensions and sizes to the data they are applied to. This should be in agreement with the Abstract FFT interface. The support for transform directions has not been changed and almost all directions should be working now. @stevengj Is this OK now? |
Shouldn't it throw an error if |
I think it does julia> q = fft(CuArray(rand(10,10,10,10)), (2,4));
ERROR: ArgumentError: batch regions must be sequential
Stacktrace:
[1] cufftMakePlan(xtype::CUDA.CUFFT.cufftType_t, xdims::NTuple{4, Int64}, region::Tuple{Int64, Int64})
@ CUDA.CUFFT C:\Users\pi96doc\Documents\Programming\Julia\Forks\CUDA.jl\lib\cufft\wrappers.jl:127
julia> q = fft(rand(10,10,10,10), (2,4));
julia> q = fft(CuArray(rand(10,10,10,10)), (2,4));
ERROR: ArgumentError: batch regions must be sequential
Stacktrace:
[1] cufftMakePlan(xtype::CUDA.CUFFT.cufftType_t, xdims::NTuple{4, Int64}, region::Tuple{Int64, Int64})
@ CUDA.CUFFT C:\Users\pi96doc\Documents\Programming\Julia\Forks\CUDA.jl\lib\cufft\wrappers.jl:127
julia> p = plan_fft(CuArray(rand(10,10,10,10)), (1,3))
CUFFT d.p. complex forward plan for 10×10×10×10 CuArray of ComplexF64
julia> q = p * CuArray(rand(10,10,10,10));
julia> q = p * CuArray(rand(10,10,10,10,10));
ERROR: ArgumentError: CuFFT plan applied to wrong-size input
Stacktrace:
[1] assert_applicable(p::CUDA.CUFFT.cCuFFTPlan{ComplexF64, -1, false, 4}, X::CuArray{ComplexF64, 5, CUDA.Mem.DeviceBuffer})
@ CUDA.CUFFT C:\Users\pi96doc\Documents\Programming\Julia\Forks\CUDA.jl\lib\cufft\fft.jl:305 |
The text of the error message may be improved though upon? I kept the case and text from before for now. |
I guess so, unless @stevengj has more comments. CI failures are related though, as I noted above. |
@stevengj does not seem to have more comments, so can we merge? Would be great, as then we could finally use these new transform directions in other toolboxes. |
This has not changed, so no this cannot be merged yet. The CI failures that this PR causes need to be fixed first. |
What is the best way to do this? Last time, I was merging the new version into my pull request, and I remember that you wrote that this was a bad idea and caused you lots of work. Should I just copy the current version of the cufft.jl file and try to merge it by hand? |
There's two things that need to happen:
|
Thanks. Do I get this right:
|
I think you're overcomplicating things. On your Anyway, I've rebased the PR for now, as that seemed faster :-) |
2bddca3
to
aee3216
Compare
Heck. I think we were now both working on it at the same time. I did the rebase and pushed. |
Yeah, you overwrote my changes. Yours seem fine too though, so let's just wait for CI to finish now. |
I used these instructions: https://medium.com/@topspinj/how-to-git-rebase-into-a-forked-repo-c9f05e821c8a |
Lets hope. My local checks seemed to have some problems though: Pkg.test("CUDA";test_args=["cufft","--quickfail"])
Testing CUDA
┌ Warning: Could not use exact versions of packages in manifest, re-resolving
└ @ Pkg.Operations C:\Users\pi96doc\AppData\Local\Programs\Julia-1.9.2\share\julia\stdlib\v1.9\Pkg\src\Operations.jl:1809
ERROR: Unsatisfiable requirements detected for package GPUCompiler [61eb1bfa]:
GPUCompiler [61eb1bfa] log:
├─possible versions are: 0.1.0-0.22.0 or uninstalled
└─restricted to versions 0.23 by CUDA [052768ef] — no versions left
└─CUDA [052768ef] log:
├─possible versions are: 4.4.0 or uninstalled
└─CUDA [052768ef] is fixed to version 4.4.0 |
Make sure you use the Manifest committed in the CUDA.jl repository, as we're currently (temporarily) depending on an unreleased version of GPUCompiler. |
It failed. Strange, this type piracy problem, which seems not directly connected to the fft stuff. |
Aqua 0.7 was just released, it probably contains additional checks. |
Can I do something about it, or leave it up to you? |
I tried it with the Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x6be45c40 -- nvtxGlobals_v3 at C:\Users\pi96doc\.julia\artifacts\b4eeaf094ffb6aacf1b20ee5d2ac9aa1818fc732\bin\libnvToolsExt.dll (unknown line)
in expression starting at C:\Users\pi96doc\Nextcloud-Uni\Julia\Forks\CUDA.jl\test\setup.jl:1
nvtxGlobals_v3 at C:\Users\pi96doc\.julia\artifacts\b4eeaf094ffb6aacf1b20ee5d2ac9aa1818fc732\bin\libnvToolsExt.dll (unknown line)
Allocations: 5080540 (Pool: 5079127; Big: 1413); GC: 8
ERROR: Package CUDA errored during testing
Stacktrace:
[1] pkgerror(msg::String)
@ Pkg.Types C:\Users\pi96doc\AppData\Local\Programs\Julia-1.9.2\share\julia\stdlib\v1.9\Pkg\src\Types.jl:69
[2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
@ Pkg.Operations C:\Users\pi96doc\AppData\Local\Programs\Julia-1.9.2\share\julia\stdlib\v1.9\Pkg\src\Operations.jl:2021
[3] test
@ C:\Users\pi96doc\AppData\Local\Programs\Julia-1.9.2\share\julia\stdlib\v1.9\Pkg\src\Operations.jl:1902 [inlined yet there also seem to be some disagreeements with the |
Ah, that's bad. Does just importing CUDA.jl (but using that manifest) fail with the same error? |
Bad news: I freshly cloned the merged repo and just typing (CUDATest) pkg> st
Status `C:\Users\pi96doc\Nextcloud-Uni\Julia\Development\TestBeds\CUDATest\Project.toml`
[052768ef] CUDA v4.4.0 `https://github.com/JuliaGPU/CUDA.jl.git#master`
[7a1cc6ca] FFTW v1.7.1 Any ideas? |
I can reproduce. I'll look into it. |
Great! Now it works also on my system and also the new transform directions are supported. Thank you! |
This is achieved by allowing fft-plans to have fewer dimensions than the data they are applied to. The trailing dimensions are treated as non-transform directions and transforms are executed sequentially. (maybe @inbounds should be added?).
This should allow for most use-cases to now work. Especially note, that single dimensions are now supported, which add flexibility.
Only rare cases such as (2,4) out of a 4-dimensional array are currently still not supported but the user would be able to execute these transforms sequentially.