-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Cholesky factorization of CuSparseMatrixCSR #1855
Comments
That's incorrect; cholesky on dense inputs works fine: julia> using CUDA, LinearAlgebra
julia> CUDA.allowscalar(false)
julia> A = CUDA.rand(10, 10);
julia> A = A*A'+I;
julia> cholesky(A)
Cholesky{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}
U factor:
10×10 UpperTriangular{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}:
2.07792 1.04945 1.31206 1.20788 1.16905 1.25365 1.19039 0.792913 1.1288 1.07848
⋅ 1.794 0.943456 0.393756 0.655787 0.66578 0.537175 0.213479 0.737071 0.155549
⋅ ⋅ 1.36309 0.394237 0.370576 0.603764 0.570361 0.414203 0.541527 0.262899
⋅ ⋅ ⋅ 1.38618 -0.0552797 0.39563 0.190204 0.00359829 0.141378 0.322178
⋅ ⋅ ⋅ ⋅ 1.53461 0.564346 0.130046 0.114851 0.20222 0.332031
⋅ ⋅ ⋅ ⋅ ⋅ 1.54097 0.301906 0.191423 0.368203 0.00617357
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.21205 0.350676 0.424222 0.0375677
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.32711 0.19541 -0.0467459
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.36569 -0.0209019
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.1683
What CUSOLVER/CUSPARSE API calls do you expect this to use? |
There is a wrapper for solving a sparse system with Cholesky factorization here: CUDA.jl/lib/cusolver/sparse.jl Lines 102 to 135 in 5c51766
It seems that this solver does what Also, according to the CUDA documentation here, there is no such thing like |
Thanks, but this is not useful in my case. I'm not sure why, but NVIDIA removed the ability to keep your factorizations. The move is good for simplicity when you're only solving one system, but not in many applications such as optimization problems, or conjugate gradient with a preconditioner. Almost anything where you're iterating. Back when I wrote in C++, I just went and used older CUDA versions and they worked fine for me. The newer ones did not have what I needed. It seems (perhaps) that CUDA.jl followed the same move by NVIDIA. I am solving a system
This may or may not be useful, but for a matrix // Symbolic factorization - happens once
csrcholInfo_t info = NULL;
cusolverSpCreateCsrcholInfo(&info);
cusolverSpXcsrcholAnalysis(handle_cusolver, M->n, nnzM, descrM, rows, cols, info);
size_t internalDataInBytes, workspaceInBytes;
cusolverSpDcsrcholBufferInfo(handle_cusolver, M->n, nnzM, descrM, vals, rows,
cols, info, &internalDataInBytes, &workspaceInBytes);
void* buffer_gpu = NULL;
cudaMalloc(&buffer_gpu, sizeof(char) * workspaceInBytes);
// Numerical factorization - happens for every different `Ax=b` system for the different `M`
cusolverSpDcsrcholFactor(
handle_cusolver, M->n, nnzM, descrM, vals, rows, cols, info, buffer_gpu);
cusolverSpDcsrcholZeroPivot(handle_cusolver, info, tol, &singularity);
// Solve that happens every conjugate gradient iteration
`cusolverSpDcsrcholSolve(handle_cusolver, M->n, x, b, info, buffer_gpu);` |
Actually, an incomplete factorization would probably suffice for my purposes, but Although still I would argue that a full factorization should be doable. In many cases a full Cholesky factorization stays sparse. |
@maleadt would you be willing to develop this functionality together so it can be part of the library? I can take care of the linear algebra, I just don't have experience with developing backend Julia functions (I do have experience with CUDA though) |
I'm totally unfamiliar with what you're trying to accomplish, so I can only offer high-level guidance. Since you know which API calls you want to perform; what we normally do first, is build a slightly more generic wrapper around these APIs, e.g., CUDA.jl/lib/cusolver/sparse.jl Lines 101 to 135 in b3c6be4
Then, we use these wrappers to write methods that integrate with interfaces/stdlibs/packages like LinearAlgebra.jl, e.g., CUDA.jl/lib/cusolver/linalg.jl Lines 481 to 485 in b3c6be4
|
@shakedregev Using It might be likely they won't be supported since the CUDA component of CHOLMOD from SuiteSparse seems to be the officially supported sparse Cholesky library for CUDA. I guess this would require depending on SuiteSparse. @maleadt I would like to contribute, but I imagine this particular direction would be more involved because of the dependence of SuiteSparse. Would you be open to contributions in this direction, and offering some high-level guidance? |
Sure, happy to offer some guidance. I'd think that this functionality would need to be a part of SuiteSparse.jl, ideally as a package extension that gets loaded when CUDA.jl is available. We've already built some artifacts, https://github.com/JuliaPackaging/Yggdrasil/tree/master/S/SuiteSparse/SuiteSparse_GPU%405, but they aren't used. |
@shakedregev |
@amontoison - Thank you! |
Can this be closed then, or is there more to this issue? |
CHOLMOD support for CUDA is very likely to be implemented as a package extension to the standard library / SuiteSparse (JuliaSparse/SparseArrays.jl#443). So, I think there should not be more to this issue and can be closed as completed. |
Describe the bug
When you factorize a GPU matrix, the output is on a CPU. I believe
https://github.com/JuliaGPU/CUDA.jl/blob/6d2e9ab2f2a1bb1df7791bb30178e0fe956940a3/lib/cusolver/linalg.jl#L484 is the only mention of Cholesky in the entire package and it looks like it's coded to work this way (just copy to CPU and factorize). This is not desired behavior, as it makes the entire GPU operation pointless.
To reproduce
The Minimal Working Example (MWE) for this bug:
Output
Manifest.toml
Expected behavior
I expect the factorization to stay on the GPU and the solution to also be on the GPU. Meaning
typeof(x)==typeof(b)
andtypeof(cholA)
should be something GPU related.Version info
Details on Julia:
Details on CUDA:
The text was updated successfully, but these errors were encountered: