Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning Tuple of Sparse Matrices is Broken #2570

Open
rbassett3 opened this issue Nov 29, 2024 · 7 comments
Open

Returning Tuple of Sparse Matrices is Broken #2570

rbassett3 opened this issue Nov 29, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@rbassett3
Copy link

rbassett3 commented Nov 29, 2024

Describe the bug

I've been experiencing a subtle bug in CUDA.jl related to tuples of sparse matrices which have 0 in one of their dimensions. I was finally able to condense it down to the MWE below.

To reproduce

The Minimal Working Example (MWE) for this bug:

julia> using CUDA, CUDA.CUSPARSE, SparseArrays
julia> CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]))), CuArray{Float32}([1.0, 1.0, 1.0])

(Error showing value of type Tuple{CuSparseMatrixCSR{Float32, Int32}, CuArray{Float32, 1, CUDA.DeviceMemory}}:
ERROR: ArgumentError: 1 == colptr[1] != 1

Expected behavior

Using SparseArrays, we get:

julia> SparseMatrixCSC{Float32}(sparse([], [], Array{Float32}([]))), Array{Float32}([1.0, 1.0, 1.0])
(sparse(Int64[], Int64[], Float32[], 0, 0), Float32[1.0, 1.0, 1.0])

Using AMDGPU, we get:

julia> ROCSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]))), ROCArray{Float32}([1.0, 1.0, 1.0])
(sparse(Int32[], Int32[], Float32[], 0, 0), Float32[1.0, 1.0, 1.0])

This seems like the correct behavior, and is what I was expecting.

Version info

Details on Julia:

Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × AMD Ryzen Threadripper PRO 5975WX 32-Cores
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 64 virtual cores)
Environment:

Details on CUDA:

julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.5
NVIDIA driver 555.42.6

CUDA libraries: 
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+555.42.6

Julia packages: 
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0

Toolchain:
- Julia: 1.11.1
- LLVM: 16.0.6

1 device:
  0: NVIDIA TITAN V (sm_70, 11.432 GiB / 12.000 GiB available)

@rbassett3 rbassett3 added the bug Something isn't working label Nov 29, 2024
@amontoison
Copy link
Member

amontoison commented Nov 29, 2024

For a sparse matrix A of size m x n, the vector A.colptr should be of length n+1 (CSC format) and the vector A.rowptr should be of length m+1 (CSR format).
We can't have an empty vector A.*ptr (length 1 in the worst case)
On CPU, you can check that we have the same restriction when we create a sparse matrix with spzeros of size 0 x 0.

@rbassett3
Copy link
Author

@amontoison: Of course you are correct, but that's not the issue here. Recall that the sparse constructor in the MWE wants input in COO format, i.e. I, J, and V such that A[I[k], J[k]] = V[k]. So there isn't even an opportunity for the user to input a colptr of the wrong length. All the vectors provided have length nnz(A).

In summary, the conversion of COO -> CuSparseCSR leaves the user with an incorrectly sized colptr when nnz(A) = 0. The other packages I mentioned (AMDGPU and SparseArrays) handle this case correctly.

@amontoison
Copy link
Member

Sorry, I missed the call to sparse in the constructor of CuSparseMatrixCSC.
It should not be hard to fix, I will have a look.

@amontoison
Copy link
Member

amontoison commented Dec 2, 2024

Ok, I isolated the issue:

A = SparseMatrixCSC(CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]))))
ERROR: ArgumentError: 1 == colptr[1] != 1

@amontoison
Copy link
Member

amontoison commented Dec 2, 2024

@rbassett3 The issue is in the CUDA routines, if we have an empty sparse matrix with one-based indexing, we need A.colptr = [1] or A.rowptr[1] to ensure that A.colptr[end] == nnz(A) + 1 or A.rowptr[end] == nnz(A) + 1 .
We have the same behaviour than zero-based indexing...
NVIDIA forgot these corner cases.

I fixed the issue in #2575 but we should open a ticket on NVIDIA's website.

@rbassett3
Copy link
Author

@amontoison You are the expert. I've been trying to wrap my head around this issue since i submitted it but am not having any luck. Here are some more examples I've constructed feeling out the corner cases.

  • CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 0, 1)), CuArray{Float32}([1.0, 1.0, 1.0]) <- Doesn't work.
  • CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 1, 0)), CuArray{Float32}([1.0, 1.0, 1.0]) <- Doesn't work. If either dimension is zero it causes an error.
  • CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 1, 1)), CuArray{Float32}([1.0, 1.0, 1.0]) <- Works. Indicating it's a zero dimension issue
  • CuSparseMatrixCSC{Float32}(sparse([], [], Array{Float32}([]), 0, 0)), CuArray{Float32}([1.0, 1.0, 1.0]) <- Works. Indicating it's with CSR and not CSC

The really strange thing to me is that all the examples above work when the second argument (the CuArray) is removed. I.e. both

  • CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 0, 1))
  • CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 1, 0))
    work. My understanding is that this is because the tuple calls the function show and the single matrix return calls print? I think show converts to COO before displaying and print does not, but again I'm just familiarizing myself with this code base.

@amontoison
Copy link
Member

amontoison commented Dec 2, 2024

I tested your corner cases and it works with my local branch (#2575).

julia> CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 0, 1)), CuArray{Float32}([1.0, 1.0, 1.0])

(sparse(Int32[], Int32[], Float32[], 0, 1), Float32[1.0, 1.0, 1.0])

julia> 

julia> CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 1, 0)), CuArray{Float32}([1.0, 1.0, 1.0])
(sparse(Int32[], Int32[], Float32[], 1, 0), Float32[1.0, 1.0, 1.0])

julia> CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 1, 1)), CuArray{Float32}([1.0, 1.0, 1.0])
(sparse(Int32[], Int32[], Float32[], 1, 1), Float32[1.0, 1.0, 1.0])

julia> CuSparseMatrixCSC{Float32}(sparse([], [], Array{Float32}([]), 0, 0)), CuArray{Float32}([1.0, 1.0, 1.0])
(sparse(Int32[], Int32[], Float32[], 0, 0), Float32[1.0, 1.0, 1.0])

We convert back to SparseMatrixCSC when we call show. The code is here:
https://github.com/JuliaGPU/CUDA.jl/blob/master/lib/cusparse/array.jl#L597-L644

I think you are right and print is called instead of show when only the sparse matrix is displayed.
We need to ensure that empty matrices on GPU can always by converted to CSC format on CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants