Returning Tuple of Sparse Matrices is Broken #2570

rbassett3 · 2024-11-29T04:17:25Z

Describe the bug

I've been experiencing a subtle bug in CUDA.jl related to tuples of sparse matrices which have 0 in one of their dimensions. I was finally able to condense it down to the MWE below.

To reproduce

The Minimal Working Example (MWE) for this bug:

julia> using CUDA, CUDA.CUSPARSE, SparseArrays
julia> CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]))), CuArray{Float32}([1.0, 1.0, 1.0])

(Error showing value of type Tuple{CuSparseMatrixCSR{Float32, Int32}, CuArray{Float32, 1, CUDA.DeviceMemory}}:
ERROR: ArgumentError: 1 == colptr[1] != 1

Expected behavior

Using SparseArrays, we get:

julia> SparseMatrixCSC{Float32}(sparse([], [], Array{Float32}([]))), Array{Float32}([1.0, 1.0, 1.0])
(sparse(Int64[], Int64[], Float32[], 0, 0), Float32[1.0, 1.0, 1.0])

Using AMDGPU, we get:

julia> ROCSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]))), ROCArray{Float32}([1.0, 1.0, 1.0])
(sparse(Int32[], Int32[], Float32[], 0, 0), Float32[1.0, 1.0, 1.0])

This seems like the correct behavior, and is what I was expecting.

Version info

Details on Julia:

Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × AMD Ryzen Threadripper PRO 5975WX 32-Cores
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 64 virtual cores)
Environment:

Details on CUDA:

julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.5
NVIDIA driver 555.42.6

CUDA libraries: 
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+555.42.6

Julia packages: 
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0

Toolchain:
- Julia: 1.11.1
- LLVM: 16.0.6

1 device:
  0: NVIDIA TITAN V (sm_70, 11.432 GiB / 12.000 GiB available)

The text was updated successfully, but these errors were encountered:

amontoison · 2024-11-29T14:43:51Z

For a sparse matrix A of size m x n, the vector A.colptr should be of length n+1 (CSC format) and the vector A.rowptr should be of length m+1 (CSR format).
We can't have an empty vector A.*ptr (length 1 in the worst case)
On CPU, you can check that we have the same restriction when we create a sparse matrix with spzeros of size 0 x 0.

rbassett3 · 2024-11-29T21:42:48Z

@amontoison: Of course you are correct, but that's not the issue here. Recall that the sparse constructor in the MWE wants input in COO format, i.e. I, J, and V such that A[I[k], J[k]] = V[k]. So there isn't even an opportunity for the user to input a colptr of the wrong length. All the vectors provided have length nnz(A).

In summary, the conversion of COO -> CuSparseCSR leaves the user with an incorrectly sized colptr when nnz(A) = 0. The other packages I mentioned (AMDGPU and SparseArrays) handle this case correctly.

amontoison · 2024-11-29T22:23:37Z

Sorry, I missed the call to sparse in the constructor of CuSparseMatrixCSC.
It should not be hard to fix, I will have a look.

amontoison · 2024-12-02T17:15:46Z

Ok, I isolated the issue:

A = SparseMatrixCSC(CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]))))
ERROR: ArgumentError: 1 == colptr[1] != 1

amontoison · 2024-12-02T18:32:13Z

@rbassett3 The issue is in the CUDA routines, if we have an empty sparse matrix with one-based indexing, we need A.colptr = [1] or A.rowptr[1] to ensure that A.colptr[end] == nnz(A) + 1 or A.rowptr[end] == nnz(A) + 1 .
We have the same behaviour than zero-based indexing...
NVIDIA forgot these corner cases.

I fixed the issue in #2575 but we should open a ticket on NVIDIA's website.

rbassett3 · 2024-12-02T19:05:51Z

@amontoison You are the expert. I've been trying to wrap my head around this issue since i submitted it but am not having any luck. Here are some more examples I've constructed feeling out the corner cases.

CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 0, 1)), CuArray{Float32}([1.0, 1.0, 1.0]) <- Doesn't work.
CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 1, 0)), CuArray{Float32}([1.0, 1.0, 1.0]) <- Doesn't work. If either dimension is zero it causes an error.
CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 1, 1)), CuArray{Float32}([1.0, 1.0, 1.0]) <- Works. Indicating it's a zero dimension issue
CuSparseMatrixCSC{Float32}(sparse([], [], Array{Float32}([]), 0, 0)), CuArray{Float32}([1.0, 1.0, 1.0]) <- Works. Indicating it's with CSR and not CSC

The really strange thing to me is that all the examples above work when the second argument (the CuArray) is removed. I.e. both

CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 0, 1))
CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 1, 0))
work. My understanding is that this is because the tuple calls the function show and the single matrix return calls print? I think show converts to COO before displaying and print does not, but again I'm just familiarizing myself with this code base.

amontoison · 2024-12-02T19:26:04Z

I tested your corner cases and it works with my local branch (#2575).

julia> CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 0, 1)), CuArray{Float32}([1.0, 1.0, 1.0])

(sparse(Int32[], Int32[], Float32[], 0, 1), Float32[1.0, 1.0, 1.0])

julia> 

julia> CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 1, 0)), CuArray{Float32}([1.0, 1.0, 1.0])
(sparse(Int32[], Int32[], Float32[], 1, 0), Float32[1.0, 1.0, 1.0])

julia> CuSparseMatrixCSR{Float32}(sparse([], [], Array{Float32}([]), 1, 1)), CuArray{Float32}([1.0, 1.0, 1.0])
(sparse(Int32[], Int32[], Float32[], 1, 1), Float32[1.0, 1.0, 1.0])

julia> CuSparseMatrixCSC{Float32}(sparse([], [], Array{Float32}([]), 0, 0)), CuArray{Float32}([1.0, 1.0, 1.0])
(sparse(Int32[], Int32[], Float32[], 0, 0), Float32[1.0, 1.0, 1.0])

We convert back to SparseMatrixCSC when we call show. The code is here:
https://github.com/JuliaGPU/CUDA.jl/blob/master/lib/cusparse/array.jl#L597-L644

I think you are right and print is called instead of show when only the sparse matrix is displayed.
We need to ensure that empty matrices on GPU can always by converted to CSC format on CPU.

rbassett3 added the bug Something isn't working label Nov 29, 2024

amontoison mentioned this issue Dec 2, 2024

[CUSPARSE] Fix constructor of sparse empty matrices #2575

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Returning Tuple of Sparse Matrices is Broken #2570

Returning Tuple of Sparse Matrices is Broken #2570

rbassett3 commented Nov 29, 2024 •

edited

Loading

amontoison commented Nov 29, 2024 •

edited

Loading

rbassett3 commented Nov 29, 2024

amontoison commented Nov 29, 2024

amontoison commented Dec 2, 2024 •

edited

Loading

amontoison commented Dec 2, 2024 •

edited

Loading

rbassett3 commented Dec 2, 2024

amontoison commented Dec 2, 2024 •

edited

Loading

Returning Tuple of Sparse Matrices is Broken #2570

Returning Tuple of Sparse Matrices is Broken #2570

Comments

rbassett3 commented Nov 29, 2024 • edited Loading

amontoison commented Nov 29, 2024 • edited Loading

rbassett3 commented Nov 29, 2024

amontoison commented Nov 29, 2024

amontoison commented Dec 2, 2024 • edited Loading

amontoison commented Dec 2, 2024 • edited Loading

rbassett3 commented Dec 2, 2024

amontoison commented Dec 2, 2024 • edited Loading

rbassett3 commented Nov 29, 2024 •

edited

Loading

amontoison commented Nov 29, 2024 •

edited

Loading

amontoison commented Dec 2, 2024 •

edited

Loading

amontoison commented Dec 2, 2024 •

edited

Loading

amontoison commented Dec 2, 2024 •

edited

Loading