An implementation/port of  Merrill and Garland's Merge-based Parallel Sparse Matrix-Vector Multiplication (10.1109/SC.2016.57)  paper in 
the  
    
        
        Julia Programming Language
    
     
ParallelMergeCSR allows you to perform multithreaded CSC formatted sparse Matrix multiplication against dense vectors and matrices as long as the sparse Matrix has had a transpose or adjoint operation applied to it via LinearAlgebra, built-in to Julia Base. The reason for this is the original algorithm was restricted to CSR formatted sparse Matrices but by taking the transpose of a CSC matrix you've created a CSR representation of the same matrix.
ParallelMergeCSR only has one function intended for use: mul!, which is used for both Sparse Matrix - Dense Vector and Sparse Matrix - Dense Matrix multiplication. The function is not exported by default to avoid conflict with LinearAlgebra.mul! and must be invoked through its fully qualified name ParallelMergeCSR.mul!.
ParallelMergeCSR takes advantage of the Polyester.jl package for multithreading support. Unfortunately, Polyester has known issues with working on Mac OS and ParallelMergeCSR should be used on Linux platforms only.
You can install ParallelMergeCSR through the Julia package manager interface (just hit ] when you're in the Julia prompt) by typing the following:
add ParallelMergeCSR
Start Julia with the desired number of threads by launching it with the -t/--threads argument:
julia --threads <number_of_threads>
or setting JULIA_NUM_THREADS in your environment and then running Julia.
You can confirm Julia is using the specified number of threads via:
julia> Threads.nthreads()You can then use ParallelMergeCSR in a similar fashion to the example below:
julia> using ParallelMergeCSR, SparseArrays
# create a 20x20 transposed CSC-formatted Sparse matrix with a 30% chance of values appearing
julia> A = transpose(sprand(20, 20, 0.3));
# dense vector (can be a matrix too)
julia> B = rand(size(A, 2));
# output
julia> C = rand(size(A, 2));
# coefficients
julia> α = -1.0; β = 2.9;
# perform the operation C = ABα + Cβ, mutating C to store the answer
julia> ParallelMergeCSR.mul!(C, A, B, α, β)