Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex dot product performance #165

Open
coezmaden opened this issue Jan 20, 2021 · 0 comments
Open

Complex dot product performance #165

coezmaden opened this issue Jan 20, 2021 · 0 comments

Comments

@coezmaden
Copy link

Hi. While trying to implement a dot product between a real CuArray and a complex StructArray of CuArrays I stumbled upon these two problems:

  • Complex dot product with StructArrays of CuArrays results in scalar indexing (no out-of-the-box support).
  • Custom implementations are considerably slower than CuArrays.

This is an issue already created in JuliaGPU/CUDA.jl#667

As a quick recap: here are the tables representing the crux of the problem. Tested are several constellations of input parameters, arrays size of 10000, italic = scalar indexing.

CuArray real · real cplx · cplx real · cplx dot_parts(cplx, real) dot_padded(cplx, real)
btime w/ sync 90.900 μs 91.000 μs 362.500 ms 135.900 μs 90.299 μs
StructArray real · real cplx · cplx real · cplx dot_struct_cplx_cplx dot_struct_cplx_real
btime w/ sync N/A 750.751 ms 567.323 ms 228.099 μs 118.199 μs

MWE:

using CUDA, StructArrays, LinearAlgebra, BenchmarkTools

# Initialize the vectors
N = 10000
real_vector = CUDA.ones(Float32, N)
cplx_vector = CUDA.ones(ComplexF32, N)
CUDA.allowscalar(true) # Allow for scalar indexing for proper intitialization of StructArray
cplx_struct = StructArray(cplx_vector) 
CUDA.allowscalar(false) # Turn it off once we're done

## Custom dot product functions

# Perform dot product by taking the complex vector apart
function dot_parts(cplx_vector::CuArray, real_vector::CuArray)
    complex.(dot(real(cplx_vector), real_vector), dot(imag(cplx_vector), real_vector))
end

# Transform a real vector into a complex one by padding its imaginary part with zeros in order to invoke dotc
function dot_padded(cplx_vector::CuArray, real_vector::CuArray)
    dot(cplx_vector, complex.(real_vector, CUDA.zeros(length(real_vector))))
end

# Perform the complex dot product by dividing the StructArray into real and imaginary parts
function dot_struct_cplx_cplx(cplx_struct_1::StructArray, cplx_struct_2::StructArray)
    complex.(
        dot(cplx_struct_1.re, cplx_struct_2.re) - dot(cplx_struct_1.im, cplx_struct_2.im),
        dot(cplx_struct_1.re, cplx_struct_2.im) + dot(cplx_struct_1.im, cplx_struct_2.re)
    )
end

# Perform an ordinary dot product by dividing the StructArray into real and imaginary parts
function dot_struct_cplx_real(cplx_struct::StructArray, real_vector::CuArray)
    complex.(dot(cplx_struct.re, real_vector), dot(cplx_struct.im, real_vector))
end

## Benchmarks

@btime CUDA.@sync dot($real_vector,$real_vector)
@btime CUDA.@sync dot($cplx_vector,$cplx_vector)
@btime CUDA.@sync dot($cplx_vector,$real_vector) # scalar indexing
@btime CUDA.@sync dot_parts($cplx_vector,$real_vector)
@btime CUDA.@sync dot_padded($cplx_vector,$real_vector)
@btime CUDA.@sync dot($cplx_struct,$cplx_struct) # scalar indexing
@btime CUDA.@sync dot_struct_cplx_cplx($cplx_struct,$cplx_struct)
@btime CUDA.@sync dot($cplx_struct,$real_vector) # scalar indexing
@btime CUDA.@sync dot_struct_cplx_real($cplx_struct,$real_vector)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant