Complex dot product performance #165

coezmaden · 2021-01-20T10:26:33Z

Hi. While trying to implement a dot product between a real CuArray and a complex StructArray of CuArrays I stumbled upon these two problems:

Complex dot product with StructArrays of CuArrays results in scalar indexing (no out-of-the-box support).
Custom implementations are considerably slower than CuArrays.

This is an issue already created in JuliaGPU/CUDA.jl#667

As a quick recap: here are the tables representing the crux of the problem. Tested are several constellations of input parameters, arrays size of 10000, italic = scalar indexing.

CuArray	real · real	cplx · cplx	real · cplx	dot_parts(cplx, real)	dot_padded(cplx, real)
btime w/ sync	90.900 μs	91.000 μs	362.500 ms	135.900 μs	90.299 μs

StructArray	real · real	cplx · cplx	real · cplx	dot_struct_cplx_cplx	dot_struct_cplx_real
btime w/ sync	N/A	750.751 ms	567.323 ms	228.099 μs	118.199 μs

MWE:

using CUDA, StructArrays, LinearAlgebra, BenchmarkTools

# Initialize the vectors
N = 10000
real_vector = CUDA.ones(Float32, N)
cplx_vector = CUDA.ones(ComplexF32, N)
CUDA.allowscalar(true) # Allow for scalar indexing for proper intitialization of StructArray
cplx_struct = StructArray(cplx_vector) 
CUDA.allowscalar(false) # Turn it off once we're done

## Custom dot product functions

# Perform dot product by taking the complex vector apart
function dot_parts(cplx_vector::CuArray, real_vector::CuArray)
    complex.(dot(real(cplx_vector), real_vector), dot(imag(cplx_vector), real_vector))
end

# Transform a real vector into a complex one by padding its imaginary part with zeros in order to invoke dotc
function dot_padded(cplx_vector::CuArray, real_vector::CuArray)
    dot(cplx_vector, complex.(real_vector, CUDA.zeros(length(real_vector))))
end

# Perform the complex dot product by dividing the StructArray into real and imaginary parts
function dot_struct_cplx_cplx(cplx_struct_1::StructArray, cplx_struct_2::StructArray)
    complex.(
        dot(cplx_struct_1.re, cplx_struct_2.re) - dot(cplx_struct_1.im, cplx_struct_2.im),
        dot(cplx_struct_1.re, cplx_struct_2.im) + dot(cplx_struct_1.im, cplx_struct_2.re)
    )
end

# Perform an ordinary dot product by dividing the StructArray into real and imaginary parts
function dot_struct_cplx_real(cplx_struct::StructArray, real_vector::CuArray)
    complex.(dot(cplx_struct.re, real_vector), dot(cplx_struct.im, real_vector))
end

## Benchmarks

@btime CUDA.@sync dot($real_vector,$real_vector)
@btime CUDA.@sync dot($cplx_vector,$cplx_vector)
@btime CUDA.@sync dot($cplx_vector,$real_vector) # scalar indexing
@btime CUDA.@sync dot_parts($cplx_vector,$real_vector)
@btime CUDA.@sync dot_padded($cplx_vector,$real_vector)
@btime CUDA.@sync dot($cplx_struct,$cplx_struct) # scalar indexing
@btime CUDA.@sync dot_struct_cplx_cplx($cplx_struct,$cplx_struct)
@btime CUDA.@sync dot($cplx_struct,$real_vector) # scalar indexing
@btime CUDA.@sync dot_struct_cplx_real($cplx_struct,$real_vector)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complex dot product performance #165

Complex dot product performance #165

coezmaden commented Jan 20, 2021

Complex dot product performance #165

Complex dot product performance #165

Comments

coezmaden commented Jan 20, 2021