Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native GPU support #65

Open
wants to merge 52 commits into
base: master
Choose a base branch
from
Open

Native GPU support #65

wants to merge 52 commits into from

Conversation

MilesCranmer
Copy link
Member

@MilesCranmer MilesCranmer commented Feb 3, 2024

This PR adds native GPU support. This is a single CUDA kernel which evaluates an expression directly on the GPU!

This also allows one to evaluate multiple trees at once (which helps can save time in the CUDA kernel).

graphviz

TODO:

  • See whether CUDA.@captured helps at al
    • Nope...
  • Explore whether manually manipulating CUDA streams will help at all
  • See whether I need to use @sync anywhere
  • Consider adding Optim support now or later

Copy link
Contributor

github-actions bot commented Feb 3, 2024

Benchmark Results

master 3da1d38... master/3da1d38b1da79b...
eval/ComplexF32/evaluation 7.25 ± 0.53 ms 7.22 ± 0.54 ms 1
eval/ComplexF64/evaluation 10.7 ± 0.72 ms 10.5 ± 0.75 ms 1.01
eval/Float32/derivative 11.6 ± 0.84 ms 11.2 ± 0.75 ms 1.03
eval/Float32/derivative_turbo 11.5 ± 0.85 ms 11.2 ± 0.84 ms 1.03
eval/Float32/evaluation 2.7 ± 0.26 ms 2.7 ± 0.24 ms 1
eval/Float32/evaluation_bumper 0.573 ± 0.016 ms 0.567 ± 0.015 ms 1.01
eval/Float32/evaluation_turbo 0.531 ± 0.03 ms 0.522 ± 0.029 ms 1.02
eval/Float32/evaluation_turbo_bumper 0.569 ± 0.016 ms 0.567 ± 0.015 ms 1
eval/Float64/derivative 15.9 ± 0.63 ms 14.5 ± 0.57 ms 1.1
eval/Float64/derivative_turbo 15.8 ± 0.71 ms 14.2 ± 0.63 ms 1.11
eval/Float64/evaluation 3.13 ± 0.32 ms 3.11 ± 0.28 ms 1.01
eval/Float64/evaluation_bumper 1.18 ± 0.044 ms 1.17 ± 0.043 ms 1.01
eval/Float64/evaluation_turbo 1.02 ± 0.069 ms 0.994 ± 0.067 ms 1.02
eval/Float64/evaluation_turbo_bumper 1.18 ± 0.044 ms 1.18 ± 0.045 ms 1.01
utils/combine_operators/break_sharing 0.0389 ± 0.001 ms 0.0391 ± 0.00045 ms 0.995
utils/convert/break_sharing 27.9 ± 2.8 μs 26.5 ± 2.2 μs 1.05
utils/convert/preserve_sharing 0.0987 ± 0.0037 ms 0.0971 ± 0.0035 ms 1.02
utils/copy/break_sharing 28.3 ± 2.2 μs 27.2 ± 2.1 μs 1.04
utils/copy/preserve_sharing 0.0988 ± 0.0034 ms 0.0967 ± 0.0036 ms 1.02
utils/count_constant_nodes/break_sharing 8.64 ± 0.18 μs 9.07 ± 0.16 μs 0.952
utils/count_constant_nodes/preserve_sharing 0.0893 ± 0.0031 ms 0.0853 ± 0.0035 ms 1.05
utils/count_depth/break_sharing 14.3 ± 0.37 μs 9.52 ± 0.2 μs 1.5
utils/count_nodes/break_sharing 9.17 ± 0.42 μs 8.22 ± 0.21 μs 1.12
utils/count_nodes/preserve_sharing 0.0856 ± 0.0029 ms 0.0849 ± 0.0032 ms 1.01
utils/get_set_constants!/break_sharing 0.0343 ± 0.0021 ms 0.0332 ± 0.0021 ms 1.04
utils/get_set_constants!/preserve_sharing 0.175 ± 0.005 ms 0.175 ± 0.0053 ms 1
utils/get_set_constants_parametric 0.0433 ± 0.002 ms 0.0439 ± 0.0018 ms 0.988
utils/has_constants/break_sharing 4.28 ± 0.12 μs 4.1 ± 0.13 μs 1.04
utils/has_operators/break_sharing 2.25 ± 0.052 μs 2.03 ± 0.044 μs 1.11
utils/hash/break_sharing 24 ± 0.75 μs 22.8 ± 0.6 μs 1.05
utils/hash/preserve_sharing 0.0981 ± 0.0031 ms 0.0964 ± 0.0032 ms 1.02
utils/index_constant_nodes/break_sharing 25.7 ± 1.1 μs 25 ± 0.87 μs 1.03
utils/index_constant_nodes/preserve_sharing 0.0992 ± 0.0032 ms 0.0978 ± 0.0036 ms 1.01
utils/is_constant/break_sharing 3.89 ± 0.13 μs 4.42 ± 0.11 μs 0.881
utils/simplify_tree/break_sharing 0.169 ± 0.003 ms 0.166 ± 0.003 ms 1.02
utils/simplify_tree/preserve_sharing 0.225 ± 0.0043 ms 0.215 ± 0.0046 ms 1.04
utils/string_tree/break_sharing 0.451 ± 0.013 ms 0.451 ± 0.014 ms 1
utils/string_tree/preserve_sharing 0.547 ± 0.016 ms 0.559 ± 0.018 ms 0.979
time_to_load 0.229 ± 0.0053 s 0.226 ± 0.004 s 1.01

@coveralls
Copy link

coveralls commented Feb 25, 2024

Pull Request Test Coverage Report for Build 8042273246

Details

  • -2 of 137 (98.54%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 94.965%

Changes Missing Coverage Covered Lines Changed/Added Lines %
ext/DynamicExpressionsCUDAExt.jl 78 80 97.5%
Totals Coverage Status
Change from base Build 7996123220: 0.3%
Covered Lines: 1754
Relevant Lines: 1847

💛 - Coveralls

@coveralls
Copy link

coveralls commented Dec 16, 2024

Pull Request Test Coverage Report for Build 12348903122

Details

  • 129 of 132 (97.73%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 95.637%

Changes Missing Coverage Covered Lines Changed/Added Lines %
ext/DynamicExpressionsCUDAExt.jl 72 75 96.0%
Totals Coverage Status
Change from base Build 12322890369: 0.1%
Covered Lines: 2674
Relevant Lines: 2796

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants