Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enzyme: fix propagation of runtime activity #534

Merged
merged 1 commit into from
Oct 4, 2024
Merged

Enzyme: fix propagation of runtime activity #534

merged 1 commit into from
Oct 4, 2024

Conversation

wsmoses
Copy link
Collaborator

@wsmoses wsmoses commented Sep 29, 2024

Required for chmy if differentiating an existing differentiated kernel

Copy link
Contributor

Benchmark Results

main e679d50... main/e679d50b98b05b...
saxpy/default/Float16/1024 2.78 ± 0.2 μs 2.79 ± 0.19 μs 0.994
saxpy/default/Float16/1048576 2.08 ± 0.0064 ms 2.08 ± 0.0046 ms 1
saxpy/default/Float16/16384 0.0328 ± 0.00014 ms 0.0328 ± 0.00014 ms 1
saxpy/default/Float16/2048 5.21 ± 0.041 μs 5.22 ± 0.038 μs 0.997
saxpy/default/Float16/256 1 ± 0.11 μs 0.963 ± 0.074 μs 1.04
saxpy/default/Float16/262144 0.524 ± 0.0095 ms 0.524 ± 0.0095 ms 1
saxpy/default/Float16/32768 0.065 ± 0.00017 ms 0.0651 ± 0.00018 ms 1
saxpy/default/Float16/4096 10.1 ± 0.05 μs 10.1 ± 0.06 μs 0.998
saxpy/default/Float16/512 1.57 ± 0.16 μs 1.57 ± 0.05 μs 0.994
saxpy/default/Float16/64 0.622 ± 0.016 μs 0.618 ± 0.016 μs 1.01
saxpy/default/Float16/65536 0.129 ± 0.00033 ms 0.129 ± 0.00034 ms 1
saxpy/default/Float32/1024 1.02 ± 0.012 μs 1.03 ± 0.016 μs 0.993
saxpy/default/Float32/1048576 0.963 ± 0.0075 ms 0.966 ± 0.0081 ms 0.997
saxpy/default/Float32/16384 15.4 ± 0.12 μs 15.5 ± 0.13 μs 0.997
saxpy/default/Float32/2048 1.71 ± 0.019 μs 1.74 ± 0.024 μs 0.986
saxpy/default/Float32/256 0.532 ± 0.12 μs 0.53 ± 0.13 μs 1
saxpy/default/Float32/262144 0.238 ± 0.0093 ms 0.238 ± 0.0098 ms 0.998
saxpy/default/Float32/32768 30.3 ± 0.15 μs 30.4 ± 0.18 μs 0.999
saxpy/default/Float32/4096 3.03 ± 0.024 μs 3.02 ± 0.025 μs 1
saxpy/default/Float32/512 0.695 ± 0.11 μs 0.696 ± 0.12 μs 1
saxpy/default/Float32/64 0.427 ± 0.0058 μs 0.414 ± 0.0048 μs 1.03
saxpy/default/Float32/65536 0.0602 ± 0.00039 ms 0.0601 ± 0.00028 ms 1
saxpy/default/Float64/1024 1.06 ± 0.018 μs 1.06 ± 0.02 μs 1
saxpy/default/Float64/1048576 1.02 ± 0.028 ms 1.02 ± 0.024 ms 0.997
saxpy/default/Float64/16384 15.8 ± 0.21 μs 15.8 ± 0.28 μs 0.997
saxpy/default/Float64/2048 1.75 ± 0.023 μs 1.76 ± 0.029 μs 0.995
saxpy/default/Float64/256 0.526 ± 0.0087 μs 0.515 ± 0.012 μs 1.02
saxpy/default/Float64/262144 0.243 ± 0.01 ms 0.243 ± 0.01 ms 0.999
saxpy/default/Float64/32768 30.9 ± 0.61 μs 31 ± 0.43 μs 0.997
saxpy/default/Float64/4096 3.06 ± 0.045 μs 3.05 ± 0.041 μs 1
saxpy/default/Float64/512 0.7 ± 0.11 μs 0.7 ± 0.11 μs 0.999
saxpy/default/Float64/64 0.397 ± 0.0066 μs 0.393 ± 0.0081 μs 1.01
saxpy/default/Float64/65536 0.0626 ± 0.0022 ms 0.0613 ± 0.00077 ms 1.02
saxpy/static workgroup=(1024,)/Float16/1024 2.12 ± 0.21 μs 2.11 ± 0.21 μs 1
saxpy/static workgroup=(1024,)/Float16/1048576 0.16 ± 0.0098 ms 0.17 ± 0.017 ms 0.945
saxpy/static workgroup=(1024,)/Float16/16384 4.36 ± 0.24 μs 4.35 ± 0.22 μs 1
saxpy/static workgroup=(1024,)/Float16/2048 2.15 ± 0.23 μs 2.13 ± 0.22 μs 1.01
saxpy/static workgroup=(1024,)/Float16/256 2.66 ± 0.037 μs 2.63 ± 0.037 μs 1.01
saxpy/static workgroup=(1024,)/Float16/262144 0.043 ± 0.0026 ms 0.0437 ± 0.0037 ms 0.985
saxpy/static workgroup=(1024,)/Float16/32768 6.6 ± 0.28 μs 6.62 ± 0.25 μs 0.996
saxpy/static workgroup=(1024,)/Float16/4096 2.43 ± 0.034 μs 2.44 ± 0.033 μs 0.998
saxpy/static workgroup=(1024,)/Float16/512 3.18 ± 0.061 μs 3.14 ± 0.086 μs 1.01
saxpy/static workgroup=(1024,)/Float16/64 2.27 ± 0.021 μs 2.24 ± 0.022 μs 1.01
saxpy/static workgroup=(1024,)/Float16/65536 12.7 ± 0.67 μs 12.5 ± 0.44 μs 1.02
saxpy/static workgroup=(1024,)/Float32/1024 1.97 ± 0.026 μs 1.96 ± 0.025 μs 1
saxpy/static workgroup=(1024,)/Float32/1048576 0.268 ± 0.025 ms 0.264 ± 0.03 ms 1.02
saxpy/static workgroup=(1024,)/Float32/16384 4.12 ± 0.93 μs 4.12 ± 0.9 μs 1
saxpy/static workgroup=(1024,)/Float32/2048 2.31 ± 0.22 μs 2.28 ± 0.22 μs 1.01
saxpy/static workgroup=(1024,)/Float32/256 2.65 ± 0.42 μs 2.65 ± 0.45 μs 1
saxpy/static workgroup=(1024,)/Float32/262144 0.0654 ± 0.0059 ms 0.0594 ± 0.0083 ms 1.1
saxpy/static workgroup=(1024,)/Float32/32768 7.08 ± 0.38 μs 7.16 ± 0.57 μs 0.988
saxpy/static workgroup=(1024,)/Float32/4096 2.59 ± 0.2 μs 2.57 ± 0.2 μs 1.01
saxpy/static workgroup=(1024,)/Float32/512 2.48 ± 0.23 μs 2.47 ± 0.22 μs 1.01
saxpy/static workgroup=(1024,)/Float32/64 2.43 ± 0.051 μs 2.42 ± 0.052 μs 1
saxpy/static workgroup=(1024,)/Float32/65536 16.8 ± 1.4 μs 16.5 ± 2.8 μs 1.02
saxpy/static workgroup=(1024,)/Float64/1024 2.06 ± 0.028 μs 2.06 ± 0.029 μs 1
saxpy/static workgroup=(1024,)/Float64/1048576 0.594 ± 0.065 ms 0.578 ± 0.071 ms 1.03
saxpy/static workgroup=(1024,)/Float64/16384 7.06 ± 1.3 μs 7.2 ± 1.4 μs 0.981
saxpy/static workgroup=(1024,)/Float64/2048 2.54 ± 0.26 μs 2.54 ± 0.25 μs 1
saxpy/static workgroup=(1024,)/Float64/256 2.41 ± 0.055 μs 2.41 ± 0.055 μs 0.998
saxpy/static workgroup=(1024,)/Float64/262144 0.108 ± 0.013 ms 0.127 ± 0.013 ms 0.849
saxpy/static workgroup=(1024,)/Float64/32768 16.9 ± 1.7 μs 16.5 ± 2.4 μs 1.02
saxpy/static workgroup=(1024,)/Float64/4096 3.02 ± 0.35 μs 2.99 ± 0.35 μs 1.01
saxpy/static workgroup=(1024,)/Float64/512 2.39 ± 0.041 μs 2.41 ± 0.046 μs 0.994
saxpy/static workgroup=(1024,)/Float64/64 2.37 ± 0.077 μs 2.38 ± 0.082 μs 0.995
saxpy/static workgroup=(1024,)/Float64/65536 0.0318 ± 0.0045 ms 0.0337 ± 0.0041 ms 0.944
time_to_load 0.313 ± 0.0023 s 0.316 ± 0.0022 s 0.993

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@vchuravy
Copy link
Member

We have to figure out the compat issue, currently there is no real reason for KA to stop supporting 1.6, besides the Enzyme extension

@wsmoses
Copy link
Collaborator Author

wsmoses commented Sep 30, 2024

I think the compat issue was introduced previously (presumably by me on the 0.13 enzyme PR). We should definitely fix it, but I presume this diff shouldn’t change things for it

@vchuravy vchuravy merged commit 27ded01 into main Oct 4, 2024
17 of 36 checks passed
@vchuravy vchuravy deleted the rta branch October 4, 2024 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants