Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] CROSSQ compatibility with compile #2554

Merged
merged 43 commits into from
Dec 14, 2024
Merged

Conversation

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Nov 12, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2554

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 3 New Failures, 19 Unrelated Failures

As of commit 4097d4c with merge base e3c3047 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Nov 12, 2024
ghstack-source-id: 223f1c7d4ffbd2086655391083875022035da567
Pull Request resolved: #2554
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 12, 2024
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 12, 2024
ghstack-source-id: 2bf32209bad331003f840cb4ac1f5b17b993c1ae
Pull Request resolved: #2554
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 12, 2024
ghstack-source-id: f2bff5f750f91f91ae303461001e63dad253e336
Pull Request resolved: #2554
@vmoens vmoens added enhancement New feature or request performance Performance issue or suggestion for improvement labels Nov 12, 2024
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 12, 2024
ghstack-source-id: 5f9e72fe8bb64a2c55647b9927ce6b35d2634c04
Pull Request resolved: #2554
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 12, 2024
ghstack-source-id: 23ca95187a3e365e0babc37cd8ec2bcb9fb788ee
Pull Request resolved: #2554
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 12, 2024
ghstack-source-id: f2060cb644945bb823b68becc5f437ebe5949c86
Pull Request resolved: #2554
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 12, 2024
ghstack-source-id: c948c4ee1925148c95bd8faee07d2e2d2b86d069
Pull Request resolved: #2554
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 12, 2024
ghstack-source-id: bd27858d0bd8b1c426ce3c65c9ddbf1d4b2b295c
Pull Request resolved: #2554
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Copy link

github-actions bot commented Dec 13, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}18$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.4337s 0.4259s 2.3478 Ops/s 2.2837 Ops/s $\color{#35bf28}+2.81\%$
test_transformed 0.6067s 0.6052s 1.6523 Ops/s 1.6232 Ops/s $\color{#35bf28}+1.80\%$
test_serial 1.3483s 1.3443s 0.7439 Ops/s 0.7343 Ops/s $\color{#35bf28}+1.31\%$
test_parallel 1.2932s 1.2851s 0.7781 Ops/s 0.7646 Ops/s $\color{#35bf28}+1.78\%$
test_step_mdp_speed[True-True-True-True-True] 0.2525ms 30.4009μs 32.8937 KOps/s 33.6226 KOps/s $\color{#d91a1a}-2.17\%$
test_step_mdp_speed[True-True-True-True-False] 44.6640μs 17.5807μs 56.8807 KOps/s 57.2343 KOps/s $\color{#d91a1a}-0.62\%$
test_step_mdp_speed[True-True-True-False-True] 47.2990μs 16.7177μs 59.8167 KOps/s 59.8560 KOps/s $\color{#d91a1a}-0.07\%$
test_step_mdp_speed[True-True-True-False-False] 49.1230μs 10.0919μs 99.0893 KOps/s 100.9279 KOps/s $\color{#d91a1a}-1.82\%$
test_step_mdp_speed[True-True-False-True-True] 0.7008ms 32.2164μs 31.0401 KOps/s 31.6259 KOps/s $\color{#d91a1a}-1.85\%$
test_step_mdp_speed[True-True-False-True-False] 47.4780μs 19.5523μs 51.1448 KOps/s 51.8067 KOps/s $\color{#d91a1a}-1.28\%$
test_step_mdp_speed[True-True-False-False-True] 78.8710μs 18.8181μs 53.1404 KOps/s 53.7177 KOps/s $\color{#d91a1a}-1.07\%$
test_step_mdp_speed[True-True-False-False-False] 39.0940μs 11.9160μs 83.9207 KOps/s 85.1856 KOps/s $\color{#d91a1a}-1.48\%$
test_step_mdp_speed[True-False-True-True-True] 76.4830μs 34.5890μs 28.9109 KOps/s 29.6810 KOps/s $\color{#d91a1a}-2.59\%$
test_step_mdp_speed[True-False-True-True-False] 58.4400μs 21.4654μs 46.5865 KOps/s 46.8733 KOps/s $\color{#d91a1a}-0.61\%$
test_step_mdp_speed[True-False-True-False-True] 54.8130μs 18.7915μs 53.2156 KOps/s 54.0957 KOps/s $\color{#d91a1a}-1.63\%$
test_step_mdp_speed[True-False-True-False-False] 52.7290μs 11.8384μs 84.4712 KOps/s 85.1939 KOps/s $\color{#d91a1a}-0.85\%$
test_step_mdp_speed[True-False-False-True-True] 70.3920μs 35.9188μs 27.8406 KOps/s 28.0168 KOps/s $\color{#d91a1a}-0.63\%$
test_step_mdp_speed[True-False-False-True-False] 62.4570μs 23.1765μs 43.1471 KOps/s 43.5368 KOps/s $\color{#d91a1a}-0.90\%$
test_step_mdp_speed[True-False-False-False-True] 97.6000μs 20.5054μs 48.7676 KOps/s 49.1222 KOps/s $\color{#d91a1a}-0.72\%$
test_step_mdp_speed[True-False-False-False-False] 62.6900μs 13.5409μs 73.8502 KOps/s 73.5540 KOps/s $\color{#35bf28}+0.40\%$
test_step_mdp_speed[False-True-True-True-True] 0.1217ms 34.3219μs 29.1359 KOps/s 29.7845 KOps/s $\color{#d91a1a}-2.18\%$
test_step_mdp_speed[False-True-True-True-False] 52.1680μs 21.4589μs 46.6007 KOps/s 46.6950 KOps/s $\color{#d91a1a}-0.20\%$
test_step_mdp_speed[False-True-True-False-True] 60.1820μs 21.4559μs 46.6072 KOps/s 47.6328 KOps/s $\color{#d91a1a}-2.15\%$
test_step_mdp_speed[False-True-True-False-False] 39.8350μs 13.0129μs 76.8468 KOps/s 76.8013 KOps/s $\color{#35bf28}+0.06\%$
test_step_mdp_speed[False-True-False-True-True] 93.4880μs 36.2511μs 27.5853 KOps/s 28.2195 KOps/s $\color{#d91a1a}-2.25\%$
test_step_mdp_speed[False-True-False-True-False] 0.5088ms 23.2850μs 42.9462 KOps/s 43.6503 KOps/s $\color{#d91a1a}-1.61\%$
test_step_mdp_speed[False-True-False-False-True] 2.6013ms 23.3901μs 42.7531 KOps/s 43.5759 KOps/s $\color{#d91a1a}-1.89\%$
test_step_mdp_speed[False-True-False-False-False] 49.1220μs 14.7675μs 67.7163 KOps/s 67.9979 KOps/s $\color{#d91a1a}-0.41\%$
test_step_mdp_speed[False-False-True-True-True] 72.5360μs 37.9412μs 26.3565 KOps/s 27.0783 KOps/s $\color{#d91a1a}-2.67\%$
test_step_mdp_speed[False-False-True-True-False] 0.2860ms 26.6261μs 37.5571 KOps/s 40.2941 KOps/s $\textbf{\color{#d91a1a}-6.79\%}$
test_step_mdp_speed[False-False-True-False-True] 72.7260μs 23.2435μs 43.0228 KOps/s 43.5925 KOps/s $\color{#d91a1a}-1.31\%$
test_step_mdp_speed[False-False-True-False-False] 63.8600μs 14.5888μs 68.5458 KOps/s 68.1220 KOps/s $\color{#35bf28}+0.62\%$
test_step_mdp_speed[False-False-False-True-True] 0.1041ms 39.2106μs 25.5033 KOps/s 25.9447 KOps/s $\color{#d91a1a}-1.70\%$
test_step_mdp_speed[False-False-False-True-False] 88.1120μs 26.5649μs 37.6436 KOps/s 38.2975 KOps/s $\color{#d91a1a}-1.71\%$
test_step_mdp_speed[False-False-False-False-True] 65.2220μs 24.6925μs 40.4982 KOps/s 41.5976 KOps/s $\color{#d91a1a}-2.64\%$
test_step_mdp_speed[False-False-False-False-False] 44.8740μs 16.3349μs 61.2188 KOps/s 61.2323 KOps/s $\color{#d91a1a}-0.02\%$
test_values[generalized_advantage_estimate-True-True] 12.8746ms 9.8349ms 101.6788 Ops/s 105.1970 Ops/s $\color{#d91a1a}-3.34\%$
test_values[vec_generalized_advantage_estimate-True-True] 36.7663ms 33.4173ms 29.9246 Ops/s 29.9970 Ops/s $\color{#d91a1a}-0.24\%$
test_values[td0_return_estimate-False-False] 0.2351ms 0.1951ms 5.1266 KOps/s 5.6339 KOps/s $\textbf{\color{#d91a1a}-9.00\%}$
test_values[td1_return_estimate-False-False] 26.5699ms 24.0456ms 41.5877 Ops/s 41.4115 Ops/s $\color{#35bf28}+0.43\%$
test_values[vec_td1_return_estimate-False-False] 35.3315ms 33.4590ms 29.8873 Ops/s 29.8571 Ops/s $\color{#35bf28}+0.10\%$
test_values[td_lambda_return_estimate-True-False] 37.3030ms 34.5551ms 28.9393 Ops/s 28.5867 Ops/s $\color{#35bf28}+1.23\%$
test_values[vec_td_lambda_return_estimate-True-False] 36.5204ms 33.4190ms 29.9231 Ops/s 29.9112 Ops/s $\color{#35bf28}+0.04\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.4903ms 8.3632ms 119.5708 Ops/s 119.2695 Ops/s $\color{#35bf28}+0.25\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.2213ms 1.9595ms 510.3226 Ops/s 540.6720 Ops/s $\textbf{\color{#d91a1a}-5.61\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4930ms 0.3597ms 2.7799 KOps/s 2.8052 KOps/s $\color{#d91a1a}-0.90\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 41.9641ms 40.2819ms 24.8251 Ops/s 22.9512 Ops/s $\textbf{\color{#35bf28}+8.16\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.0706ms 3.0330ms 329.7029 Ops/s 321.5080 Ops/s $\color{#35bf28}+2.55\%$
test_dqn_speed[False-None] 5.8816ms 1.3614ms 734.5518 Ops/s 705.3775 Ops/s $\color{#35bf28}+4.14\%$
test_dqn_speed[False-backward] 1.9195ms 1.8282ms 546.9830 Ops/s 537.1512 Ops/s $\color{#35bf28}+1.83\%$
test_dqn_speed[True-None] 0.6969ms 0.4610ms 2.1694 KOps/s 2.0998 KOps/s $\color{#35bf28}+3.32\%$
test_dqn_speed[True-backward] 1.0320ms 0.8821ms 1.1337 KOps/s 1.0635 KOps/s $\textbf{\color{#35bf28}+6.60\%}$
test_dqn_speed[reduce-overhead-None] 0.7585ms 0.4616ms 2.1663 KOps/s 2.1113 KOps/s $\color{#35bf28}+2.60\%$
test_dqn_speed[reduce-overhead-backward] 0.9326ms 0.8830ms 1.1324 KOps/s 1.1207 KOps/s $\color{#35bf28}+1.04\%$
test_ddpg_speed[False-None] 4.4887ms 2.8363ms 352.5773 Ops/s 350.3643 Ops/s $\color{#35bf28}+0.63\%$
test_ddpg_speed[False-backward] 4.1201ms 3.9762ms 251.4991 Ops/s 246.2713 Ops/s $\color{#35bf28}+2.12\%$
test_ddpg_speed[True-None] 1.3587ms 0.9862ms 1.0140 KOps/s 964.0202 Ops/s $\textbf{\color{#35bf28}+5.19\%}$
test_ddpg_speed[True-backward] 1.9474ms 1.8656ms 536.0282 Ops/s 517.9822 Ops/s $\color{#35bf28}+3.48\%$
test_ddpg_speed[reduce-overhead-None] 1.3300ms 0.9851ms 1.0151 KOps/s 952.1633 Ops/s $\textbf{\color{#35bf28}+6.61\%}$
test_ddpg_speed[reduce-overhead-backward] 2.0381ms 1.8780ms 532.4690 Ops/s 521.9410 Ops/s $\color{#35bf28}+2.02\%$
test_sac_speed[False-None] 8.4287ms 7.8414ms 127.5284 Ops/s 123.6678 Ops/s $\color{#35bf28}+3.12\%$
test_sac_speed[False-backward] 10.9925ms 10.6000ms 94.3395 Ops/s 92.2933 Ops/s $\color{#35bf28}+2.22\%$
test_sac_speed[True-None] 2.2710ms 1.8199ms 549.4700 Ops/s 528.8859 Ops/s $\color{#35bf28}+3.89\%$
test_sac_speed[True-backward] 3.7264ms 3.5304ms 283.2527 Ops/s 282.0210 Ops/s $\color{#35bf28}+0.44\%$
test_sac_speed[reduce-overhead-None] 2.0771ms 1.8107ms 552.2663 Ops/s 531.4599 Ops/s $\color{#35bf28}+3.91\%$
test_sac_speed[reduce-overhead-backward] 3.8077ms 3.5247ms 283.7091 Ops/s 282.8451 Ops/s $\color{#35bf28}+0.31\%$
test_redq_speed[False-None] 14.4981ms 12.6346ms 79.1479 Ops/s 76.4023 Ops/s $\color{#35bf28}+3.59\%$
test_redq_speed[False-backward] 24.3209ms 21.9578ms 45.5420 Ops/s 43.3316 Ops/s $\textbf{\color{#35bf28}+5.10\%}$
test_redq_speed[True-None] 5.5026ms 4.5322ms 220.6414 Ops/s 211.8905 Ops/s $\color{#35bf28}+4.13\%$
test_redq_speed[True-backward] 12.2903ms 11.9789ms 83.4798 Ops/s 83.2785 Ops/s $\color{#35bf28}+0.24\%$
test_redq_speed[reduce-overhead-None] 5.1208ms 4.5055ms 221.9493 Ops/s 209.7288 Ops/s $\textbf{\color{#35bf28}+5.83\%}$
test_redq_speed[reduce-overhead-backward] 13.3697ms 11.8760ms 84.2038 Ops/s 82.1415 Ops/s $\color{#35bf28}+2.51\%$
test_redq_deprec_speed[False-None] 13.6041ms 12.6274ms 79.1929 Ops/s 75.7387 Ops/s $\color{#35bf28}+4.56\%$
test_redq_deprec_speed[False-backward] 21.1380ms 18.3852ms 54.3915 Ops/s 52.5548 Ops/s $\color{#35bf28}+3.49\%$
test_redq_deprec_speed[True-None] 4.2110ms 3.5477ms 281.8751 Ops/s 273.8397 Ops/s $\color{#35bf28}+2.93\%$
test_redq_deprec_speed[True-backward] 8.5375ms 7.9964ms 125.0566 Ops/s 123.9487 Ops/s $\color{#35bf28}+0.89\%$
test_redq_deprec_speed[reduce-overhead-None] 4.1937ms 3.5559ms 281.2239 Ops/s 273.8788 Ops/s $\color{#35bf28}+2.68\%$
test_redq_deprec_speed[reduce-overhead-backward] 8.2958ms 7.9983ms 125.0261 Ops/s 123.6359 Ops/s $\color{#35bf28}+1.12\%$
test_td3_speed[False-None] 8.2490ms 7.8319ms 127.6832 Ops/s 123.4462 Ops/s $\color{#35bf28}+3.43\%$
test_td3_speed[False-backward] 10.6173ms 10.2304ms 97.7482 Ops/s 96.1763 Ops/s $\color{#35bf28}+1.63\%$
test_td3_speed[True-None] 1.8768ms 1.6947ms 590.0790 Ops/s 559.5773 Ops/s $\textbf{\color{#35bf28}+5.45\%}$
test_td3_speed[True-backward] 3.3553ms 3.2990ms 303.1224 Ops/s 300.2305 Ops/s $\color{#35bf28}+0.96\%$
test_td3_speed[reduce-overhead-None] 1.8128ms 1.6915ms 591.1933 Ops/s 561.6766 Ops/s $\textbf{\color{#35bf28}+5.26\%}$
test_td3_speed[reduce-overhead-backward] 3.4157ms 3.3163ms 301.5453 Ops/s 298.5176 Ops/s $\color{#35bf28}+1.01\%$
test_cql_speed[False-None] 38.8132ms 35.7669ms 27.9588 Ops/s 27.2202 Ops/s $\color{#35bf28}+2.71\%$
test_cql_speed[False-backward] 49.3344ms 46.2618ms 21.6161 Ops/s 21.4983 Ops/s $\color{#35bf28}+0.55\%$
test_cql_speed[True-None] 16.6340ms 15.4960ms 64.5328 Ops/s 63.1865 Ops/s $\color{#35bf28}+2.13\%$
test_cql_speed[True-backward] 23.1229ms 22.3131ms 44.8168 Ops/s 44.7723 Ops/s $\color{#35bf28}+0.10\%$
test_cql_speed[reduce-overhead-None] 15.8714ms 15.4276ms 64.8190 Ops/s 61.7076 Ops/s $\textbf{\color{#35bf28}+5.04\%}$
test_cql_speed[reduce-overhead-backward] 23.1799ms 22.3138ms 44.8152 Ops/s 45.0426 Ops/s $\color{#d91a1a}-0.50\%$
test_a2c_speed[False-None] 8.0117ms 7.1361ms 140.1331 Ops/s 138.6721 Ops/s $\color{#35bf28}+1.05\%$
test_a2c_speed[False-backward] 15.6702ms 14.2028ms 70.4085 Ops/s 70.8536 Ops/s $\color{#d91a1a}-0.63\%$
test_a2c_speed[True-None] 4.9437ms 4.1888ms 238.7334 Ops/s 230.2351 Ops/s $\color{#35bf28}+3.69\%$
test_a2c_speed[True-backward] 11.3561ms 10.6015ms 94.3265 Ops/s 93.3242 Ops/s $\color{#35bf28}+1.07\%$
test_a2c_speed[reduce-overhead-None] 4.7174ms 4.2078ms 237.6560 Ops/s 229.9070 Ops/s $\color{#35bf28}+3.37\%$
test_a2c_speed[reduce-overhead-backward] 11.3941ms 10.9692ms 91.1645 Ops/s 93.8109 Ops/s $\color{#d91a1a}-2.82\%$
test_ppo_speed[False-None] 8.0367ms 7.5139ms 133.0875 Ops/s 132.6579 Ops/s $\color{#35bf28}+0.32\%$
test_ppo_speed[False-backward] 15.2588ms 14.7467ms 67.8118 Ops/s 67.2954 Ops/s $\color{#35bf28}+0.77\%$
test_ppo_speed[True-None] 3.9878ms 3.6743ms 272.1589 Ops/s 262.5272 Ops/s $\color{#35bf28}+3.67\%$
test_ppo_speed[True-backward] 10.5418ms 9.5283ms 104.9500 Ops/s 104.5547 Ops/s $\color{#35bf28}+0.38\%$
test_ppo_speed[reduce-overhead-None] 4.4003ms 3.6512ms 273.8796 Ops/s 260.6202 Ops/s $\textbf{\color{#35bf28}+5.09\%}$
test_ppo_speed[reduce-overhead-backward] 9.8904ms 9.4572ms 105.7396 Ops/s 104.1605 Ops/s $\color{#35bf28}+1.52\%$
test_reinforce_speed[False-None] 7.8398ms 6.4963ms 153.9340 Ops/s 152.2633 Ops/s $\color{#35bf28}+1.10\%$
test_reinforce_speed[False-backward] 11.0026ms 9.7604ms 102.4549 Ops/s 101.2804 Ops/s $\color{#35bf28}+1.16\%$
test_reinforce_speed[True-None] 3.2073ms 2.6101ms 383.1341 Ops/s 361.6470 Ops/s $\textbf{\color{#35bf28}+5.94\%}$
test_reinforce_speed[True-backward] 9.2331ms 8.5207ms 117.3614 Ops/s 116.4739 Ops/s $\color{#35bf28}+0.76\%$
test_reinforce_speed[reduce-overhead-None] 3.1095ms 2.6222ms 381.3527 Ops/s 362.3793 Ops/s $\textbf{\color{#35bf28}+5.24\%}$
test_reinforce_speed[reduce-overhead-backward] 9.3402ms 8.4833ms 117.8782 Ops/s 115.8745 Ops/s $\color{#35bf28}+1.73\%$
test_iql_speed[False-None] 33.8853ms 31.8863ms 31.3615 Ops/s 30.7601 Ops/s $\color{#35bf28}+1.96\%$
test_iql_speed[False-backward] 53.1637ms 45.2816ms 22.0840 Ops/s 21.4574 Ops/s $\color{#35bf28}+2.92\%$
test_iql_speed[True-None] 11.2131ms 10.4792ms 95.4274 Ops/s 88.4864 Ops/s $\textbf{\color{#35bf28}+7.84\%}$
test_iql_speed[True-backward] 22.2086ms 21.4084ms 46.7106 Ops/s 44.9644 Ops/s $\color{#35bf28}+3.88\%$
test_iql_speed[reduce-overhead-None] 11.8648ms 10.5363ms 94.9102 Ops/s 89.3061 Ops/s $\textbf{\color{#35bf28}+6.28\%}$
test_iql_speed[reduce-overhead-backward] 23.5878ms 21.8541ms 45.7580 Ops/s 46.4079 Ops/s $\color{#d91a1a}-1.40\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.4216ms 5.0217ms 199.1354 Ops/s 199.9268 Ops/s $\color{#d91a1a}-0.40\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8694ms 0.5131ms 1.9489 KOps/s 1.9447 KOps/s $\color{#35bf28}+0.21\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8294ms 0.4881ms 2.0490 KOps/s 2.0478 KOps/s $\color{#35bf28}+0.06\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.7684ms 4.8654ms 205.5310 Ops/s 209.1624 Ops/s $\color{#d91a1a}-1.74\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.3711s 0.7917ms 1.2632 KOps/s 1.9880 KOps/s $\textbf{\color{#d91a1a}-36.46\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8048ms 0.4888ms 2.0460 KOps/s 2.0769 KOps/s $\color{#d91a1a}-1.49\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.5143ms 1.6320ms 612.7446 Ops/s 599.4978 Ops/s $\color{#35bf28}+2.21\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.4644ms 1.5826ms 631.8852 Ops/s 627.5665 Ops/s $\color{#35bf28}+0.69\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.8407ms 4.9063ms 203.8213 Ops/s 200.6578 Ops/s $\color{#35bf28}+1.58\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.4427ms 0.6345ms 1.5760 KOps/s 1.5323 KOps/s $\color{#35bf28}+2.85\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.0535ms 0.6184ms 1.6171 KOps/s 1.6110 KOps/s $\color{#35bf28}+0.38\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.9639ms 4.7628ms 209.9589 Ops/s 204.8208 Ops/s $\color{#35bf28}+2.51\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.2017ms 0.5160ms 1.9380 KOps/s 1.9179 KOps/s $\color{#35bf28}+1.05\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8035ms 0.4881ms 2.0487 KOps/s 2.0470 KOps/s $\color{#35bf28}+0.08\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.1377ms 4.7679ms 209.7381 Ops/s 207.8877 Ops/s $\color{#35bf28}+0.89\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0187ms 0.4961ms 2.0158 KOps/s 1.9735 KOps/s $\color{#35bf28}+2.14\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7842ms 0.4789ms 2.0879 KOps/s 2.1080 KOps/s $\color{#d91a1a}-0.95\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.3962ms 4.8460ms 206.3543 Ops/s 202.0537 Ops/s $\color{#35bf28}+2.13\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.2479ms 0.6779ms 1.4752 KOps/s 1.5520 KOps/s $\color{#d91a1a}-4.94\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9256ms 0.6216ms 1.6089 KOps/s 1.5931 KOps/s $\color{#35bf28}+0.99\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.4049s 12.2119ms 81.8874 Ops/s 232.7908 Ops/s $\textbf{\color{#d91a1a}-64.82\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 8.8838ms 2.3406ms 427.2338 Ops/s 418.8316 Ops/s $\color{#35bf28}+2.01\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 6.5855ms 1.3457ms 743.0822 Ops/s 774.2786 Ops/s $\color{#d91a1a}-4.03\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 5.5002ms 4.1365ms 241.7505 Ops/s 246.3200 Ops/s $\color{#d91a1a}-1.86\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 4.9245ms 2.2511ms 444.2310 Ops/s 411.7735 Ops/s $\textbf{\color{#35bf28}+7.88\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.6089ms 1.4364ms 696.1644 Ops/s 719.6078 Ops/s $\color{#d91a1a}-3.26\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.3703s 11.7437ms 85.1518 Ops/s 236.9826 Ops/s $\textbf{\color{#d91a1a}-64.07\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 7.2817ms 2.3720ms 421.5773 Ops/s 418.9173 Ops/s $\color{#35bf28}+0.63\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.8494ms 1.2924ms 773.7296 Ops/s 690.3143 Ops/s $\textbf{\color{#35bf28}+12.08\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 11.6056ms 10.8688ms 92.0065 Ops/s 86.0440 Ops/s $\textbf{\color{#35bf28}+6.93\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 15.9401ms 14.8539ms 67.3224 Ops/s 66.2257 Ops/s $\color{#35bf28}+1.66\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 19.8921ms 19.4828ms 51.3272 Ops/s 47.9250 Ops/s $\textbf{\color{#35bf28}+7.10\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 17.1325ms 15.2186ms 65.7092 Ops/s 64.4901 Ops/s $\color{#35bf28}+1.89\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 21.3551ms 19.4525ms 51.4072 Ops/s 48.9736 Ops/s $\color{#35bf28}+4.97\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 17.3356ms 16.2907ms 61.3848 Ops/s 60.5817 Ops/s $\color{#35bf28}+1.33\%$

Copy link

github-actions bot commented Dec 13, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.7587s 0.7523s 1.3293 Ops/s 1.3203 Ops/s $\color{#35bf28}+0.68\%$
test_transformed 1.0080s 1.0063s 0.9937 Ops/s 0.9841 Ops/s $\color{#35bf28}+0.97\%$
test_serial 2.1755s 2.1687s 0.4611 Ops/s 0.4624 Ops/s $\color{#d91a1a}-0.28\%$
test_parallel 2.0207s 1.9934s 0.5017 Ops/s 0.5159 Ops/s $\color{#d91a1a}-2.76\%$
test_step_mdp_speed[True-True-True-True-True] 0.1790ms 40.3887μs 24.7594 KOps/s 25.5226 KOps/s $\color{#d91a1a}-2.99\%$
test_step_mdp_speed[True-True-True-True-False] 51.6810μs 23.0477μs 43.3882 KOps/s 43.4379 KOps/s $\color{#d91a1a}-0.11\%$
test_step_mdp_speed[True-True-True-False-True] 47.1000μs 21.9800μs 45.4958 KOps/s 45.9450 KOps/s $\color{#d91a1a}-0.98\%$
test_step_mdp_speed[True-True-True-False-False] 42.3510μs 12.8620μs 77.7485 KOps/s 78.5317 KOps/s $\color{#d91a1a}-1.00\%$
test_step_mdp_speed[True-True-False-True-True] 81.2710μs 42.1576μs 23.7205 KOps/s 23.4992 KOps/s $\color{#35bf28}+0.94\%$
test_step_mdp_speed[True-True-False-True-False] 55.9010μs 24.6919μs 40.4992 KOps/s 40.1605 KOps/s $\color{#35bf28}+0.84\%$
test_step_mdp_speed[True-True-False-False-True] 56.6610μs 23.9005μs 41.8402 KOps/s 40.9375 KOps/s $\color{#35bf28}+2.21\%$
test_step_mdp_speed[True-True-False-False-False] 49.8810μs 14.9361μs 66.9518 KOps/s 66.8183 KOps/s $\color{#35bf28}+0.20\%$
test_step_mdp_speed[True-False-True-True-True] 71.2610μs 44.3926μs 22.5263 KOps/s 22.3666 KOps/s $\color{#35bf28}+0.71\%$
test_step_mdp_speed[True-False-True-True-False] 52.6210μs 27.1861μs 36.7835 KOps/s 36.8108 KOps/s $\color{#d91a1a}-0.07\%$
test_step_mdp_speed[True-False-True-False-True] 54.1610μs 23.7332μs 42.1351 KOps/s 41.0579 KOps/s $\color{#35bf28}+2.62\%$
test_step_mdp_speed[True-False-True-False-False] 36.8500μs 14.9370μs 66.9478 KOps/s 67.0128 KOps/s $\color{#d91a1a}-0.10\%$
test_step_mdp_speed[True-False-False-True-True] 75.6210μs 46.2288μs 21.6315 KOps/s 21.4144 KOps/s $\color{#35bf28}+1.01\%$
test_step_mdp_speed[True-False-False-True-False] 59.9900μs 29.6352μs 33.7436 KOps/s 34.1761 KOps/s $\color{#d91a1a}-1.27\%$
test_step_mdp_speed[True-False-False-False-True] 68.6210μs 26.0150μs 38.4394 KOps/s 38.0420 KOps/s $\color{#35bf28}+1.04\%$
test_step_mdp_speed[True-False-False-False-False] 45.7810μs 17.0368μs 58.6965 KOps/s 58.4227 KOps/s $\color{#35bf28}+0.47\%$
test_step_mdp_speed[False-True-True-True-True] 79.7710μs 44.1883μs 22.6304 KOps/s 22.4351 KOps/s $\color{#35bf28}+0.87\%$
test_step_mdp_speed[False-True-True-True-False] 52.8010μs 27.2903μs 36.6430 KOps/s 36.7749 KOps/s $\color{#d91a1a}-0.36\%$
test_step_mdp_speed[False-True-True-False-True] 60.7210μs 28.1946μs 35.4678 KOps/s 35.6835 KOps/s $\color{#d91a1a}-0.60\%$
test_step_mdp_speed[False-True-True-False-False] 54.3710μs 16.6605μs 60.0222 KOps/s 60.3722 KOps/s $\color{#d91a1a}-0.58\%$
test_step_mdp_speed[False-True-False-True-True] 76.2210μs 46.3382μs 21.5804 KOps/s 20.8374 KOps/s $\color{#35bf28}+3.57\%$
test_step_mdp_speed[False-True-False-True-False] 64.4210μs 29.7012μs 33.6687 KOps/s 33.6925 KOps/s $\color{#d91a1a}-0.07\%$
test_step_mdp_speed[False-True-False-False-True] 3.1827ms 30.4522μs 32.8384 KOps/s 33.5654 KOps/s $\color{#d91a1a}-2.17\%$
test_step_mdp_speed[False-True-False-False-False] 59.6110μs 18.8480μs 53.0559 KOps/s 53.6938 KOps/s $\color{#d91a1a}-1.19\%$
test_step_mdp_speed[False-False-True-True-True] 78.6310μs 49.0647μs 20.3812 KOps/s 20.7693 KOps/s $\color{#d91a1a}-1.87\%$
test_step_mdp_speed[False-False-True-True-False] 58.0700μs 31.8140μs 31.4327 KOps/s 32.1424 KOps/s $\color{#d91a1a}-2.21\%$
test_step_mdp_speed[False-False-True-False-True] 62.7110μs 29.7607μs 33.6014 KOps/s 33.7346 KOps/s $\color{#d91a1a}-0.40\%$
test_step_mdp_speed[False-False-True-False-False] 49.2010μs 19.0154μs 52.5890 KOps/s 53.2429 KOps/s $\color{#d91a1a}-1.23\%$
test_step_mdp_speed[False-False-False-True-True] 87.7220μs 49.9854μs 20.0058 KOps/s 19.7114 KOps/s $\color{#35bf28}+1.49\%$
test_step_mdp_speed[False-False-False-True-False] 62.8510μs 33.7150μs 29.6604 KOps/s 29.8981 KOps/s $\color{#d91a1a}-0.80\%$
test_step_mdp_speed[False-False-False-False-True] 70.6810μs 31.8498μs 31.3974 KOps/s 31.8203 KOps/s $\color{#d91a1a}-1.33\%$
test_step_mdp_speed[False-False-False-False-False] 0.1237ms 20.1593μs 49.6048 KOps/s 49.0252 KOps/s $\color{#35bf28}+1.18\%$
test_values[generalized_advantage_estimate-True-True] 25.9452ms 25.4218ms 39.3363 Ops/s 37.5646 Ops/s $\color{#35bf28}+4.72\%$
test_values[vec_generalized_advantage_estimate-True-True] 95.7589ms 2.8190ms 354.7390 Ops/s 366.9100 Ops/s $\color{#d91a1a}-3.32\%$
test_values[td0_return_estimate-False-False] 0.1063ms 81.6249μs 12.2512 KOps/s 12.1656 KOps/s $\color{#35bf28}+0.70\%$
test_values[td1_return_estimate-False-False] 60.1652ms 57.6856ms 17.3354 Ops/s 16.6732 Ops/s $\color{#35bf28}+3.97\%$
test_values[vec_td1_return_estimate-False-False] 1.3296ms 1.1134ms 898.1340 Ops/s 913.2627 Ops/s $\color{#d91a1a}-1.66\%$
test_values[td_lambda_return_estimate-True-False] 95.5651ms 93.5506ms 10.6894 Ops/s 10.4476 Ops/s $\color{#35bf28}+2.31\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2372ms 1.0847ms 921.9424 Ops/s 910.4203 Ops/s $\color{#35bf28}+1.27\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 26.4731ms 25.1219ms 39.8060 Ops/s 39.3962 Ops/s $\color{#35bf28}+1.04\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0580ms 0.7644ms 1.3081 KOps/s 1.2985 KOps/s $\color{#35bf28}+0.74\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7722ms 0.6783ms 1.4742 KOps/s 1.4620 KOps/s $\color{#35bf28}+0.84\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5316ms 1.4893ms 671.4626 Ops/s 669.0949 Ops/s $\color{#35bf28}+0.35\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7294ms 0.6925ms 1.4439 KOps/s 1.4362 KOps/s $\color{#35bf28}+0.54\%$
test_dqn_speed[False-None] 7.0048ms 1.5331ms 652.2839 Ops/s 654.9320 Ops/s $\color{#d91a1a}-0.40\%$
test_dqn_speed[False-backward] 2.3522ms 2.1373ms 467.8828 Ops/s 467.5568 Ops/s $\color{#35bf28}+0.07\%$
test_dqn_speed[True-None] 0.9459ms 0.5301ms 1.8864 KOps/s 1.8440 KOps/s $\color{#35bf28}+2.30\%$
test_dqn_speed[True-backward] 1.1541ms 1.0854ms 921.2823 Ops/s 824.5154 Ops/s $\textbf{\color{#35bf28}+11.74\%}$
test_dqn_speed[reduce-overhead-None] 0.9791ms 0.5468ms 1.8289 KOps/s 1.7923 KOps/s $\color{#35bf28}+2.04\%$
test_dqn_speed[reduce-overhead-backward] 1.0113ms 0.9516ms 1.0508 KOps/s 923.6206 Ops/s $\textbf{\color{#35bf28}+13.77\%}$
test_ddpg_speed[False-None] 3.2492ms 2.8351ms 352.7186 Ops/s 345.9491 Ops/s $\color{#35bf28}+1.96\%$
test_ddpg_speed[False-backward] 4.5573ms 4.1108ms 243.2602 Ops/s 232.6394 Ops/s $\color{#35bf28}+4.57\%$
test_ddpg_speed[True-None] 1.2181ms 1.0866ms 920.3108 Ops/s 920.2962 Ops/s $+0.00\%$
test_ddpg_speed[True-backward] 2.2059ms 2.1500ms 465.1166 Ops/s 457.7874 Ops/s $\color{#35bf28}+1.60\%$
test_ddpg_speed[reduce-overhead-None] 1.2573ms 1.0964ms 912.0932 Ops/s 894.0213 Ops/s $\color{#35bf28}+2.02\%$
test_ddpg_speed[reduce-overhead-backward] 1.7223ms 1.6477ms 606.9179 Ops/s 595.2274 Ops/s $\color{#35bf28}+1.96\%$
test_sac_speed[False-None] 8.6210ms 8.1928ms 122.0589 Ops/s 121.1488 Ops/s $\color{#35bf28}+0.75\%$
test_sac_speed[False-backward] 11.3625ms 11.1281ms 89.8624 Ops/s 88.6559 Ops/s $\color{#35bf28}+1.36\%$
test_sac_speed[True-None] 2.0425ms 1.5278ms 654.5450 Ops/s 642.6798 Ops/s $\color{#35bf28}+1.85\%$
test_sac_speed[True-backward] 3.3479ms 3.2504ms 307.6562 Ops/s 303.5506 Ops/s $\color{#35bf28}+1.35\%$
test_sac_speed[reduce-overhead-None] 23.3397ms 12.7220ms 78.6038 Ops/s 80.3375 Ops/s $\color{#d91a1a}-2.16\%$
test_sac_speed[reduce-overhead-backward] 1.4992ms 1.3513ms 740.0440 Ops/s 651.4199 Ops/s $\textbf{\color{#35bf28}+13.60\%}$
test_redq_speed[False-None] 8.2905ms 7.5165ms 133.0411 Ops/s 131.8210 Ops/s $\color{#35bf28}+0.93\%$
test_redq_speed[False-backward] 12.1533ms 11.4261ms 87.5188 Ops/s 84.1528 Ops/s $\color{#35bf28}+4.00\%$
test_redq_speed[True-None] 2.5003ms 1.9778ms 505.6210 Ops/s 487.6573 Ops/s $\color{#35bf28}+3.68\%$
test_redq_speed[True-backward] 4.1410ms 3.6958ms 270.5749 Ops/s 257.7664 Ops/s $\color{#35bf28}+4.97\%$
test_redq_speed[reduce-overhead-None] 2.1694ms 2.0114ms 497.1766 Ops/s 491.9121 Ops/s $\color{#35bf28}+1.07\%$
test_redq_speed[reduce-overhead-backward] 4.0773ms 3.6637ms 272.9479 Ops/s 266.2029 Ops/s $\color{#35bf28}+2.53\%$
test_redq_deprec_speed[False-None] 9.7190ms 9.0148ms 110.9290 Ops/s 107.8063 Ops/s $\color{#35bf28}+2.90\%$
test_redq_deprec_speed[False-backward] 12.7067ms 12.0805ms 82.7783 Ops/s 80.1780 Ops/s $\color{#35bf28}+3.24\%$
test_redq_deprec_speed[True-None] 3.0325ms 2.3280ms 429.5456 Ops/s 405.7798 Ops/s $\textbf{\color{#35bf28}+5.86\%}$
test_redq_deprec_speed[True-backward] 4.8212ms 4.0007ms 249.9586 Ops/s 236.0896 Ops/s $\textbf{\color{#35bf28}+5.87\%}$
test_redq_deprec_speed[reduce-overhead-None] 2.7816ms 2.3298ms 429.2175 Ops/s 421.6147 Ops/s $\color{#35bf28}+1.80\%$
test_redq_deprec_speed[reduce-overhead-backward] 4.3967ms 4.0357ms 247.7909 Ops/s 245.5629 Ops/s $\color{#35bf28}+0.91\%$
test_td3_speed[False-None] 8.0656ms 7.8896ms 126.7490 Ops/s 124.3337 Ops/s $\color{#35bf28}+1.94\%$
test_td3_speed[False-backward] 10.8730ms 10.2435ms 97.6226 Ops/s 95.3539 Ops/s $\color{#35bf28}+2.38\%$
test_td3_speed[True-None] 1.6437ms 1.5581ms 641.7993 Ops/s 631.1539 Ops/s $\color{#35bf28}+1.69\%$
test_td3_speed[True-backward] 3.2612ms 3.1238ms 320.1231 Ops/s 316.9758 Ops/s $\color{#35bf28}+0.99\%$
test_td3_speed[reduce-overhead-None] 83.1313ms 25.6729ms 38.9516 Ops/s 37.0633 Ops/s $\textbf{\color{#35bf28}+5.09\%}$
test_td3_speed[reduce-overhead-backward] 1.6975ms 1.3093ms 763.7407 Ops/s 674.1733 Ops/s $\textbf{\color{#35bf28}+13.29\%}$
test_cql_speed[False-None] 17.5617ms 16.9101ms 59.1362 Ops/s 58.4278 Ops/s $\color{#35bf28}+1.21\%$
test_cql_speed[False-backward] 22.4276ms 21.9156ms 45.6296 Ops/s 44.0102 Ops/s $\color{#35bf28}+3.68\%$
test_cql_speed[True-None] 3.3905ms 2.9345ms 340.7772 Ops/s 338.6616 Ops/s $\color{#35bf28}+0.62\%$
test_cql_speed[True-backward] 5.7648ms 5.3312ms 187.5740 Ops/s 187.8231 Ops/s $\color{#d91a1a}-0.13\%$
test_cql_speed[reduce-overhead-None] 21.8775ms 13.3136ms 75.1112 Ops/s 75.5508 Ops/s $\color{#d91a1a}-0.58\%$
test_cql_speed[reduce-overhead-backward] 1.8283ms 1.7042ms 586.7795 Ops/s 654.2772 Ops/s $\textbf{\color{#d91a1a}-10.32\%}$
test_a2c_speed[False-None] 3.4782ms 3.2031ms 312.2012 Ops/s 305.1170 Ops/s $\color{#35bf28}+2.32\%$
test_a2c_speed[False-backward] 6.8432ms 6.4681ms 154.6058 Ops/s 158.8410 Ops/s $\color{#d91a1a}-2.67\%$
test_a2c_speed[True-None] 1.0653ms 1.0017ms 998.2974 Ops/s 981.8624 Ops/s $\color{#35bf28}+1.67\%$
test_a2c_speed[True-backward] 3.2781ms 2.7860ms 358.9370 Ops/s 378.3326 Ops/s $\textbf{\color{#d91a1a}-5.13\%}$
test_a2c_speed[reduce-overhead-None] 21.9900ms 11.7478ms 85.1222 Ops/s 86.2605 Ops/s $\color{#d91a1a}-1.32\%$
test_a2c_speed[reduce-overhead-backward] 1.2142ms 1.1254ms 888.6002 Ops/s 1.0027 KOps/s $\textbf{\color{#d91a1a}-11.38\%}$
test_ppo_speed[False-None] 3.8138ms 3.7082ms 269.6759 Ops/s 266.0279 Ops/s $\color{#35bf28}+1.37\%$
test_ppo_speed[False-backward] 7.6378ms 7.2317ms 138.2795 Ops/s 142.6594 Ops/s $\color{#d91a1a}-3.07\%$
test_ppo_speed[True-None] 1.1016ms 0.9512ms 1.0513 KOps/s 1.0388 KOps/s $\color{#35bf28}+1.20\%$
test_ppo_speed[True-backward] 2.8988ms 2.7172ms 368.0278 Ops/s 386.2461 Ops/s $\color{#d91a1a}-4.72\%$
test_ppo_speed[reduce-overhead-None] 0.5719ms 0.5026ms 1.9896 KOps/s 1.9043 KOps/s $\color{#35bf28}+4.48\%$
test_ppo_speed[reduce-overhead-backward] 1.2311ms 1.1138ms 897.8463 Ops/s 1.0043 KOps/s $\textbf{\color{#d91a1a}-10.60\%}$
test_reinforce_speed[False-None] 2.3693ms 2.2614ms 442.1943 Ops/s 433.2842 Ops/s $\color{#35bf28}+2.06\%$
test_reinforce_speed[False-backward] 4.1997ms 3.5262ms 283.5940 Ops/s 296.2403 Ops/s $\color{#d91a1a}-4.27\%$
test_reinforce_speed[True-None] 0.8919ms 0.8274ms 1.2087 KOps/s 1.1892 KOps/s $\color{#35bf28}+1.64\%$
test_reinforce_speed[True-backward] 2.7136ms 2.5745ms 388.4261 Ops/s 405.8189 Ops/s $\color{#d91a1a}-4.29\%$
test_reinforce_speed[reduce-overhead-None] 22.8295ms 11.9136ms 83.9374 Ops/s 85.9185 Ops/s $\color{#d91a1a}-2.31\%$
test_reinforce_speed[reduce-overhead-backward] 1.3095ms 1.1783ms 848.6951 Ops/s 937.1067 Ops/s $\textbf{\color{#d91a1a}-9.43\%}$
test_iql_speed[False-None] 11.1872ms 9.5482ms 104.7321 Ops/s 106.9931 Ops/s $\color{#d91a1a}-2.11\%$
test_iql_speed[False-backward] 13.9753ms 13.4931ms 74.1119 Ops/s 75.9106 Ops/s $\color{#d91a1a}-2.37\%$
test_iql_speed[True-None] 1.9441ms 1.7628ms 567.2883 Ops/s 575.3671 Ops/s $\color{#d91a1a}-1.40\%$
test_iql_speed[True-backward] 4.8402ms 4.4454ms 224.9499 Ops/s 236.2949 Ops/s $\color{#d91a1a}-4.80\%$
test_iql_speed[reduce-overhead-None] 15.5004ms 9.1849ms 108.8738 Ops/s 88.5062 Ops/s $\textbf{\color{#35bf28}+23.01\%}$
test_iql_speed[reduce-overhead-backward] 1.7270ms 1.6080ms 621.8867 Ops/s 697.1118 Ops/s $\textbf{\color{#d91a1a}-10.79\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 8.0337ms 6.4569ms 154.8724 Ops/s 151.1496 Ops/s $\color{#35bf28}+2.46\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.5790ms 0.3123ms 3.2018 KOps/s 2.7919 KOps/s $\textbf{\color{#35bf28}+14.68\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5246ms 0.2935ms 3.4070 KOps/s 3.3421 KOps/s $\color{#35bf28}+1.94\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.4154ms 6.1904ms 161.5417 Ops/s 158.5236 Ops/s $\color{#35bf28}+1.90\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.2359ms 0.2872ms 3.4817 KOps/s 3.6917 KOps/s $\textbf{\color{#d91a1a}-5.69\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5977ms 0.2901ms 3.4469 KOps/s 3.4469 KOps/s $-0.00\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5258ms 1.3208ms 757.0914 Ops/s 765.3835 Ops/s $\color{#d91a1a}-1.08\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4586ms 1.2339ms 810.4472 Ops/s 795.9494 Ops/s $\color{#35bf28}+1.82\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.4469ms 6.3492ms 157.5011 Ops/s 153.7703 Ops/s $\color{#35bf28}+2.43\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.9723ms 0.4372ms 2.2871 KOps/s 2.0541 KOps/s $\textbf{\color{#35bf28}+11.34\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7596ms 0.4428ms 2.2585 KOps/s 2.1513 KOps/s $\color{#35bf28}+4.98\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.5218ms 6.1986ms 161.3266 Ops/s 158.5828 Ops/s $\color{#35bf28}+1.73\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7915ms 0.3467ms 2.8845 KOps/s 3.0872 KOps/s $\textbf{\color{#d91a1a}-6.57\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7264ms 0.3381ms 2.9576 KOps/s 3.7328 KOps/s $\textbf{\color{#d91a1a}-20.77\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.3814ms 6.1215ms 163.3585 Ops/s 158.6538 Ops/s $\color{#35bf28}+2.97\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7630ms 0.3600ms 2.7775 KOps/s 2.7728 KOps/s $\color{#35bf28}+0.17\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5207ms 0.2424ms 4.1246 KOps/s 2.7035 KOps/s $\textbf{\color{#35bf28}+52.56\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.5429ms 6.3440ms 157.6300 Ops/s 154.2960 Ops/s $\color{#35bf28}+2.16\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0446ms 0.4205ms 2.3784 KOps/s 1.9195 KOps/s $\textbf{\color{#35bf28}+23.91\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.5927ms 0.3915ms 2.5542 KOps/s 2.2101 KOps/s $\textbf{\color{#35bf28}+15.57\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.9307ms 5.2977ms 188.7597 Ops/s 186.6370 Ops/s $\color{#35bf28}+1.14\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 10.6997ms 2.0792ms 480.9573 Ops/s 440.9258 Ops/s $\textbf{\color{#35bf28}+9.08\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 6.0072ms 1.2200ms 819.6667 Ops/s 781.9367 Ops/s $\color{#35bf28}+4.83\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.1369ms 5.3516ms 186.8615 Ops/s 189.0266 Ops/s $\color{#d91a1a}-1.15\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 5.4512ms 1.9852ms 503.7187 Ops/s 491.4945 Ops/s $\color{#35bf28}+2.49\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 10.3094ms 1.3047ms 766.4746 Ops/s 816.3016 Ops/s $\textbf{\color{#d91a1a}-6.10\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5200s 15.8761ms 62.9878 Ops/s 32.1068 Ops/s $\textbf{\color{#35bf28}+96.18\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 9.2136ms 2.1179ms 472.1547 Ops/s 526.6434 Ops/s $\textbf{\color{#d91a1a}-10.35\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.3966ms 1.1630ms 859.8752 Ops/s 700.3528 Ops/s $\textbf{\color{#35bf28}+22.78\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.3929ms 13.0795ms 76.4556 Ops/s 74.5987 Ops/s $\color{#35bf28}+2.49\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.4941ms 17.3309ms 57.7003 Ops/s 55.9462 Ops/s $\color{#35bf28}+3.14\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 18.1073ms 17.4905ms 57.1738 Ops/s 54.9612 Ops/s $\color{#35bf28}+4.03\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.7925ms 17.5711ms 56.9116 Ops/s 56.8260 Ops/s $\color{#35bf28}+0.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 17.6518ms 17.3879ms 57.5112 Ops/s 56.0174 Ops/s $\color{#35bf28}+2.67\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.4824ms 18.9167ms 52.8633 Ops/s 52.4650 Ops/s $\color{#35bf28}+0.76\%$

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@vmoens vmoens merged commit 4097d4c into gh/vmoens/37/base Dec 14, 2024
3 checks passed
vmoens added a commit that referenced this pull request Dec 14, 2024
ghstack-source-id: 98a2b30e8f6a1b0bc583a9f3c51adc2634eb8028
Pull Request resolved: #2554
@vmoens vmoens deleted the gh/vmoens/37/head branch December 14, 2024 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request performance Performance issue or suggestion for improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants