Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Avoid cloning trajs in SliceSampler #2671

Merged
merged 2 commits into from
Dec 20, 2024

Conversation

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Dec 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2671

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures, 6 Unrelated Failures

As of commit 5aacaea with merge base 133d709 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Dec 19, 2024
ghstack-source-id: ac4a85a7dba5b045af980bfafaf1da95fb2c6198
Pull Request resolved: #2671
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 19, 2024
Copy link

github-actions bot commented Dec 19, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.4275s 0.4251s 2.3523 Ops/s 2.2431 Ops/s $\color{#35bf28}+4.87\%$
test_transformed 0.6030s 0.6004s 1.6656 Ops/s 1.6323 Ops/s $\color{#35bf28}+2.04\%$
test_serial 1.3587s 1.3523s 0.7395 Ops/s 0.7384 Ops/s $\color{#35bf28}+0.15\%$
test_parallel 1.2936s 1.2029s 0.8314 Ops/s 0.8193 Ops/s $\color{#35bf28}+1.47\%$
test_step_mdp_speed[True-True-True-True-True] 0.2378ms 30.5244μs 32.7607 KOps/s 32.3722 KOps/s $\color{#35bf28}+1.20\%$
test_step_mdp_speed[True-True-True-True-False] 59.9530μs 17.8934μs 55.8866 KOps/s 55.4773 KOps/s $\color{#35bf28}+0.74\%$
test_step_mdp_speed[True-True-True-False-True] 68.3180μs 17.4405μs 57.3379 KOps/s 57.7134 KOps/s $\color{#d91a1a}-0.65\%$
test_step_mdp_speed[True-True-True-False-False] 46.8280μs 10.2340μs 97.7135 KOps/s 97.7619 KOps/s $\color{#d91a1a}-0.05\%$
test_step_mdp_speed[True-True-False-True-True] 86.2310μs 33.0947μs 30.2163 KOps/s 30.4463 KOps/s $\color{#d91a1a}-0.76\%$
test_step_mdp_speed[True-True-False-True-False] 67.8770μs 19.7898μs 50.5312 KOps/s 49.9752 KOps/s $\color{#35bf28}+1.11\%$
test_step_mdp_speed[True-True-False-False-True] 64.8210μs 19.0743μs 52.4266 KOps/s 51.4760 KOps/s $\color{#35bf28}+1.85\%$
test_step_mdp_speed[True-True-False-False-False] 66.3540μs 12.0378μs 83.0717 KOps/s 82.6284 KOps/s $\color{#35bf28}+0.54\%$
test_step_mdp_speed[True-False-True-True-True] 97.0420μs 35.0065μs 28.5661 KOps/s 28.6256 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[True-False-True-True-False] 74.5490μs 22.1658μs 45.1146 KOps/s 45.2184 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[True-False-True-False-True] 57.6190μs 19.1029μs 52.3481 KOps/s 51.6075 KOps/s $\color{#35bf28}+1.44\%$
test_step_mdp_speed[True-False-True-False-False] 62.9380μs 11.9927μs 83.3843 KOps/s 81.8577 KOps/s $\color{#35bf28}+1.86\%$
test_step_mdp_speed[True-False-False-True-True] 78.6480μs 36.1670μs 27.6495 KOps/s 27.1968 KOps/s $\color{#35bf28}+1.66\%$
test_step_mdp_speed[True-False-False-True-False] 68.5580μs 24.0887μs 41.5132 KOps/s 41.5310 KOps/s $\color{#d91a1a}-0.04\%$
test_step_mdp_speed[True-False-False-False-True] 71.6550μs 21.2190μs 47.1277 KOps/s 47.2926 KOps/s $\color{#d91a1a}-0.35\%$
test_step_mdp_speed[True-False-False-False-False] 67.8280μs 13.9908μs 71.4755 KOps/s 71.2059 KOps/s $\color{#35bf28}+0.38\%$
test_step_mdp_speed[False-True-True-True-True] 92.2730μs 34.9308μs 28.6280 KOps/s 28.6873 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[False-True-True-True-False] 72.1850μs 22.0011μs 45.4522 KOps/s 45.3859 KOps/s $\color{#35bf28}+0.15\%$
test_step_mdp_speed[False-True-True-False-True] 74.1390μs 22.1443μs 45.1583 KOps/s 44.8134 KOps/s $\color{#35bf28}+0.77\%$
test_step_mdp_speed[False-True-True-False-False] 51.9070μs 13.4240μs 74.4936 KOps/s 73.6986 KOps/s $\color{#35bf28}+1.08\%$
test_step_mdp_speed[False-True-False-True-True] 76.3340μs 36.6652μs 27.2738 KOps/s 27.3590 KOps/s $\color{#d91a1a}-0.31\%$
test_step_mdp_speed[False-True-False-True-False] 60.9250μs 23.7364μs 42.1294 KOps/s 41.7944 KOps/s $\color{#35bf28}+0.80\%$
test_step_mdp_speed[False-True-False-False-True] 2.5666ms 24.3354μs 41.0925 KOps/s 41.2156 KOps/s $\color{#d91a1a}-0.30\%$
test_step_mdp_speed[False-True-False-False-False] 52.4690μs 15.3119μs 65.3085 KOps/s 65.1329 KOps/s $\color{#35bf28}+0.27\%$
test_step_mdp_speed[False-False-True-True-True] 82.2140μs 38.8315μs 25.7523 KOps/s 26.0107 KOps/s $\color{#d91a1a}-0.99\%$
test_step_mdp_speed[False-False-True-True-False] 73.9880μs 25.5720μs 39.1052 KOps/s 38.6654 KOps/s $\color{#35bf28}+1.14\%$
test_step_mdp_speed[False-False-True-False-True] 59.2410μs 23.6863μs 42.2184 KOps/s 41.6081 KOps/s $\color{#35bf28}+1.47\%$
test_step_mdp_speed[False-False-True-False-False] 76.9940μs 15.3372μs 65.2010 KOps/s 65.1060 KOps/s $\color{#35bf28}+0.15\%$
test_step_mdp_speed[False-False-False-True-True] 84.8700μs 40.0516μs 24.9678 KOps/s 24.9353 KOps/s $\color{#35bf28}+0.13\%$
test_step_mdp_speed[False-False-False-True-False] 69.4500μs 27.0883μs 36.9162 KOps/s 36.5203 KOps/s $\color{#35bf28}+1.08\%$
test_step_mdp_speed[False-False-False-False-True] 81.7540μs 25.3737μs 39.4109 KOps/s 39.0707 KOps/s $\color{#35bf28}+0.87\%$
test_step_mdp_speed[False-False-False-False-False] 57.4080μs 17.1118μs 58.4392 KOps/s 57.3596 KOps/s $\color{#35bf28}+1.88\%$
test_values[generalized_advantage_estimate-True-True] 12.9119ms 10.2774ms 97.3011 Ops/s 87.7587 Ops/s $\textbf{\color{#35bf28}+10.87\%}$
test_values[vec_generalized_advantage_estimate-True-True] 35.7096ms 33.3163ms 30.0153 Ops/s 29.9742 Ops/s $\color{#35bf28}+0.14\%$
test_values[td0_return_estimate-False-False] 0.2383ms 0.1790ms 5.5864 KOps/s 5.6615 KOps/s $\color{#d91a1a}-1.33\%$
test_values[td1_return_estimate-False-False] 26.2181ms 24.9301ms 40.1122 Ops/s 41.1003 Ops/s $\color{#d91a1a}-2.40\%$
test_values[vec_td1_return_estimate-False-False] 35.4861ms 33.4097ms 29.9314 Ops/s 29.8083 Ops/s $\color{#35bf28}+0.41\%$
test_values[td_lambda_return_estimate-True-False] 39.2742ms 35.7471ms 27.9743 Ops/s 29.1180 Ops/s $\color{#d91a1a}-3.93\%$
test_values[vec_td_lambda_return_estimate-True-False] 41.9530ms 33.8914ms 29.5060 Ops/s 29.8334 Ops/s $\color{#d91a1a}-1.10\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 11.4477ms 8.6858ms 115.1303 Ops/s 118.8719 Ops/s $\color{#d91a1a}-3.15\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.1878ms 1.7373ms 575.6057 Ops/s 576.7529 Ops/s $\color{#d91a1a}-0.20\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5685ms 0.3610ms 2.7698 KOps/s 2.8549 KOps/s $\color{#d91a1a}-2.98\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 38.7481ms 36.9136ms 27.0903 Ops/s 27.6992 Ops/s $\color{#d91a1a}-2.20\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.8534ms 3.0514ms 327.7173 Ops/s 325.3416 Ops/s $\color{#35bf28}+0.73\%$
test_dqn_speed[False-None] 6.0226ms 1.4001ms 714.2463 Ops/s 720.9351 Ops/s $\color{#d91a1a}-0.93\%$
test_dqn_speed[False-backward] 1.9595ms 1.9044ms 525.1110 Ops/s 535.4878 Ops/s $\color{#d91a1a}-1.94\%$
test_dqn_speed[True-None] 0.6009ms 0.4922ms 2.0318 KOps/s 2.0385 KOps/s $\color{#d91a1a}-0.32\%$
test_dqn_speed[True-backward] 0.9674ms 0.9124ms 1.0960 KOps/s 1.0934 KOps/s $\color{#35bf28}+0.24\%$
test_dqn_speed[reduce-overhead-None] 0.7397ms 0.5026ms 1.9898 KOps/s 2.0436 KOps/s $\color{#d91a1a}-2.63\%$
test_dqn_speed[reduce-overhead-backward] 0.9891ms 0.9126ms 1.0958 KOps/s 1.0691 KOps/s $\color{#35bf28}+2.50\%$
test_ddpg_speed[False-None] 3.3187ms 2.9155ms 342.9933 Ops/s 346.2080 Ops/s $\color{#d91a1a}-0.93\%$
test_ddpg_speed[False-backward] 4.3485ms 4.0825ms 244.9465 Ops/s 245.3348 Ops/s $\color{#d91a1a}-0.16\%$
test_ddpg_speed[True-None] 1.7572ms 1.0433ms 958.5206 Ops/s 962.3625 Ops/s $\color{#d91a1a}-0.40\%$
test_ddpg_speed[True-backward] 2.1011ms 1.9335ms 517.1855 Ops/s 515.1833 Ops/s $\color{#35bf28}+0.39\%$
test_ddpg_speed[reduce-overhead-None] 1.1786ms 1.0262ms 974.4314 Ops/s 965.0983 Ops/s $\color{#35bf28}+0.97\%$
test_ddpg_speed[reduce-overhead-backward] 2.0480ms 1.9257ms 519.3006 Ops/s 517.4396 Ops/s $\color{#35bf28}+0.36\%$
test_sac_speed[False-None] 9.8706ms 8.0754ms 123.8329 Ops/s 120.9644 Ops/s $\color{#35bf28}+2.37\%$
test_sac_speed[False-backward] 11.0694ms 10.7562ms 92.9693 Ops/s 91.5656 Ops/s $\color{#35bf28}+1.53\%$
test_sac_speed[True-None] 2.2792ms 1.8504ms 540.4263 Ops/s 533.7038 Ops/s $\color{#35bf28}+1.26\%$
test_sac_speed[True-backward] 3.6699ms 3.5520ms 281.5302 Ops/s 278.6269 Ops/s $\color{#35bf28}+1.04\%$
test_sac_speed[reduce-overhead-None] 2.8342ms 1.8578ms 538.2684 Ops/s 539.7612 Ops/s $\color{#d91a1a}-0.28\%$
test_sac_speed[reduce-overhead-backward] 3.7234ms 3.5550ms 281.2927 Ops/s 283.6139 Ops/s $\color{#d91a1a}-0.82\%$
test_redq_speed[False-None] 14.8693ms 13.0922ms 76.3811 Ops/s 76.3298 Ops/s $\color{#35bf28}+0.07\%$
test_redq_speed[False-backward] 24.3774ms 22.5742ms 44.2983 Ops/s 44.5136 Ops/s $\color{#d91a1a}-0.48\%$
test_redq_speed[True-None] 6.1382ms 4.7589ms 210.1317 Ops/s 212.4556 Ops/s $\color{#d91a1a}-1.09\%$
test_redq_speed[True-backward] 13.2253ms 12.3215ms 81.1591 Ops/s 81.6224 Ops/s $\color{#d91a1a}-0.57\%$
test_redq_speed[reduce-overhead-None] 5.8207ms 4.9042ms 203.9074 Ops/s 193.7625 Ops/s $\textbf{\color{#35bf28}+5.24\%}$
test_redq_speed[reduce-overhead-backward] 12.9392ms 12.2895ms 81.3703 Ops/s 81.0381 Ops/s $\color{#35bf28}+0.41\%$
test_redq_deprec_speed[False-None] 14.5976ms 12.8979ms 77.5320 Ops/s 73.3352 Ops/s $\textbf{\color{#35bf28}+5.72\%}$
test_redq_deprec_speed[False-backward] 20.8375ms 18.7135ms 53.4372 Ops/s 52.4628 Ops/s $\color{#35bf28}+1.86\%$
test_redq_deprec_speed[True-None] 4.3327ms 3.6242ms 275.9235 Ops/s 273.9435 Ops/s $\color{#35bf28}+0.72\%$
test_redq_deprec_speed[True-backward] 8.6795ms 8.1054ms 123.3752 Ops/s 118.7533 Ops/s $\color{#35bf28}+3.89\%$
test_redq_deprec_speed[reduce-overhead-None] 4.4611ms 3.6567ms 273.4683 Ops/s 271.9700 Ops/s $\color{#35bf28}+0.55\%$
test_redq_deprec_speed[reduce-overhead-backward] 8.9075ms 8.1979ms 121.9830 Ops/s 122.2332 Ops/s $\color{#d91a1a}-0.20\%$
test_td3_speed[False-None] 8.4970ms 8.0820ms 123.7325 Ops/s 122.0957 Ops/s $\color{#35bf28}+1.34\%$
test_td3_speed[False-backward] 12.9701ms 10.4910ms 95.3197 Ops/s 94.8695 Ops/s $\color{#35bf28}+0.47\%$
test_td3_speed[True-None] 1.8992ms 1.7485ms 571.9039 Ops/s 573.0053 Ops/s $\color{#d91a1a}-0.19\%$
test_td3_speed[True-backward] 3.5611ms 3.3586ms 297.7433 Ops/s 299.0896 Ops/s $\color{#d91a1a}-0.45\%$
test_td3_speed[reduce-overhead-None] 2.1443ms 1.7446ms 573.1896 Ops/s 572.1323 Ops/s $\color{#35bf28}+0.18\%$
test_td3_speed[reduce-overhead-backward] 3.6547ms 3.3643ms 297.2363 Ops/s 300.8807 Ops/s $\color{#d91a1a}-1.21\%$
test_cql_speed[False-None] 40.7359ms 37.3013ms 26.8087 Ops/s 26.4881 Ops/s $\color{#35bf28}+1.21\%$
test_cql_speed[False-backward] 49.1145ms 46.7499ms 21.3904 Ops/s 20.7476 Ops/s $\color{#35bf28}+3.10\%$
test_cql_speed[True-None] 17.1438ms 15.7081ms 63.6615 Ops/s 63.4803 Ops/s $\color{#35bf28}+0.29\%$
test_cql_speed[True-backward] 23.8837ms 22.6538ms 44.1427 Ops/s 44.2065 Ops/s $\color{#d91a1a}-0.14\%$
test_cql_speed[reduce-overhead-None] 16.3468ms 15.7358ms 63.5492 Ops/s 63.3154 Ops/s $\color{#35bf28}+0.37\%$
test_cql_speed[reduce-overhead-backward] 24.0924ms 22.6024ms 44.2430 Ops/s 44.1331 Ops/s $\color{#35bf28}+0.25\%$
test_a2c_speed[False-None] 8.0646ms 7.1768ms 139.3372 Ops/s 136.6104 Ops/s $\color{#35bf28}+2.00\%$
test_a2c_speed[False-backward] 16.2115ms 14.1793ms 70.5254 Ops/s 68.4915 Ops/s $\color{#35bf28}+2.97\%$
test_a2c_speed[True-None] 4.8092ms 4.2129ms 237.3684 Ops/s 236.9947 Ops/s $\color{#35bf28}+0.16\%$
test_a2c_speed[True-backward] 11.2571ms 10.7839ms 92.7310 Ops/s 92.3156 Ops/s $\color{#35bf28}+0.45\%$
test_a2c_speed[reduce-overhead-None] 5.6515ms 4.2430ms 235.6829 Ops/s 233.9251 Ops/s $\color{#35bf28}+0.75\%$
test_a2c_speed[reduce-overhead-backward] 11.7773ms 10.7484ms 93.0375 Ops/s 92.8806 Ops/s $\color{#35bf28}+0.17\%$
test_ppo_speed[False-None] 8.8122ms 7.4427ms 134.3592 Ops/s 130.9638 Ops/s $\color{#35bf28}+2.59\%$
test_ppo_speed[False-backward] 16.0515ms 14.7818ms 67.6509 Ops/s 65.9606 Ops/s $\color{#35bf28}+2.56\%$
test_ppo_speed[True-None] 4.3341ms 3.7085ms 269.6507 Ops/s 265.8500 Ops/s $\color{#35bf28}+1.43\%$
test_ppo_speed[True-backward] 11.1251ms 9.5612ms 104.5891 Ops/s 103.2012 Ops/s $\color{#35bf28}+1.34\%$
test_ppo_speed[reduce-overhead-None] 4.0753ms 3.7068ms 269.7730 Ops/s 266.2221 Ops/s $\color{#35bf28}+1.33\%$
test_ppo_speed[reduce-overhead-backward] 10.6878ms 9.7708ms 102.3461 Ops/s 100.6993 Ops/s $\color{#35bf28}+1.64\%$
test_reinforce_speed[False-None] 9.0122ms 6.5800ms 151.9754 Ops/s 147.7965 Ops/s $\color{#35bf28}+2.83\%$
test_reinforce_speed[False-backward] 10.2876ms 9.8213ms 101.8191 Ops/s 96.2266 Ops/s $\textbf{\color{#35bf28}+5.81\%}$
test_reinforce_speed[True-None] 3.2270ms 2.6540ms 376.7832 Ops/s 369.2083 Ops/s $\color{#35bf28}+2.05\%$
test_reinforce_speed[True-backward] 9.6533ms 9.0148ms 110.9282 Ops/s 115.3579 Ops/s $\color{#d91a1a}-3.84\%$
test_reinforce_speed[reduce-overhead-None] 3.1378ms 2.6494ms 377.4425 Ops/s 362.7910 Ops/s $\color{#35bf28}+4.04\%$
test_reinforce_speed[reduce-overhead-backward] 9.0814ms 8.6698ms 115.3427 Ops/s 110.9605 Ops/s $\color{#35bf28}+3.95\%$
test_iql_speed[False-None] 34.5360ms 32.4962ms 30.7728 Ops/s 29.6088 Ops/s $\color{#35bf28}+3.93\%$
test_iql_speed[False-backward] 46.7018ms 45.6206ms 21.9199 Ops/s 21.7089 Ops/s $\color{#35bf28}+0.97\%$
test_iql_speed[True-None] 12.0035ms 10.9148ms 91.6188 Ops/s 91.1586 Ops/s $\color{#35bf28}+0.50\%$
test_iql_speed[True-backward] 22.5986ms 21.7817ms 45.9101 Ops/s 45.4399 Ops/s $\color{#35bf28}+1.03\%$
test_iql_speed[reduce-overhead-None] 11.7875ms 10.7323ms 93.1770 Ops/s 90.7323 Ops/s $\color{#35bf28}+2.69\%$
test_iql_speed[reduce-overhead-backward] 22.9059ms 21.7095ms 46.0627 Ops/s 45.1354 Ops/s $\color{#35bf28}+2.05\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.3480ms 4.9599ms 201.6177 Ops/s 195.6866 Ops/s $\color{#35bf28}+3.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7904ms 0.5197ms 1.9241 KOps/s 1.8683 KOps/s $\color{#35bf28}+2.99\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8351ms 0.4981ms 2.0077 KOps/s 1.9611 KOps/s $\color{#35bf28}+2.37\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.1182ms 4.7725ms 209.5318 Ops/s 205.7391 Ops/s $\color{#35bf28}+1.84\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.3508s 0.7736ms 1.2927 KOps/s 1.9581 KOps/s $\textbf{\color{#d91a1a}-33.99\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7728ms 0.4847ms 2.0630 KOps/s 2.0098 KOps/s $\color{#35bf28}+2.64\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.5424ms 1.6613ms 601.9539 Ops/s 605.8149 Ops/s $\color{#d91a1a}-0.64\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.3709ms 1.5823ms 631.9723 Ops/s 624.0564 Ops/s $\color{#35bf28}+1.27\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.4426ms 5.0557ms 197.7954 Ops/s 202.2180 Ops/s $\color{#d91a1a}-2.19\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.3656ms 0.6480ms 1.5433 KOps/s 1.5147 KOps/s $\color{#35bf28}+1.89\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.0684ms 0.6316ms 1.5832 KOps/s 1.5540 KOps/s $\color{#35bf28}+1.88\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.5845ms 4.8002ms 208.3232 Ops/s 189.5877 Ops/s $\textbf{\color{#35bf28}+9.88\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.1790ms 0.5260ms 1.9011 KOps/s 1.8394 KOps/s $\color{#35bf28}+3.35\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7638ms 0.4959ms 2.0164 KOps/s 1.9528 KOps/s $\color{#35bf28}+3.26\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.1244ms 4.7404ms 210.9518 Ops/s 206.6502 Ops/s $\color{#35bf28}+2.08\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0692ms 0.5077ms 1.9698 KOps/s 1.9709 KOps/s $\color{#d91a1a}-0.06\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8464ms 0.4860ms 2.0576 KOps/s 2.0298 KOps/s $\color{#35bf28}+1.37\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.3131ms 4.9418ms 202.3556 Ops/s 202.8774 Ops/s $\color{#d91a1a}-0.26\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8648ms 0.6462ms 1.5474 KOps/s 1.5051 KOps/s $\color{#35bf28}+2.82\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 7.8987ms 0.6408ms 1.5606 KOps/s 1.5941 KOps/s $\color{#d91a1a}-2.10\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.4188s 12.6253ms 79.2062 Ops/s 36.3259 Ops/s $\textbf{\color{#35bf28}+118.04\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.4884ms 2.2111ms 452.2573 Ops/s 425.0984 Ops/s $\textbf{\color{#35bf28}+6.39\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.3557ms 1.4177ms 705.3530 Ops/s 702.1887 Ops/s $\color{#35bf28}+0.45\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.7703ms 4.3639ms 229.1521 Ops/s 224.6397 Ops/s $\color{#35bf28}+2.01\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 6.7524ms 2.3000ms 434.7810 Ops/s 425.5568 Ops/s $\color{#35bf28}+2.17\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 5.6477ms 1.3559ms 737.5275 Ops/s 707.5613 Ops/s $\color{#35bf28}+4.24\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.4128s 12.7133ms 78.6579 Ops/s 219.9908 Ops/s $\textbf{\color{#d91a1a}-64.24\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 8.9406ms 2.5310ms 395.0941 Ops/s 400.1146 Ops/s $\color{#d91a1a}-1.25\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 4.6959ms 1.4772ms 676.9744 Ops/s 657.9164 Ops/s $\color{#35bf28}+2.90\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.3878ms 13.0764ms 76.4734 Ops/s 72.3825 Ops/s $\textbf{\color{#35bf28}+5.65\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 16.2742ms 14.9665ms 66.8161 Ops/s 67.7160 Ops/s $\color{#d91a1a}-1.33\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 23.5288ms 21.8594ms 45.7469 Ops/s 44.5370 Ops/s $\color{#35bf28}+2.72\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 16.8144ms 15.1890ms 65.8369 Ops/s 66.4732 Ops/s $\color{#d91a1a}-0.96\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 22.1108ms 21.7137ms 46.0540 Ops/s 45.5761 Ops/s $\color{#35bf28}+1.05\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 17.7449ms 16.4397ms 60.8282 Ops/s 61.1576 Ops/s $\color{#d91a1a}-0.54\%$

Copy link

github-actions bot commented Dec 19, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.7204s 0.7153s 1.3980 Ops/s 1.3650 Ops/s $\color{#35bf28}+2.42\%$
test_transformed 0.9680s 0.9656s 1.0356 Ops/s 1.0346 Ops/s $\color{#35bf28}+0.10\%$
test_serial 2.2256s 2.1464s 0.4659 Ops/s 0.4721 Ops/s $\color{#d91a1a}-1.32\%$
test_parallel 1.9008s 1.8380s 0.5441 Ops/s 0.5404 Ops/s $\color{#35bf28}+0.69\%$
test_step_mdp_speed[True-True-True-True-True] 0.1359ms 39.5500μs 25.2844 KOps/s 24.6614 KOps/s $\color{#35bf28}+2.53\%$
test_step_mdp_speed[True-True-True-True-False] 46.5410μs 23.5303μs 42.4984 KOps/s 42.2716 KOps/s $\color{#35bf28}+0.54\%$
test_step_mdp_speed[True-True-True-False-True] 49.9310μs 22.5004μs 44.4436 KOps/s 43.9896 KOps/s $\color{#35bf28}+1.03\%$
test_step_mdp_speed[True-True-True-False-False] 38.8110μs 13.1470μs 76.0630 KOps/s 76.7314 KOps/s $\color{#d91a1a}-0.87\%$
test_step_mdp_speed[True-True-False-True-True] 71.2810μs 43.0591μs 23.2239 KOps/s 23.5547 KOps/s $\color{#d91a1a}-1.40\%$
test_step_mdp_speed[True-True-False-True-False] 53.3010μs 26.0566μs 38.3780 KOps/s 38.8345 KOps/s $\color{#d91a1a}-1.18\%$
test_step_mdp_speed[True-True-False-False-True] 51.7600μs 24.8731μs 40.2041 KOps/s 40.4904 KOps/s $\color{#d91a1a}-0.71\%$
test_step_mdp_speed[True-True-False-False-False] 38.6400μs 15.5091μs 64.4782 KOps/s 64.0313 KOps/s $\color{#35bf28}+0.70\%$
test_step_mdp_speed[True-False-True-True-True] 74.5310μs 46.1078μs 21.6883 KOps/s 22.3109 KOps/s $\color{#d91a1a}-2.79\%$
test_step_mdp_speed[True-False-True-True-False] 56.2800μs 28.3144μs 35.3177 KOps/s 35.4162 KOps/s $\color{#d91a1a}-0.28\%$
test_step_mdp_speed[True-False-True-False-True] 55.4010μs 25.2311μs 39.6337 KOps/s 40.3694 KOps/s $\color{#d91a1a}-1.82\%$
test_step_mdp_speed[True-False-True-False-False] 38.5610μs 15.6245μs 64.0022 KOps/s 64.3195 KOps/s $\color{#d91a1a}-0.49\%$
test_step_mdp_speed[True-False-False-True-True] 81.6020μs 47.4340μs 21.0819 KOps/s 21.1026 KOps/s $\color{#d91a1a}-0.10\%$
test_step_mdp_speed[True-False-False-True-False] 56.7610μs 30.9183μs 32.3433 KOps/s 32.7220 KOps/s $\color{#d91a1a}-1.16\%$
test_step_mdp_speed[True-False-False-False-True] 54.2400μs 26.7745μs 37.3489 KOps/s 36.6599 KOps/s $\color{#35bf28}+1.88\%$
test_step_mdp_speed[True-False-False-False-False] 51.0610μs 17.7266μs 56.4125 KOps/s 55.8751 KOps/s $\color{#35bf28}+0.96\%$
test_step_mdp_speed[False-True-True-True-True] 75.8610μs 44.5620μs 22.4407 KOps/s 22.1533 KOps/s $\color{#35bf28}+1.30\%$
test_step_mdp_speed[False-True-True-True-False] 54.3010μs 28.3028μs 35.3322 KOps/s 35.0172 KOps/s $\color{#35bf28}+0.90\%$
test_step_mdp_speed[False-True-True-False-True] 63.7810μs 28.6374μs 34.9194 KOps/s 35.7780 KOps/s $\color{#d91a1a}-2.40\%$
test_step_mdp_speed[False-True-True-False-False] 39.0610μs 16.9935μs 58.8460 KOps/s 57.8430 KOps/s $\color{#35bf28}+1.73\%$
test_step_mdp_speed[False-True-False-True-True] 74.3210μs 47.2635μs 21.1580 KOps/s 21.0577 KOps/s $\color{#35bf28}+0.48\%$
test_step_mdp_speed[False-True-False-True-False] 55.4210μs 30.7212μs 32.5509 KOps/s 33.0310 KOps/s $\color{#d91a1a}-1.45\%$
test_step_mdp_speed[False-True-False-False-True] 3.2956ms 31.4119μs 31.8351 KOps/s 31.7152 KOps/s $\color{#35bf28}+0.38\%$
test_step_mdp_speed[False-True-False-False-False] 52.3310μs 19.7001μs 50.7613 KOps/s 51.0227 KOps/s $\color{#d91a1a}-0.51\%$
test_step_mdp_speed[False-False-True-True-True] 79.1310μs 50.0773μs 19.9691 KOps/s 20.2050 KOps/s $\color{#d91a1a}-1.17\%$
test_step_mdp_speed[False-False-True-True-False] 0.1039ms 32.9069μs 30.3888 KOps/s 30.1512 KOps/s $\color{#35bf28}+0.79\%$
test_step_mdp_speed[False-False-True-False-True] 61.1210μs 30.4589μs 32.8311 KOps/s 33.0265 KOps/s $\color{#d91a1a}-0.59\%$
test_step_mdp_speed[False-False-True-False-False] 48.0610μs 19.5371μs 51.1846 KOps/s 51.6710 KOps/s $\color{#d91a1a}-0.94\%$
test_step_mdp_speed[False-False-False-True-True] 81.6010μs 52.5774μs 19.0196 KOps/s 19.4794 KOps/s $\color{#d91a1a}-2.36\%$
test_step_mdp_speed[False-False-False-True-False] 56.8110μs 35.5294μs 28.1457 KOps/s 28.0819 KOps/s $\color{#35bf28}+0.23\%$
test_step_mdp_speed[False-False-False-False-True] 99.4720μs 32.1558μs 31.0986 KOps/s 30.3965 KOps/s $\color{#35bf28}+2.31\%$
test_step_mdp_speed[False-False-False-False-False] 52.2410μs 21.7736μs 45.9271 KOps/s 46.3628 KOps/s $\color{#d91a1a}-0.94\%$
test_values[generalized_advantage_estimate-True-True] 25.6265ms 25.1566ms 39.7510 Ops/s 40.1940 Ops/s $\color{#d91a1a}-1.10\%$
test_values[vec_generalized_advantage_estimate-True-True] 95.6561ms 2.8163ms 355.0768 Ops/s 326.8331 Ops/s $\textbf{\color{#35bf28}+8.64\%}$
test_values[td0_return_estimate-False-False] 0.1027ms 79.5654μs 12.5683 KOps/s 12.6503 KOps/s $\color{#d91a1a}-0.65\%$
test_values[td1_return_estimate-False-False] 56.3263ms 55.9095ms 17.8861 Ops/s 18.0735 Ops/s $\color{#d91a1a}-1.04\%$
test_values[vec_td1_return_estimate-False-False] 1.2414ms 1.0844ms 922.1839 Ops/s 916.0955 Ops/s $\color{#35bf28}+0.66\%$
test_values[td_lambda_return_estimate-True-False] 89.1666ms 88.5946ms 11.2874 Ops/s 11.4022 Ops/s $\color{#d91a1a}-1.01\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2551ms 1.0796ms 926.2286 Ops/s 920.1526 Ops/s $\color{#35bf28}+0.66\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 25.0926ms 24.8522ms 40.2379 Ops/s 40.6156 Ops/s $\color{#d91a1a}-0.93\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0383ms 0.7587ms 1.3180 KOps/s 1.3187 KOps/s $\color{#d91a1a}-0.05\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7725ms 0.6766ms 1.4781 KOps/s 1.4865 KOps/s $\color{#d91a1a}-0.57\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5205ms 1.4823ms 674.6329 Ops/s 674.1034 Ops/s $\color{#35bf28}+0.08\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7329ms 0.6900ms 1.4492 KOps/s 1.4534 KOps/s $\color{#d91a1a}-0.29\%$
test_dqn_speed[False-None] 7.0220ms 1.5418ms 648.5844 Ops/s 659.9869 Ops/s $\color{#d91a1a}-1.73\%$
test_dqn_speed[False-backward] 2.2065ms 2.1471ms 465.7379 Ops/s 472.0659 Ops/s $\color{#d91a1a}-1.34\%$
test_dqn_speed[True-None] 0.6947ms 0.5583ms 1.7910 KOps/s 1.7582 KOps/s $\color{#35bf28}+1.87\%$
test_dqn_speed[True-backward] 1.1972ms 1.1191ms 893.5925 Ops/s 801.0365 Ops/s $\textbf{\color{#35bf28}+11.55\%}$
test_dqn_speed[reduce-overhead-None] 0.7291ms 0.5908ms 1.6927 KOps/s 1.7314 KOps/s $\color{#d91a1a}-2.23\%$
test_dqn_speed[reduce-overhead-backward] 1.0920ms 0.9951ms 1.0049 KOps/s 902.3235 Ops/s $\textbf{\color{#35bf28}+11.37\%}$
test_ddpg_speed[False-None] 3.1974ms 2.8651ms 349.0305 Ops/s 348.1060 Ops/s $\color{#35bf28}+0.27\%$
test_ddpg_speed[False-backward] 4.4193ms 4.1452ms 241.2434 Ops/s 234.3938 Ops/s $\color{#35bf28}+2.92\%$
test_ddpg_speed[True-None] 1.3396ms 1.1634ms 859.5639 Ops/s 895.0286 Ops/s $\color{#d91a1a}-3.96\%$
test_ddpg_speed[True-backward] 2.2804ms 2.1961ms 455.3619 Ops/s 423.7484 Ops/s $\textbf{\color{#35bf28}+7.46\%}$
test_ddpg_speed[reduce-overhead-None] 1.2131ms 1.1199ms 892.9064 Ops/s 884.7614 Ops/s $\color{#35bf28}+0.92\%$
test_ddpg_speed[reduce-overhead-backward] 1.7638ms 1.6673ms 599.7713 Ops/s 546.3993 Ops/s $\textbf{\color{#35bf28}+9.77\%}$
test_sac_speed[False-None] 8.5032ms 8.1039ms 123.3978 Ops/s 123.4993 Ops/s $\color{#d91a1a}-0.08\%$
test_sac_speed[False-backward] 11.5522ms 11.0572ms 90.4391 Ops/s 88.9497 Ops/s $\color{#35bf28}+1.67\%$
test_sac_speed[True-None] 1.7700ms 1.6077ms 622.0027 Ops/s 635.8281 Ops/s $\color{#d91a1a}-2.17\%$
test_sac_speed[True-backward] 3.4404ms 3.3154ms 301.6220 Ops/s 289.3620 Ops/s $\color{#35bf28}+4.24\%$
test_sac_speed[reduce-overhead-None] 24.4070ms 13.0948ms 76.3660 Ops/s 78.7802 Ops/s $\color{#d91a1a}-3.06\%$
test_sac_speed[reduce-overhead-backward] 1.4371ms 1.3601ms 735.2599 Ops/s 677.8571 Ops/s $\textbf{\color{#35bf28}+8.47\%}$
test_redq_speed[False-None] 8.2920ms 7.5853ms 131.8338 Ops/s 131.5386 Ops/s $\color{#35bf28}+0.22\%$
test_redq_speed[False-backward] 12.2122ms 11.4008ms 87.7129 Ops/s 85.6055 Ops/s $\color{#35bf28}+2.46\%$
test_redq_speed[True-None] 2.1756ms 2.0859ms 479.4020 Ops/s 492.8820 Ops/s $\color{#d91a1a}-2.73\%$
test_redq_speed[True-backward] 4.2628ms 3.7617ms 265.8359 Ops/s 253.9039 Ops/s $\color{#35bf28}+4.70\%$
test_redq_speed[reduce-overhead-None] 2.6295ms 2.0723ms 482.5567 Ops/s 482.7941 Ops/s $\color{#d91a1a}-0.05\%$
test_redq_speed[reduce-overhead-backward] 3.7998ms 3.6911ms 270.9215 Ops/s 257.4294 Ops/s $\textbf{\color{#35bf28}+5.24\%}$
test_redq_deprec_speed[False-None] 9.7373ms 9.1176ms 109.6778 Ops/s 108.2970 Ops/s $\color{#35bf28}+1.28\%$
test_redq_deprec_speed[False-backward] 12.5867ms 12.0914ms 82.7031 Ops/s 79.7516 Ops/s $\color{#35bf28}+3.70\%$
test_redq_deprec_speed[True-None] 2.5861ms 2.3794ms 420.2655 Ops/s 416.6426 Ops/s $\color{#35bf28}+0.87\%$
test_redq_deprec_speed[True-backward] 4.1946ms 4.0815ms 245.0089 Ops/s 245.3489 Ops/s $\color{#d91a1a}-0.14\%$
test_redq_deprec_speed[reduce-overhead-None] 2.7505ms 2.3737ms 421.2894 Ops/s 409.0274 Ops/s $\color{#35bf28}+3.00\%$
test_redq_deprec_speed[reduce-overhead-backward] 4.2492ms 4.0573ms 246.4716 Ops/s 235.4414 Ops/s $\color{#35bf28}+4.68\%$
test_td3_speed[False-None] 8.1121ms 8.0150ms 124.7665 Ops/s 124.9125 Ops/s $\color{#d91a1a}-0.12\%$
test_td3_speed[False-backward] 10.9069ms 10.3901ms 96.2457 Ops/s 94.6500 Ops/s $\color{#35bf28}+1.69\%$
test_td3_speed[True-None] 1.7125ms 1.6517ms 605.4367 Ops/s 621.5897 Ops/s $\color{#d91a1a}-2.60\%$
test_td3_speed[True-backward] 3.6255ms 3.2073ms 311.7894 Ops/s 314.0921 Ops/s $\color{#d91a1a}-0.73\%$
test_td3_speed[reduce-overhead-None] 84.0304ms 27.1563ms 36.8239 Ops/s 35.6666 Ops/s $\color{#35bf28}+3.24\%$
test_td3_speed[reduce-overhead-backward] 1.3871ms 1.3320ms 750.7608 Ops/s 664.9206 Ops/s $\textbf{\color{#35bf28}+12.91\%}$
test_cql_speed[False-None] 17.4758ms 16.9616ms 58.9567 Ops/s 58.6299 Ops/s $\color{#35bf28}+0.56\%$
test_cql_speed[False-backward] 22.6122ms 22.1031ms 45.2426 Ops/s 44.3475 Ops/s $\color{#35bf28}+2.02\%$
test_cql_speed[True-None] 3.1066ms 3.0060ms 332.6665 Ops/s 334.8641 Ops/s $\color{#d91a1a}-0.66\%$
test_cql_speed[True-backward] 5.4159ms 5.1631ms 193.6834 Ops/s 186.5551 Ops/s $\color{#35bf28}+3.82\%$
test_cql_speed[reduce-overhead-None] 22.2032ms 13.5939ms 73.5624 Ops/s 75.0635 Ops/s $\color{#d91a1a}-2.00\%$
test_cql_speed[reduce-overhead-backward] 1.7604ms 1.7165ms 582.5774 Ops/s 578.3405 Ops/s $\color{#35bf28}+0.73\%$
test_a2c_speed[False-None] 3.4142ms 3.2444ms 308.2197 Ops/s 308.2063 Ops/s $+0.00\%$
test_a2c_speed[False-backward] 6.8101ms 6.3660ms 157.0846 Ops/s 155.8330 Ops/s $\color{#35bf28}+0.80\%$
test_a2c_speed[True-None] 1.1627ms 1.0385ms 962.9228 Ops/s 962.9247 Ops/s $-0.00\%$
test_a2c_speed[True-backward] 2.7680ms 2.6659ms 375.1133 Ops/s 354.8684 Ops/s $\textbf{\color{#35bf28}+5.70\%}$
test_a2c_speed[reduce-overhead-None] 22.0878ms 11.9273ms 83.8412 Ops/s 85.8618 Ops/s $\color{#d91a1a}-2.35\%$
test_a2c_speed[reduce-overhead-backward] 1.0278ms 0.9940ms 1.0061 KOps/s 900.8978 Ops/s $\textbf{\color{#35bf28}+11.68\%}$
test_ppo_speed[False-None] 3.9121ms 3.7117ms 269.4177 Ops/s 273.0996 Ops/s $\color{#d91a1a}-1.35\%$
test_ppo_speed[False-backward] 7.2054ms 6.8156ms 146.7217 Ops/s 142.2588 Ops/s $\color{#35bf28}+3.14\%$
test_ppo_speed[True-None] 1.1103ms 0.9812ms 1.0191 KOps/s 1.0273 KOps/s $\color{#d91a1a}-0.80\%$
test_ppo_speed[True-backward] 2.6463ms 2.5945ms 385.4366 Ops/s 384.5387 Ops/s $\color{#35bf28}+0.23\%$
test_ppo_speed[reduce-overhead-None] 0.7258ms 0.5351ms 1.8688 KOps/s 1.8219 KOps/s $\color{#35bf28}+2.57\%$
test_ppo_speed[reduce-overhead-backward] 1.0291ms 0.9880ms 1.0122 KOps/s 979.8980 Ops/s $\color{#35bf28}+3.29\%$
test_reinforce_speed[False-None] 2.3710ms 2.2816ms 438.2973 Ops/s 437.9882 Ops/s $\color{#35bf28}+0.07\%$
test_reinforce_speed[False-backward] 3.3517ms 3.2751ms 305.3326 Ops/s 303.5277 Ops/s $\color{#35bf28}+0.59\%$
test_reinforce_speed[True-None] 1.1616ms 0.8537ms 1.1714 KOps/s 1.1552 KOps/s $\color{#35bf28}+1.40\%$
test_reinforce_speed[True-backward] 2.5471ms 2.4458ms 408.8633 Ops/s 404.4490 Ops/s $\color{#35bf28}+1.09\%$
test_reinforce_speed[reduce-overhead-None] 22.4762ms 11.9606ms 83.6080 Ops/s 85.2572 Ops/s $\color{#d91a1a}-1.93\%$
test_reinforce_speed[reduce-overhead-backward] 1.2552ms 1.2023ms 831.7716 Ops/s 918.0502 Ops/s $\textbf{\color{#d91a1a}-9.40\%}$
test_iql_speed[False-None] 10.1210ms 9.4302ms 106.0422 Ops/s 107.6794 Ops/s $\color{#d91a1a}-1.52\%$
test_iql_speed[False-backward] 14.2190ms 13.4247ms 74.4893 Ops/s 77.0696 Ops/s $\color{#d91a1a}-3.35\%$
test_iql_speed[True-None] 1.8898ms 1.7988ms 555.9388 Ops/s 563.0198 Ops/s $\color{#d91a1a}-1.26\%$
test_iql_speed[True-backward] 4.6003ms 4.4932ms 222.5575 Ops/s 223.6344 Ops/s $\color{#d91a1a}-0.48\%$
test_iql_speed[reduce-overhead-None] 21.4952ms 11.8912ms 84.0955 Ops/s 86.5708 Ops/s $\color{#d91a1a}-2.86\%$
test_iql_speed[reduce-overhead-backward] 1.7025ms 1.6353ms 611.5057 Ops/s 621.7511 Ops/s $\color{#d91a1a}-1.65\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 8.0646ms 6.4684ms 154.5971 Ops/s 152.4504 Ops/s $\color{#35bf28}+1.41\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.4926ms 0.2786ms 3.5899 KOps/s 3.5789 KOps/s $\color{#35bf28}+0.31\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4485ms 0.2576ms 3.8827 KOps/s 3.8618 KOps/s $\color{#35bf28}+0.54\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.4546ms 6.2219ms 160.7216 Ops/s 159.7403 Ops/s $\color{#35bf28}+0.61\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9371ms 0.3704ms 2.6999 KOps/s 3.4461 KOps/s $\textbf{\color{#d91a1a}-21.65\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5124ms 0.3102ms 3.2236 KOps/s 3.3768 KOps/s $\color{#d91a1a}-4.54\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6025ms 1.4110ms 708.7341 Ops/s 769.2062 Ops/s $\textbf{\color{#d91a1a}-7.86\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.3657ms 1.1811ms 846.6531 Ops/s 845.3437 Ops/s $\color{#35bf28}+0.15\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.5884ms 6.4247ms 155.6487 Ops/s 155.9014 Ops/s $\color{#d91a1a}-0.16\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.1847ms 0.4378ms 2.2843 KOps/s 2.1028 KOps/s $\textbf{\color{#35bf28}+8.63\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8751ms 0.4466ms 2.2394 KOps/s 2.3129 KOps/s $\color{#d91a1a}-3.18\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 13.1050ms 6.5412ms 152.8761 Ops/s 159.3635 Ops/s $\color{#d91a1a}-4.07\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0891ms 0.2924ms 3.4199 KOps/s 3.0434 KOps/s $\textbf{\color{#35bf28}+12.37\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4486ms 0.2556ms 3.9125 KOps/s 3.5508 KOps/s $\textbf{\color{#35bf28}+10.19\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.4775ms 6.2471ms 160.0739 Ops/s 159.9576 Ops/s $\color{#35bf28}+0.07\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6675ms 0.2929ms 3.4141 KOps/s 2.8303 KOps/s $\textbf{\color{#35bf28}+20.63\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4679ms 0.2789ms 3.5858 KOps/s 2.9384 KOps/s $\textbf{\color{#35bf28}+22.03\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.5306ms 6.4206ms 155.7482 Ops/s 156.2878 Ops/s $\color{#d91a1a}-0.35\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.9983ms 0.4144ms 2.4130 KOps/s 2.1267 KOps/s $\textbf{\color{#35bf28}+13.46\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6570ms 0.3914ms 2.5549 KOps/s 2.3158 KOps/s $\textbf{\color{#35bf28}+10.33\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 7.1883ms 5.5088ms 181.5270 Ops/s 180.7788 Ops/s $\color{#35bf28}+0.41\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 10.2904ms 2.0775ms 481.3587 Ops/s 502.0075 Ops/s $\color{#d91a1a}-4.11\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.3843ms 1.1531ms 867.2419 Ops/s 813.8866 Ops/s $\textbf{\color{#35bf28}+6.56\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 9.3877ms 5.5930ms 178.7951 Ops/s 182.9731 Ops/s $\color{#d91a1a}-2.28\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 7.7076ms 2.0303ms 492.5298 Ops/s 434.9196 Ops/s $\textbf{\color{#35bf28}+13.25\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 8.8011ms 1.2365ms 808.7461 Ops/s 845.8689 Ops/s $\color{#d91a1a}-4.39\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5278s 16.1781ms 61.8121 Ops/s 32.3305 Ops/s $\textbf{\color{#35bf28}+91.19\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 9.7897ms 2.1727ms 460.2520 Ops/s 430.9643 Ops/s $\textbf{\color{#35bf28}+6.80\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 8.0066ms 1.3782ms 725.6041 Ops/s 786.0691 Ops/s $\textbf{\color{#d91a1a}-7.69\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 15.6889ms 15.5245ms 64.4142 Ops/s 63.2500 Ops/s $\color{#35bf28}+1.84\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 22.6845ms 18.0765ms 55.3204 Ops/s 56.6949 Ops/s $\color{#d91a1a}-2.42\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 20.1652ms 19.9504ms 50.1243 Ops/s 48.2210 Ops/s $\color{#35bf28}+3.95\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.2509ms 18.0792ms 55.3122 Ops/s 55.4247 Ops/s $\color{#d91a1a}-0.20\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 20.6629ms 19.8896ms 50.2776 Ops/s 48.5685 Ops/s $\color{#35bf28}+3.52\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.8893ms 19.2243ms 52.0176 Ops/s 51.8597 Ops/s $\color{#35bf28}+0.30\%$

@vmoens vmoens added the performance Performance issue or suggestion for improvement label Dec 20, 2024
[ghstack-poisoned]
@vmoens vmoens merged commit 5aacaea into gh/vmoens/61/base Dec 20, 2024
64 of 79 checks passed
vmoens added a commit that referenced this pull request Dec 20, 2024
ghstack-source-id: 2e133fcea716b202694cfa84df3f6e4ba3507bbc
Pull Request resolved: #2671
@vmoens vmoens deleted the gh/vmoens/61/head branch December 20, 2024 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. performance Performance issue or suggestion for improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants