-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Benchmark] Add benchmark for compiled ReplayBuffer.extend/sample
#2514
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2514
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 5 Unrelated FailuresAs of commit 163b4a9 with merge base 0f29c7e (): NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: d4562697e2c1a8392cf5bdcadb50f8b7b6939e41 Pull Request resolved: #2514
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_simple | 0.7247s | 0.7240s | 1.3813 Ops/s | 1.3727 Ops/s | |
test_transformed | 1.0575s | 0.9806s | 1.0198 Ops/s | 1.0359 Ops/s | |
test_serial | 2.1948s | 2.1173s | 0.4723 Ops/s | 0.4750 Ops/s | |
test_parallel | 2.0456s | 1.9947s | 0.5013 Ops/s | 0.4953 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 0.1402ms | 37.3060μs | 26.8054 KOps/s | 27.4671 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 48.0110μs | 22.3893μs | 44.6641 KOps/s | 46.4162 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 55.7010μs | 20.4659μs | 48.8616 KOps/s | 51.3450 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 40.1700μs | 12.0857μs | 82.7421 KOps/s | 85.3896 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 79.1020μs | 39.5953μs | 25.2555 KOps/s | 25.2010 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 51.8710μs | 24.4135μs | 40.9609 KOps/s | 42.6227 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 49.6510μs | 22.4099μs | 44.6231 KOps/s | 45.5995 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 46.1410μs | 14.4897μs | 69.0145 KOps/s | 71.9102 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 75.8410μs | 42.4008μs | 23.5845 KOps/s | 24.1734 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 56.6410μs | 27.1576μs | 36.8221 KOps/s | 38.3525 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 41.7200μs | 22.7198μs | 44.0144 KOps/s | 45.6571 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 63.8520μs | 14.3096μs | 69.8831 KOps/s | 70.7012 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 70.1320μs | 44.6390μs | 22.4019 KOps/s | 22.9234 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 59.3610μs | 29.6376μs | 33.7409 KOps/s | 34.8623 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 51.5710μs | 25.5423μs | 39.1507 KOps/s | 40.9970 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 43.5210μs | 17.0035μs | 58.8114 KOps/s | 60.1328 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 70.3910μs | 42.3111μs | 23.6344 KOps/s | 24.0845 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 0.1124ms | 27.2736μs | 36.6655 KOps/s | 38.2646 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 59.3910μs | 26.6351μs | 37.5445 KOps/s | 36.3315 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 43.9210μs | 16.8166μs | 59.4651 KOps/s | 59.5847 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 79.9520μs | 44.5922μs | 22.4254 KOps/s | 22.6234 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 60.3610μs | 29.0832μs | 34.3841 KOps/s | 34.7079 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 3.3194ms | 29.2343μs | 34.2063 KOps/s | 33.9409 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 81.9610μs | 18.9552μs | 52.7560 KOps/s | 52.6142 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 89.7820μs | 46.8545μs | 21.3427 KOps/s | 21.2063 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 60.4620μs | 31.8429μs | 31.4042 KOps/s | 31.3104 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 61.2610μs | 29.6077μs | 33.7750 KOps/s | 33.3798 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 54.4910μs | 19.4096μs | 51.5209 KOps/s | 52.5213 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 90.4420μs | 49.4698μs | 20.2143 KOps/s | 20.6294 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 61.6010μs | 34.0769μs | 29.3454 KOps/s | 29.5927 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 70.7810μs | 31.2198μs | 32.0310 KOps/s | 32.4987 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 54.7010μs | 21.7524μs | 45.9719 KOps/s | 47.0103 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 25.0647ms | 24.6525ms | 40.5639 Ops/s | 40.4933 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 0.1005s | 2.8998ms | 344.8467 Ops/s | 327.2836 Ops/s | |
test_values[td0_return_estimate-False-False] | 89.4820μs | 66.7092μs | 14.9904 KOps/s | 15.0094 KOps/s | |
test_values[td1_return_estimate-False-False] | 55.4044ms | 55.1335ms | 18.1378 Ops/s | 18.3790 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 1.2786ms | 1.0738ms | 931.2530 Ops/s | 930.1485 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 87.5751ms | 87.0271ms | 11.4907 Ops/s | 11.5489 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 1.3109ms | 1.0708ms | 933.8865 Ops/s | 934.4878 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 24.5043ms | 24.2367ms | 41.2598 Ops/s | 41.1772 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.0262ms | 0.7420ms | 1.3478 KOps/s | 1.3162 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7505ms | 0.6593ms | 1.5168 KOps/s | 1.5100 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.5095ms | 1.4662ms | 682.0284 Ops/s | 681.1020 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.7999ms | 0.6751ms | 1.4814 KOps/s | 1.4776 KOps/s | |
test_dqn_speed[False-None] | 1.4279ms | 1.2902ms | 775.0899 Ops/s | 664.9738 Ops/s | |
test_dqn_speed[False-backward] | 1.9084ms | 1.8272ms | 547.2888 Ops/s | 534.7718 Ops/s | |
test_dqn_speed[True-None] | 1.1353ms | 0.5652ms | 1.7693 KOps/s | 1.7575 KOps/s | |
test_dqn_speed[True-backward] | 1.0520ms | 1.0046ms | 995.4476 Ops/s | 969.1008 Ops/s | |
test_dqn_speed[reduce-overhead-None] | 0.8686ms | 0.5627ms | 1.7772 KOps/s | 1.8066 KOps/s | |
test_dqn_speed[reduce-overhead-backward] | 1.0569ms | 1.0037ms | 996.3025 Ops/s | 966.9975 Ops/s | |
test_ddpg_speed[False-None] | 2.9753ms | 2.6614ms | 375.7418 Ops/s | 372.4022 Ops/s | |
test_ddpg_speed[False-backward] | 4.0437ms | 3.9160ms | 255.3656 Ops/s | 252.6653 Ops/s | |
test_ddpg_speed[True-None] | 1.5983ms | 1.2376ms | 807.9930 Ops/s | 809.8698 Ops/s | |
test_ddpg_speed[True-backward] | 2.2646ms | 2.2095ms | 452.5898 Ops/s | 443.4394 Ops/s | |
test_ddpg_speed[reduce-overhead-None] | 1.4864ms | 1.2507ms | 799.5811 Ops/s | 792.4085 Ops/s | |
test_ddpg_speed[reduce-overhead-backward] | 2.2829ms | 2.2201ms | 450.4347 Ops/s | 449.9720 Ops/s | |
test_sac_speed[False-None] | 8.4696ms | 7.5000ms | 133.3335 Ops/s | 130.8433 Ops/s | |
test_sac_speed[False-backward] | 11.2840ms | 10.8089ms | 92.5167 Ops/s | 92.1935 Ops/s | |
test_sac_speed[True-None] | 2.4229ms | 2.0273ms | 493.2718 Ops/s | 489.7019 Ops/s | |
test_sac_speed[True-backward] | 4.1301ms | 4.0016ms | 249.9003 Ops/s | 237.3357 Ops/s | |
test_sac_speed[reduce-overhead-None] | 2.2656ms | 2.0467ms | 488.5914 Ops/s | 491.5316 Ops/s | |
test_sac_speed[reduce-overhead-backward] | 4.1939ms | 3.9980ms | 250.1250 Ops/s | 252.0157 Ops/s | |
test_redq_speed[False-None] | 14.6527ms | 10.1475ms | 98.5465 Ops/s | 102.2321 Ops/s | |
test_redq_speed[False-backward] | 17.8514ms | 17.0725ms | 58.5738 Ops/s | 40.5369 Ops/s | |
test_redq_speed[True-None] | 4.0246ms | 3.6573ms | 273.4292 Ops/s | 277.3685 Ops/s | |
test_redq_speed[True-backward] | 8.8831ms | 8.6645ms | 115.4136 Ops/s | 117.2609 Ops/s | |
test_redq_speed[reduce-overhead-None] | 3.8370ms | 3.5500ms | 281.6889 Ops/s | 275.6458 Ops/s | |
test_redq_speed[reduce-overhead-backward] | 8.9146ms | 8.5917ms | 116.3918 Ops/s | 116.6084 Ops/s | |
test_redq_deprec_speed[False-None] | 12.1523ms | 10.4558ms | 95.6406 Ops/s | 95.3588 Ops/s | |
test_redq_deprec_speed[False-backward] | 16.0275ms | 15.4281ms | 64.8168 Ops/s | 65.6461 Ops/s | |
test_redq_deprec_speed[True-None] | 3.5739ms | 3.2286ms | 309.7335 Ops/s | 307.5825 Ops/s | |
test_redq_deprec_speed[True-backward] | 7.3944ms | 7.1557ms | 139.7480 Ops/s | 138.2697 Ops/s | |
test_redq_deprec_speed[reduce-overhead-None] | 3.5831ms | 3.2161ms | 310.9342 Ops/s | 312.1585 Ops/s | |
test_redq_deprec_speed[reduce-overhead-backward] | 7.3844ms | 7.1707ms | 139.4559 Ops/s | 135.5173 Ops/s | |
test_td3_speed[False-None] | 7.4924ms | 7.4419ms | 134.3735 Ops/s | 131.7094 Ops/s | |
test_td3_speed[False-backward] | 10.4929ms | 10.3039ms | 97.0510 Ops/s | 94.6241 Ops/s | |
test_td3_speed[True-None] | 1.9429ms | 1.9085ms | 523.9648 Ops/s | 511.7892 Ops/s | |
test_td3_speed[True-backward] | 3.8283ms | 3.7245ms | 268.4920 Ops/s | 250.4049 Ops/s | |
test_td3_speed[reduce-overhead-None] | 1.9321ms | 1.9049ms | 524.9566 Ops/s | 514.6173 Ops/s | |
test_td3_speed[reduce-overhead-backward] | 3.8187ms | 3.7160ms | 269.1040 Ops/s | 271.7981 Ops/s | |
test_cql_speed[False-None] | 28.3456ms | 24.9959ms | 40.0066 Ops/s | 40.3531 Ops/s | |
test_cql_speed[False-backward] | 39.3962ms | 35.0683ms | 28.5158 Ops/s | 29.6124 Ops/s | |
test_cql_speed[True-None] | 11.2509ms | 10.9828ms | 91.0514 Ops/s | 91.3287 Ops/s | |
test_cql_speed[True-backward] | 17.2658ms | 16.7836ms | 59.5820 Ops/s | 58.6804 Ops/s | |
test_cql_speed[reduce-overhead-None] | 11.3718ms | 10.9529ms | 91.2998 Ops/s | 90.9005 Ops/s | |
test_cql_speed[reduce-overhead-backward] | 20.1524ms | 17.4028ms | 57.4621 Ops/s | 58.8264 Ops/s | |
test_a2c_speed[False-None] | 5.5766ms | 5.3118ms | 188.2591 Ops/s | 184.9553 Ops/s | |
test_a2c_speed[False-backward] | 12.0810ms | 11.7288ms | 85.2602 Ops/s | 84.5344 Ops/s | |
test_a2c_speed[True-None] | 3.4025ms | 3.0478ms | 328.1057 Ops/s | 318.9109 Ops/s | |
test_a2c_speed[True-backward] | 8.8598ms | 8.5875ms | 116.4483 Ops/s | 117.3641 Ops/s | |
test_a2c_speed[reduce-overhead-None] | 3.2055ms | 3.0303ms | 330.0013 Ops/s | 323.7980 Ops/s | |
test_a2c_speed[reduce-overhead-backward] | 8.7837ms | 8.5077ms | 117.5401 Ops/s | 117.6216 Ops/s | |
test_ppo_speed[False-None] | 7.1790ms | 5.7691ms | 173.3358 Ops/s | 176.6455 Ops/s | |
test_ppo_speed[False-backward] | 12.7603ms | 12.3683ms | 80.8520 Ops/s | 82.4711 Ops/s | |
test_ppo_speed[True-None] | 3.7449ms | 3.4612ms | 288.9200 Ops/s | 288.7630 Ops/s | |
test_ppo_speed[True-backward] | 8.6944ms | 8.3749ms | 119.4044 Ops/s | 121.4437 Ops/s | |
test_ppo_speed[reduce-overhead-None] | 3.8403ms | 3.4615ms | 288.8888 Ops/s | 291.1117 Ops/s | |
test_ppo_speed[reduce-overhead-backward] | 8.5206ms | 8.3118ms | 120.3112 Ops/s | 119.5477 Ops/s | |
test_reinforce_speed[False-None] | 4.6561ms | 4.4026ms | 227.1406 Ops/s | 229.8540 Ops/s | |
test_reinforce_speed[False-backward] | 7.5601ms | 7.3207ms | 136.5985 Ops/s | 138.7638 Ops/s | |
test_reinforce_speed[True-None] | 2.6480ms | 2.2321ms | 448.0174 Ops/s | 434.6713 Ops/s | |
test_reinforce_speed[True-backward] | 7.4554ms | 7.1969ms | 138.9489 Ops/s | 139.5256 Ops/s | |
test_reinforce_speed[reduce-overhead-None] | 2.7940ms | 2.2424ms | 445.9444 Ops/s | 441.2806 Ops/s | |
test_reinforce_speed[reduce-overhead-backward] | 7.3432ms | 7.0986ms | 140.8729 Ops/s | 139.3393 Ops/s | |
test_iql_speed[False-None] | 20.2910ms | 19.2475ms | 51.9548 Ops/s | 49.8796 Ops/s | |
test_iql_speed[False-backward] | 31.0354ms | 29.9305ms | 33.4107 Ops/s | 32.6096 Ops/s | |
test_iql_speed[True-None] | 7.2765ms | 6.7690ms | 147.7326 Ops/s | 144.8366 Ops/s | |
test_iql_speed[True-backward] | 16.0485ms | 15.5279ms | 64.4000 Ops/s | 61.8607 Ops/s | |
test_iql_speed[reduce-overhead-None] | 7.4591ms | 6.7972ms | 147.1190 Ops/s | 146.9741 Ops/s | |
test_iql_speed[reduce-overhead-backward] | 16.0029ms | 15.6602ms | 63.8561 Ops/s | 63.6490 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.4926ms | 6.2138ms | 160.9334 Ops/s | 163.1347 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.0323ms | 0.2815ms | 3.5524 KOps/s | 4.2542 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6512ms | 0.2861ms | 3.4957 KOps/s | 4.4918 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.2351ms | 5.9609ms | 167.7589 Ops/s | 169.0786 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.9915ms | 0.2716ms | 3.6823 KOps/s | 2.9688 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.4488ms | 0.2274ms | 4.3969 KOps/s | 3.5313 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.5803ms | 1.2038ms | 830.6957 Ops/s | 715.8863 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.3404ms | 1.1586ms | 863.1044 Ops/s | 740.5887 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.2818ms | 6.1443ms | 162.7516 Ops/s | 165.0999 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.1837ms | 0.4163ms | 2.4021 KOps/s | 2.2818 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7004ms | 0.4112ms | 2.4316 KOps/s | 2.4038 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.1830ms | 5.9673ms | 167.5794 Ops/s | 167.3422 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.9106ms | 0.2724ms | 3.6708 KOps/s | 3.2721 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 1.3634ms | 0.3187ms | 3.1376 KOps/s | 4.6383 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 7.5525ms | 5.9585ms | 167.8269 Ops/s | 171.4990 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.3803ms | 0.2350ms | 4.2552 KOps/s | 3.3630 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5145ms | 0.2426ms | 4.1227 KOps/s | 4.7574 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.4144ms | 6.1171ms | 163.4760 Ops/s | 165.6992 Ops/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.9920ms | 0.4936ms | 2.0258 KOps/s | 2.3811 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7407ms | 0.4777ms | 2.0932 KOps/s | 2.7683 KOps/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 7.0488ms | 5.2628ms | 190.0142 Ops/s | 35.3554 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 10.8026ms | 2.0389ms | 490.4699 Ops/s | 496.4853 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 2.0543ms | 1.0585ms | 944.7509 Ops/s | 845.0143 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.4240s | 13.7213ms | 72.8792 Ops/s | 186.7988 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 7.4681ms | 1.9767ms | 505.8949 Ops/s | 489.3915 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 8.2266ms | 1.1978ms | 834.8823 Ops/s | 947.7880 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 8.4493ms | 5.5099ms | 181.4930 Ops/s | 180.8643 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 9.2125ms | 2.1594ms | 463.0904 Ops/s | 473.5613 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 7.8778ms | 1.3357ms | 748.6959 Ops/s | 711.6548 Ops/s | |
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000-100-True] | 45.0436ms | 43.0096ms | 23.2506 Ops/s | 22.4890 Ops/s | |
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000-100-False] | 10.4728ms | 9.9044ms | 100.9653 Ops/s | 99.9544 Ops/s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! thanks!
Part of #2501
Stack from ghstack (oldest at bottom):
ReplayBuffer.extend/sample
#2514