-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Faster slice sampler #2031
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2031
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 20 Unrelated FailuresAs of commit 4eb145b with merge base 660d827 (): NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 62.2475ms | 54.7210ms | 18.2745 Ops/s | 17.8861 Ops/s | |
test_sync | 30.4470ms | 29.1686ms | 34.2834 Ops/s | 34.4402 Ops/s | |
test_async | 52.1722ms | 26.8124ms | 37.2962 Ops/s | 36.5925 Ops/s | |
test_simple | 0.3999s | 0.3389s | 2.9511 Ops/s | 2.9356 Ops/s | |
test_transformed | 0.5266s | 0.4756s | 2.1025 Ops/s | 2.1166 Ops/s | |
test_serial | 1.2536s | 1.2009s | 0.8327 Ops/s | 0.8302 Ops/s | |
test_parallel | 1.0770s | 1.0242s | 0.9763 Ops/s | 0.9725 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 0.1539ms | 21.1090μs | 47.3731 KOps/s | 47.0284 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 39.3930μs | 13.1737μs | 75.9087 KOps/s | 78.2412 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 34.7440μs | 12.4711μs | 80.1855 KOps/s | 80.6377 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 29.3540μs | 7.6922μs | 130.0022 KOps/s | 134.9625 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 58.0770μs | 22.5685μs | 44.3096 KOps/s | 44.3384 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 56.4350μs | 14.2736μs | 70.0596 KOps/s | 71.5608 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 39.0520μs | 13.6201μs | 73.4209 KOps/s | 73.5524 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 35.5260μs | 8.8072μs | 113.5437 KOps/s | 115.9136 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 68.0760μs | 23.8625μs | 41.9068 KOps/s | 41.4831 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 39.9640μs | 15.7575μs | 63.4619 KOps/s | 64.6992 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 83.3820μs | 13.8522μs | 72.1907 KOps/s | 73.3301 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 33.0720μs | 8.7303μs | 114.5431 KOps/s | 112.7358 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 51.3560μs | 24.7853μs | 40.3465 KOps/s | 39.7711 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 42.4190μs | 16.8142μs | 59.4736 KOps/s | 61.2857 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 44.7630μs | 14.7085μs | 67.9879 KOps/s | 66.8635 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 0.1117ms | 10.3019μs | 97.0695 KOps/s | 102.7007 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 60.2830μs | 23.9268μs | 41.7942 KOps/s | 41.7524 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 41.9580μs | 15.8159μs | 63.2274 KOps/s | 64.8198 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 58.1890μs | 15.9554μs | 62.6746 KOps/s | 62.7421 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 39.6340μs | 9.9925μs | 100.0752 KOps/s | 101.4280 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 47.6790μs | 25.6084μs | 39.0497 KOps/s | 39.5810 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 36.3670μs | 17.1092μs | 58.4482 KOps/s | 61.2718 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 37.9210μs | 17.0988μs | 58.4837 KOps/s | 59.0860 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 89.8370μs | 11.3255μs | 88.2964 KOps/s | 91.3664 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 0.1011ms | 26.2544μs | 38.0888 KOps/s | 38.0274 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 50.6640μs | 18.2495μs | 54.7959 KOps/s | 56.4244 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 46.3460μs | 17.0292μs | 58.7228 KOps/s | 58.6099 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 41.0770μs | 11.2506μs | 88.8838 KOps/s | 90.6026 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 61.3740μs | 27.5870μs | 36.2490 KOps/s | 36.4507 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 85.1390μs | 19.5623μs | 51.1188 KOps/s | 53.1690 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 56.4550μs | 18.0391μs | 55.4350 KOps/s | 56.2428 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 57.7910μs | 12.1922μs | 82.0196 KOps/s | 83.2132 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 10.4406ms | 9.5631ms | 104.5683 Ops/s | 108.6454 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 39.9798ms | 35.7996ms | 27.9333 Ops/s | 28.5532 Ops/s | |
test_values[td0_return_estimate-False-False] | 0.2272ms | 0.1766ms | 5.6627 KOps/s | 5.7591 KOps/s | |
test_values[td1_return_estimate-False-False] | 23.2642ms | 22.8395ms | 43.7838 Ops/s | 43.6765 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 37.4280ms | 35.8148ms | 27.9214 Ops/s | 28.6381 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 37.0803ms | 33.6072ms | 29.7556 Ops/s | 30.4478 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 37.1078ms | 35.6954ms | 28.0148 Ops/s | 28.4505 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 8.2732ms | 8.1605ms | 122.5417 Ops/s | 123.1707 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 2.4584ms | 1.9294ms | 518.2865 Ops/s | 555.4366 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.5957ms | 0.3507ms | 2.8514 KOps/s | 2.8455 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 41.4098ms | 39.1982ms | 25.5114 Ops/s | 22.0761 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 3.5532ms | 3.0319ms | 329.8239 Ops/s | 331.9050 Ops/s | |
test_dqn_speed | 6.9681ms | 1.3551ms | 737.9322 Ops/s | 695.0187 Ops/s | |
test_ddpg_speed | 2.9820ms | 2.6830ms | 372.7182 Ops/s | 378.0220 Ops/s | |
test_sac_speed | 9.7313ms | 8.2612ms | 121.0474 Ops/s | 123.5812 Ops/s | |
test_redq_speed | 14.4897ms | 13.2607ms | 75.4106 Ops/s | 77.9603 Ops/s | |
test_redq_deprec_speed | 14.9728ms | 13.3221ms | 75.0634 Ops/s | 77.9220 Ops/s | |
test_td3_speed | 16.1090ms | 8.2068ms | 121.8503 Ops/s | 124.0594 Ops/s | |
test_cql_speed | 37.3579ms | 36.1080ms | 27.6947 Ops/s | 27.8215 Ops/s | |
test_a2c_speed | 8.0638ms | 7.3666ms | 135.7482 Ops/s | 137.3737 Ops/s | |
test_ppo_speed | 9.1612ms | 7.6782ms | 130.2393 Ops/s | 133.1884 Ops/s | |
test_reinforce_speed | 7.3033ms | 6.5535ms | 152.5897 Ops/s | 154.9115 Ops/s | |
test_iql_speed | 33.2971ms | 32.3145ms | 30.9458 Ops/s | 30.9785 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 2.5375ms | 2.2631ms | 441.8754 Ops/s | 479.0684 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 97.7832ms | 0.5770ms | 1.7332 KOps/s | 2.0291 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6711ms | 0.4747ms | 2.1067 KOps/s | 2.1287 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.3964ms | 2.3289ms | 429.3933 Ops/s | 482.0713 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.1012ms | 0.4927ms | 2.0296 KOps/s | 2.0597 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6407ms | 0.4687ms | 2.1335 KOps/s | 2.1795 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.7901ms | 1.2096ms | 826.7491 Ops/s | 774.5112 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.6407ms | 1.1417ms | 875.9153 Ops/s | 817.8853 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.4375ms | 2.3867ms | 418.9900 Ops/s | 452.2994 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.0785ms | 0.6149ms | 1.6263 KOps/s | 1.6486 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8939ms | 0.5882ms | 1.7002 KOps/s | 1.7172 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.4560ms | 2.2994ms | 434.9000 Ops/s | 483.7164 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.6569ms | 0.5022ms | 1.9911 KOps/s | 2.0223 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 3.9302ms | 0.4814ms | 2.0775 KOps/s | 2.1249 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.4917ms | 2.3635ms | 423.1000 Ops/s | 474.0430 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.6125ms | 0.4918ms | 2.0333 KOps/s | 2.0453 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6826ms | 0.4712ms | 2.1223 KOps/s | 2.1749 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.4840ms | 2.3940ms | 417.7068 Ops/s | 456.8850 Ops/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.1367ms | 0.6168ms | 1.6212 KOps/s | 1.6450 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7855ms | 0.5879ms | 1.7009 KOps/s | 1.7209 KOps/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1104s | 7.6689ms | 130.3973 Ops/s | 134.7378 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 14.3786ms | 11.9633ms | 83.5891 Ops/s | 83.8722 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 1.6076ms | 1.0607ms | 942.8066 Ops/s | 958.8463 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1010s | 5.5350ms | 180.6675 Ops/s | 185.3464 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 14.2861ms | 11.8917ms | 84.0921 Ops/s | 68.6599 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 3.9687ms | 1.1515ms | 868.4556 Ops/s | 964.6981 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.1068s | 8.0143ms | 124.7775 Ops/s | 173.6853 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 15.0415ms | 12.2902ms | 81.3657 Ops/s | 81.1138 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 4.1109ms | 1.4292ms | 699.6817 Ops/s | 744.2709 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 0.1079s | 0.1043s | 9.5893 Ops/s | 9.1300 Ops/s | |
test_sync | 91.7501ms | 88.0483ms | 11.3574 Ops/s | 10.7516 Ops/s | |
test_async | 0.1819s | 90.5767ms | 11.0404 Ops/s | 11.1805 Ops/s | |
test_single_pixels | 0.1134s | 0.1126s | 8.8794 Ops/s | 8.9232 Ops/s | |
test_sync_pixels | 76.0009ms | 68.0311ms | 14.6992 Ops/s | 14.9715 Ops/s | |
test_async_pixels | 0.1007s | 62.8028ms | 15.9229 Ops/s | 15.6392 Ops/s | |
test_simple | 0.7485s | 0.6780s | 1.4748 Ops/s | 1.4887 Ops/s | |
test_transformed | 0.9745s | 0.8911s | 1.1223 Ops/s | 1.1274 Ops/s | |
test_serial | 2.1693s | 2.1182s | 0.4721 Ops/s | 0.4799 Ops/s | |
test_parallel | 1.8928s | 1.8162s | 0.5506 Ops/s | 0.5510 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 84.7010μs | 33.5420μs | 29.8134 KOps/s | 28.7615 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 45.2010μs | 19.5138μs | 51.2458 KOps/s | 50.3475 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 33.3100μs | 18.4865μs | 54.0936 KOps/s | 52.8192 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 27.3700μs | 11.2500μs | 88.8886 KOps/s | 88.3643 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 95.0710μs | 34.6286μs | 28.8778 KOps/s | 28.2402 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 40.2300μs | 21.2517μs | 47.0550 KOps/s | 45.8746 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 48.3900μs | 20.2790μs | 49.3121 KOps/s | 48.9478 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 31.1510μs | 13.2582μs | 75.4249 KOps/s | 75.5476 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 60.2510μs | 36.8132μs | 27.1642 KOps/s | 26.6369 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 90.7920μs | 23.3909μs | 42.7517 KOps/s | 42.2603 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 36.9910μs | 20.1754μs | 49.5652 KOps/s | 48.8036 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 30.6100μs | 13.1347μs | 76.1342 KOps/s | 75.1771 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 71.2310μs | 37.8947μs | 26.3889 KOps/s | 25.6757 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 61.1910μs | 24.8488μs | 40.2434 KOps/s | 39.5736 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 39.1610μs | 21.9996μs | 45.4554 KOps/s | 45.1817 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 29.9510μs | 14.7778μs | 67.6691 KOps/s | 66.7648 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 55.9110μs | 35.9942μs | 27.7823 KOps/s | 26.7138 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 47.4600μs | 23.2114μs | 43.0822 KOps/s | 42.3457 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 50.0700μs | 24.1731μs | 41.3683 KOps/s | 41.4149 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 31.9600μs | 14.7400μs | 67.8427 KOps/s | 67.9126 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 63.1210μs | 38.8398μs | 25.7468 KOps/s | 25.0651 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 55.2310μs | 25.0496μs | 39.9208 KOps/s | 39.6872 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 49.2910μs | 25.9779μs | 38.4943 KOps/s | 38.5669 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 45.4210μs | 16.4559μs | 60.7685 KOps/s | 59.9086 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 57.6310μs | 40.3140μs | 24.8053 KOps/s | 24.3771 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 43.2510μs | 27.1628μs | 36.8151 KOps/s | 36.5516 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 93.5210μs | 25.9010μs | 38.6085 KOps/s | 38.2611 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 35.4010μs | 16.4781μs | 60.6868 KOps/s | 60.0247 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 68.1510μs | 41.6003μs | 24.0383 KOps/s | 23.7740 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 64.0400μs | 28.9196μs | 34.5786 KOps/s | 34.7468 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 49.1110μs | 27.6703μs | 36.1399 KOps/s | 36.3637 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 37.0200μs | 18.2675μs | 54.7420 KOps/s | 54.7225 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 27.0439ms | 24.9036ms | 40.1548 Ops/s | 41.0822 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 82.3481ms | 3.2221ms | 310.3539 Ops/s | 307.2974 Ops/s | |
test_values[td0_return_estimate-False-False] | 93.3020μs | 66.4387μs | 15.0515 KOps/s | 15.3372 KOps/s | |
test_values[td1_return_estimate-False-False] | 55.6854ms | 55.4132ms | 18.0463 Ops/s | 18.2754 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 2.1001ms | 1.7779ms | 562.4638 Ops/s | 566.4542 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 96.0303ms | 88.8608ms | 11.2536 Ops/s | 11.7808 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 2.1137ms | 1.7766ms | 562.8645 Ops/s | 566.8676 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 24.4621ms | 24.0664ms | 41.5517 Ops/s | 42.2189 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 0.9009ms | 0.7153ms | 1.3981 KOps/s | 1.4179 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7307ms | 0.6627ms | 1.5090 KOps/s | 1.5309 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.4907ms | 1.4632ms | 683.4380 Ops/s | 685.2496 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.9576ms | 0.6853ms | 1.4592 KOps/s | 1.4826 KOps/s | |
test_dqn_speed | 1.8528ms | 1.4465ms | 691.3075 Ops/s | 669.9737 Ops/s | |
test_ddpg_speed | 3.1767ms | 2.7655ms | 361.6029 Ops/s | 363.6794 Ops/s | |
test_sac_speed | 8.5929ms | 8.1659ms | 122.4604 Ops/s | 123.5544 Ops/s | |
test_redq_speed | 11.7540ms | 10.5931ms | 94.4007 Ops/s | 95.3757 Ops/s | |
test_redq_deprec_speed | 11.8680ms | 11.3237ms | 88.3100 Ops/s | 89.1693 Ops/s | |
test_td3_speed | 8.1904ms | 8.0882ms | 123.6370 Ops/s | 124.2931 Ops/s | |
test_cql_speed | 26.4281ms | 25.6721ms | 38.9528 Ops/s | 38.9657 Ops/s | |
test_a2c_speed | 5.7440ms | 5.5435ms | 180.3910 Ops/s | 177.6930 Ops/s | |
test_ppo_speed | 6.0444ms | 5.8701ms | 170.3555 Ops/s | 167.1658 Ops/s | |
test_reinforce_speed | 5.3566ms | 4.5343ms | 220.5424 Ops/s | 220.4312 Ops/s | |
test_iql_speed | 0.1142s | 21.3877ms | 46.7558 Ops/s | 50.5884 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.1128ms | 2.9146ms | 343.1051 Ops/s | 343.7329 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.3394ms | 0.5445ms | 1.8365 KOps/s | 1.8217 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.7320ms | 0.5235ms | 1.9104 KOps/s | 1.9290 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.1484ms | 2.9177ms | 342.7352 Ops/s | 345.2346 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.7446ms | 0.5440ms | 1.8381 KOps/s | 1.5346 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.7037ms | 0.5148ms | 1.9427 KOps/s | 1.9564 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.6565ms | 1.4761ms | 677.4494 Ops/s | 653.5166 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.4921ms | 1.3946ms | 717.0398 Ops/s | 680.5898 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.1308ms | 3.0387ms | 329.0838 Ops/s | 327.9275 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.2790ms | 0.6697ms | 1.4933 KOps/s | 1.4843 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8662ms | 0.6497ms | 1.5391 KOps/s | 1.5194 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.0013ms | 2.9008ms | 344.7321 Ops/s | 344.9155 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.6692ms | 0.5451ms | 1.8345 KOps/s | 1.8281 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 4.4281ms | 0.5253ms | 1.9036 KOps/s | 1.9020 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.1169ms | 2.9407ms | 340.0538 Ops/s | 339.1315 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.6577ms | 0.5371ms | 1.8619 KOps/s | 1.8614 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.7231ms | 0.5127ms | 1.9504 KOps/s | 1.9455 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.1592ms | 3.0294ms | 330.1005 Ops/s | 329.0955 Ops/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8353ms | 0.6691ms | 1.4945 KOps/s | 1.4871 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 4.6174ms | 0.6576ms | 1.5207 KOps/s | 1.5364 KOps/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1330s | 7.3881ms | 135.3525 Ops/s | 136.0825 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 18.8000ms | 15.1991ms | 65.7934 Ops/s | 65.7658 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 2.4625ms | 1.1079ms | 902.5690 Ops/s | 932.0249 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1158s | 6.9998ms | 142.8608 Ops/s | 141.2281 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 17.4517ms | 15.0994ms | 66.2278 Ops/s | 57.4777 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 2.2616ms | 1.1164ms | 895.7304 Ops/s | 928.5206 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.1193s | 9.6955ms | 103.1401 Ops/s | 135.4568 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 17.9278ms | 15.4757ms | 64.6174 Ops/s | 63.8866 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 2.7688ms | 1.4653ms | 682.4717 Ops/s | 632.8394 Ops/s |
I was looking into pre-computing the trajectory indices at write time. The bottleneck if we're not caching values is at the nonzero call rl/torchrl/data/replay_buffers/samplers.py Line 839 in 660d827
This nonzero is called on the If we want to "cache" whatever we can between two updates, we should update the result of nonzero to avoid calling nonzero over the whole import torch
torch.manual_seed(2)
batch = 4
time = 10
ends = torch.zeros(batch, time, dtype=torch.bool).bernoulli_(0.2)
nz = ends.nonzero()
print("original non zero", nz)
ends_slice = torch.zeros(batch, 4, dtype=torch.bool).bernoulli_(0.2)
ends2 = ends.clone()
ends2[:, 2:6] = ends_slice
nz2 = ends2.nonzero()
nz_slice = ends_slice.nonzero()
nz_slice[:, 1] += 2
print("non zero from the slice", nz_slice)
print("updated non zero")
print(nz2) That will give you
So the operation of updating the first non zero to get the second given the update we made is to (1) is O(T) and if implemented in python it will basically amend to scanning through the whole set of non-zero end signals and create a new tensor out of it. In short: it will be expensive and tedious. I don't think this is worth anyone's time so I'm a bit skeptical that we can get this working. So for now I will not be looking at writing the start/end of trajectories at write time since I can't see a viable implementation. The one I outlined above will be slower than just calling nonzero() on the whole thing. But there is an intermediate solution if you're doing more than one |
Some benchmarks: import time
import torch
import tqdm
from tensordict import TensorDict
from torchrl.data import ReplayBuffer, SliceSampler, LazyTensorStorage
for compile in [True, False]:
for cached in [True, False]:
rb = ReplayBuffer(storage=LazyTensorStorage(1_000_000),
sampler=SliceSampler(num_slices=16, traj_key="traj_idx", compile=compile, cache_values=cached),
batch_size=256)
tds = TensorDict({
"traj_idx": torch.arange(1_000_000) // 100,
"x": torch.randn(1_000_000),
("next", "y"): torch.randn(1_000_000),
}, [1_000_000]).split(1000)
def iter_over_tds():
while True:
yield from tds
iterator = iter_over_tds()
rb.extend(next(iterator))
rb.sample()
n_samples = 20
t0 = time.time()
for i, data in tqdm.tqdm(enumerate(iterator), total=5000, desc=f"compile={compile}, cache={cached}"):
rb.extend(data)
for j in range(n_samples):
rb.sample()
if i == 5000:
break
print(f"compile={compile}, cache={cached}, time={time.time() - t0: 4.4f}") Results:
So we have a clear very impressive gain of cache (compile=True => 625%, compile = False => 2000%) and some gain thanks to compile too (cache = True => 110%, cache=False => 388%) |
(cherry picked from commit cd540bf)
TODO:
cc @ahmed-touati @Cadene