Fix: with SAC, a new training batch should be sampled for each gradient step #208

YiboDi · 2024-10-08T12:07:19Z

…nt_step.

DDPG/TD3/SAC for robotics tasks #65
Increasing the number of parallel simulations accelerates the replay buffer data refresh rate in SAC and other off-policy algorithms. However, to fully leverage this increased data collection, the model should be updated more frequently. This can be achieved by increasing self._gradient_steps.

Issue:

- Current Behavior: The current implementation reuses the same training batch across multiple gradient steps, which can lead to overfitting and inefficient use of the new data collected from parallel simulations.

- Expected Behavior: According to the SAC algorithm implementation, a new training batch should be sampled for each gradient step to ensure diverse and fresh experiences are used for updates.

Fixed Implemented:

Modified the SAC training loop to sample a new training batch from the replay buffer for each gradient step.

…nt_step

Fix: with SAC, a new training batch should be sampled for each gradie…

13eaef8

…nt_step

Toni-SM changed the base branch from main to develop November 2, 2024 19:11

Toni-SM changed the title ~~Fix: with SAC, a new training batch should be sampled for each gradie…~~ Fix: with SAC, a new training batch should be sampled for each gradient step Nov 2, 2024

Apply format

a9cdd06

Toni-SM merged commit 5fce807 into Toni-SM:develop Nov 3, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: with SAC, a new training batch should be sampled for each gradient step #208

Fix: with SAC, a new training batch should be sampled for each gradient step #208

YiboDi commented Oct 8, 2024

Fix: with SAC, a new training batch should be sampled for each gradient step #208

Fix: with SAC, a new training batch should be sampled for each gradient step #208

Conversation

YiboDi commented Oct 8, 2024