Based on PARL, the SAC algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in Mujoco benchmarks.
Include following approaches:
- DDPG Style with Stochastic Policy
- Maximum Entropy
SAC in Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Please see here to know more about Mujoco games.
- python3.5+
- paddlepaddle>=1.6.1
- parl
- gym
- mujoco-py>=1.50.1.0
# To train an agent for HalfCheetah-v2 game
python train.py
# To train for different games
# python train.py --env [ENV_NAME]