Each experiment uses 3 seeds and is trained for 3M environment steps. The parameters used for SAC are the same parameters as described in the original paper.
coach -p Mujoco_SAC -lvl inverted_pendulum
coach -p Mujoco_SAC -lvl hopper
coach -p Mujoco_SAC -lvl half_cheetah
coach -p Mujoco_SAC -lvl walker2d
coach -p Mujoco_SAC -lvl humanoid