- renamed classes like trainers and agents for easier understanding
- support dm_control suite with visual observation
- update environment with max step limit
- support action repeat and reward delay
- update installation by fixing dependent library versions (gym, mujoco, .etc)