Solving the car racing problem in OpenAI Gym using Proximal Policy Optimization (PPO). This problem has a real physical engine in the back end. You can achieve real racing actions in the environment, like drifting.
To run the code, you need
Every action will be repeated for 8 frames. To get velocity information, state is defined as adjacent 4 frames in shape (4, 96, 96). Use a two heads FCN to represent the actor and critic respectively. The actor outputs α, β for each actin as the parameters of Beta distribution.
Start a Visdom server with python -m visdom.server, it will serve http://localhost:8097/ by default.
To train the agent, runpython train.py --render --vis or python train.py --render without visdom.
To test, run python test.py --render.


