Solving Gymnasium's Car Racing with Reinforcement Learning

Soft-Actor Critic (SAC)

Hardware: Google Colab T4

Model Type	Discrete	Average Reward	Training Time	Total Training Steps
PPO	No	887.84	5:33:03	751,614
SAC	No	610.67	6:29:16	333,116
DQN	Yes	897.77	5:41:22	750,000

Set ent_coef for PPO as it encourages exploration of other actions. Stable Baselines3 defaults the value to 0.0. More Information
Do not set your eval_freq too low, as it can sometimes cause instability during learning due to being interrupted by evaluation. (e.g. >=10,000)
buffer_size defaults to 1,000,000, which requires a significant memory for DQN and SAC. Try setting it to a more practical value when using the original observation space (e.g., 200,000)
Set the gray_scale flag in the notebooks to True to allow DQN and SAC to run without using the High-RAM option in Google Colab (buffer size <= 150,000). This converts the observation space from (96 x 96 x 3) images to (84 x 84) grayscale images.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
Images		Images
.gitignore		.gitignore
README.md		README.md
[Car Racing] Deep Q-Network (DQN).ipynb		[Car Racing] Deep Q-Network (DQN).ipynb
[Car Racing] Proximal Policy Optimization (PPO).ipynb		[Car Racing] Proximal Policy Optimization (PPO).ipynb
[Car Racing] Soft Actor-Critic (SAC).ipynb		[Car Racing] Soft Actor-Critic (SAC).ipynb