This project demonstrates the application of reinforcement learning (RL) to play the Super Mario Bros game using the Proximal Policy Optimization (PPO) algorithm. The game environment undergoes pre-processing steps to prepare the state inputs for the RL agent.
- Grayscale Conversion: The original game images are converted to grayscale, which reduces the computational load by simplifying the input data. This is crucial for enhancing the performance of the learning algorithm.
-
Frame Stacking: Multiple consecutive frames are stacked together. This technique provides a temporal dimension to the inputs, allowing the agent to understand and predict the trajectory and velocity of Mario and his enemies.
These preprocessing techniques are essential for efficient training of the RL agent, enabling it to perform better by understanding the dynamics of the game environment.
- Clone the repository.
git clone [email protected]:Sukruthi-C/RL-for-Super-Mario.git
- Install the pre-requisities.
cd RL-for-Super-Mario
- Install dependencies
pip install gym_super_mario_bros
pip install stable-baselines3[extra]
pip install matplotlib
pip install pytorch
- Run all code the blocks on jupyter notebook
The model was trained for 1000000 iterations with a learning rate of 1e-7 with cnn policy and 2000000 iterations with a learning rate of 1e-6 with a mlp policy, batch size of 64 and number of steps of 512. The one with the lower learning rate performed better. However, the model needs to be trained more than this to perform better. As you can see in the gifs attached above that Mario cannot pass the first level with this. Due to GPU limitations, this remains a work for the future.