Contributors: Shubham Jain, Michael Lee, Jeremy Meyer
With this project we aim to explore the applications of genetic algorithms and reinforcement learning in the context of learning how to play games. Our ultimate goal is to develop a network capable of playing PacMan by using game data, such as player position, wall positions, and directional vectors towards ghosts, fruits, and orbs to learn an optimal strategy of movement. Our first steps towards this goal is to use a genetic algorithm to optimize the weights of an artificial neural network for MNIST digit classification, then develop a convolutional neural network capable of playing Pong by utilizing reinforcement learning. We will then try to combine what we've learned from these smaller experiments to implement a network to play PacMan.
We used Keras to develop a simple artificial neural network, but instead of compiling with a typical backpropagation/gradient descent style optimizer, we apply our own method of weight optimization. A high level overview of our pipeline is as follows:
- Construct an initial population of networks with randomly initialized weights and biases.
- Use the training data to immediately make predictions using each network in the population.
- Rate each population's fitness using its training accuracy.
- Select the top individuals of this population to create a mating pool.
- Randomly select parents from this mating pool to perform crossover and produce sets of weights and biases for the next generation of networks.
- For each new network, randomly select weights and biases to mutate.
- Repeat steps 2-6 for a large number of generations.
We have the following hyperparameters:
NUM_GENERATIONS
: the number of generations to continue this processPOPULATION_SIZE
: the number of networks in each populationSELECTION_SIZE
: the size of the mating poolMUTATION_RATE
: the probability that a weight is mutated during crossoverMUTATION_WEIGHT_RANGE
: the range of values that a weight could be modified byMUTATION_BIAS_RANGE
: the range of values that a bias could be modified by- Crossover method: could be either uniform crossover or single-point crossover
We used the following network architecture for each individual in the population:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_0 (Dense) (None, 784) 615440
_________________________________________________________________
dense_1 (Dense) (None, 32) 25120
_________________________________________________________________
dense_2 (Dense) (None, 10) 330
=================================================================
Total params: 640,890
Trainable params: 640,890
Non-trainable params: 0
_________________________________________________________________
This learning curve is the result of training with a population size of 100, a mating pool size of 15, uniform crossover, and random additive mutation over 200 generations.
We performed the same process with a convolutional neural network, with each individual having the following architecture:
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_2 (Conv2D) (None, 28, 28, 8) 40
_________________________________________________________________
conv2d_3 (Conv2D) (None, 28, 28, 16) 528
_________________________________________________________________
flatten_1 (Flatten) (None, 12544) 0
_________________________________________________________________
dense (Dense) (None, 10) 125450
=================================================================
Total params: 126,018
Trainable params: 126,018
Non-trainable params: 0
_________________________________________________________________