Phase 2: Model Experimentation
Introduction

As the team approached this problem with limited knowledge of reinforcement learning, we turned towards scholarly research to inform our decision on which models to include in experimentation. First, we implemented NEAT, NeuroEvolution of Augmenting Topologies, as our flagship model as it performs optimally on a CPU. As the team gained GPU processing capabilities, we transitioned to a more robust Deep Q Learning Agent. These agents were tuned in parallel to identify the best model for our situation. We ultimately moved forward with development of the Deep Q Learning Agent. Deep Q uses a neural network to calculate action-reward pairs for each input state in parallel, resulting in a policy that maps states to actions that obtain the greatest reward.
However, we found that the agent benefitted from seeing certain states more often than others to develop a policy that could overcome the trickiest of obstacles, such as the dreaded loop. As such, we developed the replay agent that supported training of the Deep Q Agent. This helper agent allows the user to control the starting point of the Deep Q Agent's training, allowing the agent to focus better on obstacles that would otherwise prove insurmountable during standard training.