Asynchronous server for collecting offline rollouts in a reinforcement learning setting
- Externally, Pytorch models of agent policy functions are are trained using PPO
- Models weights are are sent by clients to be cached in the server
- Each model version plays multiple matches against all other models
- Rollouts of these matches are collected and returned to the clients
A fruitbots clone is used as the game environment in this engine