Selfplay league runner with flexible configuration and algo presets #58
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is very much WIP.
or
For league runner to work, it's required to provide 2 entrypoints:
train
andevaluate
. Both take as an argument path to saved agent and saved opponent.League configuration (in YAML file) gives ability to control:
The runner keeps track of winrates in a payoff table and MMR by running Bayesian update on TrueSkill. Information on winrates and MMR could be used for making decisions on the next opponent.
As an example, 2 presets are implemented:
The league supports resume from checkpoint.
There are still a lot of issues, including
It's actually pretty hard to iterate on the league runner with MicroRTS env, as it takes long time to get any training done. I'm mostly iterating on SlimeVolley, and some other PettingZoo envs. And I'm think about having league runner as a separate package that could be used as a library and/or CLI tool (from implementation perspective, it's completely independent from details of training or env that is used). WDYT @vwxyzjn?