Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selfplay league runner with flexible configuration and algo presets #58

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kachayev
Copy link
Contributor

@kachayev kachayev commented Feb 6, 2022

Screen Shot 2022-02-05 at 10 57 54 AM

This is very much WIP.
$ poetry run python run_league.py --config-file league_alphastar.yaml

or

$ poetry run python run_league.py --config-file league_openfive.yaml

For league runner to work, it's required to provide 2 entrypoints: train and evaluate. Both take as an argument path to saved agent and saved opponent.

League configuration (in YAML file) gives ability to control:

  • population structure
  • matchmaking algo (pick the next opponent)
  • archival and evaluation scheduling

The runner keeps track of winrates in a payoff table and MMR by running Bayesian update on TrueSkill. Information on winrates and MMR could be used for making decisions on the next opponent.

As an example, 2 presets are implemented:

  • AlphaStar
  • OpenFive (not sure if it's required to sample based on MMR), not that implementation turned out to be almost trivial based on league API (I mainly used this as a confirmation for API being flexible enough)

The league supports resume from checkpoint.

There are still a lot of issues, including

  • proper setup for tensorboard writer/w&b
  • open questions around seeding
  • better logger for master process
  • and more

It's actually pretty hard to iterate on the league runner with MicroRTS env, as it takes long time to get any training done. I'm mostly iterating on SlimeVolley, and some other PettingZoo envs. And I'm think about having league runner as a separate package that could be used as a library and/or CLI tool (from implementation perspective, it's completely independent from details of training or env that is used). WDYT @vwxyzjn?

@vwxyzjn
Copy link
Collaborator

vwxyzjn commented Feb 7, 2022

As an example, 2 presets are implemented: AlphaStar OpenFive (not sure if it's required to sample based on MMR), not that implementation turned out to be almost trivial based on league API (I mainly used this as a confirmation for API being flexible enough)

Nice! This is really cool!

And I'm think about having league runner as a separate package that could be used as a library and/or CLI tool (from implementation perspective, it's completely independent from details of training or env that is used). WDYT @vwxyzjn?

This makes sense! Lots of projects could benefit from this :)

It's actually pretty hard to iterate on the league runner with MicroRTS env, as it takes long time to get any training done. I'm mostly iterating on SlimeVolley, and some other PettingZoo envs.

I will look further into this and evaluate.

@vwxyzjn
Copy link
Collaborator

vwxyzjn commented Feb 8, 2022

Looked further into this. Do we have a sense of how "fast" the training is? So if we run poetry run python run_league.py --config-file league_alphastar.yaml for 24 hours, what's the trueskill of the best agent using our league.py to evaluate?

@kachayev
Copy link
Contributor Author

kachayev commented Feb 8, 2022

To answer this question I need to have a GPU 😀 And based on the schedule of other experiments I will be able to run it tomorrow or the day after tomorrow

@kachayev
Copy link
Contributor Author

kachayev commented Feb 9, 2022

Oh, BTW. Found another detail I have to flesh out first: right now evaluation only works against other agents, PvE games are not supported. It's quite easy to cover, so shouldn't take long

@vwxyzjn
Copy link
Collaborator

vwxyzjn commented Feb 9, 2022

What is PvE?

@kachayev
Copy link
Contributor Author

kachayev commented Feb 9, 2022

Sorry :) It stands for "player vs. environment" (like, built-in bot). In comparison to PvP, as "player vs. player"

@vwxyzjn
Copy link
Collaborator

vwxyzjn commented Feb 9, 2022

Oh, this makes sense. It would be useful to cover! That said, hopefully, we can also train really strong agents without the help of human-engineered bots :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants