Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enabling self play #241

Open
drblallo opened this issue Mar 26, 2024 · 4 comments
Open

enabling self play #241

drblallo opened this issue Mar 26, 2024 · 4 comments

Comments

@drblallo
Copy link

drblallo commented Mar 26, 2024

hi,
i tried out this project and it is one of the few that actually works off the shelf, thank you for your work.
Is there a way to enable self play when training an agent? My usecase is to use DreamerV3 as a alternative to algorithms such as muzero to train agents for boardgames.
I have looked around the repo but this feature does not seem trivially available out of the box.

@belerico
Copy link
Member

Hi @drblallo, thank you for your words!
You're right, self-play is not supported right now. Do you have any specific references about self-play that we can look upon?

@drblallo
Copy link
Author

from what i gather, pretty much everyone just implements it as "the environment has a function that tells you which is the current player, and the rewards are a vector with a element for each player", for example openspiel from google implements it as https://github.com/google-deepmind/open_spiel/blob/master/open_spiel/python/examples/tic_tac_toe_qlearner.py#L118

player_id = time_step.observations["current_player"] #gets the current player
agent_output = agents[player_id].step(time_step) #asks the agent assigned to that player what actions to perform
time_step = env.step([agent_output.action]) #performs the action

as far as i know there is no know math to do something fancier than this, except stuff like minmax, but those are alphago style algorithms which do not make much sense for algorithms like dreamer, so the whole thing should just require to have a array of agents instead of one.

In principle i am willing to implement this myself, if it is expected to be a circumscribed effort.

@geranim0
Copy link

geranim0 commented Apr 5, 2024

Hi @drblallo

Need this right now too. Before I start working on it and adapt what's done in cli.py::eval_algorithm to my wrapper, just wanted to ping to see if you've done that work already.

Basically the interface I'm looking for is something like that

agent = load(checkpoint_path, config, seed)
action = agent.act(obs_space.sample())

Thanks

@belerico
Copy link
Member

belerico commented May 6, 2024

Hi @drblallo and @geranim0! The one thing that you could do to enable self-play is:

  • Create a new agent (inheriting from an already defined one, like Dreamer-V3 for example) in a new folder and adapt it so that it instantiates multiple agent as the number of player you need, i.e. calling the build_agent N times and save those agents in a dict or list
  • Create a wrapper to open-spiel so that the environment can be directly used in sheeprl
  • Interact with the environment as specified by open-spiel

This goes as far as the observations, actions, rewards and everything that could be saved in the same rollout or replay buffer has a dimension of [sequence_length, num_envs, ...]. Also bear in mind that in every algorithm we add a leading 1 in everything we are going to save in the replay buffer because we suppose that the vectorized environment returns something of shape [num_envs, ...]. One thing that you could do is to save into the buffer arrays of shape [seq_len, num_envs, players_num, ...] or create a replay buffer for every player independently and sample accordingly.

This could be linked to #278

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants