enabling self play #241

drblallo · 2024-03-26T18:49:37Z

hi,
i tried out this project and it is one of the few that actually works off the shelf, thank you for your work.
Is there a way to enable self play when training an agent? My usecase is to use DreamerV3 as a alternative to algorithms such as muzero to train agents for boardgames.
I have looked around the repo but this feature does not seem trivially available out of the box.

belerico · 2024-03-27T16:50:06Z

Hi @drblallo, thank you for your words!
You're right, self-play is not supported right now. Do you have any specific references about self-play that we can look upon?

drblallo · 2024-03-27T20:42:56Z

from what i gather, pretty much everyone just implements it as "the environment has a function that tells you which is the current player, and the rewards are a vector with a element for each player", for example openspiel from google implements it as https://github.com/google-deepmind/open_spiel/blob/master/open_spiel/python/examples/tic_tac_toe_qlearner.py#L118

player_id = time_step.observations["current_player"] #gets the current player
agent_output = agents[player_id].step(time_step) #asks the agent assigned to that player what actions to perform
time_step = env.step([agent_output.action]) #performs the action

as far as i know there is no know math to do something fancier than this, except stuff like minmax, but those are alphago style algorithms which do not make much sense for algorithms like dreamer, so the whole thing should just require to have a array of agents instead of one.

In principle i am willing to implement this myself, if it is expected to be a circumscribed effort.

geranim0 · 2024-04-05T22:55:01Z

Hi @drblallo

Need this right now too. Before I start working on it and adapt what's done in cli.py::eval_algorithm to my wrapper, just wanted to ping to see if you've done that work already.

Basically the interface I'm looking for is something like that

agent = load(checkpoint_path, config, seed)
action = agent.act(obs_space.sample())

Thanks

belerico · 2024-05-06T08:44:40Z

Hi @drblallo and @geranim0! The one thing that you could do to enable self-play is:

Create a new agent (inheriting from an already defined one, like Dreamer-V3 for example) in a new folder and adapt it so that it instantiates multiple agent as the number of player you need, i.e. calling the build_agent N times and save those agents in a dict or list
Create a wrapper to open-spiel so that the environment can be directly used in sheeprl
Interact with the environment as specified by open-spiel

This goes as far as the observations, actions, rewards and everything that could be saved in the same rollout or replay buffer has a dimension of [sequence_length, num_envs, ...]. Also bear in mind that in every algorithm we add a leading 1 in everything we are going to save in the replay buffer because we suppose that the vectorized environment returns something of shape [num_envs, ...]. One thing that you could do is to save into the buffer arrays of shape [seq_len, num_envs, players_num, ...] or create a replay buffer for every player independently and sample accordingly.

This could be linked to #278

belerico mentioned this issue May 6, 2024

ReplayBuffer storing actions size mismatch during env reset #278

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enabling self play #241

enabling self play #241

drblallo commented Mar 26, 2024 •

edited

Loading

belerico commented Mar 27, 2024

drblallo commented Mar 27, 2024

geranim0 commented Apr 5, 2024 •

edited

Loading

belerico commented May 6, 2024

enabling self play #241

enabling self play #241

Comments

drblallo commented Mar 26, 2024 • edited Loading

belerico commented Mar 27, 2024

drblallo commented Mar 27, 2024

geranim0 commented Apr 5, 2024 • edited Loading

belerico commented May 6, 2024

drblallo commented Mar 26, 2024 •

edited

Loading

geranim0 commented Apr 5, 2024 •

edited

Loading