[OSPP-Week2] Check the `kuhn_nfsp` experiment and read papers #342

peterchen96 · 2021-07-06T14:13:40Z

peterchen96
Jul 6, 2021
Collaborator

Hi, I am a student of the OSPP project Implement Multi-Agent Reinforcement Learning Algorithms in Julia. This summer, my main mission is to implement some multi-agent RL algorithms(such as NFSP and PSRO) in RLZoo and add relative experiments in RLExperiment. Following is my current progress and this week's plan.

Current Progress

I have implemented the Neural Fictitious Self-play(NFSP) algorithm roughly and test it on the Kuhn Poker game(KuhnPokerEnv). See my project for more details.

This Week's Plan

Use nash_conv to evaluate the experiment:

I should add some prob function in QBasedpolicy for some specific stages. For now, nash_conv is not available for QBasedpolicy.
Organize current codes and add them to the corresponding directories in RL.jl.

findmyway · 2021-07-06T15:48:05Z

findmyway
Jul 6, 2021
Maintainer

Besides, as we mentioned before, the julia version of KuhnPokerEnv may need to implement the clone method to calculate nash_conv

1 reply

peterchen96 Jul 7, 2021
Collaborator Author

There is a copy method for AbstractEnv. Maybe add Base.:(==) and Base.hash methods for AbstractEnv is enough, which just like in OpenSpielEnv.

peterchen96 · 2021-07-12T03:30:55Z

peterchen96
Jul 12, 2021
Collaborator Author

For now, nash_conv can be used in the kuhn_nfsp experiment, although its result looks bad.

So in this week, I would do the following things:

Check the process of kuhn_nfsp to correct the result of the experiment. For more details about the experiment see https://github.com/peterchen96/nfsp_demo.
Submit nfsp algorithm and kuhn_nfsp experiment to ReinforcementLearning.jl step by step.
Read papers about overview for cooperative Multi-agent Reinforcement Learning, such as https://arxiv.org/abs/1908.03963.

3 replies

findmyway Jul 12, 2021
Maintainer

You'd better create a draft PR so that I can offer suggestions from my side early, otherwise, things may out of control very quickly.

findmyway Jul 12, 2021
Maintainer

And I can also help to diagnose the possible problems.

peterchen96 Jul 12, 2021
Collaborator Author

Ok, I'll create a PR later today.

peterchen96 · 2021-07-12T13:55:35Z

peterchen96
Jul 12, 2021
Collaborator Author

Tasking List for adding Neural Fictitious Self-play(nfsp) algorithm and kuhn_nfsp experiment

NFSPAgent's structure:

rl_agent:

Agent(
    policy = QBasedPolicy(
        learner = DQNLearner,
        explorer = EpsilonGreedyExplorer,
    ),
    trajectory = CircularArraySARTTrajectory
)

where rl_agent(use DQN as an example) works to search for the best response from the self-play process.

sl_agent:

Agent(
    policy = BehaviorCloningPolicy,
    trajectory = ReservoirTrajectory,
)

where sl_agent learns the best response for the state from rl_agent's policy.

Task list

~~add AverageLearner in RLZoo~~ use BehaviorCloingPolicy as sl_agent for now
add nfsp and its relative operations
add kuhn_nfsp experiment

New progress is in #402.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OSPP-Week2] Check the `kuhn_nfsp` experiment and read papers #342

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

[OSPP-Week2] Check the kuhn_nfsp experiment and read papers #342

peterchen96 Jul 6, 2021 Collaborator

Current Progress

This Week's Plan

Replies: 3 comments · 4 replies

findmyway Jul 6, 2021 Maintainer

peterchen96 Jul 7, 2021 Collaborator Author

peterchen96 Jul 12, 2021 Collaborator Author

findmyway Jul 12, 2021 Maintainer

findmyway Jul 12, 2021 Maintainer

peterchen96 Jul 12, 2021 Collaborator Author

peterchen96 Jul 12, 2021 Collaborator Author

NFSPAgent's structure:

Task list

[OSPP-Week2] Check the `kuhn_nfsp` experiment and read papers #342

peterchen96
Jul 6, 2021
Collaborator

Replies: 3 comments 4 replies

findmyway
Jul 6, 2021
Maintainer

peterchen96 Jul 7, 2021
Collaborator Author

peterchen96
Jul 12, 2021
Collaborator Author

findmyway Jul 12, 2021
Maintainer

findmyway Jul 12, 2021
Maintainer

peterchen96 Jul 12, 2021
Collaborator Author

peterchen96
Jul 12, 2021
Collaborator Author