Tencent Kaiwu Arena

This is the final project of CS3316(Reinforcement Learning) in Shanghai Jiao Tong University.

Team Members

Yongshan Chen: Guided the research process, proposed the improvemnets on PPO algorithm and implemented the PSRO algorithm.
Lai Jiang: Write the Abstract, Introduction and Conclusion part of the paper. Propose and polish the structure of the paper. Propose the possible experiments.
Yuhao Wang: Checked feasibility of truly PPO mechanism and implemented the PPO-RB part in codes. Finished the truly PPO part in introduction, related work, methods and conclusion section.
Linhao Zhong: Run the experiment, refine the parameter and evaluate the model. Write the Section 4.2, 4.3 and part of introduction.
Binglin Zhou: Run the experiment, refine the parameter and analysis the evaluation result. Write the Section 4.1, 4.4.

First, you should upload the code to the Kaiwu Arena platform. Then, you can run the experiment by just running the following command:

python3 train_test.py

We would like to thank the course instructor, Prof. Weinan Zhang, TA Xialin He and Kaiwu Arena for providing the platform for this project.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
algorithm		algorithm
conf		conf
environment		environment
framework		framework
sample_processor		sample_processor
state_action_reward		state_action_reward
thirdparty/model_pool_go		thirdparty/model_pool_go
tools		tools
.gitignore		.gitignore
README.md		README.md
train_test.py		train_test.py