Skip to content
/ CFVFP Public

This repository is the accompanying code for the paper CFVFP. This paper presents a new algorithm for solving incomplete information games - CFVFP, which has better results than MCCFR in some games. The paper has been accepted by NeurIPS 2024.

Notifications You must be signed in to change notification settings

Zealoter/CFVFP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MCCFVFP

OverView

This is the CFVFP method corresponding to the paper titled "Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play". This paper has been accepted by NeurIPS 2024.

This method is employed to address large-scale incomplete information game problems. The core concept of this approach is to use Q values instead of regret values in CFR. In the next iteration, directly utilize the strategy with the max Q-value action as the current strategy. This not only sidesteps a great deal of cumbersome calculations related to regret values but also can prevent the selection of dominated strategies, thereby accelerating the convergence of game algorithms.

For the experiments, we have adopted common benchmark experimental environments such as Kuhn and Leduc. Additionally, our algorithm has been successfully applied in Texas Hold'em. https://gtoking.com is a commercial solver based on MCCFVFP. If necessary, you can try it out.

Instructions

game_name = 'Leduc'
is_show_policy = False
prior_state_num = 3
y_pot = 3
z_len = 3

The above code is used to set the experimental configuration, where 'game_name' determines the type of game to be trained. This can be referenced from the code below or from the 'Game_Sampling'.

The 'prior_state_num' is used to set the scale of the game, and there are several ways to understand it: For 'Kuhn', 'Leduc', 'KuhnNPot', 'Leduc3Pot', and 'Leduc5Pot', it can be understood as the number of cards set in the game. For 'Goofspiel', it can be understood as the number of cards in each player’s hand. For 'PAM', it can be understood as the step length to terminate the game.

The 'y_pot''z_len' is only for when the 'game_name' is 'KuhnNPot'.

train_mode = 'fix_itr'
log_interval_mode = 'itr'
log_mode = 'exponential'

The 'train_mode' is set to the mode used for training: For "train_mode = 'fix_itr'"it means fix number of training rounds. For "train_mode = 'node_touched'" it means fix number of nodes passed during training. For "train_mode = 'tran_time'"it means fix training time.

The 'log_interval_mode' is set to the mode used for recording results.

The 'log_mode' is set to the interval for recording results: For'exponential' records in exponential form. Fpr 'normal' records in arithmetic form.

total_train_constraint = 1000000
log_interval = 1.5
nun_of_train_repetitions = 3
n_jobs = 1  

The setting of 'total_train_constraint' depends on the selected 'train_mode': For "train_mode = 'tran_time'" then the 'total_train_constraint=100' means that each method is trained for 100s. For "train_mode = 'node_touched'" means that each method is trained to pass a set number of nodes.

The 'log_interval' is used to set the recording of training results.

The 'nun_of_train_repetitions' is used to set how many times a training is repeated.

The 'n_jobs' is for parallel training, generally 1 for your own compute.

After setting the above parameters, you can start running the experiment you want.

About

This repository is the accompanying code for the paper CFVFP. This paper presents a new algorithm for solving incomplete information games - CFVFP, which has better results than MCCFR in some games. The paper has been accepted by NeurIPS 2024.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages