This is the official repository of the ICLR 2025 paper Strength Estimation and Human-Like Strength Adjustment in Games.
If you use this work for research, please consider citing our paper as follows:
title={Strength Estimation and Human-Like Strength Adjustment in Games},
author={Chen, Chun-Jung and Shih, Chung-Chin and Wu, Ti-Rong},
booktitle={International Conference on Learning Representations},
This repository is built upon MiniZero. We add chess game from LeelaChessZero. The following instructions are prepared for reproducing the main experiments in the paper.
The program requires a Linux operating system with a container installed and at least one NVIDIA GPU to operate.
Clone this repository with the required submodules:
git clone --recursive [email protected]:rlglab/strength-estimator.git
cd strength-estimator
Enter the container to build the required executables:
# start the container
# run the below commands to build programs inside the container
./scripts/ go # for Go
./scripts/ chess # for chess
We provide the downloading and preprocessing of game records used in the paper. You are welcome to use game records from other sources as long as they follow the format described below.
Visit the Fox Weiqi website to download the FoxWeiqi application. Then, install and log in to your account and select the games you want to download and save them as .sgf
# create a directory named training_sgf_go
mkdir training_sgf_go
# arrange .sgf files in the following structure:
├── 3-5k.sgf
├── 1-2k.sgf
│ ...
└── 9d.sgf
Each game records must be in one line and in the form of Smart Game Format and contain tags BR
and WR
in English (e.g. if you download the game records from FoxWeiqi, you need to first transfer "級" into "k" and "段" into "d") to indicate the ranks of both players. To reproduce the main experiment in the paper, you should prepapre 11 ranks (3-5k, 1-2k, 1d, 2d, ... , 9d) of game records in directory.
Visit Lichess and download the .pgn.zst file.
# create a directory named download_chess_game with subfolders for 2023 and 2024 games
mkdir download_chess_game
mkdir download_chess_game/database2023
mkdir download_chess_game/database2024
# extract the .pgn.zst Files and arrange the extracted .pgn files in the following structure:
├── database2024/
│ ├── lichess_db_standard_rated_2024-01.pgn
│ └── lichess_db_standard_rated_2024-02.pgn
└── database2023/
├── lichess_db_standard_rated_2023-09.pgn
├── ...
└── lichess_db_standard_rated_2023-12.pgn
# run the script to preprocess game records
# you will obtain the following directory structure in current directory after executing the above command
├── 1000_1200.txt
│ ...
├── 2400_2600.txt
├── 1000_1200.txt
│ ...
├── 2400_2600.txt
├── 1000_1200.txt
│ ...
└── 2400_2600.txt
To reproduce the main experiment in the paper, you should prepapre 8 ranks (elo 1000-1199, 1200-1399, ... , and 2400-2599) of game records.
To reproduce the strength estimator models of
# For Go
./scripts/ go cfg/sl_go.cfg # Train SL
./scripts/ go cfg/se_go.cfg # Train SE
./scripts/ go cfg/se_infty_go.cfg # Train SE_∞
# For chess
./scripts/ chess cfg/sl_chess.cfg # Train SL
./scripts/ chess cfg/se_chess.cfg # Train SE
./scripts/ chess cfg/se_infty_chess.cfg # Train SE_∞
You will obtain a folder with the following structure:
# The following is an example for training SE_∞ in Go
├── go_19x19_bt_b32_r12_p7_20bx256-2a0d91.cfg # configuration file
├── model/ # model snapshots
│ ├── weight_iter_*.pkl # include training step, parameters, optimizer, sc
│ └── weight_iter_*.pt # model parameters only (use for testing)
└── op.log # the optimization log
You are welcome to adjust training parameters in configuration files.
learner_batch_size=2688 # total taining positions for one training step, it must equal to bt_num_batch_size * bt_num_rank_per_batch * bt_num_position_per_rank
learner_learning_rate=0.01 # hyperparameter for initial learning rate
nn_num_blocks=20 # hyperparameter for the model; the number of the residual blocks
bt_num_batch_size=32 # numbers of training batch for one training step
bt_num_rank_per_batch=12 # the numbers of ranks chosen in one training batch
bt_num_position_per_rank=7 # the number of positions chosen for each rank in one training batch
To reproduce the experiments for section 4.4, adjust the training_sgf_go/
directory to contain only game records with 1D and 9D for the 2-rank dataset, and 1D, 5D, and 9D for the 3-rank dataset. Then, modify the following configurations in cfg/se_infty_go.cfg
and use the same training command to train
# For 2-rank dataset
# For 3-rank dataset
# training command
./scripts/ go cfg/se_infty_go.cfg
To reproduce the paper result, collect at least 100 games of each rank for candidate, i.e. total 100
# create two directories for candidate and query dataset
mkdir candidate_sgf_go
mkdir query_sgf_go
# arrange .sgf files in the same structure with training:
mv [your_go_game_candidate_dataset.sgf] candidate_sgf_go/
mv [your_go_game_query_dataset.sgf] query_sgf_go/
Each game records must be in one line and in the form of Smart Game Format and contain tags BR
and WR
in English (e.g. if you download the game records from FoxWeiqi, you need to first transfer "級" into "k" and "段" into "d") to indicate the ranks of both players. To reproduce the main experiment in the paper, you should prepapre 11 ranks (3-5k, 1-2k, 1d, 2d, ... , 9d) of game records in directory.
You can skip this step if you use the method mentioned in preprocess.
Run the following commands:
# For Go (in subsection 4.2)
./build/go/strength_go -conf_file cfg/se_go.cfg -mode evaluator # SE
./build/go/strength_go -conf_file cfg/se_infty_go.cfg -mode evaluator # SE_∞
./build/go/strength_go -conf_file cfg/sl_sum_go.cfg -mode evaluator # SL_sum
./build/go/strength_go -conf_file cfg/sl_vote_go.cfg -mode evaluator # SL_vote
# For Go (in subsection 4.4)
./build/go/strength_go -conf_file cfg/se_infty_go_2_rank.cfg -mode evaluator # 2 rank
./build/go/strength_go -conf_file cfg/se_infty_go_3_rank.cfg -mode evaluator # 3 rank
# For chess (in subsection 4.5)
./build/chess/strength_chess -conf_file cfg/se_chess.cfg -mode evaluator # SE
./build/chess/strength_chess -conf_file cfg/se_infty_chess.cfg -mode evaluator # SE_∞
./build/chess/strength_chess -conf_file cfg/sl_sum_chess.cfg -mode evaluator # SL_sum
./build/chess/strength_chess -conf_file cfg/sl_vote_chess.cfg -mode evaluator # SL_vote
After running the above commands, the program will output a table like the following:
-1 0 1 2 3 4 5 6 7 8 9 all
1 0.525 0.365 0.485 0.255 0.22 0.225 0.38 0.385 0.365 0.235 0.66 0.372727
2 0.545 0.46 0.58 0.505 0.365 0.37 0.34 0.415 0.445 0.39 0.69 0.464091
100 0.915 1 1 1 0.995 0.965 1 0.995 1 0.945 1 0.983182
- Row: Each row represents the results obtained when querying a specific number of games.
- Column:
: The evaluation results for different ranks (-1 corresponds to$r_{11}$ , 0 to$r_{10}$ , ..., 9 to$r_1$ ).all
: The average evaluation score across all ranks.
If you want to use a specific model snapshot, modify the value of nn_file_name
in the configure file to the model path. e.g., nn_file_name=./go_19x19_bt_b32_r12_p7_20bx256-2a0d91/model/weight_iter_*.pt
For candidate_sgf_dir
in configure file, i.e. to adjust to the desired ranking, adjust the setting of the configure file.
# adjust to 4d in Go
# adjust to 1400-1599 elo rating in chess
For actor_select_action_softmax_temperature
in configuration file cfg/sa_go_mcts.cfg
# adjust to 4d in Go
To evaluate the move accuracy of the specific rank, adjust the setting testing_sgf_dir
of the configure file.
# evaluate the move accuracy of 4d in Go
# evaluate the move accuracy of elo 1400-1599 in chess
Commands for reproducing experiments:
# For Go (in subsection 4.3)
./build/go/strength_go -conf_file cfg/se_go_mcts.cfg -mode mcts_acc # SE-MCTS
./build/go/strength_go -conf_file cfg/se_infty_go_mcts.cfg -mode mcts_acc # SE∞-MCTS
./build/go/strength_go -conf_file cfg/sa_go_mcts.cfg -mode mcts_acc # SA-MCTS
# For chess (in subsection 4.5)
./build/chess/strength_chess -conf_file cfg/se_chess_mcts.cfg -mode mcts_acc # SE-MCTS
./build/chess/strength_chess -conf_file cfg/se_infty_chess_mcts.cfg -mode mcts_acc # SE∞-MCTS
./build/chess/strength_chess -conf_file cfg/sa_chess_mcts.cfg -mode mcts_acc # SA-MCTS
The program will output accuracy results for different simulation counts and various
simulation: 1, mcts accuracy: 4% (2/50), ssa_-2 accuracy: 4% (2/50), ssa_-1 accuracy: 4% (2/50), ...
simulation: 2, mcts accuracy: 4% (2/50), ssa_-2 accuracy: 18% (9/50), ssa_-1 accuracy: 32% (16/50), ...
simulation: 3, mcts accuracy: 4% (2/50), ssa_-2 accuracy: 24% (12/50), ssa_-1 accuracy: 18% (9/50), ...
- Simulation: The number of simulation counts used in MCTS.
- MCTS Accuracy: The accuracy of the search-based result without applying SA-MCTS. The accuracy will be
$\texttt{SE-MCTS}$ accuracy when using the$\texttt{SE}$ model, and it will be MCTS accuracy when using SA-MCTS. - SSA Accuracy: The accuracy when using SA-MCTS with different
$z$ values.