Available algorithms

: thoroughly-tested. In many cases, we verified against known values and/or reproduced results from papers.

~: implemented but lightly tested.

X: known problems; please see github issues.

Algorithms	Category	Reference	Status
Information Set Monte Carlo Tree Search (IS-MCTS)	Search	Cowley et al. '12	~
Minimax (and Alpha-Beta) Search	Search	Wikipedia1, Wikipedia2, Knuth and Moore '75
Monte Carlo Tree Search	Search	Wikipedia, UCT paper, Coulom '06, Cowling et al. survey
Lemke-Howson (via `nashpy`)	Opt.	Wikipedia, Shoham & Leyton-Brown '09
ADIDAS	Opt.	Gemp et al '22	~
Sequence-form linear programming	Opt.	Koller, Megiddo, and von Stengel '94, Shoham & Leyton-Brown '09
Counterfactual Regret Minimization (CFR)	Tabular	Zinkevich et al '08, Neller & Lanctot '13
CFR against a best responder (CFR-BR)	Tabular	Johanson et al '12
Exploitability / Best response	Tabular	Shoham & Leyton-Brown '09
External sampling Monte Carlo CFR	Tabular	Lanctot et al. '09, Lanctot '13
Fixed Strategy Iteration CFR (FSICFR)	Tabular	Neller & Hnath '11	~
Mean-field Ficticious Play for MFG	Tabular	Perrin et. al. '20	~
Online Mirror Descent for MFG	Tabular	Perolat et. al. '21	~
Outcome sampling Monte Carlo CFR	Tabular	Lanctot et al. '09, Lanctot '13
Q-learning	Tabular	Sutton & Barto '18
SARSA	Tabular	Sutton & Barto '18
Policy Iteration	Tabular	Sutton & Barto '18
Restricted Nash Response (RNR)	Tabular	Johanson et al '08	~
Value Iteration	Tabular	Sutton & Barto '18
Advantage Actor-Critic (A2C)	RL	Mnih et al. '16
Deep Q-networks (DQN)	RL	Mnih et al. '15
Ephemeral Value Adjustments (EVA)	RL	Hansen et al. '18	~
AlphaZero (C++/LibTorch)	MARL	Silver et al. '18
AlphaZero (Python/TF)	MARL	Silver et al. '18
Deep CFR	MARL	Brown et al. '18
Exploitability Descent (ED)	MARL	Lockhart et al. '19
(Extensive-form) Fictitious Play (XFP)	MARL	Heinrich, Lanctot, & Silver '15
Neural Fictitious Self-Play (NFSP)	MARL	Heinrich & Silver '16
Neural Replicator Dynamics (NeuRD)	MARL	Omidshafiei, Hennes, Morrill, et al. '19	X
Regret Policy Gradients (RPG, RMPG)	MARL	Srinivasan, Lanctot, et al. '18
Policy-Space Response Oracles (PSRO)	MARL	Lanctot et al. '17
Q-based ("all-actions") Policy Gradient (QPG)	MARL	Srinivasan, Lanctot, et al. '18
Regression CFR (RCFR)	MARL	Waugh et al. '15, Morrill '16
Rectified Nash Response (PSRO_rn)	MARL	Balduzzi et al. '19	~
α-Rank	Eval. / Viz.	Omidhsafiei et al. '19, arXiv
Replicator / Evolutionary Dynamics	Eval. / Viz.	Hofbaeur & Sigmund '98, Sandholm '10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

algorithms.md

algorithms.md

Available algorithms

Files

algorithms.md

Latest commit

History

algorithms.md

File metadata and controls

Available algorithms