: thoroughly-tested. In many cases, we verified against known values and/or reproduced results from papers.
~: implemented but lightly tested.
X: known problems; please see github issues.
Algorithms | Category | Reference | Status |
---|---|---|---|
Information Set Monte Carlo Tree Search (IS-MCTS) | Search | Cowley et al. '12 | ~ |
Minimax (and Alpha-Beta) Search | Search | Wikipedia1, Wikipedia2, Knuth and Moore '75 | |
Monte Carlo Tree Search | Search | Wikipedia, UCT paper, Coulom '06, Cowling et al. survey | |
Lemke-Howson (via nashpy) | Opt. | Wikipedia, Shoham & Leyton-Brown '09 | |
ADIDAS | Opt. | Gemp et al '22 | ~ |
Sequence-form linear programming | Opt. | Koller, Megiddo, and von Stengel '94, Shoham & Leyton-Brown '09 |
|
Counterfactual Regret Minimization (CFR) | Tabular | Zinkevich et al '08, Neller & Lanctot '13 | |
CFR against a best responder (CFR-BR) | Tabular | Johanson et al '12 | |
Exploitability / Best response | Tabular | Shoham & Leyton-Brown '09 | |
External sampling Monte Carlo CFR | Tabular | Lanctot et al. '09, Lanctot '13 | |
Fixed Strategy Iteration CFR (FSICFR) | Tabular | Neller & Hnath '11 | ~ |
Mean-field Ficticious Play for MFG | Tabular | Perrin et. al. '20 | ~ |
Online Mirror Descent for MFG | Tabular | Perolat et. al. '21 | ~ |
Outcome sampling Monte Carlo CFR | Tabular | Lanctot et al. '09, Lanctot '13 | |
Q-learning | Tabular | Sutton & Barto '18 | |
SARSA | Tabular | Sutton & Barto '18 | |
Policy Iteration | Tabular | Sutton & Barto '18 | |
Restricted Nash Response (RNR) | Tabular | Johanson et al '08 | ~ |
Value Iteration | Tabular | Sutton & Barto '18 | |
Advantage Actor-Critic (A2C) | RL | Mnih et al. '16 | |
Deep Q-networks (DQN) | RL | Mnih et al. '15 | |
Ephemeral Value Adjustments (EVA) | RL | Hansen et al. '18 | ~ |
AlphaZero (C++/LibTorch) | MARL | Silver et al. '18 | |
AlphaZero (Python/TF) | MARL | Silver et al. '18 | |
Deep CFR | MARL | Brown et al. '18 | |
Exploitability Descent (ED) | MARL | Lockhart et al. '19 | |
(Extensive-form) Fictitious Play (XFP) | MARL | Heinrich, Lanctot, & Silver '15 | |
Neural Fictitious Self-Play (NFSP) | MARL | Heinrich & Silver '16 | |
Neural Replicator Dynamics (NeuRD) | MARL | Omidshafiei, Hennes, Morrill, et al. '19 | X |
Regret Policy Gradients (RPG, RMPG) | MARL | Srinivasan, Lanctot, et al. '18 | |
Policy-Space Response Oracles (PSRO) | MARL | Lanctot et al. '17 | |
Q-based ("all-actions") Policy Gradient (QPG) | MARL | Srinivasan, Lanctot, et al. '18 | |
Regression CFR (RCFR) | MARL | Waugh et al. '15, Morrill '16 | |
Rectified Nash Response (PSRO_rn) | MARL | Balduzzi et al. '19 | ~ |
α-Rank | Eval. / Viz. | Omidhsafiei et al. '19, arXiv | |
Replicator / Evolutionary Dynamics | Eval. / Viz. | Hofbaeur & Sigmund '98, Sandholm '10 |