An experiment in building an in-game win probability model for tennis matches. Uses XGBoost.
Women's model
Men's model
Women's model
Men's model
Data is from the Match Charting Project.
Run on the command line:
$ R --no-save < tennis-win-probability.R
Data cleanup tasks to do:
- Records are numbered by point
Pts
(approximately 100 per match) Set1
andSet2
are sets won by player 1 or two- Same for
Gm1
andGm2
- The model uses points, games, and sets
- The identity of the player serving the ball is not currently included in the model
- Add estimated points (EPA) for potentially even greater accuracy