diff --git a/NL_eredivisie_2014_2019.R b/NL_eredivisie_2014_2019.R index 8f7f6d1..3d49084 100644 --- a/NL_eredivisie_2014_2019.R +++ b/NL_eredivisie_2014_2019.R @@ -17,7 +17,7 @@ col_name <- function(name, ...) { paste0(name, "[", paste(..., sep=",") , "]") } -## ----raw_data, cache = 2, cache.extra = file.info(list.files(path = "test", pattern = ".csv", full.names = TRUE))---- +## ----raw_data, cache = 2, dependson = "scan_data_dir"-------------------- from_year <- 2014 to_year <- 2019 source(paste0("functions/Import_Data_Eredivisie.R")) diff --git a/NL_eredivisie_2014_2019.html b/NL_eredivisie_2014_2019.html index 1e8bd18..38dff0d 100644 --- a/NL_eredivisie_2014_2019.html +++ b/NL_eredivisie_2014_2019.html @@ -180,7 +180,7 @@

Bayesian estimation of a Poisson model for Dutch football matches odds

Piet Stam

-

July 14th, 2019

+

July 20th, 2019

@@ -342,18 +342,18 @@

Model estimation

ms3 <- as.matrix(s3)

The following graphs shows the trace plots and probability distributions of the team mean, team sigma and season sigma parameters, respectively.

plot(s3[, "group_skill"])
-

+

plot(s3[, "group_sigma"])
-

+

plot(s3[, "season_sigma"])
-

+

We can also calculate the default home advantage by looking at the difference between exp(home_baseline) - exp(away_baseline). The next graph shows that there is a home advantage of more than 0.4 goals, on average, and it differs significantly from zero.

plotPost(exp(ms3[,col_name("home_baseline",to_year-from_year)]) - exp(ms3[,col_name("away_baseline",to_year-from_year)]), compVal = 0, xlab = "Home advantage in number of goals")
-

+

##                                        mean    median      mode hdiMass
-## Home advantage in number of goals 0.4288232 0.4248015 0.4162575    0.95
+## Home advantage in number of goals 0.4283805 0.4249969 0.4252382    0.95
 ##                                      hdiLow   hdiHigh compVal pcGTcompVal
-## Home advantage in number of goals 0.2880692 0.6003385       0           1
+## Home advantage in number of goals 0.2777361 0.5844823       0           1
 ##                                   ROPElow ROPEhigh pcInROPE
 ## Home advantage in number of goals      NA       NA       NA

Return to table of contents

@@ -408,24 +408,24 @@

Model validation

hist(m3_pred[ , "mode_home_goal"], breaks= (-1:max(m3_pred[ , "mode_home_goal"])) + 0.5, xlim=c(-0.5, 10),
     main = "Distribution of predicted most \nprobable score by a home team in\na match",
     xlab = "")
-

+

For almost all games the single most likely number of goals is one. Actually, if you know nothing about an Eredivisie game, betting on one goal for the home team is 78 % of the times the best bet.

Let's instead look at the distribution of the predicted mean number of home goals in each game.

hist(m3_pred[ , "mean_home_goal"], breaks= (-1:max(m3_pred[ , "mean_home_goal"])) + 0.5, xlim=c(-0.5, 10),
     main = "Distribution of predicted mean \n score by a home team in a match",
     xlab = "")
-

+

For most games the expected number of goals are 2. That is, even if your safest bet is one goal you would expect to see around two goals.

The distribution of the mode and the mean number of goals doesn’t look remotely like the actual number of goals. This was not to be expected, we would however expect the distribution of randomized goals (where for each match the number of goals has been randomly drawn from that match’s predicted home goal distribution) to look similar to the actual number of home goals. Looking at the histogram below, this seems to be the case.

hist(m3_pred[ , "rand_home_goal"], breaks= (-1:max(m3_pred[ , "rand_home_goal"])) + 0.5, xlim=c(-0.5, 10),
     main = "Distribution of randomly drawn \n score by a home team in a match",
     xlab = "")
-

+

We can also look at how well the model predicts the data. This should probably be done using cross validation, but as the number of effective parameters are much smaller than the number of data points a direct comparison should at least give an estimated prediction accuracy in the right ballpark.

mean(eredivisie$HomeGoals == m3_pred[ , "mode_home_goal"], na.rm=T)
-
## [1] 0.3150232
+
## [1] 0.3141534
mean((eredivisie$HomeGoals - m3_pred[ , "mean_home_goal"])^2, na.rm=T)
-
## [1] 1.509597
+
## [1] 1.509738

So on average the model predicts the correct number of home goals 31% of the time and guesses the average number of goals with a mean squared error of 1.51. Now we’ll look at the actual and predicted match outcomes. The graph below shows the match outcomes in the data with 1 being a home win, 0 being a draw and -1 being a win for the away team.

hist(eredivisie$MatchResult, breaks= (-2:1) + 0.5, xlim=c(-1.5, 1.5), ylim=c(0, 1000), main = "Actual match results",
     xlab = "")
@@ -433,14 +433,14 @@

Model validation

Now looking at the most probable outcomes of the matches according to the model.

hist(m3_pred[ , "match_result"], breaks= (-2:1) + 0.5, xlim=c(-1.5, 1.5), ylim=c(0, 1000), main = "Predicted match results",
     xlab = "")
-

+

For almost all matches the safest bet is to bet on the home team. While draws are not uncommon it is never the safest bet.

As in the case with the number of home goals, the randomized match outcomes have a distribution similar to the actual match outcomes:

hist(m3_pred[ , "rand_match_result"], breaks= (-2:1) + 0.5, xlim=c(-1.5, 1.5), ylim=c(0, 1000), main = "Randomized match results",
     xlab = "")
-

+

mean(eredivisie$MatchResult == m3_pred[ , "match_result"], na.rm=T)
-
## [1] 0.563865
+
## [1] 0.5661376

The model predicts the correct match outcome (i.e. home team wins / a draw / away team wins) 57% of the time. Pretty good!

Return to table of contents

@@ -458,15 +458,15 @@

The ranking of the teams

team_skill <- team_skill[,order(colMeans(team_skill), decreasing=T)] old_par <- par(mar=c(2,0.7,0.7,0.7), xaxs='i') caterplot(team_skill, labels.loc="above", val.lim=c(0.7, 3.8)) -

+

par(old_par)

Two teams are clearly ahead of the rest, Ajax and PSV. Let's look at the credible difference between these two teams. Ajax is a better team than PSV with a probabilty of 74%, i.e. the odds in favor of Ajax are 74% / 26% = 3. So, on average, PSV only wins one out of four games that they play against Ajax.

plotPost(team_skill[, "Ajax"] - team_skill[, "PSV Eindhoven"], compVal = 0, xlab = "<- PSV     vs     Ajax ->")
-

-
##                                mean    median      mode hdiMass     hdiLow
-## <- PSV     vs     Ajax -> 0.1616095 0.1535391 0.1467396    0.95 -0.3586123
+

+
##                                mean    median     mode hdiMass     hdiLow
+## <- PSV     vs     Ajax -> 0.1637752 0.1529581 0.139265    0.95 -0.3411534
 ##                             hdiHigh compVal pcGTcompVal ROPElow ROPEhigh
-## <- PSV     vs     Ajax -> 0.6842824       0      0.7312      NA       NA
+## <- PSV     vs     Ajax -> 0.6727083       0   0.7360667      NA       NA
 ##                           pcInROPE
 ## <- PSV     vs     Ajax ->       NA

Return to table of contents

@@ -487,7 +487,7 @@

Predicting the future

rownames(eredivisie_forecast) <- NULL print(xtable(eredivisie_forecast, align="cccccccccc"), type="html")
- +
@@ -569,10 +569,10 @@

Predicting the future

PSV Eindhoven
-1.30 +1.20 -1.90 +1.80 1.00 @@ -674,7 +674,7 @@

Predicting the future

2.00
-0.00 +1.00 Heerenveen @@ -866,7 +866,7 @@

Predicting the future

0.00
-2.00 +3.00 Ajax @@ -895,7 +895,7 @@

Predicting the future

1.20
-2.00 +1.00 1.00 @@ -924,7 +924,7 @@

Predicting the future

1.40
-1.70 +1.60 1.00 @@ -959,7 +959,7 @@

Predicting the future

0.70
-2.00 +3.00 0.00 @@ -1052,10 +1052,10 @@

Predicting the future

1.00
-2.30 +2.20 -1.00 +0.00 2.00 @@ -1084,7 +1084,7 @@

Predicting the future

1.30
-1.70 +1.80 1.00 @@ -1107,7 +1107,7 @@

Predicting the future

rownames(eredivisie_sim) <- NULL print(xtable(eredivisie_sim, align="cccccccc"), type="html") - + @@ -1177,13 +1177,13 @@

Predicting the future

PSV Eindhoven @@ -1203,7 +1203,7 @@

Predicting the future

For Sittard @@ -1281,10 +1281,10 @@

Predicting the future

Graafschap @@ -1333,10 +1333,10 @@

Predicting the future

VVV Venlo @@ -1385,10 +1385,10 @@

Predicting the future

Willem II @@ -1437,7 +1437,7 @@

Predicting the future

Heerenveen @@ -1489,13 +1489,13 @@

Predicting the future

Heracles @@ -1515,10 +1515,10 @@

Predicting the future

Groningen @@ -1593,7 +1593,7 @@

Predicting the future

Vitesse
@@ -1151,13 +1151,13 @@

Predicting the future

Utrecht
-1.00 +2.00 1.00 -Draw +Ajax
-0.00 +2.00 -2.00 +1.00 -PSV Eindhoven +AZ Alkmaar
-3.00 +2.00 1.00 @@ -1229,10 +1229,10 @@

Predicting the future

Den Haag
-4.00 +3.00 -1.00 +2.00 Feyenoord @@ -1258,10 +1258,10 @@

Predicting the future

3.00
-2.00 +4.00 -Heerenveen +NAC Breda
-4.00 +3.00 -3.00 +1.00 Vitesse @@ -1307,13 +1307,13 @@

Predicting the future

FC Emmen
-2.00 +4.00 2.00 -Draw +Willem II
-4.00 +5.00 -2.00 +0.00 Zwolle @@ -1359,13 +1359,13 @@

Predicting the future

Excelsior
-4.00 +1.00 -0.00 +1.00 -Heracles +Draw
-1.00 +0.00 -1.00 +0.00 Draw @@ -1411,13 +1411,13 @@

Predicting the future

Ajax
-0.00 +3.00 -1.00 +2.00 -Ajax +Graafschap
-5.00 +3.00 1.00 @@ -1463,13 +1463,13 @@

Predicting the future

Zwolle
-2.00 +0.00 -1.00 +3.00 -NAC Breda +Zwolle
-2.00 +1.00 -2.00 +0.00 -Draw +PSV Eindhoven
-0.00 +1.00 -2.00 +3.00 Groningen @@ -1567,13 +1567,13 @@

Predicting the future

Feyenoord
-0.00 +2.00 -3.00 +1.00 -Feyenoord +For Sittard
-3.00 +1.00 0.00 @@ -1637,7 +1637,7 @@

Betting on the match outcome

The fourth graph shows the probability distribution of a PSV win (‘away_win’), a draw (‘equal’) and AZ win (‘home_win’). This graph underlines that a PSV win is a likely scenario: it has a probability of more than 50%. The fact that the balance topples in favor of PSV should then be due to the one goal difference that is attributed a great chance according to the third graph. Note, however, that the probability that PSV will not turn out as the match winner (i.e. a draw or a loss) is still almost 50%.

old_par <- par(mfrow = c(2, 2), mar=rep(2.2, 4))
 plot_goals(home_goals, away_goals)
-

+

par(old_par)

At May 10th, that is just before the start of competition round 33, you got the following payouts (that is, how much would I get back if my bet was successful) for betting on the outcome of this game, after 288 bets being placed on the betting site William Hill

@@ -1659,7 +1659,7 @@

Betting on the match outcome

Using my simulated distribution of the number of goals I can calculate the predicted payouts of the model. It appears that the payouts of the model are very close to the payouts that William Hill offers.

1 / c(AZ =  mean(home_goals > away_goals), Draw = mean(home_goals == away_goals), PSV = mean(home_goals < away_goals))
##       AZ     Draw      PSV 
-## 3.928759 4.332756 1.943005
+## 3.839263 4.393673 1.953379

The most likely result is 1 - 1 with a predicted payout of 9.70, which can be compared to the William Hill payout of 7.50 for this bet. Thus, William Hill thinks that a 1 - 1 draw is even likier than our model predicts. If we want to earn some extra money, we should bet on a 1 - 0 win for AZ, as the William Hill payout is 19 and our model predicts 17.50.

Return to table of contents

@@ -1732,7 +1732,7 @@

Betting on the correct score

goals_payout <- round(goals_payout, 1) print(xtable(goals_payout, align="cccccccc"), type="html") - +
@@ -1790,25 +1790,25 @@

Betting on the correct score

AZ Alkmaar - 1 @@ -1816,25 +1816,25 @@

Betting on the correct score

AZ Alkmaar - 2 @@ -1842,22 +1842,22 @@

Betting on the correct score

AZ Alkmaar - 3 @@ -1894,25 +1894,25 @@

Betting on the correct score

AZ Alkmaar - 5 @@ -1920,25 +1920,25 @@

Betting on the correct score

AZ Alkmaar - 6
@@ -1764,25 +1764,25 @@

Betting on the correct score

AZ Alkmaar - 0
-20.70 +21.00 -12.20 +11.80 -13.00 +13.20 -21.20 +19.80 -45.20 +43.60 -120.00 +122.00 -306.10 +468.80
-18.30 +17.30 -9.50 +9.60 -10.40 +10.50 -16.90 +16.70 -36.30 +39.50 -94.30 +116.30 -238.10 +288.50
-29.60 +27.00 -15.90 +15.60 -16.30 +17.10 -27.10 +29.10 -54.50 +54.30 -168.50 +147.10 -625.00 +517.20
-67.00 +71.10 -36.10 +36.00 -38.80 +40.10 -70.80 +69.80 -153.10 +170.50 -441.20 +384.60 1363.60 @@ -1868,25 +1868,25 @@

Betting on the correct score

AZ Alkmaar - 4
-208.30 +220.60 112.80 -135.10 +124.00 214.30 -454.50 +468.80 1363.60 -3750.00 +3000.00
-937.50 +1153.80 -428.60 +483.90 -625.00 +394.70 -714.30 +882.40 -1875.00 +1500.00 -7500.00 +3000.00 -15000.00 +7500.00
-3750.00 +3000.00 -2142.90 +3750.00 -7500.00 +2142.90 -3000.00 +5000.00 -7500.00 +Inf -Inf +15000.00 -Inf +15000.00
diff --git a/results/eredivisie.RData b/results/eredivisie.RData deleted file mode 100644 index a446bc1..0000000 Binary files a/results/eredivisie.RData and /dev/null differ diff --git a/results/hist_pred_match_result-1.png b/results/hist_pred_match_result-1.png index b11d727..975f283 100644 Binary files a/results/hist_pred_match_result-1.png and b/results/hist_pred_match_result-1.png differ diff --git a/results/hist_rand_match_result-1.png b/results/hist_rand_match_result-1.png index 9579eb5..e518bf5 100644 Binary files a/results/hist_rand_match_result-1.png and b/results/hist_rand_match_result-1.png differ diff --git a/results/mean_home_goal-1.png b/results/mean_home_goal-1.png index e50bcc1..37a63f0 100644 Binary files a/results/mean_home_goal-1.png and b/results/mean_home_goal-1.png differ diff --git a/results/mode_home_goal-1.png b/results/mode_home_goal-1.png index 7dafec0..725e183 100644 Binary files a/results/mode_home_goal-1.png and b/results/mode_home_goal-1.png differ diff --git a/results/mu_sigma_params-1.png b/results/mu_sigma_params-1.png index 46e1e75..9a0773c 100644 Binary files a/results/mu_sigma_params-1.png and b/results/mu_sigma_params-1.png differ diff --git a/results/mu_sigma_params-2.png b/results/mu_sigma_params-2.png index bd30279..cde9b03 100644 Binary files a/results/mu_sigma_params-2.png and b/results/mu_sigma_params-2.png differ diff --git a/results/mu_sigma_params-3.png b/results/mu_sigma_params-3.png index 1b0f03a..e26e070 100644 Binary files a/results/mu_sigma_params-3.png and b/results/mu_sigma_params-3.png differ diff --git a/results/overall_home_advantage-1.png b/results/overall_home_advantage-1.png index 9b243f5..2d7a049 100644 Binary files a/results/overall_home_advantage-1.png and b/results/overall_home_advantage-1.png differ diff --git a/results/plot_goals-1.png b/results/plot_goals-1.png index 23c4d93..bb12ec5 100644 Binary files a/results/plot_goals-1.png and b/results/plot_goals-1.png differ diff --git a/results/rand_home_goal-1.png b/results/rand_home_goal-1.png index 6cae891..07cc2c9 100644 Binary files a/results/rand_home_goal-1.png and b/results/rand_home_goal-1.png differ diff --git a/results/team_skill-1.png b/results/team_skill-1.png index 6e5269d..f48a14c 100644 Binary files a/results/team_skill-1.png and b/results/team_skill-1.png differ diff --git a/results/team_skill_PSV_Ajax-1.png b/results/team_skill_PSV_Ajax-1.png index 41d7f56..967e85e 100644 Binary files a/results/team_skill_PSV_Ajax-1.png and b/results/team_skill_PSV_Ajax-1.png differ