diff --git a/NL_eredivisie_2014_2019.R b/NL_eredivisie_2014_2019.R index 8f7f6d1..3d49084 100644 --- a/NL_eredivisie_2014_2019.R +++ b/NL_eredivisie_2014_2019.R @@ -17,7 +17,7 @@ col_name <- function(name, ...) { paste0(name, "[", paste(..., sep=",") , "]") } -## ----raw_data, cache = 2, cache.extra = file.info(list.files(path = "test", pattern = ".csv", full.names = TRUE))---- +## ----raw_data, cache = 2, dependson = "scan_data_dir"-------------------- from_year <- 2014 to_year <- 2019 source(paste0("functions/Import_Data_Eredivisie.R")) diff --git a/NL_eredivisie_2014_2019.html b/NL_eredivisie_2014_2019.html index 1e8bd18..38dff0d 100644 --- a/NL_eredivisie_2014_2019.html +++ b/NL_eredivisie_2014_2019.html @@ -180,7 +180,7 @@
The following graphs shows the trace plots and probability distributions of the team mean, team sigma and season sigma parameters, respectively.
plot(s3[, "group_skill"])
-
+
plot(s3[, "group_sigma"])
-
+
plot(s3[, "season_sigma"])
-
+
We can also calculate the default home advantage by looking at the difference between exp(home_baseline) - exp(away_baseline)
. The next graph shows that there is a home advantage of more than 0.4 goals, on average, and it differs significantly from zero.
plotPost(exp(ms3[,col_name("home_baseline",to_year-from_year)]) - exp(ms3[,col_name("away_baseline",to_year-from_year)]), compVal = 0, xlab = "Home advantage in number of goals")
-
+
## mean median mode hdiMass
-## Home advantage in number of goals 0.4288232 0.4248015 0.4162575 0.95
+## Home advantage in number of goals 0.4283805 0.4249969 0.4252382 0.95
## hdiLow hdiHigh compVal pcGTcompVal
-## Home advantage in number of goals 0.2880692 0.6003385 0 1
+## Home advantage in number of goals 0.2777361 0.5844823 0 1
## ROPElow ROPEhigh pcInROPE
## Home advantage in number of goals NA NA NA
@@ -408,24 +408,24 @@ hist(m3_pred[ , "mode_home_goal"], breaks= (-1:max(m3_pred[ , "mode_home_goal"])) + 0.5, xlim=c(-0.5, 10),
main = "Distribution of predicted most \nprobable score by a home team in\na match",
xlab = "")
-
+
For almost all games the single most likely number of goals is one. Actually, if you know nothing about an Eredivisie game, betting on one goal for the home team is 78 % of the times the best bet.
Let's instead look at the distribution of the predicted mean number of home goals in each game.
hist(m3_pred[ , "mean_home_goal"], breaks= (-1:max(m3_pred[ , "mean_home_goal"])) + 0.5, xlim=c(-0.5, 10),
main = "Distribution of predicted mean \n score by a home team in a match",
xlab = "")
-
+
For most games the expected number of goals are 2. That is, even if your safest bet is one goal you would expect to see around two goals.
The distribution of the mode and the mean number of goals doesn’t look remotely like the actual number of goals. This was not to be expected, we would however expect the distribution of randomized goals (where for each match the number of goals has been randomly drawn from that match’s predicted home goal distribution) to look similar to the actual number of home goals. Looking at the histogram below, this seems to be the case.
hist(m3_pred[ , "rand_home_goal"], breaks= (-1:max(m3_pred[ , "rand_home_goal"])) + 0.5, xlim=c(-0.5, 10),
main = "Distribution of randomly drawn \n score by a home team in a match",
xlab = "")
-
+
We can also look at how well the model predicts the data. This should probably be done using cross validation, but as the number of effective parameters are much smaller than the number of data points a direct comparison should at least give an estimated prediction accuracy in the right ballpark.
mean(eredivisie$HomeGoals == m3_pred[ , "mode_home_goal"], na.rm=T)
-## [1] 0.3150232
+## [1] 0.3141534
mean((eredivisie$HomeGoals - m3_pred[ , "mean_home_goal"])^2, na.rm=T)
-## [1] 1.509597
+## [1] 1.509738
So on average the model predicts the correct number of home goals 31% of the time and guesses the average number of goals with a mean squared error of 1.51. Now we’ll look at the actual and predicted match outcomes. The graph below shows the match outcomes in the data with 1 being a home win, 0 being a draw and -1 being a win for the away team.
hist(eredivisie$MatchResult, breaks= (-2:1) + 0.5, xlim=c(-1.5, 1.5), ylim=c(0, 1000), main = "Actual match results",
xlab = "")
@@ -433,14 +433,14 @@ Now looking at the most probable outcomes of the matches according to the model.
hist(m3_pred[ , "match_result"], breaks= (-2:1) + 0.5, xlim=c(-1.5, 1.5), ylim=c(0, 1000), main = "Predicted match results",
xlab = "")
-
+
For almost all matches the safest bet is to bet on the home team. While draws are not uncommon it is never the safest bet.
As in the case with the number of home goals, the randomized match outcomes have a distribution similar to the actual match outcomes:
hist(m3_pred[ , "rand_match_result"], breaks= (-2:1) + 0.5, xlim=c(-1.5, 1.5), ylim=c(0, 1000), main = "Randomized match results",
xlab = "")
-
+
mean(eredivisie$MatchResult == m3_pred[ , "match_result"], na.rm=T)
-## [1] 0.563865
+## [1] 0.5661376
The model predicts the correct match outcome (i.e. home team wins / a draw / away team wins) 57% of the time. Pretty good!
@@ -458,15 +458,15 @@par(old_par)
Two teams are clearly ahead of the rest, Ajax and PSV. Let's look at the credible difference between these two teams. Ajax is a better team than PSV with a probabilty of 74%, i.e. the odds in favor of Ajax are 74% / 26% = 3. So, on average, PSV only wins one out of four games that they play against Ajax.
plotPost(team_skill[, "Ajax"] - team_skill[, "PSV Eindhoven"], compVal = 0, xlab = "<- PSV vs Ajax ->")
-
-## mean median mode hdiMass hdiLow
-## <- PSV vs Ajax -> 0.1616095 0.1535391 0.1467396 0.95 -0.3586123
+
+## mean median mode hdiMass hdiLow
+## <- PSV vs Ajax -> 0.1637752 0.1529581 0.139265 0.95 -0.3411534
## hdiHigh compVal pcGTcompVal ROPElow ROPEhigh
-## <- PSV vs Ajax -> 0.6842824 0 0.7312 NA NA
+## <- PSV vs Ajax -> 0.6727083 0 0.7360667 NA NA
## pcInROPE
## <- PSV vs Ajax -> NA
@@ -487,7 +487,7 @@ Predicting the future
rownames(eredivisie_forecast) <- NULL
print(xtable(eredivisie_forecast, align="cccccccccc"), type="html")
-
+
@@ -569,10 +569,10 @@ Predicting the futurePSV Eindhoven | -1.30 +1.20 | -1.90 +1.80 |
1.00
@@ -674,7 +674,7 @@ Predicting the future2.00 |
-0.00 +1.00 |
Heerenveen
@@ -866,7 +866,7 @@ Predicting the future0.00 |
-2.00 +3.00 |
Ajax
@@ -895,7 +895,7 @@ Predicting the future1.20 |
-2.00 +1.00 |
1.00
@@ -924,7 +924,7 @@ Predicting the future1.40 |
-1.70 +1.60 |
1.00
@@ -959,7 +959,7 @@ Predicting the future0.70 |
-2.00 +3.00 |
0.00
@@ -1052,10 +1052,10 @@ Predicting the future1.00 |
-2.30 +2.20 | -1.00 +0.00 |
2.00
@@ -1084,7 +1084,7 @@ Predicting the future1.30 |
-1.70 +1.80 |
1.00
@@ -1107,7 +1107,7 @@ Predicting the futurerownames(eredivisie_sim) <- NULL print(xtable(eredivisie_sim, align="cccccccc"), type="html") - +
|
---|