-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path03-quality-checks.qmd
111 lines (73 loc) · 2.34 KB
/
03-quality-checks.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
title: "Quality checks"
---
Because we are extracting this data from PDF's I used this file to troubleshoot and make sure all this is good.
## Fixed issues
- In the [Cleaning roster](01-cleaning-roster.qmd) notebook the `roster_type` == "SUPPLEMENTAL SPOT 31" was changed to just "SUPPLEMENTAL SPOT" for consistency. The roster has the "31" in all cases but seems odd to keep here. I can change later if needed.
- In `others` there were initially some missing players. I ended up piecing everything together in [Cleaning other](01-cleaning-other.qmd).
- Also in `others` I had to rework how players were awarded different types and notes because in the original list players can be listed more than once with different designations. I had to collapse all of that.
-
There aren't any known issues as of now, but I'm keeping this notebook around for now.
## Setup
```{r}
#| label: setup
#| message: false
#| warning: false
library(tidyverse)
library(janitor)
```
## Import the rosters
```{r}
rosters <- read_rds("data-processed/rosters.rds")
rosters |> glimpse()
```
Do I have all the teams?
```{r}
rosters |>
count(club_short)
```
There are [29 teams](https://en.wikipedia.org/wiki/Major_League_Soccer) in the MLS as of May 2, 2024.
Let's spot check some teams.
```{r}
rosters |> filter(club_short == "PHI")
```
## Others
We look at the others file here.
```{r}
others <- read_rds("data-processed/others.rds")
others |> glimpse()
```
### Missing teams
At one point we were missing teams. Here we make sure there are 29.
```{r}
others |>
count(club_short)
```
### Checking season-ending list
I noticed this because we were missing injured players. Here I check for them.
```{r}
others |>
filter(type_inj == TRUE)
```
## Profiles check
The last check of everything
```{r}
profiles <- read_rds("data-processed/profiles.rds")
profiles |> filter(club_short == "CLB")
```
Example for index page
```{r}
profiles |> filter(club_short == "ATX") |>
head(2) |> glimpse()
```
How many teams
```{r}
profiles |>
count(club_short)
```
## Stray header?
if you do a sort by roster designation, you'll see a stray | by a U22 designated player at the top, but otherwise, it reveals DPs first in a list you can then secondarily sort by team, which is muy helpful.
```{r}
profiles |> filter(type_u22 == T) |>
filter(name == "Maximiliano David Ayala")
```