This data is from the Statsbomb free and open data, and it contains data about different football competitions, matches, players, and events. it's stored in a highly nested JSON format, and it's a great dataset to practice data engineering skills on.
data/
├── competitions.json
│
├── matches/
│ └── <competition_id>/
│ └── <season_id>.json
│
├── lineups/
│ └── <match_id>.json
│
└── events/
└── <match_id>.json
-
competitions.json
- This file contains basic data about the competitions as well as their seasons
-
matches/<competition_id>/<season_id>.json
- This file contains basic data about all the matches in that season, it's important to note that some of this data is truncated because we're using the free verison of the data
-
lineups/<match_id>.json
- This file contains the lineups for that match id
- This will contain data about all players that actually played in the match
- It will also show their position, and if they changed from one position to another during the match
-
events/<match_id>.json
- This file contains the events for that match id
- This is the bread and butter of this dataset, it contains each pass, each tackle, and every single event that happened in the match, and each event has related events
- (PS: this may be my undoing because it's a LOT of data (JK), but let's see where this goes)
database
├── competitions (collection)
├── matches (collection)
├── lineups (collection)
└── events (collection)
competitions
-> file is uploaded as ismatches
-> each document in each file is given amatch_id
field (based on the name of the file itself in the Raw Data)lineups
-> each document in each file is given amatch_id
field (based on the name of the file itself in the Raw Data)events
-> each document in each file is given amatch_id
field (based on the name of the file itself in the Raw Data)
PS: the
_id
field is automatically generated by MongoDB
Please refer to relational database design.dbml
or to the ERD diagram for the database design.
you can click on the image to go to the interactive version of the ERD.