Welcome to my Soccer Data Analysis project! This repository dives deep into football statistics to uncover intriguing insights about player performance, market trends, and league dynamics. Explore how data shapes the game and tells compelling stories about the beautiful sport.
- Data Source: Dataset from Transfermarkt, a soccer statistics platform. Includes information on players, their clubs, market values, goals, and more.
- Motivation: Driven by a passion for soccer, this project provides data-driven insights to enthusiasts and analysts alike.
- Dataset Coverage:
- 30,328 players
- 1,515,723 appearances
- Statistics include player heights, ages, market values, cards, and minutes played.
The peak ages for soccer players are 27–32, aligning with prime performance years.
- Average market value: €1.6 million
- Players valued over €100 million are predominantly younger (around 22 years old), reflecting the growing emphasis on potential.
The distribution of player positions reveals that goalkeepers are the least represented, while midfielders dominate.
European countries dominate the player pool, with France leading, followed by England and Spain.
- English Premier League boasts the highest average player market value.
- La Liga records the most yellow cards per game, reflecting stricter officiating.
- Manchester City holds the highest squad market value.
- Chelsea has the youngest squad among the most valuable clubs.
- Libraries Used:
pandas
,seaborn
,matplotlib
,numpy
- Data cleaning, imputation, and visualization techniques were employed to derive insights.
- Players table linked to Appearances table (one-to-many relationship).
- Missing values in critical columns like
foot
andcontract_expiration_date
were imputed using probabilistic techniques. - Removed columns with more than 50% missing data, e.g.,
agent_name
.
- Machine Learning: Build predictive models for player market values using historical data and gameplay metrics.
- Expanded Metrics: Include advanced statistics like passing accuracy, defensive actions, and goal contributions.
- Interactive Dashboards: Develop a Streamlit-based application for real-time soccer analytics.
- Lack of historical market value data limited trend analysis.
- Absence of detailed gameplay metrics restricted advanced model building.
- Dataset: Transfermarkt
- Additional Analysis: Kaggle's Player Scores Dataset (link)
This repository is a living project, and contributions are welcome! Feel free to fork, create pull requests, or open issues for discussions.