Complete football/soccer datalake with 41000+ players from Transfermarkt. Includes player profiles, performance statistics, market values, transfer histories, injury records, national team data, and teammate relationships.
- 🎯 Total Players: 41000++ professional football players
- ⚽ Total Teams: 1400+ clubs worldwide
- 🌍 Geographic Scope: Global coverage of all major leagues
- 📈 Data Categories: 10 comprehensive data categories
Check out a sample of the dataset to get started.
datalake/transfermarkt/raw/
├── player_profiles/
├── player_performances/
├── player_market_values/
├── player_transfer_histories/
├── player_injury_histories/
├── player_national_team_performances/
└── player_teammates_played_with/
datalake/transfermarkt/raw/
├── teams_details/
├── teams_competitions_seasons/
└── teams_children/
- 40,700+ Profiles
- 765,000+ Performance Records
- 427,000+ Market Values
- 280,000+ Transfers
- 78,000+ Injuries
- 62,000+ National Team Appearances
- 681,000+ Teammate Relationships
- 1,300+ Teams
- 1,300+ Competition Records
- 4,900+ Club Relationships
erDiagram
PLAYER_PROFILES {
varchar player_id PK
varchar player_slug
varchar player_name
varchar player_image_url
varchar date_of_birth_url
date date_of_birth
varchar place_of_birth_country
varchar place_of_birth
varchar height
varchar citizenship_country
varchar citizenship
varchar position
varchar foot
varchar player_agent_url
varchar player_agent
varchar current_club_id FK
varchar current_club_url
date joined
date contract_expires
varchar social_media_url
varchar social_media
varchar player_main_position
varchar player_sub_position
}
PLAYER_MARKET_VALUES {
varchar player_id FK
bigint date_unix PK
int value
}
PLAYER_TRANSFER_HISTORIES {
varchar transfer_id PK
varchar player_id FK
varchar season
date date
varchar date_unformatted
varchar from_team_id FK
varchar from_team_url
varchar from_team_name
varchar to_team_id FK
varchar to_team_url
varchar to_team_name
int value_at_transfer
varchar transfer_fee
}
PLAYER_PERFORMANCES {
varchar player_id FK
varchar season
varchar competition_id FK
varchar competition_url
varchar competition_name
varchar team_id FK
varchar team_url
varchar team_name
int nb_in_group
int nb_on_pitch
int goals
int own_goals
int assists
int subed_in
int subed_out
int yellow_cards
int second_yellow_cards
int direct_red_cards
int penalty_goals
int minutes_played
int goals_conceded
int clean_sheets
}
PLAYER_TEAMMATES_PLAYED_WITH {
varchar player_id FK
varchar teammate_id FK
varchar player_with_url
varchar player_with_name
float ppg_played_with
int joint_goal_participation
int minutes_played_with
}
PLAYER_INJURY_HISTORIES {
varchar player_id FK
varchar season
varchar injury_reason
date from_date PK
date end_date
int days_missed
int games_missed
}
PLAYER_NATIONAL_TEAM_PERFORMANCES {
varchar player_id FK
varchar team_id FK
varchar team_url
varchar team_name
date first_game_date PK
int matches
int goals
}
TEAMS_DETAILS {
varchar club_id PK
varchar club_slug
varchar club_name
varchar logo_url
varchar country_name
varchar season_id
varchar competition_id FK
varchar competition_slug
varchar competition_name
varchar club_division
varchar source_url
}
TEAMS_CHILDREN {
varchar parent_team_id FK
varchar parent_team_name
varchar child_team_id FK
varchar child_team_name
}
TEAMS_COMPETITIONS_SEASONS {
varchar team_id FK
varchar team_name
varchar season_id
varchar competition_id FK
varchar competition_name
varchar club_division
}
COMPETITIONS {
varchar competition_id PK
varchar competition_slug
varchar competition_name
}
%% RELATIONSHIPS
PLAYER_PROFILES ||--o{ PLAYER_MARKET_VALUES : "has values"
PLAYER_PROFILES ||--o{ PLAYER_TRANSFER_HISTORIES : "has transfers"
PLAYER_PROFILES ||--o{ PLAYER_PERFORMANCES : "has performances"
PLAYER_PROFILES ||--o{ PLAYER_TEAMMATES_PLAYED_WITH : "played with"
PLAYER_PROFILES ||--o{ PLAYER_INJURY_HISTORIES : "has injuries"
PLAYER_PROFILES ||--o{ PLAYER_NATIONAL_TEAM_PERFORMANCES : "national team"
TEAMS_DETAILS ||--o{ TEAMS_CHILDREN : "parent/child"
TEAMS_DETAILS ||--o{ TEAMS_COMPETITIONS_SEASONS : "plays in"
COMPETITIONS ||--o{ TEAMS_DETAILS : "competition includes teams"
PLAYER_TRANSFER_HISTORIES }o--|| TEAMS_DETAILS : "from/to team"
PLAYER_PERFORMANCES }o--|| TEAMS_DETAILS : "performance for team"
PLAYER_PERFORMANCES }o--|| COMPETITIONS : "performance in comp"
PLAYER_NATIONAL_TEAM_PERFORMANCES }o--|| TEAMS_DETAILS : "national team"
- ✅ Deduplication: Content hashing prevents duplicate data
- ✅ Incremental Updates: Only changed data is reprocessed
- ✅ Error Tracking: Failed URLs logged for monitoring
- ✅ Unicode Support: Proper handling of international characters
- ✅ Timestamp Tracking: All records include update timestamps
Most datasets give you a filtered, pre-processed view.
Working with raw football data lets you explore everything—from cleaning and organizing to deep analysis—giving you the opportunity to learn by doing.
- 🎯 Explore Freely – Investigate the data your way and discover patterns on your own
- 🔬 Develop Analytical Skills – Create your own metrics, KPIs, and ways of interpreting the game
- 🤖 Experiment with Machine Learning – Train models on raw features to understand player performance, tactics, and trends
- 📊 Spot Hidden Insights – Learn to uncover trends that pre-processed datasets might hide
Raw Data Aspect | How You Can Learn |
---|---|
🏗️ Build Your Own Pipeline | Gain hands-on experience cleaning, structuring, and preparing large datasets |
🔍 Deep Data Exploration | Practice exploratory data analysis (EDA), spot anomalies, and discover patterns |
⚡ Efficient Data Handling | Learn to query, filter, and transform large datasets effectively |
🎨 Visual Storytelling | Create your own charts and visualizations to communicate insights clearly |
🔗 Combine Sources | Merge data from matches, players, and events to see the bigger picture and draw richer conclusions |
📚 Learn Through Iteration | Test different approaches, refine your methods, and see the impact of your analysis in real time |
Help maintain and expand this valuable football dataset:
Your sponsorship helps with:
- 🚀 Regular Data Updates: Keep the dataset current
- 🌍 Expanded Coverage: Add more leagues and competitions
- 🔧 Infrastructure Costs: Server and storage maintenance
- 📊 Data Quality: Enhanced validation and processing
I’m always excited to collaborate on innovative football data projects. If you’ve got an idea, let’s make it happen together!
- GitHub: @salimt
- LinkedIn: salimt
- Issues: Feel free to use GitHub Issues if you’ve got dataset-specific questions.
If you find this project useful, don’t forget to drop a star ⭐ on GitHub—it really helps others discover it too!
Contributions to the Nodeball Football Datalake are most welcome! If you want to contribute new fields, data improvements, or processing enhancements to this dataset, the instructions are quite simple:
- Fork the repo
- Set up your local environment
- Analyze the datalake structure in
datalake/
directory - Start modifying data processing or creating new data extraction scripts
- If it's all looking good, create a pull request with your changes 🚀
- 🐛 Data Quality: Report inconsistencies or missing data
- 🔧 Processing Scripts: Improve data extraction and validation
- 📊 New Data Categories: Add new types of football data
- 🧹 Data Cleaning: Help with validation and normalization
- 📝 Documentation: Improve dataset documentation
football-data
soccer-dataset
transfermarkt-data
player-statistics
football-analytics
soccer-analytics
sports-data
football-research
player-performance
transfer-market
football-database
soccer-database
sports-dataset
football-datalake
soccer-datalake
Built with ⚽ by salimt | Last Updated: August 2025
"Complete football datalake - no player left behind."