Skip to content

Complete football/soccer datalake with 41000+ players from Transfermarkt. Includes player profiles, performance statistics, market values, transfer histories, injury records, national team data, and teammate relationships.

Notifications You must be signed in to change notification settings

salimt/football-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

⚽ Most Comprehensive Transfermarkt Dataset

Comprehensive Football/Soccer Dataset - 41,000+ Players

GitHub License Data Coverage Last Updated Football Data Sponsor Kaggle

Complete football/soccer datalake with 41000+ players from Transfermarkt. Includes player profiles, performance statistics, market values, transfer histories, injury records, national team data, and teammate relationships.


📊 Dataset Coverage

  • 🎯 Total Players: 41000++ professional football players
  • ⚽ Total Teams: 1400+ clubs worldwide
  • 🌍 Geographic Scope: Global coverage of all major leagues
  • 📈 Data Categories: 10 comprehensive data categories

🗂️ Complete Datalake Structure - all CSV files -

Example Data

Check out a sample of the dataset to get started.

Player Data Categories (7 categories)

datalake/transfermarkt/raw/
├── player_profiles/               
├── player_performances/          
├── player_market_values/         
├── player_transfer_histories/          
├── player_injury_histories/       
├── player_national_team_performances/ 
└── player_teammates_played_with/  

Team Data Categories (3 categories)

datalake/transfermarkt/raw/
├── teams_details/                 
├── teams_competitions_seasons/    
└── teams_children/                

What You Get (2.16M+ Records!) 🔥

Player Intelligence (7 massive datasets)

  • 40,700+ Profiles
  • 765,000+ Performance Records
  • 427,000+ Market Values
  • 280,000+ Transfers
  • 78,000+ Injuries
  • 62,000+ National Team Appearances
  • 681,000+ Teammate Relationships

Club Data (3 datasets)

  • 1,300+ Teams
  • 1,300+ Competition Records
  • 4,900+ Club Relationships

🏗️ Complete Data Schema & Entity Relationships

erDiagram

    PLAYER_PROFILES {
        varchar player_id PK
        varchar player_slug
        varchar player_name
        varchar player_image_url
        varchar date_of_birth_url
        date    date_of_birth
        varchar place_of_birth_country
        varchar place_of_birth
        varchar height
        varchar citizenship_country
        varchar citizenship
        varchar position
        varchar foot
        varchar player_agent_url
        varchar player_agent
        varchar current_club_id FK
        varchar current_club_url
        date    joined
        date    contract_expires
        varchar social_media_url
        varchar social_media
        varchar player_main_position
        varchar player_sub_position
    }

    PLAYER_MARKET_VALUES {
        varchar player_id FK
        bigint  date_unix PK
        int     value
    }

    PLAYER_TRANSFER_HISTORIES {
        varchar transfer_id PK
        varchar player_id FK
        varchar season
        date    date
        varchar date_unformatted
        varchar from_team_id FK
        varchar from_team_url
        varchar from_team_name
        varchar to_team_id FK
        varchar to_team_url
        varchar to_team_name
        int     value_at_transfer
        varchar transfer_fee
    }

    PLAYER_PERFORMANCES {
        varchar player_id FK
        varchar season
        varchar competition_id FK
        varchar competition_url
        varchar competition_name
        varchar team_id FK
        varchar team_url
        varchar team_name
        int     nb_in_group
        int     nb_on_pitch
        int     goals
        int     own_goals
        int     assists
        int     subed_in
        int     subed_out
        int     yellow_cards
        int     second_yellow_cards
        int     direct_red_cards
        int     penalty_goals
        int     minutes_played
        int     goals_conceded
        int     clean_sheets
    }

    PLAYER_TEAMMATES_PLAYED_WITH {
        varchar player_id FK
        varchar teammate_id FK
        varchar player_with_url
        varchar player_with_name
        float   ppg_played_with
        int     joint_goal_participation
        int     minutes_played_with
    }

    PLAYER_INJURY_HISTORIES {
        varchar player_id FK
        varchar season
        varchar injury_reason
        date    from_date PK
        date    end_date
        int     days_missed
        int     games_missed
    }

    PLAYER_NATIONAL_TEAM_PERFORMANCES {
        varchar player_id FK
        varchar team_id FK
        varchar team_url
        varchar team_name
        date    first_game_date PK
        int     matches
        int     goals
    }

    TEAMS_DETAILS {
        varchar club_id PK
        varchar club_slug
        varchar club_name
        varchar logo_url
        varchar country_name
        varchar season_id
        varchar competition_id FK
        varchar competition_slug
        varchar competition_name
        varchar club_division
        varchar source_url
    }

    TEAMS_CHILDREN {
        varchar parent_team_id FK
        varchar parent_team_name
        varchar child_team_id FK
        varchar child_team_name
    }

    TEAMS_COMPETITIONS_SEASONS {
        varchar team_id FK
        varchar team_name
        varchar season_id
        varchar competition_id FK
        varchar competition_name
        varchar club_division
    }

    COMPETITIONS {
        varchar competition_id PK
        varchar competition_slug
        varchar competition_name
    }

    %% RELATIONSHIPS
    PLAYER_PROFILES ||--o{ PLAYER_MARKET_VALUES : "has values"
    PLAYER_PROFILES ||--o{ PLAYER_TRANSFER_HISTORIES : "has transfers"
    PLAYER_PROFILES ||--o{ PLAYER_PERFORMANCES : "has performances"
    PLAYER_PROFILES ||--o{ PLAYER_TEAMMATES_PLAYED_WITH : "played with"
    PLAYER_PROFILES ||--o{ PLAYER_INJURY_HISTORIES : "has injuries"
    PLAYER_PROFILES ||--o{ PLAYER_NATIONAL_TEAM_PERFORMANCES : "national team"

    TEAMS_DETAILS ||--o{ TEAMS_CHILDREN : "parent/child"
    TEAMS_DETAILS ||--o{ TEAMS_COMPETITIONS_SEASONS : "plays in"
    COMPETITIONS ||--o{ TEAMS_DETAILS : "competition includes teams"

    PLAYER_TRANSFER_HISTORIES }o--|| TEAMS_DETAILS : "from/to team"
    PLAYER_PERFORMANCES }o--|| TEAMS_DETAILS : "performance for team"
    PLAYER_PERFORMANCES }o--|| COMPETITIONS : "performance in comp"
    PLAYER_NATIONAL_TEAM_PERFORMANCES }o--|| TEAMS_DETAILS : "national team"

Loading

📋 **Data Quality **

Data Quality Features

  • Deduplication: Content hashing prevents duplicate data
  • Incremental Updates: Only changed data is reprocessed
  • Error Tracking: Failed URLs logged for monitoring
  • Unicode Support: Proper handling of international characters
  • Timestamp Tracking: All records include update timestamps

🎁 Why Raw Data? Because Freedom Matters!

🔓 Dive Into Raw Football Data

Most datasets give you a filtered, pre-processed view.
Working with raw football data lets you explore everything—from cleaning and organizing to deep analysis—giving you the opportunity to learn by doing.

💡 Learn Through Practice

  • 🎯 Explore Freely – Investigate the data your way and discover patterns on your own
  • 🔬 Develop Analytical Skills – Create your own metrics, KPIs, and ways of interpreting the game
  • 🤖 Experiment with Machine Learning – Train models on raw features to understand player performance, tactics, and trends
  • 📊 Spot Hidden Insights – Learn to uncover trends that pre-processed datasets might hide

🚀 Self-Learning Opportunities with Raw Data

Raw Data Aspect How You Can Learn
🏗️ Build Your Own Pipeline Gain hands-on experience cleaning, structuring, and preparing large datasets
🔍 Deep Data Exploration Practice exploratory data analysis (EDA), spot anomalies, and discover patterns
⚡ Efficient Data Handling Learn to query, filter, and transform large datasets effectively
🎨 Visual Storytelling Create your own charts and visualizations to communicate insights clearly
🔗 Combine Sources Merge data from matches, players, and events to see the bigger picture and draw richer conclusions
📚 Learn Through Iteration Test different approaches, refine your methods, and see the impact of your analysis in real time

💝 Support This Project

💖 Sponsor the Datalake

Help maintain and expand this valuable football dataset:

GitHub Sponsors

Your sponsorship helps with:

  • 🚀 Regular Data Updates: Keep the dataset current
  • 🌍 Expanded Coverage: Add more leagues and competitions
  • 🔧 Infrastructure Costs: Server and storage maintenance
  • 📊 Data Quality: Enhanced validation and processing

🤝 Get In Touch

💡 Working on a Cool Project?

I’m always excited to collaborate on innovative football data projects. If you’ve got an idea, let’s make it happen together!

📬 Contact Me

  • GitHub: @salimt
  • LinkedIn: salimt
  • Issues: Feel free to use GitHub Issues if you’ve got dataset-specific questions.

🌟 Star the Repo

If you find this project useful, don’t forget to drop a star ⭐ on GitHub—it really helps others discover it too!

GitHub stars


👨‍💻 Contributing

Contributions to the Nodeball Football Datalake are most welcome! If you want to contribute new fields, data improvements, or processing enhancements to this dataset, the instructions are quite simple:

🎯 How to Contribute

  1. Fork the repo
  2. Set up your local environment
  3. Analyze the datalake structure in datalake/ directory
  4. Start modifying data processing or creating new data extraction scripts
  5. If it's all looking good, create a pull request with your changes 🚀

📋 Contribution Areas

  • 🐛 Data Quality: Report inconsistencies or missing data
  • 🔧 Processing Scripts: Improve data extraction and validation
  • 📊 New Data Categories: Add new types of football data
  • 🧹 Data Cleaning: Help with validation and normalization
  • 📝 Documentation: Improve dataset documentation

football-data soccer-dataset transfermarkt-data player-statistics football-analytics soccer-analytics sports-data football-research player-performance transfer-market football-database soccer-database sports-dataset football-datalake soccer-datalake


Built with ⚽ by salimt | Last Updated: August 2025

"Complete football datalake - no player left behind."

About

Complete football/soccer datalake with 41000+ players from Transfermarkt. Includes player profiles, performance statistics, market values, transfer histories, injury records, national team data, and teammate relationships.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published