Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Extraction #12

Open
5 of 10 tasks
jeffjohannsen opened this issue Nov 22, 2024 · 0 comments
Open
5 of 10 tasks

Data Extraction #12

jeffjohannsen opened this issue Nov 22, 2024 · 0 comments

Comments

@jeffjohannsen
Copy link
Contributor

jeffjohannsen commented Nov 22, 2024

Data Extraction Scripts

Requirements:

  1. Scripts for regularly scheduled updates of data in near real-time.
  2. Scripts for the complete ingestion of historical data to fill the initial database, including transformations as needed.
  3. Relevant Database Structure Updates
  4. Bulk Load of Historical Data with Data Quality Debugging
  5. Airflow Dags Created and Tested

Data Categories

  • Teams - Develop scripts to automate data ingestion for current and historical team data. Ensure proper mapping to existing database schema and maintain data integrity.
    - Use Current Teams Table. No Updating Necessary.

  • Players - Build automated workflows for importing player metadata, including rosters and historical player information. Scripts should handle updates and structure mapping effectively.
    - All Players
    - Individual Player Info

  • Games - Implement scripts to automate the collection and insertion of game schedules, results, and statuses. Ensure integration with existing game tables while maintaining data consistency.
    - Current Schedule Code
    - Stats Scoreboard
    - Live Scoreboard

  • Betting - Write scripts to automate the ingestion of betting odds and results. Ensure proper integration with related game and team data in the database.
    - The Odds API
    - Covers.com scraping code from NBA Betting project. Databall

  • Injuries - Create automation for loading injury data, including real-time injury reports and historical injury records. Update schemas as required.
    - ESPN NBA Injuries
    - Historical Injury Data
    - Pro Sports Transactions

  • Play-by-Play (PBP) - Develop automated scripts for importing play-by-play data, including timestamped events and descriptions. Ensure comprehensive and efficient handling of large data sets.
    - Current Live PBP Option
    - Live PBP
    - PBP V3

  • PlayerBox - Build scripts for loading detailed player performance statistics. Automate updates to the database to ensure compatibility with both historical and real-time data sources.
    - Player Game Logs

  • TeamBox - Write scripts to extract and load team performance statistics for games. Ensure all relevant metrics are integrated with existing data structures.
    - Team Game Logs

  • GameStates - Develop scripts to handle play-by-play and/or box data ingestion that supports game state tracking goals. Ensure proper integration for state monitoring.

  • WinProbability - Implement scripts for sourcing win probability data, with integration for comparison to the custom model. Include both live data and historical data sources.
    - Win Probability PBP


Other Sources:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant