Author: Lam T. Nguyen
Date: Winter 2020
License: MIT
This project analyzes NBA game data to identify statistical patterns and performance indicators that contribute to team success in the modern era.
It combines data scraping, data preprocessing, and machine learning regression to explore which metrics most strongly correlate with winning outcomes.
The dataset is sourced from Basketball Reference, a widely used repository for historical and current basketball statistics.
- Collect and clean raw NBA game and player data
- Engineer features to represent team performance across seasons
- Apply linear regression models to identify the most predictive statistics
- Visualize correlations between team metrics and win percentages
- Data Mining: Automated retrieval and structuring of basketball game data
- Feature Engineering: Deriving advanced metrics (e.g., turnovers, rebounds, shooting efficiency)
- Machine Learning: Training regression models to predict win likelihood
- Visualization: Using
matplotlibto display team trends and statistical relationships
| Category | Tools |
|---|---|
| Language | Python 3.8+ |
| Data Source | nba_api, web scraping from Basketball Reference |
| ML & Analytics | scikit-learn, numpy, pandas |
| Visualization | matplotlib, mpl_toolkits |
| Data Storage | CSV-based intermediate files |
# Clone repo
git clone https://github.com/<your-username>/nba-data-analytics.git
cd nba-data-analytics
# Create environment (optional)
python -m venv venv
source venv/bin/activate # on Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt