NFL Playoff Prediction Machine Learning Application

Senior Capstone Project by Max Sealey

(B.S. in Computer Science conferred 06/26/24)

Project Overview

Project Requirements

Design and develop a fully functional data product (application) addressing your identified business problem or organizational need.

The deliverables include the application and a written report, also located in this repository. The report contains a Letter of Transmittal to Commissioner Goodell, a project proposal plan, and a post-implementation report.

Data Methods – provide one descriptive method that discerns relationships and characteristics of the past data in at least three forms of visualization. Also, provide one nondescriptive where a decision or trend could be inferred. The descriptive method should be in the domains of cluster or association analysis, and the others could include pruning algorithm, discriminate analysis, regression analysis (linear, logistic), Bayesian methods, neural network, or support vector machines.

Datasets – The use of dataset(s) is a critical element and involves the gathering and measuring of information on targeted variables in a systematic fashion. This could be student collected (Please consider IRB ramifications.) or publicly accessible such as websites (e.g. Kaggle.com), governmental (e.g. Department of Labor), or software related (e.g. GitHub.com).

Analytics – Using the given data, your application needs to enable decisions to be formulated or support for given trends to be provided.

Data Cleaning – if applicable, create a function that will make the data usable prior to actually being used by the application. Things such as featuring, parsing, cleaning, and wrangling the datasets.

Data Visualization – You need at least three real-time (e.g. using the GUI/dashboard) formats to visualize the data in a graphic format. Look at things like charting, mapping, color theory, plots, diagrams, or other methods (tables must include heat mapping).

Real-Time Queries – As part of your GUI enable users to access and manipulate data real-time including data maintenance. This does not deal with data “freshness” but with the query response time being in seconds.

Adaptive Element – if appropriate for the business need, provide the implementation of machine-learning methods and algorithms to enable the application to improve with experience.

Outcome Accuracy – provide functionalities that evaluate the accuracy of the information/outcomes given by the application. What are the parameters for valid output data and how will those be checked by the application?

Dashboard – include a user-friendly, functional dashboard that enables the query and display of the data, as well as other functionality described in this section. This could be stand-alone, CLI, Web-based, or a mobile application interface.

Project Proposal

My application aims to utilize machine learning to assist the NFL in predicting the playoff likelihood of any NFL team based on how they allocate their salaries by position. The dataset consists of salary cap data from 2013-2022.

Machine Learning Overview

The ML model chosen is a Random Forest Classification model, a subsect of the Supervised Learning branch of machine learning. The features that I used to make the model included the percentage of the cap allocated to the QB position, the percentage of the cap allocated to the offense (as a whole), and the percentage of the cap allocated to the defense (as a whole). I then split the data into training and testing subsets (70/30 split) and fit it to a RandomForestClassifier model imported from scikit-learn. Then I had the model make predictions on random samples of the testing data, the results of which were stored and used to formulate the accuracy score, classification report, and the confusion matrix.

Future Improvements

The main improvement I want to make is to increase the accuracy of the application, and that likely includes introducing more data and adjusting the parameters of the ML model.

Other improvements include:

Switching to a Regression model
Develop into a web application (frontend UI and backend API)
Introduce Ensemble methods
More methods to monitor reliability

Tech Stack

Languages: Python, SQL (database)

Libraries: Pandas, Scikit-learn, Matplotlib, NumPy, Seaborn, SQLite

Dataset: NFL Salary Cap Spending 2013-2022 (link)

IDE: PyCharm 2023.1.12 (Community)

Local Environment Setup InstructIons

These instructions assumes that git is installed on your computer and you have a basic knowledge of git and terminal navigation.

1. Clone the repository to your local machine.

git clone <ssh key>

2. Open in your chosen IDE. I recommend PyCharm since that is what was used to develop this program.

3. Install 'pip' if you don't already have it.

4. Navigate to the project directory and run the following command:

pip install scikit-learn matplotlib numpy seaborn pandas

5. Run the program on main.py

Command Line Interface (CLI)

Welcome, and thank you for using my program.

View the classification report, accuracy score, and confusion matrix.

Classification Report & Accuracy Score

Confusion Matrix

Displays the Main Menu for User Interaction

Option 1: Make a prediction

Option 2: Pie Chart Visualization

Pie Chart Example

Option 3: Bar Chart Visualization

Bar Chart Example

Option 4: Line Graph Visualization

Line Graph Example

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
app		app
data		data
readme-assets/cli		readme-assets/cli
.gitignore		.gitignore
README.md		README.md
sealeycapstone-report.pdf		sealeycapstone-report.pdf

maxsealey/NFL-MachineLearning-Capstone

Folders and files

Latest commit

History

Repository files navigation

NFL Playoff Prediction Machine Learning Application

Senior Capstone Project by Max Sealey

(B.S. in Computer Science conferred 06/26/24)

Table of Contents

Project Overview

Project Requirements

Analytics – Using the given data, your application needs to enable decisions to be formulated or support for given trends to be provided.

Data Cleaning – if applicable, create a function that will make the data usable prior to actually being used by the application. Things such as featuring, parsing, cleaning, and wrangling the datasets.

Data Visualization – You need at least three real-time (e.g. using the GUI/dashboard) formats to visualize the data in a graphic format. Look at things like charting, mapping, color theory, plots, diagrams, or other methods (tables must include heat mapping).

Real-Time Queries – As part of your GUI enable users to access and manipulate data real-time including data maintenance. This does not deal with data “freshness” but with the query response time being in seconds.

Adaptive Element – if appropriate for the business need, provide the implementation of machine-learning methods and algorithms to enable the application to improve with experience.

Outcome Accuracy – provide functionalities that evaluate the accuracy of the information/outcomes given by the application. What are the parameters for valid output data and how will those be checked by the application?

Dashboard – include a user-friendly, functional dashboard that enables the query and display of the data, as well as other functionality described in this section. This could be stand-alone, CLI, Web-based, or a mobile application interface.

Project Proposal

My application aims to utilize machine learning to assist the NFL in predicting the playoff likelihood of any NFL team based on how they allocate their salaries by position. The dataset consists of salary cap data from 2013-2022.

Machine Learning Overview

Future Improvements

The main improvement I want to make is to increase the accuracy of the application, and that likely includes introducing more data and adjusting the parameters of the ML model.Other improvements include:

Tech Stack

Languages: Python, SQL (database) Libraries: Pandas, Scikit-learn, Matplotlib, NumPy, Seaborn, SQLite Dataset: NFL Salary Cap Spending 2013-2022 (link) IDE: PyCharm 2023.1.12 (Community)

Local Environment Setup InstructIons

These instructions assumes that git is installed on your computer and you have a basic knowledge of git and terminal navigation. 1. Clone the repository to your local machine.

2. Open in your chosen IDE. I recommend PyCharm since that is what was used to develop this program. 3. Install 'pip' if you don't already have it. 4. Navigate to the project directory and run the following command: pip install scikit-learn matplotlib numpy seaborn pandas

5. Run the program on main.py

Command Line Interface (CLI)

Welcome, and thank you for using my program.

View the classification report, accuracy score, and confusion matrix.

Classification Report & Accuracy Score

Confusion Matrix

Displays the Main Menu for User Interaction

Option 1: Make a prediction

Option 2: Pie Chart Visualization

Pie Chart Example

Option 3: Bar Chart Visualization

Bar Chart Example

Option 4: Line Graph Visualization

Line Graph Example

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

The main improvement I want to make is to increase the accuracy of the application, and that likely includes introducing more data and adjusting the parameters of the ML model.

Other improvements include:

Languages: Python, SQL (database)

Libraries: Pandas, Scikit-learn, Matplotlib, NumPy, Seaborn, SQLite

Dataset: NFL Salary Cap Spending 2013-2022 (link)

IDE: PyCharm 2023.1.12 (Community)

These instructions assumes that git is installed on your computer and you have a basic knowledge of git and terminal navigation.

1. Clone the repository to your local machine.

2. Open in your chosen IDE. I recommend PyCharm since that is what was used to develop this program.

3. Install 'pip' if you don't already have it.

4. Navigate to the project directory and run the following command:
`pip install scikit-learn matplotlib numpy seaborn pandas`

Packages