Wine Quality Prediction Pipeline

Overview

This project aims to build and run a feature pipeline using Modal, train a model on the feature data, and then build an inference pipeline complete with a Gradio UI hosted on HuggingFace Spaces. The main objective is to predict the quality of wine using a set of features.

The project leverages the HopsWorks feature store for efficient data management.

Dataset

The dataset used in this project is the Wine Quality Dataset, which can be found here. It consists of various attributes that describe the wine's properties and a target quality score.

Project Structure

wine-eda-and-backfilling.ipynb:
- Conducts data cleaning, exploratory data analysis (EDA), and feature selection.
- Uploads relevant data to the HopsWorks feature store.
wine-feature-pipeline-daily.py:
- A script designed to run on Modal, generating 10 new data points daily and adding them to the feature store.
wine-training-pipeline.ipynb:
- Handles the training process using linear regression and decision tree models.
- Evaluates the performance of these models and uploads them to the HopsWorks model registry.
wine-batch-inference-pipeline.py:
- Another script for Modal, fetching batch data from the feature store (the most recently added data).
- Measures and stores the performance of the models on this batch data using the dataset_api on HopsWorks.
huggingface-space-wine/app.py:
- Contains a Gradio interface for making wine quality predictions using sliders to adjust feature values.
- Displays the classification accuracy, confusion matrix, test MSE, and prediction comparisons of the two models on the batch data.

Huggingface Space link

Check out the app on huggingface space here

Setup and Installation

Run the following bash script and replace the HopsWorks login api_key_value with your own api key

pip install -r requirements.txt

Usage

Perform EDA and backfilling

jupyter notebook wine-eda-and-backfilling.ipynb

Run feature pipeline on Modal

python wine-feature-pipeline-daily.py

Train models

jupyter notebook wine-training-pipeline.ipynb

Execute batch inference pipeline on Modal

python wine-batch-inference-pipeline.py

Run the Gradio interface (HuggingFace Spaces)

python huggingface-space-wine/app.py

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
decision_tree_model		decision_tree_model
huggingface-space-wine		huggingface-space-wine
huggingface-spaces-wine-monitor		huggingface-spaces-wine-monitor
iris_task1		iris_task1
linear_regression_model		linear_regression_model
README.md		README.md
classification_accuracy.png		classification_accuracy.png
confusion_matrix.png		confusion_matrix.png
predictions_comparison.png		predictions_comparison.png
requirements.txt		requirements.txt
test_mse.png		test_mse.png
wine-batch-inference-pipeline.py		wine-batch-inference-pipeline.py
wine-eda-and-backfill.ipynb		wine-eda-and-backfill.ipynb
wine-feature-pipline-daily.py		wine-feature-pipline-daily.py
wine-training-pipeline.ipynb		wine-training-pipeline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wine Quality Prediction Pipeline

Overview

Dataset

Project Structure

Huggingface Space link

Setup and Installation

Usage

Perform EDA and backfilling

Run feature pipeline on Modal

Train models

Execute batch inference pipeline on Modal

Run the Gradio interface (HuggingFace Spaces)

About

Releases

Packages

Languages

HillSeahWQ/scalable-machine-learning

Folders and files

Latest commit

History

Repository files navigation

Wine Quality Prediction Pipeline

Overview

Dataset

Project Structure

Huggingface Space link

Setup and Installation

Usage

Perform EDA and backfilling

Run feature pipeline on Modal

Train models

Execute batch inference pipeline on Modal

Run the Gradio interface (HuggingFace Spaces)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages