This project aims to build and run a feature pipeline using Modal, train a model on the feature data, and then build an inference pipeline complete with a Gradio UI hosted on HuggingFace Spaces. The main objective is to predict the quality of wine using a set of features.
The project leverages the HopsWorks feature store for efficient data management.
The dataset used in this project is the Wine Quality Dataset, which can be found here. It consists of various attributes that describe the wine's properties and a target quality score.
-
wine-eda-and-backfilling.ipynb
:- Conducts data cleaning, exploratory data analysis (EDA), and feature selection.
- Uploads relevant data to the HopsWorks feature store.
-
wine-feature-pipeline-daily.py
:- A script designed to run on Modal, generating 10 new data points daily and adding them to the feature store.
-
wine-training-pipeline.ipynb
:- Handles the training process using linear regression and decision tree models.
- Evaluates the performance of these models and uploads them to the HopsWorks model registry.
-
wine-batch-inference-pipeline.py
:- Another script for Modal, fetching batch data from the feature store (the most recently added data).
- Measures and stores the performance of the models on this batch data using the dataset_api on HopsWorks.
-
huggingface-space-wine/app.py
:- Contains a Gradio interface for making wine quality predictions using sliders to adjust feature values.
- Displays the classification accuracy, confusion matrix, test MSE, and prediction comparisons of the two models on the batch data.
Check out the app on huggingface space here
Run the following bash script and replace the HopsWorks login api_key_value with your own api key
pip install -r requirements.txt
jupyter notebook wine-eda-and-backfilling.ipynb
python wine-feature-pipeline-daily.py
jupyter notebook wine-training-pipeline.ipynb
python wine-batch-inference-pipeline.py
python huggingface-space-wine/app.py