Photo by Roman Kraft on Unsplash
Badge source
In this project we are developing Flask Application which can predict the Maths Score of the students. This is an end to end Machine learning portfolio project which involves the Exploratory Analysis, Model Development and Model training. Machine Learning Project is implemented with MLOps and CI/CD pipelines. MLOps consists of Data Ingestion, Data Transformation, Model Trainer, Model Evaluation and Model Deployment.
- Authors
- Table of Contents
- Problem Statement
- Tech Stack
- Data source
- Quick glance at the results
- Lessons learned and recommendation
- Limitation and what can be improved
- Run Locally
- Explore the notebook
- Contribution
- License
This app predicts the math score of the students by providing the other features. App will receive the details and use the model to predict the result.
- pandas
- numpy
- seaborn
- matplotlib
- scikit-learn
- catboost
- xgboost
- dill
- Flask
Kaggle Link : - https://www.kaggle.com/datasets/spscientist/students-performance-in-exams?datasetId=74977
-
Data send contains the Gender, Ethnicity, Parental level of education, Lunch and Test preparation course by the students in various subjects.
-
The data consists of 8 column and 1000 rows.
Final Results of the Model
Plots of Actual Score and Predicted Score
-
The final model used is: Linear Regression
-
Metrics used: R2_Score
-
Why choose precision as metrics: Since the objective of this problem is to predict the math score with accuracy have used R² Score. The R² Score, also known as the Coefficient of Determination, is a popular metric for evaluating regression models. It explains how well the predicted values from your model approximate the actual data points.
R² Score is used when we care about how much variance in the target variable is captured by the model.
Conclusion: In our case, since we need to predict the score without much varriance we will use R² Score as our metric.
During our analysis we have found the below:
- Student's Performance is related with lunch, race, parental level education
- Females lead in pass percentage and also are top-scorers
- Student's Performance is not much related with test preparation course
- Finishing preparation course is benefitial.
- Accuracy of the model is 88.04 for Linear Regression. Since there is not much difference between Ridge and Linear Regression we choose Linear Regression as our model. Accuracy can be improved further by Fine Tunning the model further.
- Application can deployed in AWS or Azure. Have tried implementing in AWS and Azure But not shown in the Project. Even have tried setting in Docker.
- Current Dataset is not having much features. Project can be improved by using a Dataset with more features.
Initialize git
git init
Clone the project
git clone https://github.com/samithcsachi/Student_Performance_Predictor.git
Open Anaconda Prompt and Change the Directory and Open VSCODE by typing code .
cd E:/Student_Performance_Predictor
Open the command prompt in VSCODE
conda create -p venv python==3.13 -y
Activate the conda environment
conda activate venv/
List all the packages installed
conda list
Install the packages in Requirements file.
pip install -r requirements.txt
Run the application
python application.py
GitHub : https://github.com/samithcsachi/Student_Performance_Predictor
Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change or contribute.
MIT License
Copyright (c) 2025 Samith Chimminiyan
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Learn more about MIT license
If you have any questions, suggestions, or collaborations in data science, feel free to reach out:
- 📧 Email: [email protected]
- 🔗 LinkedIn: www.linkedin.com/in/samithchimminiyan
- 🌐 Website: www.samithc.com