This projects aims to predict the compressive strength of concrete materials used in civil engineering. We trained a model on a dataset from the “UCI Machine Learning Repository”. We will train a regression model on the data and then build a very simple client interface with Streamlit.
In this step, we make descriptive analysis to explore and understand each variables in the data. The figures below show the distributions of each features and the correlations between each features.
Before we start model building, we remove outliers with the Isolation Forest Algorithm with 5% contamination then rescale the data with RobustScaler so that the rescaled will not be sensitive to some undetected outliers. After building regression models and hyperparameters tuning, the best model has R2 score of 90%. It is an ensemble method called "gradient boosting regressor" and its learning curves is shown below:
You can easily run the streamlit app by changing to current directory to apps then run the app.py script.
- $ cd apps/
- $ streamlit run app.py