This Data Explorer is phase 2 of a larger project where I develop an ML pipeline for property price prediction. See also:
- univariate EDA (Jupyter notebook)
- multivariate EDA (Jupyter notebook)
This project was done over the course of one week in February 2024 in Ghent (Belgium), during the AI Bootcamp by Becode.
Its main goals were to practice:
- exploratory data analysis (EDA)
- data cleaning & preparation
- data visualisation
- data storytelling & presentation
The most challenging part for me with this project was coming to terms with just how much of EDA is iterative. Many were the times I had to go back to step 1 after discovering something in step 15. To check out my exploratory work, open any of the three EDA Jupyter notebooks.
If I have time to return to this, I'd like to:
- use a better, clearer structure, where I split the investigation into structure, quality & content and for each section look at categorical variables and numerical variables separately and in a clear order (e.g. ordinal then nominal, discrete then continuous etc.)
- have a different version of the exploration where I try fancier libraries and dataviz tools (PyGWalker, Bokeh, Streamlit, Dash)
All my code is currently heavily:
- docstringed
- commented
- and sometimes typed
This is to help me learn and to make my sessions with our training coach more efficient.
Connect with me on LinkedIn 🤍