According to Wikipedia:
In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.
This is one such question that everyone is keen on knowing the answer. Well, the answer is it depends on the data set that you are working. There is no one method or common methods in order to perform EDA, whereas in this Project you can understand some common methods and plots that would be used in the EDA process.
Advantages of EDA
- It gives us valuable insights about the data
- It helps us for feature selection (i.e using PCA)
- Visualization is an effective way of detecting outliers
We will perform exploratory data analysis on the House Price Prediction dataset.
There are many libraries available in python pandas, NumPy, matplotlib, seaborn etc. with the help of those we can do the analysis of the data and bring out helpful insights. I will be using Jupyter Notebook along with these libraries. Some of the key steps in EDA are :
- Identifying the features
- Basics of EDA
- Handling Missing values
- Detecting Outliers
- Handling Outliers
- Histogram
- Correlation Heatmap
- Scatterplot
- Boxplot
- Feature Engineering
Hence the above are some of the steps involved in Exploratory data analysis, these are some general steps that you must follow in order to perform EDA.