Skip to content

Latest commit

 

History

History
50 lines (38 loc) · 2.55 KB

07Satyam_ML_EDA.md

File metadata and controls

50 lines (38 loc) · 2.55 KB

Introduction to Exploratory Data Analysis in Python

: By Satyam Gadekar 07Satyam | LinkedIn GitHub 07Satyam | Twitter

EDA

What is Exploratory data analysis?

According to Wikipedia:

In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.

How to perform Exploratory Data Analysis?

This is one such question that everyone is keen on knowing the answer. Well, the answer is it depends on the data set that you are working. There is no one method or common methods in order to perform EDA, whereas in this Project you can understand some common methods and plots that would be used in the EDA process.

Advantages of EDA

  1. It gives us valuable insights about the data
  2. It helps us for feature selection (i.e using PCA)
  3. Visualization is an effective way of detecting outliers

Dataset Introduction

We will perform exploratory data analysis on the House Price Prediction dataset.

EDA in Python

There are many libraries available in python pandas, NumPy, matplotlib, seaborn etc. with the help of those we can do the analysis of the data and bring out helpful insights. I will be using Jupyter Notebook along with these libraries. Some of the key steps in EDA are :

  • Identifying the features
  • Basics of EDA
  • Handling Missing values
  • Detecting Outliers
  • Handling Outliers
  • Histogram
  • Correlation Heatmap
  • Scatterplot
  • Boxplot
  • Feature Engineering

Endnotes

Hence the above are some of the steps involved in Exploratory data analysis, these are some general steps that you must follow in order to perform EDA.