Skip to content

Latest commit

 

History

History
32 lines (27 loc) · 2.29 KB

File metadata and controls

32 lines (27 loc) · 2.29 KB

Module 3: Exploratory Data Analysis

Overview

In this module, we begin working with the cleaned dataset from the previous module. The focus is on analyzing the dataset to understand the distribution of data, identify the presence of outliers, and determine the correlation between different columns. This helps in gaining insights and preparing the data for more advanced analysis.

Objectives

  • Plot distribution curves and histograms.
  • Find the median and outliers of particular columns.
  • Compute the Inter Quartile Range (IQR).
  • Determine the upper and lower bounds.
  • Identify correlations between numerical columns.
  • Create a new dataframe with the refined data.

Contents

Exploratory Data Analysis Notebook

  • Notebook: 03-Exploratory_Data_Analysis.ipynb
    • Description: This notebook includes all steps for exploratory data analysis:
      • Distribution Analysis: Plotting distribution curves and histograms to understand data distribution.
      • Outlier Detection: Identifying outliers using statistical methods and visualizations.
      • IQR Computation: Computing the Inter Quartile Range to find the spread of the data.
      • Bounds Identification: Finding the upper and lower bounds for detecting outliers.
      • Correlation Analysis: Identifying correlations between numerical columns to understand relationships.
      • Dataframe Creation: Creating a new dataframe with the refined data after handling outliers and other anomalies.
    • Output: Insights and refined dataset for further analysis.

Key Points

  • Exploratory Data Analysis (EDA) helps in understanding the data distribution, identifying outliers, and determining correlations.
  • Techniques include plotting distributions, detecting outliers, computing IQR, finding bounds, and analyzing correlations.
  • EDA is crucial for gaining insights and preparing data for advanced analysis.

Summary

Module 3 covers essential exploratory data analysis techniques to understand the dataset better. By analyzing data distribution, identifying outliers, computing IQR, finding bounds, and analyzing correlations, we gain valuable insights and prepare the data for further analysis. The single notebook provides detailed steps for each process, ensuring a comprehensive understanding of the dataset.