Skip to content

Latest commit

 

History

History
29 lines (24 loc) · 1.68 KB

File metadata and controls

29 lines (24 loc) · 1.68 KB

Module 2: Data Wrangling

Overview

In this module, we focus on cleaning the dataset using various data wrangling techniques. This includes identifying and removing duplicate rows, finding and imputing missing values, and normalizing the data. The goal is to prepare the data for further analysis.

Objectives

  • Identify duplicate values in the dataset.
  • Remove duplicate values from the dataset.
  • Identify and impute missing values in the dataset.
  • Normalize the data using existing columns.

Contents

Data Wrangling Notebook

  • Notebook: 02-Data_Wrangling.ipynb
    • Description: This notebook includes all steps for data wrangling:
      • Finding Duplicates: Identifies duplicate rows in the dataset.
      • Removing Duplicates: Removes duplicate rows from the dataset.
      • Finding Missing Values: Finds missing values in the dataset.
      • Imputing Missing Values: Imputes (replaces) missing values in the dataset.
      • Normalizing Data: Normalizes data using existing columns.
    • Output: Cleaned and normalized dataset.

Key Points

  • Data wrangling involves cleaning the dataset to make it ready for analysis.
  • Techniques include identifying and removing duplicates, finding and imputing missing values, and normalizing data.
  • Cleaned and normalized data is essential for accurate analysis.

Summary

Module 2 covers essential data wrangling techniques to clean the dataset. By identifying and removing duplicates, finding and imputing missing values, and normalizing data, we ensure that the dataset is ready for further analysis. The single notebook provides detailed steps for each process, ensuring a thorough cleaning of the data.