Module 2: Data Wrangling

Overview

In this module, we focus on cleaning the dataset using various data wrangling techniques. This includes identifying and removing duplicate rows, finding and imputing missing values, and normalizing the data. The goal is to prepare the data for further analysis.

Objectives

Identify duplicate values in the dataset.
Remove duplicate values from the dataset.
Identify and impute missing values in the dataset.
Normalize the data using existing columns.

Notebook: 02-Data_Wrangling.ipynb
- Description: This notebook includes all steps for data wrangling:
  - Finding Duplicates: Identifies duplicate rows in the dataset.
  - Removing Duplicates: Removes duplicate rows from the dataset.
  - Finding Missing Values: Finds missing values in the dataset.
  - Imputing Missing Values: Imputes (replaces) missing values in the dataset.
  - Normalizing Data: Normalizes data using existing columns.
- Output: Cleaned and normalized dataset.

Key Points

Data wrangling involves cleaning the dataset to make it ready for analysis.
Techniques include identifying and removing duplicates, finding and imputing missing values, and normalizing data.
Cleaned and normalized data is essential for accurate analysis.

Summary

Module 2 covers essential data wrangling techniques to clean the dataset. By identifying and removing duplicates, finding and imputing missing values, and normalizing data, we ensure that the dataset is ready for further analysis. The single notebook provides detailed steps for each process, ensuring a thorough cleaning of the data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Module 2: Data Wrangling

Overview

Objectives

Contents

Data Wrangling Notebook

Key Points

Summary

Files

README.md

Latest commit

History

README.md

File metadata and controls

Module 2: Data Wrangling

Overview

Objectives

Contents

Data Wrangling Notebook

Key Points

Summary