This repository contains a Python script for analyzing Netflix data using the Pandas, NumPy, Matplotlib, Seaborn, and Plotly libraries. The analysis covers various aspects such as missing values, distribution of content types (Movies vs TV Shows), ratings, release years, top directors, and more.
- Loading the dataset using Pandas.
- Displaying the first and last few rows of the dataset.
- Checking basic information about the dataset like shape, columns, and data types.
- Identifying and visualizing missing values in the dataset.
- Handling missing values in specific columns like "rating," "duration," "date_added," and "country."
- Visualizing the distribution of Movies vs TV Shows.
- Analyzing the distribution of ratings and top-rated categories.
- Examining the distribution of content by country.
- Exploring the dataset's release years and identifying the years with the highest number of releases.
- Investigating the top directors and genres in the dataset.
- Replacing missing values in the "country," "cast," and "director" columns.
- Creating visualizations for the top actors based on the number of titles.
- Separating the dataset into Movies and TV Shows.
- Analyzing and cleaning the "duration" column for both Movies and TV Shows.
- Identifying the shortest and longest movies.
- Analyzing the duration and ratings of TV Shows, including those with the highest number of seasons.
- Extracting and visualizing the addition of content over the months and years.
- Install the required libraries using
pip install pandas numpy matplotlib seaborn plotly
. - Clone the repository.
- Run the provided Python script in your local environment.
Feel free to explore, modify, and enhance the analysis based on your preferences and requirements. 🚀