Netflix-Movies-and-TV-Shows-Clustering

AlmaBetter Capstone Project

Objectives:

Conduct Exploratory Data Analysis.

Try understanding what type content is available in different countries.

Check if Netflix is increasingly focusing on TV rather than movies in recent years.

Clustering similar content by matching text-based features.

Methods used:

Descriptive Statistics.

Data Visualization.

Machine Learning.

Libraries utilized:

NumPy and Pandas - For dataset cleaning and analysis.

Matplotlib, Plotly and Seaborn - For Data Visualization.

SkLearn and nltk - For machine learning and clustering.

Project Overview

Netflix, is an American subscription streaming service and production company. It was founded in 1997 by Reed Hastings and Marc Randolph in Scott’s Valley, California.

It offers a library of films and television series through distribution deals as well as its own productions, known as Netflix Originals.

Our objective is to conduct an Exploratory Data Analysis to understand what content is available in different countries and if Netflix has been increasingly focusing on TV rather than movies in recent years. And use these insights to cluster similar content by matching text-based features.

After loading the data, we start by observing the first and last five values to understand the dataset. This is followed by feature engineering to extract new variables from the datetime variable date_added.

This cleaned data is then used to conduct EDA in order to understand it better and identify the underlying trends.

Once obtained the required insights from the EDA, we start with Pre-processing the text data by removing the punctuation, and, stop words. This filtered data is passed through TF - IDF Vectorizer since we are conducting a text-based clustering and the model needs the data to be vectorized in order to predict the desired results.

Then clusters were built using the Agglomerative clustering algorithm, and the optimal number of clusters came out to be 12. This was obtained after visualizing the dendrogram.

A content based recommender system was built using the similarity matrix obtained after using cosine similarity. This recommender system will make 10 recommendations to the user based on the type of show they watched.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Netflix_Movies_and_TV_Shows_Clustering.ipynb		Netflix_Movies_and_TV_Shows_Clustering.ipynb
README.md		README.md
Recommendation System Deploy.rar		Recommendation System Deploy.rar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Netflix-Movies-and-TV-Shows-Clustering

Objectives:

Methods used:

Libraries utilized:

Project Overview

About

Releases

Packages

Languages

Developer-AD/Netflix_Movies_and_TV_Shows_Clustering

Folders and files

Latest commit

History

Repository files navigation

Netflix-Movies-and-TV-Shows-Clustering

Objectives:

Methods used:

Libraries utilized:

Project Overview

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages