- To find which books are similar to others based on the content
-
Diagnosed text of all books in the dataset using NLTK and curated a distance matrix, a matrix showing pairwise distances between all books using Pandas and Gensim.
-
Build a tf-idf model and Summarized data of how similar each book is to a particular book through a bar chart developed using Matplotlib.
-
Compute the clusters from the similarity matrix, using the Scipy-Ward variance minimization algorithm and Visualized which groups of books have similar topics using Dendrogram.