Skip to content

Latest commit

 

History

History
6 lines (6 loc) · 844 Bytes

README.md

File metadata and controls

6 lines (6 loc) · 844 Bytes

This repo contains scripts that uses the nlp techniques to obtain insights from a set of comedian transcripts. step 1 : We start with scraping the transcripts data and applying text cleaning techniques and create the Document Term matrix.
step 2 : some exploratory data analysis on the dataset like constructing wordcloud, obtaining the word frequency and profanity to verify whether our data makes sense.
step 3 : Perform sentiment analysis on the transcript using Textblob and get info on how each comedian's sentiment varies over the routine.
step 4 : Perform topic modelling using Latent Dirichlet Allocation and try come with the topics each comedian mostly uses in their comedy.
step 5 : as a fun task we try the task of text generation. We try the markov_chain techinque and also RNN to generate similar transcripts.