Skip to content

Latest commit

 

History

History
23 lines (15 loc) · 1.49 KB

README.md

File metadata and controls

23 lines (15 loc) · 1.49 KB

topic_modeling

notebooks for topic modeling using BOW, TOP2VEC and BERTopic

These two noetbooks give various approaches to topic modeling. The topic_modeling_bow file shows topic modeling with BOW and LDA, followed by Top2Vec, a vector based model that uses Doc2Vec for Topic Modeling. Results were much better with BOW LDA, with optimal number of topics 14. I took coherence scores to get to this number. The viz looks as follows: bow_topic_model

The next file uses a BERT based model BERTopic for topic modeling. On its own, model finds 600+ topics. Most of them are overlapping, hence we do some reduction in number of topics by merging similar topics together, to bring down the number of topics to 20. The quality of extraction is much better than BOW LDA model, with niche topics like international Brexit, Immigration, Olympics, etc being identified. We should be able to furthur generalise and reduce the number of topics, if we wish to do so. The viz looks as follows: bert_topic_model

Even though it looks like there are a lot of overlapping topics, on zooming in we can see these are mutually exclusive topics and very niche topics. As mentioned above, we can reduce the number of topics and increase the size of these topics/bubbles