This repository consists of a course project that aims to breakdown the apparatus of censorship on twitter in India using topic modelling and methods of text analysis and natural language processing.
The ultimate objective of this project is to understand if there are broader patterns or topics that emerge out of the tweets censored by the Indian government. If so, what could these topics be?
The project is broken down into two broad parts. The first part pertains to extracting details of accounts whose tweets were reported by the Indian government to Twitter between 2019 and 2021 using data collected by the Lumen database. A detailed explanation of the context of the problem along with the method used is mentioned in the pdf titled "Context_process_pt1". The python code for web scraping and organizing data from the Lumen database can be found in the file " Web_scraping_twitter_account_data".
The second part of the project attempts to extract texts of the reported tweets (if available), and then use text analysis and topic modelling to understand if there are certain patterns that emerge from the censored content, and if so, what these patterns are. The elaborate context of this part of the project can be found in "Context_process_pt2". The python code for the process can be found in the file "text_analysis_censored_tweets".