see analysis.ipynb
for implementation
Twitter has become perhaps the preeminent platform for political discourse, where people express their views on a variety of issues. Given the increasing polarization in the political landscape, it is of great interest to understand how people's political views are reflected in their online behavior. Our goal is to develop a model that accurately predicts whether a tweet is posted by a Republican or Democrat, taking into account both the textual content of the tweet and associated numerical features such as the number of retweets, likes, and followers. We seek to identify the best set of features and models that can effectively capture the relationship between the text and numerical features and the political affiliation of tweet authors. To do so, we scraped 786,000 Tweets made by American members of Congress in a two-year span from 2021 to 2023. After vectorizing the text and fine-tuning a Multinomial Naive Bayes algorithm, we were able to accurately classify Tweets as originating from a Republican or Democratic member of Congress at an 82 percent accuracy with a ROC score of 84.