Python for Text-as-Data: Using Word Embeddings to Assess the Diversity of Election-Related Search Queries
This is the Github repository as part of the handbook article on computational text analysis with Python: [Citation added upon acceptance].
The repo provides a fully self-contained example of how to generate and cluster word embeddings of text data, here search queries people would use on Google to stay informed on upcoming elections. All data and code are based on:
Schwabl, P., Unkel, J., & Haim, M. (2023). Vielfalt bei Google? Vielzahl, Ausgewogenheit und Verschiedenheit wahlbezogener Suchergebnisse. In C. Holtz-Bacha (ed.), Die (Massen-) Medien im Wahlkampf (p. 293–316). Springer Fachmedien Wiesbaden. https://doi.org/10.1007/978-3-658-38967-3_11
The repo contains two files:
- cluster_word_embeddings_example.ipynb: The Juypter notebook with code to reproduce the handbook example.
- survey_queries_cluster.csv: The data needed to run the notebook.
You can run the Notebook in Colab here.
Notes:
- Feel free to adapt the code to your needs!
- We chose a small dataset for the sake of clarity. However, the code and methods can also be applied to bigger datasets.
When using the code, please cite our handbook contribution as follows:
[Citation added upon acceptance]