Skip to content

Latest commit

 

History

History
32 lines (20 loc) · 1.97 KB

README.md

File metadata and controls

32 lines (20 loc) · 1.97 KB

Analysis of editorial activities on Wikipedia

About the repo

The repo contains code for a project in my MSc course, which explains what I have done as part of the programme (and which is hopefully helpful for showing my ability of intermediate coding skills).

Content

Topic

The code analyses editorial activities (e.g., creating, editing, reverting articles) on Wikipedia to understand the dynamics of human interaction and behaviour on online platforms. Here is a list of related research studies focusing on online editorial activities, networks, and communities on Wikipedia.

  • Tsvetkova, M., García-Gavilanes, R., Floridi, L., & Yasseri, T. (2017). Even good bots fight: The case of Wikipedia. PloS one, 12(2), e0171774.
  • Gildersleve, P., Lambiotte, R., & Yasseri, T. (2023). Between news and history: identifying networked topics of collective attention on Wikipedia. Journal of Computational Social Science, 6(2), 845-875.

Code structure

  • output.ipynb: A notebook contains the final output of the analysis
  • /module/create_network.py: A module to create a network data for the analysis
  • /module/find_revert_back.py: A module to find mutual reverts in Wikipedia articles
  • /module/calculate_similarity.py: A module to calculate the similarity of edit activities
  • /module/visualise.py: A module to visualise the output of the analysis

Coding environment

I used Poetry to manage the Python environment. The pyproject.toml file contains the dependencies and the Python version used in the project.

Note

  • The code is written to solve particular problems which cannot be shared. Also, the repo does not contain data provided by the school, as sharing the data is not permitted.
  • When solving the problem, we can only use simple modules such as pickle, random, datetime, and the packages numpy, matplotlib, and seaborn** to practice writing code. We can NOT use advanced data processing packages such as pandas, networkx, scikitlearn, etc.