The repo contains code for a project in my MSc course, which explains what I have done as part of the programme (and which is hopefully helpful for showing my ability of intermediate coding skills).
The code analyses editorial activities (e.g., creating, editing, reverting articles) on Wikipedia to understand the dynamics of human interaction and behaviour on online platforms. Here is a list of related research studies focusing on online editorial activities, networks, and communities on Wikipedia.
- Tsvetkova, M., García-Gavilanes, R., Floridi, L., & Yasseri, T. (2017). Even good bots fight: The case of Wikipedia. PloS one, 12(2), e0171774.
- Gildersleve, P., Lambiotte, R., & Yasseri, T. (2023). Between news and history: identifying networked topics of collective attention on Wikipedia. Journal of Computational Social Science, 6(2), 845-875.
output.ipynb
: A notebook contains the final output of the analysis/module/create_network.py
: A module to create a network data for the analysis/module/find_revert_back.py
: A module to find mutual reverts in Wikipedia articles/module/calculate_similarity.py
: A module to calculate the similarity of edit activities/module/visualise.py
: A module to visualise the output of the analysis
I used Poetry to manage the Python environment. The pyproject.toml
file contains the dependencies and the Python version used in the project.
- The code is written to solve particular problems which cannot be shared. Also, the repo does not contain data provided by the school, as sharing the data is not permitted.
- When solving the problem, we can only use simple modules such as
pickle
,random
,datetime
, and the packagesnumpy
,matplotlib
, andseaborn
** to practice writing code. We can NOT use advanced data processing packages such aspandas
,networkx
,scikitlearn
, etc.