Experts and Authorities receive disproportionate attention on Twitter during the COVID-19 crisis

This repository contains annotation data, and ML models related to the work "Experts and Authorities receive disproportionate attention on Twitter during the COVID-19 crisis".

If you intend to use any of these materials, please make sure to cite the work accordingly:

Gligorić, Kristina et al. “Experts and authorities receive disproportionate attention on Twitter during the COVID-19 crisis.” (2020).

@misc{gligori2020experts,
    title={Experts and authorities receive disproportionate attention on Twitter during the COVID-19 crisis},
    author={Kristina Gligorić and Manoel Horta Ribeiro and Martin Müller and Olesia Altunina and Maxime Peyrard and Marcel Salathé and Giovanni Colavizza and Robert West},
    year={2020},
    eprint={2008.08364},
    archivePrefix={arXiv},
    primaryClass={cs.SI}
}

Annotation data

User descriptions have been annotated by type and category. Find the data here.

The CSV file has the following columns:

Column	Description
user.id	Twitter user ID
category	Consensus category (collapsed)
type	Consensus type
tweeting_lang	Language user is tweeting in usually
bio_lang	Language bio (user description) is written in
type_1	Type annotation by annotator 1
type_2	Type annotation by annotator 2
type_3	Type annotation by annotator 3
type_4	Type annotation by annotator 4 (if available)
category_1	Categories (uncollapsed) by annotator 1
category_2	Categories (uncollapsed) by annotator 2
category_3	Categories (uncollapsed) by annotator 3
category_4	Categories (uncollapsed) by annotator 4 (if available)

The annotations contain the following labels:

Category (collapsed): Labels: "art", "business", "healthcare", "media", "ngo", "other", "political_supporter", "politics", "adult_content", "public_services", "religion", "science", "sports"
Type: Labels: "individual", "institution", "unclear"

Additionally, the following category labels have a more fine-grained (uncollapsed) annotation:

"media": "Media: News", "Media: Scientific News and Communication", "Media: Other Media"
"science": "Science: Engineering and Technology", "Science: Life Sciences", "Science: Social Sciences", "Science: Other Sciences"

Please refer to our paper for a more in-detail explanation of the individual annotations, and how the annotation was performed.

Text classification models

Based on the annotation data provided above, several models have been trained on the category (collapsed) and type objectives.

The models made available are from two different model types (BERT-like and Fasttext). For more info regarding training procedure, please check the SI of the paper.

BERT

The easiest way to use the BERT models is by using the PyTorch models together with the huggingface/transformers library. To install run

pip install transformers

Then from the table below download a suitable PyTorch checkpoint, extract it using tar -xzf <tar_file> and run:

from transformers import pipeline

path_to_model = './category_bert_multilang_pt/'

# We are using the sentiment-analysis type (even though our model is not a sentiment analysis model)
pipe = pipeline('sentiment-analysis', model=path_to_model, tokenizer=path_to_model)

# Feed an example input
pipe('artiste et paintre')
# output:
# [{'label': 'art', 'score': 0.9069588780403137}]

For the TF checkpoints you should use them with tensorflow/models.

FastText

To use the FastText models run

pip install fasttext

Download & extract one of the FastText models and run

import fasttext

model = fasttext.load_model('./category_fasttext/model.bin')
print(model.predict('virologist'))
# (('__label__science',), array([0.98916745]))

Download models

Description	Language	Dataset	Identifier	Size	Download
BERT multilingual category (BERT-multlingual cased)	multilang	category	`bert-multilang-pt`	630MB	PyTorch \| TF
BERT multilingual type (BERT-multlingual cased)	multilang	type	`bert-multilang-pt`	630MB	PyTorch \| TF
BERT English category (BERT-large uncased)	en	category	`bert-english-pt`	1.2GB	PyTorch \| TF
BERT English type (BERT-large uncased)	en	type	`bert-english-pt`	1.2GB	PyTorch \| TF
FastText english category	en	category	`fasttext-english`	200MB	Download
FastText english type	en	type	`fasttext-english`	426MB	Download

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
helpers		helpers
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experts and Authorities receive disproportionate attention on Twitter during the COVID-19 crisis

Annotation data

Text classification models

BERT

FastText

Download models

About

Releases 1

Languages

License

digitalepidemiologylab/experts-covid19-twitter

Folders and files

Latest commit

History

Repository files navigation

Experts and Authorities receive disproportionate attention on Twitter during the COVID-19 crisis

Annotation data

Text classification models

BERT

FastText

Download models

About

Resources

License

Stars

Watchers

Forks

Releases 1

Languages