Skip to content

Latest commit

 

History

History
82 lines (62 loc) · 2.28 KB

README.md

File metadata and controls

82 lines (62 loc) · 2.28 KB

NELA-GT-2019

This repository contain examples of how to use the NELA-GT-2019 data set with Python 3.

Download the dataset from here: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/O7FWPO

For more details about this dataset, check the paper: https://arxiv.org/abs/2003.08444

If you use this dataset in your work, please cite us as follows:

@misc{
    gruppi2020nelagt2019,
    title={NELA-GT-2019: A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles},
    author={Maurício Gruppi and Benjamin D. Horne and Sibel Adalı},
    year={2020},
    eprint={2003.08444},
    archivePrefix={arXiv},
    primaryClass={cs.CY}
}

Data

Metadata
Dataset name NELA-GT-2019
Formats Sqlite3,JSON
No. of articles 1118821
No. of sources 261
Collection period 2019-01-01 to 2019-12-31

Fields

Each data point collected corresponds to an article and contains the fields described below.

Field Type Description
id string ID of the article
date string date of publication (YYYY-MM-DD)
source string name of the source
title string article's headline
content string article's body text
author string author who signed the article
published string date time string as provided by source
published_utc integer unix timestamp of publication
collection_utc integer unix timestamp of collection date

Aggregated labels

We provide aggregated labels based on Media Bias/Fact Check reports, classifying each source as:

  • Reliable - class 0
  • Mixed - class 1
  • Unreliable - class 2

These labels can be found in labels.csv

Note: the labels used in this aggregation were collected from Media Bias/Fact Check on Mar 20, 2020.

Examples

load-sqlite3.py

  • How to load the data from the Sqlite3 database using SQL queries.
    • Loading data from single or multiple sources from the database
    • Loading data from the database into a Pandas dataframe

Usage:

python3 load-sqlite3.py <path-to-database>

load-json.py

  • How to load NELA in JSON format with Python 3.
    • Loading a single source's JSON
    • Loading a directory of NELA JSON files - WARNING: this consumes a lot of memory

Usage:

python3 load-json.py <path-to-file>