Skip to content

Latest commit

 

History

History
42 lines (35 loc) · 5.65 KB

README.md

File metadata and controls

42 lines (35 loc) · 5.65 KB

Datasets

Datasets shown in table below have been processed so far. Each dataset has its own folder with python jupyter notebook and data (dataset name is the link to correct folder in this repository).

Note: Not all of the datasets usually provide the data to perform task of fake news detection. In every corresponding README file, there are tasks - ideas what can be done with that dataset. However, do not limit yourself to that and use your imagination and creativity.

Processed datasets table

Dataset Records count Attributes count Labels Labeling method Primary language Entity
BanFakeNews 8,501 + 49,977 10 (or 8) fake (0), authentic (1) according to source (and probably part of the data and click-bait labeled manually by computer science students) bn news article
BuzzFeedNews Facebook Facts 2,282 12 mostly true, no factual content, mixture of true and false, mostly false manual en facebook post
CREDBANK 60 million tweets, grouped into 1049 real-world events - Certainly Inaccurate (-2), Probably Inaccurate (-1), Uncertain/Doubtful (0), Probably Accurate (+1), Certainly Accurate (+2) 30 human annotators for each event en tweet and event
Deception Detection Fake News 480 + 500 - fake, legit manual fact-checking, creating fake news manually en news article
Detecting Rumors Microblogs 1,101,985 + 3,805,656 posts, grouped into 992 + 4,664 events - rumor, non-rumor according to events (events from fact-checking portal snopes and Sina community management center) zh tweet, weibo post
EANN-KDD18 9,528 - rumor, non-rumor official rumor debunking system of Weibo (reported suspicious posts and examined by a committee of trusted users) zh tweet
Election Day Tweets 1,327 17 not fake news, fake news (or 5 categories of fake news) manual by one expert en tweet
FakeNewsChallenge 49,972 4 unrelated, discuss, agree, disagree manual by experts en news article
FakeNewsCorpus 9,408,908 16 fake, satire, bias, conspiracy, state, junksci, hate, clickbait, unreliable, political, reliable using domain (with usage of OpenSources) en news article
Fake News detection - Kaggle 4,009 4 1 (real), 0 (fake) unknown en news article
Fake News - Kaggle 20,800 5 reliable, unreliable unknown en news article
FakeNewsNet 23,196 5 real, fake according to fact-checking websites (like politifact.com) en news article and tweet
Fake News vs Satire 492 6 fake, satire manual by researchers (also provided explanation/proof) en news article
Fakeddit 1,063,106 16 fake (probably 0) or not (probably 1), or 3-way labeling and 6-way labeling (see appropriate README) according to subreddit's theme, automated quality checks and manually checked 150 of them for test en reddit post
FEVER 185,445 5 refutes, not enough info, supports manual, multiple levels of labels verification en claim
GeorgeMcIntire/fake_real_news_dataset 6,335 3 REAL, FAKE unknown en news article
Getting real about Fake News - Kaggle 12,999 20 bias, conspiracy, hate, satire, state, junksci, fake, bs using domain (with usage of OpenSources) en news article
Hack the Fake News 2,815 + 761 6 fake news (3) or not (1) manual by students of journalism bg news article
HoaxDataset 128 - Hoax, Nonhoax manual by experts en wikipedia article
LIAR 10,240 + 1,267 + 1,284 14 barely true counts, false counts, half true counts, mostly true counts, pants on fire counts according to fact-checking websites (like politifact.com) en statement
Misinfofinder 248 13 1 (misinformative), 0 (non-misinformative) manual by authors en comment post
Monant API - - - several labeling methods en, sk news article, discussion post, fact-checking article, claim
News Credibility 6,076 9 fake news, credible news (according to paper) according to source bg news article
OpenSources 833 5 bias, clickbait, conspiracy, fake, hate, junksci, satire, political, reliable, rumor, state, unreliable, blog, satirical manual by experts (only websites are labeled) en news website
PHEME 5,802 - rumour, non-rumours manual by journalists en tweet
This Just In 225 + 101 2 fake, real, satire according to source (and additional filtering) en news article
WeFEND-AAAI20 10,587 + 10,141 6 1 (fake), 0 (real) manual by experts, considering title only zh news article
WSDM - Fake News Classification - Kaggle 320,552 8 unrelated, agreed, disagreed probably by experts en/zh news article

Note: Primary language column contains language codes according to ISO 639-1 (2-letter codes).