Fake News Detection

Check installation guide to run the code.

Dataset

text: str
embeddings: BERT embeddings of the text
label:
- 0: real
- 1: fake

Results

Zero Shot Classification

Model	Dataset	Accuracy	Precision	Recall	F1
bart-large-mnli	KDD2020	0.6232	0.6699	0.6232	0.5285
bart-large-mnli	GonzaloA/fake_news	0.5461	0.5899	0.5461	0.4210
mdeberta-v3-base-mnli	KDD2020	0.5972	0.5780	0.5972	0.5685
mdeberta-v3-base-mnli	GonzaloA/fake_news	0.6088	0.6164	0.6088	0.5876

Fine-tuning on KDD2020

Model	Accuracy	F1	Loss
bert-base-uncased	0.7835671342685371	0.7807691946073421	0.46070271730422974
distilbert-base-uncased	0.7655310621242485	0.7673580407434882	0.46802181005477905
roberta-base	0.8156312625250501	0.8125283885140466	0.4082072079181671

Fine-tuning on GonzaloA/fake_news

Model	Accuracy	F1	Loss
bert-base-uncased	0.9882961685351731	0.9882994966274701	0.026664618402719498
distilbert-base-uncased	0.986694591597881	0.9866977733534859	0.029809903353452682
roberta-base	0.986694591597881	0.9866872535155707	0.024871505796909332

Few Shot Learning on KDD2020

Samples	Model	Accuracy	F1	Loss
10	bert-base-uncased	0.5751503006012024	0.5691223701021529	0.6813642978668213
10	distilbert-base-uncased	0.5591182364729459	0.5447740312435575	0.6902174949645996
10	roberta-base	0.5751503006012024	0.5691223701021529	0.6813642978668213
100	bert-base-uncased	0.591182364729459	0.4392916815999757	0.6831763982772827
100	distilbert-base-uncased	0.591182364729459	0.4392916815999757	0.677597165107727
100	roberta-base	0.591182364729459	0.4392916815999757	0.6732369661331177

Few Shot Learning on GonzaloA/fake_news

Samples	Model	Accuracy	F1	Loss
10	bert-base-uncased	0.7393125538992239	0.7269233177356373	0.570989727973938
10	distilbert-base-uncased	0.48367623506221513	0.3341564119711251	0.6532924771308899
10	roberta-base	0.46593569052605643	0.2961877101553988	0.6716976761817932
100	bert-base-uncased	0.9392632746088456	0.9391815954667218	0.3166244924068451
100	distilbert-base-uncased	0.9275594431440187	0.9273481720070647	0.5080302953720093
100	roberta-base	0.9700628310952323	0.970072407552122	0.2685811221599579

Model Comparison on Different Dataset

Model Generalization (Cross-dataset)

We inspect the generalization of the models trained on one dataset and tested on another datase, the accuracy is used as the evaluation metric.

Train on GonzaloA/fake_news and test on KDD2020

Model	Train Sample	In-dataset (GonzaloA/fake_news)	cross-dataset KDD2020
bert-base-uncased	full	0.9882961685351731	0.42685370741482964
distilbert-base-uncased	full	0.986694591597881	0.5150300601202404
roberta-base	full	0.986694591597881	0.503006012024048
distilbert-base-uncased	10	0.48367623506221513	0.4088176352705411
distilbert-base-uncased	100	0.9275594431440187	0.46292585170340683

Train on KDD2020 and test on GonzaloA/fake_news

Model	Train Sample	In-dataset (KDD2020)	cross-dataset GonzaloA/fake_news
bert-base-uncased	full	0.7835671342685371	0.37058026364420354
distilbert-base-uncased	full	0.7655310621242485	0.42540347418997165
roberta-base	full	0.8156312625250501	0.7220648022668473
distilbert-base-uncased	10	0.5591182364729459	0.3767401749414808
distilbert-base-uncased	100	0.591182364729459	0.5340643094739436

Graph Neural Network Pipeline

Fetch Text-based dataser from Huggingface
Embed the text using BERT encoder
Create an empty graph - embeddings are nodes (but no edges)
Build edges between nodes based on the cosine similarity of embeddings

KNN Builder: each node is connected to its k-nearest neighbors
ThresholdNN Builder: each node is connected to {quantile_factor} of its neighbors

Train a GNN model on the graph (with only labeled_masked nodes)

Run the GNN pipeline

build the embeddings graph (this should build an empty graph with only embeddings as nodes)

dataset_name currently only supports kdd2020

python build_graph.py --dataset_name $dataset_name

if you have a pre-built graph, you can use the --graph argument to load the graph

python build_graph.py --prebuilt_graph graph.pt

choose the edge_policy from knn or thresholdnn (default is thresholdnn)

python build_graph.py --prebuilt_graph graph.pt --edge_policy thresholdnn --threshold_factor $factor

Train the GNN model

python train_graph.py --graph graph.pt

example:

train_graph.py --graph ./graph/kdd2020/train_full_val_full_test_full_labeled_100_knn_$k.pt

Installation

Create a new conda environment

conda create -n fakenews python=3.12
conda activate fakenews

Install PyTorch (based on your CUDA version) (Official Doc)

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

Install PyTorch Geometric (Official Doc)

pip install torch-geometric

Install Additional Libraries for GNN (Based on your torch version)

pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.4.0+cu121.html

Install other dependencies

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
analyze		analyze
dataset		dataset
finetune_results		finetune_results
graph		graph
model		model
plot		plot
script		script
tsne		tsne
utils		utils
weights		weights
.gitignore		.gitignore
README.md		README.md
analyze_graph.py		analyze_graph.py
build_graph.py		build_graph.py
finetune_lm.py		finetune_lm.py
finetune_lm_res_util.py		finetune_lm_res_util.py
requirements.txt		requirements.txt
train_graph.py		train_graph.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake News Detection

Dataset

KDD2020

TFG

GossipCop

PolitiFact

Results

Zero Shot Classification

Fine-tuning on KDD2020

Fine-tuning on GonzaloA/fake_news

Few Shot Learning on KDD2020

Few Shot Learning on GonzaloA/fake_news

Model Comparison on Different Dataset

Model Generalization (Cross-dataset)

Graph Neural Network Pipeline

Run the GNN pipeline

Installation

About

Contributors 3

Languages

LittleFish-Coder/fake-news-detection

Folders and files

Latest commit

History

Repository files navigation

Fake News Detection

Dataset

KDD2020

TFG

GossipCop

PolitiFact

Results

Zero Shot Classification

Fine-tuning on KDD2020

Fine-tuning on GonzaloA/fake_news

Few Shot Learning on KDD2020

Few Shot Learning on GonzaloA/fake_news

Model Comparison on Different Dataset

Model Generalization (Cross-dataset)

Graph Neural Network Pipeline

Run the GNN pipeline

Installation

About

Resources

Stars

Watchers

Forks

Contributors 3

Languages