Propaganda Detection and Classification

This repository contains code and data for detecting and classifying propaganda techniques in text using two models: a Multi-N-gram model and a GPT-2 based model.

These two tasks are based off of Task 11 of the 2020 International Workshop on Semantic Evaluation (SemEval).

The report for this project can be found in the file Accompanying Report.pdf.

Overview

Dataset

The dataset includes 2994 text spans labeled with one of nine propaganda techniques extracted from news articles, split into 80:20 train/test sets.

Tasks

Span Identification: Binary classification to detect the presence of propaganda.
Technique Classification: Multi-class classification to identify specific propaganda techniques.

Models

Multi-N-gram Model: Uses N-grams to classify text based on language patterns.
GPT-2 Model: Uses the GPT-2 transformer model for contextual information logits to feed into a regression head for classification.

Results

Model	Task 1 F1-Score	Task 2 F1-Score
Unigram Model	0.672	0.368
Bigram Model	0.639	0.163
GPT-2 Model	0.890	0.590

Further Work

Multi-N-gram: Enhance smoothing techniques and reduce sparsity.
GPT-2: Improve classification head, add dropout layers, and weigh span length more.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
gpt2_data_separator		gpt2_data_separator
Accompanying Report.pdf		Accompanying Report.pdf
README.md		README.md
task-1-gpt2-classifier.ipynb		task-1-gpt2-classifier.ipynb
task-1-ngram-classifier.ipynb		task-1-ngram-classifier.ipynb
task-2-gpt2-classifier.ipynb		task-2-gpt2-classifier.ipynb
task-2-ngram-classifier.ipynb		task-2-ngram-classifier.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Propaganda Detection and Classification

Overview

Dataset

Tasks

Models

Results

Further Work

About

Languages

Dents6679/Propaganda-Detector-Notebooks

Folders and files

Latest commit

History

Repository files navigation

Propaganda Detection and Classification

Overview

Dataset

Tasks

Models

Results

Further Work

About

Topics

Resources

Stars

Watchers

Forks

Languages