Skip to content

The source code and dataset for comparing NLP techniques used to detect depression from tweets, including preprocessing, model implementations, and evaluation metrics.

License

Notifications You must be signed in to change notification settings

BashMocha/Automated-Depression-Detectiom-from-Tweets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Depression Detection from Tweets: a Comparison of NLP Techniques

Official implementation of the IDAP 2024 paper.

Emirhan Balcı*, Esra Saraç

Abstract

This paper aims to classify suicidal ideation as a symptom of depression from social media posts by applying the state-of-the-art classification model BERT (Bidirectional Encoder Representations from Transformers) and three traditional machine learning algorithms for binary classification. Since depression is one of the most prevalent mental health disorders amongst psychiatric disorders, the authors intended to present an experimental analysis of the machine learning classifier results as a comparison of novel depression detection techniques. We utilized undiagnosed user posts from Twitter as our dataset and tested the fine-tuned BERT model by applying hold-out and 10-fold cross-validation techniques. Since the dataset is highly unbalanced, Support Vector Machine (SVM), Naive Bayes, and Random Forest algorithms were employed on the same dataset with and without the oversampling method SMOTE (Synthetic Minority Oversampling Technique). The results demonstrate that traditional machine learning classifiers cannot infer sentiment from data containing various linguistic cues, such as depression symptoms. On the other hand, the state-of-the-art model BERT achieves 99.29% and 99.56% macro and micro-F-measure values, respectively, surpassing traditional machine learning algorithms in terms of these metrics. As a robust solution to depression detection from textual data, the BERT model is more trustworthy than the traditional machine learning classifiers to detect specific cues related to depression and similar mental disorders. This study contributes to the relevant research areas of natural language processing by indicating the performance difference between the BERT model and several traditional machine learning algorithms as a generalized approach for classification tasks.

Code | Paper | Data

Updates

17/10/2024: The paper is published in IEEE Xplore.

19/09/2024: We release the utilized dataset and the source code.

11/09/2024: The study is accepted by IDAP'24! 🎉

15/08/2024: The paper is submitted to the symposium.

Citation

If you find the dataset or code useful, please cite:

@inproceedings{balci_automated_2024,
	title = {Automated {Depression} {Detection} from {Tweets}: a {Comparison} of {NLP} {Techniques}},
	doi = {10.1109/IDAP64064.2024.10711029},
	booktitle = {2024 8th {International} {Artificial} {Intelligence} and {Data} {Processing} {Symposium} ({IDAP})},
	author = {Balcı, Emirhan and Saraç, Esra},
	year = {2024},
}

License

GNU General Public License v3.0


Feel free to contact for any questions.

About

The source code and dataset for comparing NLP techniques used to detect depression from tweets, including preprocessing, model implementations, and evaluation metrics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published