GitHub - BashMocha/Automated-Depression-Detectiom-from-Tweets: The source code and dataset for comparing NLP techniques used to detect depression from tweets, including preprocessing, model implementations, and evaluation metrics.

Automated Depression Detection from Tweets: a Comparison of NLP Techniques

Official implementation of the IDAP 2024 paper.

Emirhan Balcı*, Esra Saraç

Abstract

This paper aims to classify suicidal ideation as a symptom of depression from social media posts by applying the state-of-the-art classification model BERT (Bidirectional Encoder Representations from Transformers) and three traditional machine learning algorithms for binary classification. Since depression is one of the most prevalent mental health disorders amongst psychiatric disorders, the authors intended to present an experimental analysis of the machine learning classifier results as a comparison of novel depression detection techniques. We utilized undiagnosed user posts from Twitter as our dataset and tested the fine-tuned BERT model by applying hold-out and 10-fold cross-validation techniques. Since the dataset is highly unbalanced, Support Vector Machine (SVM), Naive Bayes, and Random Forest algorithms were employed on the same dataset with and without the oversampling method SMOTE (Synthetic Minority Oversampling Technique). The results demonstrate that traditional machine learning classifiers cannot infer sentiment from data containing various linguistic cues, such as depression symptoms. On the other hand, the state-of-the-art model BERT achieves 99.29% and 99.56% macro and micro-F-measure values, respectively, surpassing traditional machine learning algorithms in terms of these metrics. As a robust solution to depression detection from textual data, the BERT model is more trustworthy than the traditional machine learning classifiers to detect specific cues related to depression and similar mental disorders. This study contributes to the relevant research areas of natural language processing by indicating the performance difference between the BERT model and several traditional machine learning algorithms as a generalized approach for classification tasks.

Code | Paper | Data

Updates

17/10/2024: The paper is published in IEEE Xplore.

19/09/2024: We release the utilized dataset and the source code.

11/09/2024: The study is accepted by IDAP'24! 🎉

15/08/2024: The paper is submitted to the symposium.

Citation

If you find the dataset or code useful, please cite:

@inproceedings{balci_automated_2024,
	title = {Automated {Depression} {Detection} from {Tweets}: a {Comparison} of {NLP} {Techniques}},
	doi = {10.1109/IDAP64064.2024.10711029},
	booktitle = {2024 8th {International} {Artificial} {Intelligence} and {Data} {Processing} {Symposium} ({IDAP})},
	author = {Balcı, Emirhan and Saraç, Esra},
	year = {2024},
}

License

GNU General Public License v3.0

Feel free to contact for any questions.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Depression Detection from Tweets: a Comparison of NLP Techniques

Abstract

Updates

Citation

License

About

Releases

Packages

Languages

License

BashMocha/Automated-Depression-Detectiom-from-Tweets

Folders and files

Latest commit

History

Repository files navigation

Automated Depression Detection from Tweets: a Comparison of NLP Techniques

Abstract

Updates

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages