Hateful speech Hinglish social media Paper

This repository containts the dataset used by the authors in the paper - "Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media". It has been collected and labelled by the authors with the approach as stated in the paper. The dataset is collection of various social media platforms namely Instagram, Youtube, Twitter, Reddit etc and has been cleaned manually.

The Labels of the dataset are as follows -

	English	Hinglish	Hindi
Non Hate	0	2	4
Hate	1	3	5

The dataset can be seen sometimes with unicode charachters not readable in Excel and if it happens, we suggest using other tools that support bidirectional unicode characters since there are instances of hindi in the data. Recommened tools include Jupyter Notebook, Notepad etc.

Please acknowledge the authors, if you use any parts of the dataset for your research or experiments and it is requested to keep the usage fair and trusted. Since the dataset shared, is processed to anonymize all the usernames from various platforms, we hope the anonymity is maintained in any usage if performed. The dataset is meant strictly for research purposes.

The paper can be accessed online at Springer

Cite :-

@InProceedings{10.1007/978-981-16-3067-5_8,
author="Srivastava, Ananya
and Hasan, Mohammed
and Yagnik, Bhargav
and Walambe, Rahee
and Kotecha, Ketan",
editor="Choudhary, Ankur
and Agrawal, Arun Prakash
and Logeswaran, Rajasvaran
and Unhelkar, Bhuvan",
title="Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media",
booktitle="Applications of Artificial Intelligence and Machine Learning",
year="2021",
publisher="Springer Singapore",
address="Singapore",
pages="83--95",
abstract="Social networking platforms provide a conduit to disseminate our ideas, views, and thoughts and proliferate information. This has led to the amalgamation of English with natively spoken languages. Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world. Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages. Thus, the worldwide hate speech detection rate of around 44{\%} drops even more considering the content in Indian colloquial languages and slangs. In this paper, we propose a methodology for efficient detection of unstructured code-mix Hinglish language. Fine-tuning-based approaches for Hindi-English code-mixed language are employed by utilizing contextual-based embeddings such as embeddings for language models (ELMo), FLAIR, and transformer-based bidirectional encoder representations from transformers (BERT). Our proposed approach is compared against the pre-existing methods and results are compared for various datasets. Our model outperforms the other methods and frameworks.",
isbn="978-981-16-3067-5"
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Hate_speech_dataset.csv		Hate_speech_dataset.csv
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hateful speech Hinglish social media Paper

About

Releases

Packages

License

bhargavyagnik/Hateful_speech_Hinglish_social_media_Paper

Folders and files

Latest commit

History

Repository files navigation

Hateful speech Hinglish social media Paper

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages