Skip to content

Commit

Permalink
Updated Readme.md
Browse files Browse the repository at this point in the history
  • Loading branch information
manikandan-ravikiran authored Feb 25, 2021
1 parent fa84f41 commit bdec911
Showing 1 changed file with 2 additions and 4 deletions.
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# DOSA
Dravidian Code-Mixed Offensive Span Identification Dataset


Hi, Thanks for visiting this git repository. We will be soon updating this repo with the dataset files.
Dataset for Paper "DOSA: Dravidian Code-Mixed Offensive Span Identification Dataset" to Appear at First Workshop on Speech and Language Technologies for Dravidian Languages.

This paper presents the Dravidian Offensive Span Identification Dataset (DOSA) for under-resourced Tamil-English and Kannada-English code-mixed text. The dataset addresses the lack of code-mixed datasets with annotated offensive spans by extending annotations of existing code-mixed offensive language identification datasets. It provides span annotations for Tamil-English and Kannada-English code-mixed comments posted by users on YouTube social media. Overall the dataset consists of 4786 Tamil-English comments with 6202 annotated spans and 1097 Kannada-English comments with 1641 annotated spans, each annotated by two different annotators. We further present some of our baseline experimental results on the developed dataset, thereby eliciting research in under-resourced languages, leading to an essential step towards semi-automated content moderation in Dravidian languages.


0 comments on commit bdec911

Please sign in to comment.