Updated Readme.md

manikandan-ravikiran · Feb 25, 2021 · bdec911 · bdec911
1 parent fa84f41
commit bdec911
Showing 1 changed file with 2 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -1,8 +1,6 @@
 # DOSA
-Dravidian Code-Mixed Offensive Span Identification Dataset
-
-
-Hi, Thanks for visiting this git repository. We will be soon updating this repo with the dataset files.
+Dataset for Paper "DOSA: Dravidian Code-Mixed Offensive Span Identification Dataset" to Appear at First Workshop on Speech and Language Technologies for Dravidian Languages.
 
+This paper presents the Dravidian Offensive Span Identification Dataset (DOSA) for under-resourced Tamil-English and Kannada-English code-mixed text.  The dataset addresses the lack of code-mixed datasets with annotated offensive spans by extending annotations of existing code-mixed offensive language identification datasets.  It provides span annotations for Tamil-English and Kannada-English code-mixed comments posted by users on YouTube social media. Overall the dataset consists of 4786 Tamil-English comments with 6202 annotated spans and 1097 Kannada-English comments with 1641 annotated spans, each annotated by two different annotators.  We further present some of our baseline experimental results on the developed dataset, thereby eliciting research in under-resourced languages, leading to an essential step towards semi-automated content moderation in Dravidian languages.