TeQuAD

Recent state of the art models have advanced the Natural Language Processingfield especially Machine Reading Comprehension tasks have improved with the help of datasets like SQuAD (Stanford QuestionAnswering Dataset). Large and High quality datasets are essential for low resource languages like Telugu to record progress in MRC. In this paper,we present a Telugu Question Answering Dataset - TeQuAD with the size of 82k parallel triplets created by translating triplets from the SQuAD. We also introduce a few methods to create such Question Answering datasets for the low resource languages.Then, we present the performance of ourmodels outperforming the baseline models on Monolingual and Cross Lingual Machine Reading Comprehension (CLMRC)setups, one of them resulting in an F1 score of 83 % and Exact Match (EM) score of 61%

Find the Data here : https://drive.google.com/drive/folders/1b5pfbwANwzcZuq7CrM5jL-jBi1gkhwXw?usp=share_link

TELUGU SQuAD CORRECTION GUIDELINES link: https://docs.google.com/document/d/1dwSe8voWvZ023VXmNP6apDyPQmKoYPP90JWpDelNO7M/edit?usp=sharing

QA EVALUATION GUIDELINES link: https://docs.google.com/document/d/1i8WLxWK5zEu4Wm9FlcGc1CHfNRKnf2INODEE8BJ5Ohc/edit?usp=sharing

                 Architecture of Span Extractor

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
span_extractor		span_extractor
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TeQuAD

About

Releases

Packages

Languages

ltrc/TeQuAD

Folders and files

Latest commit

History

Repository files navigation

TeQuAD

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages