Skip to content

gnanaprakash-ravi/Hindi-and-Tamil-Question-Answering-System

Repository files navigation

Hindi-and-Tamil-Question-Answering-System

This contains the code for the result obtained using XLM-RoBERTa on the chaii dataset.

Dataset:

The chaii-dataset is used for fine-tuning, along with mlqa (MultiLingual Question Answering) and XQuAD (Cross-lingual Question Answering Dataset) datasets on the XLM-RoBERTa model that has been pre-trained on SQuAD2. The Chaii dataset consists of the following:

id: unique id for each example
context: a paragraph based on which the questions have to be answered
question: the question that has to be answered
answer_start: the index from which the answer starts (only in the train set)
answer_text: the answer in string format (only in the train set)

Models:

The model used are mBERT, XLM-RoBERTa.

Result:

m-BERT (pre-trained on SQuADv1.1, finetuned with chaii) gives 0.55 jaccord score mDeBERTa gives 0.579 jaccord score mDeBERTa (finetuned with mlqa, xquad, chaii) gives 0.59 jaccord score XLM-RoBERTa (pre-trained on squadv2, finetuned with chaii) gives 0.586 jaccord score XLM-RoBERTa (pre-trained on squadv2, finetuned with mlqa, xquad, chaii) gives 0.616 jaccord score

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published