Investigating_Low_Res_NMT

The project aims at investigating low resource scenario with Neural Machine Translation and Statistical Machine Translation.

Languages:

To study the low resource scenario, English-Tamil and Hindi-Tamil language pairs have been considered.

Dataset

The data was retrieved from open source websites - OPUS and GroundAI. The English-Tamil corpus consists of 2,22,367 sentence pairs. Hindi-Tamil language pair consists of 1,00,047 sentence pairs.

Tools Used:

OpenNMT Toolkit - For Neural Machine Translation system
Moses - For Statistical Machine Translation system
Latex Editor - For documentation of Report Work.

All the source code for data preprocessing are present in cleaning_scripts/ directory.
All the source code for SMT system are present in smt_scripts/ directory.
The Neural Machine Translation system code for English-Tamil and Hindi-Tamil language pair can be found in nmt_enta_job/ and nmt_hita_job/ respectively.

All the code were executed on GPU enabled systems in Grove Cluster, the DCU Cluster for ADAPT Research Centre.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Investigating_Low_Res_NMT

Languages:

Dataset

Tools Used:

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Documentation		Documentation
cleaning_scripts		cleaning_scripts
corpus		corpus
nmt_enta_job		nmt_enta_job
nmt_hita_job		nmt_hita_job
smt_scripts		smt_scripts
README.md		README.md

akshairamesh/Lores_DA_Assignment

Folders and files

Latest commit

History

Repository files navigation

Investigating_Low_Res_NMT

Languages:

Dataset

Tools Used:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages