Skip to content

in-rolls/indicate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

41e5774 · Feb 15, 2025

History

67 Commits
Dec 16, 2021
Feb 14, 2023
Dec 16, 2021
Feb 15, 2025
Oct 29, 2021
Nov 12, 2021
Nov 10, 2021
Sep 19, 2022
Oct 29, 2021
Oct 29, 2021
Aug 17, 2023
Nov 16, 2021
Oct 29, 2021
Feb 15, 2025
Feb 15, 2025
Nov 12, 2021
Feb 15, 2025
Oct 29, 2021

Repository files navigation

Indicate: Transliterate Indic Languages to English

https://app.travis-ci.com/in-rolls/indicate.svg?branch=master Documentation Status https://static.pepy.tech/badge/indicate

Transliterations to/from Indian languages are still generally low quality. One problem is access to data. Another is that there is no standard transliteration. For Hindi--English, we build novel dataset for names using the ESPNcricinfo. For instance, see here for hindi version of the english scorecard. We also create a dataset from election affidavits We also exploit the Google Dakshina dataset.

To overcome the fact that there isn't one standard way of transliteration, we provide k-best transliterations.

Install

We strongly recommend installing indicate inside a Python virtual environment (see venv documentation)

pip install indicate

General API

  1. transliterate.hindi2english will take Hindi text and translate into English.

Examples

from indicate import transliterate
english_translated = transliterate.hindi2english("हिंदी")
print(english_translated)

output - hindi

Functions

We expose 1 function, which will take Hindi text and transliterate it to English.

  • transliterate.hindi2english(input)
    • What it does:
      • Converts given hindi text into English alphabet
    • Output
      • Returns text in English

Data

The datasets used to train the model:

Evaluation

Model was evaluated on test dataset of Google Dakshina dataset, Model predicted 73.64% exact matches. Indic-trans predicted 63.12% exact matches on Google Dakshina dataset. Below is the edit distance metrics on test dataset (0.0 mean exact match, the farther away from 0.0, the difference is more between predicted text and actual text)

Edit distance metrics of model on Google Dakshina test dataset

Authors

Rajashekar Chintalapati and Gaurav Sood

Contributor Code of Conduct

The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.

License

The package is released under the MIT License.

About

transliterate hindi to english

Topics

Resources

License

Citation

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published