Languages are disappearing at an alarming rate; the linguistic rights of the speakers of most of the 6,500 languages are in danger of extinction. The Information and Communication Technologies (ICT) play a key role in preserving endangered languages. As the ultimate use of ICT, it is worth highlighting natural language processing, since this century, the lack of such support hinders literacy and prevents the use of the Internet and any electronic medium. The first step is constructing resources such as speech corpus, monolingual corpus, bilingual corpus, dictionaries. These resources allow the construction of linguistic tools for natural language processing. Some tools such as automatic speech recognition (ASR), translators (NMT), Text-to-Speech (TTS), and others help break the language barrier and revitalize minority languages. However, it is important to know why these languages are in danger of extinction.
In Peru, 48 native languages are still alive but threatened. All these languages are in danger of extinction. Experts point out that the replacement process is irreversible unless disruptive policies and tools emerge (Adelaar, 2014). There are many computational tools for language processing within ICT and under The Human Language Technologies (HLT) label. Thus, computational linguistics should be highlighted as the tool potential for the revitalization of national languages, as the lack of this support prevents the growth of these languages and their productive use on the Internet (and in any electronic system).
Churana is an open-source repository that aims to concentrate academics, independent scholars, organizations, communities, and individuals to revitalize and democratize the native languages in Peru. State of the art on sentiment analysis models
Type | Platforms |
---|---|
💬 General Discussion | Slack Group |
✨ How to contribute | Github Fork |
🙋 Feature Requests & Ideas | GitHub Issue Tracker |
If you know of any resource available that is not on this list, please add it, either using the link above or by submitting pull requests.
-
[Org] Hinantin
Research and NLP Software Development. -
[Org] Siminchikkunarayku
Preservation and revitalization of native languages in America using computational linguistics.
-
[Parsing] Parsing a Polysynthetic Language
A lexical-functional grammar of Aymara. (Homola, 2011) -
[Machine Translation] Rule-based machine translation for Aymara
A machine translation for Aymara developed with rule-based techniques. (Coler et al., 2014)
-
[Corpus] A Quechua-Spanish parallel treebank
An Quechua-Spanish parallel treebank. (Rios et al., 2008) -
[Corpus] On the Building of the Large Scale Corpus of Southern Qichwa
A non annotated corpus of Southern Qichwa (156 hours). (Camacho et al., 2017) -
[Corpus] Dictionnaire électronique français-quechua des verbes pour le TAL
A dictionary of French-Quechua verb. (Duran, 2017) -
[Corpus] Siminchik: A Speech Corpus for Preservation of Southern Quechua
Contains 99 hours of transcribed audio of the dialectic varieties of Chanca and Collao. (Cárdenas et al., 2018) -
[Machine Translation] Building NLP Systems for Two Resource-Scarce Indigenous Languages: Mapudungun and Quechua
Quechua-Spanish machine translation systems developed with rule-based techniques. (Monson et al., 2006) -
[Machine Translation] A basic language technology toolkit for quechua
A hybrid machine translation system that can translate Spanish text into Cuzco Quechua. (Rios, 2015) -
[Machine Translation] Neural machine translation with a polysynthetic low resource language
An NMT for Southern Quechua developed with several morphological segmentation techniques and a new one in order to decompose the language’s suffix-based morphemes. (Ortega et al., 2020) -
[Machine Translation] Traducción automática neuronal para lengua nativa peruana
An NMT for Chanca Quechua developed with transformers and deep learning methods achieving a BLEU of 39.5. (Huarcaya , 2020) -
[Speech Recognition] Isolated Automatic Speech Recognition of Quechua Numbers using MFCC, DTW and KNN
An ASR system of isolated Quechua numbers is developed using Mel-Frequency Cepstral Coefficients (MFCC), Dynamic Time Warping (DTW) and K-Nearest Neighbor (KNN). (Chacca et al., 2018) -
[Speech Recognition] Conversor de voz a texto para el idioma quechua usando la herramienta de reconocimiento de voz KALDI y una red neuronal profunda
An ASR built with DNN-HMM achieving a Acc 59.20%. (Aimituma et al., 2019) -
[Speech Recognition] Automatic Speech Recognition of Quechua Language Using HMM Toolkit
An ASR built with Hidden Markov Model Toolkit achieving a WER-Test 12.70. (Zevallos et al., 2019) -
[Spell Checking] Spell checking an agglutinative language: Quechua
A spell checker using finite state methods for the agglutinative language Quechua. (Rios, 2011) -
[Syntactic Analyzer] Syntactic Analyzer for Quechua Language
A syntactic analyzer for Quechua which makes use of a dynamic programming technique with a context freegrammar. (Lozano et al., 2013) -
[Morphological Analyzer] Morphological Disambiguation and Text Normalization for Southern Quechua Varieties
A pipeline to normalize Quechua texts through morphological analysis and disambiguation. (Rios et al., 2014) -
[Alignment Techniques] Using Morphemes from Agglutinative Languages like Quechua and Finnish to Aid in Low-Resource Translation A novel alignment technique for agglutinative languages like Quechua and Finnish. (Ortega et al., 2018)
-
[Word Sense disambiguation] Towards Cross-Language Word Sense Disambiguation for Quechua
A cross-language WSD for Quechua. (Rudnick, 2011) -
[Tools] Allin Qillqay! A Free Online Web Spell Checking Service for Quechua
First online web spell checking for Quechua. (Castro et al., 2014)
-
[Corpus] Corpus Creation and Initial SMT Experiments between Spanish and Shipibo-konibo
First Spanish-Shipibo parallel corpus. (Garraleta et al., 2017) -
[Corpus] No data to crawl? Monolingual corpus creation from PDF files of truly low-resource languages in Peru
New monolingual corpora for four indigenous and endangered languages from Peru. (Bustamante et al., 2020) -
[Spell Checking] Spell-Checking based on Syllabification and Character-level Graphs for a Peruvian Agglutinative Language
A spell checker using finite state methods for the Shipibo-konibo. (Alva et al., 2017) -
[Morphological Analyzer] A morphological analyzer for shipibo-konibo
A fairly complete morphological analyzer for Shipibo-Konibo. (Cárdenas et al., 2018) -
[WordNet] WordNet-Shp: Towards the Building of a Lexical Database for a Peruvian Minority Language
An initial WordNet database for a low-resourced and indigenous language in Peru. (Maguino-Valencia et al., 2018) -
[Word-Embeddings] Learning Contextualised Cross-lingual Word Embeddings for Extremely Low-Resource Languages Using Parallel Corpora
A new approach for learning contextualised cross-lingual word embeddings based only on a small parallel corpus. (Wada et al., 2020)
Este obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial 4.0 Internacional.