Language Identifier

Fun nltk-based language identifier. Will probabilistically determine if an input word is English or Spanish. Could be updated with other language models if available corpus.

In order to run, testes.txt and texten.txt need to be downloaded (as they are the corpora the language models are based on)

To add in new language models, first obtain a corpus written in that language and use the function create_LM() to create another language model

The output of the function should be the model that returns the greatest probability for the input word.

Project based on work done in my Language Processing 2 class at The University of Copenhagen English and Spanish corpora obtained from Project Gutenburg.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Lang_Identifier.py		Lang_Identifier.py
README.md		README.md
texten.txt		texten.txt
textes.txt		textes.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Identifier

About

Releases

Packages

Languages

spaidataiga/LanguageIdentifier

Folders and files

Latest commit

History

Repository files navigation

Language Identifier

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages