L2C4

A Language Lending Itself: Mapping Clusters of Contextually Close Cognates in Indo-European Languages

Paper by Sarthak Rastogi

1.0 Data Ingestion.ipynb: Loading word embeddings into dictionaries and pickling.

1.1 Preprocessing.ipynb:

POS tagging for removing proper nouns like countries, nationalities, and brands.
Removing special characters
Lemmatisation
Choosing most frequently used words
Removing names using a list of common names

2.1 - 2.5 Translation: Attempts at translation of word embeddings. Finally, uploading the embedding keys as a document to Yandex Translate was found to be more efficient.

3.0 Transliteration.ipynb: Transliterating words in Indian languages from their native scripts to the Roman script. Removing definitive (le, la, l’ and les) and partitive (du, de la, des, de l', de, d') articles from French words.

4.0 Phonetic Matching.ipynb: Calculating the Double Metaphone encodings for the words and matching words with similar encodings, i.e., cognates, onto a new embedding space

5.0 Clustering Experiments.ipynb: Experimenting with hyperparameter values of various clustering algorithms.

clustering.py: Contains functions for running the above experiments.

5.1 Clustering Optimal.ipynb: Clustering on the language pair embeddings using the optimal hyperparameter values obtained in the previous notebook.

clustering_optimal_algos.py: Contains functions for optimal clustering.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Embeddings		Embeddings
Experiments		Experiments
Graphs		Graphs
L2C4 Revision		L2C4 Revision
Papers		Papers
Resources		Resources
Translated_Embeddings		Translated_Embeddings
Translation Scratch		Translation Scratch
Transliterated_Embeddings		Transliterated_Embeddings
indic_nlp_resources-master		indic_nlp_resources-master
morfessor-2.0.1/Morfessor-2.0.1		morfessor-2.0.1/Morfessor-2.0.1
1.0 Data Ingestion.ipynb		1.0 Data Ingestion.ipynb
1.1 Preprocessing.ipynb		1.1 Preprocessing.ipynb
2.1 Translation - German.ipynb		2.1 Translation - German.ipynb
2.2 Translation - Hindi.ipynb		2.2 Translation - Hindi.ipynb
2.3 Translation - French.ipynb		2.3 Translation - French.ipynb
2.4 Translation - Tamil.ipynb		2.4 Translation - Tamil.ipynb
2.5 Translation - Samskrit.ipynb		2.5 Translation - Samskrit.ipynb
3.0 Transliteration.ipynb		3.0 Transliteration.ipynb
4.0 Phonetic Matching.ipynb		4.0 Phonetic Matching.ipynb
5.0 Clustering Experiments.ipynb		5.0 Clustering Experiments.ipynb
5.1 Clustering Optimal.ipynb		5.1 Clustering Optimal.ipynb
A Language Lending Itself Mapping Clusters of Contextually Close Cognates in Indo European Languages.pdf		A Language Lending Itself Mapping Clusters of Contextually Close Cognates in Indo European Languages.pdf
Cover Letter L2C4.doc		Cover Letter L2C4.doc
Deep Clustering.ipynb		Deep Clustering.ipynb
Diagrams.pptx		Diagrams.pptx
Highlights.docx		Highlights.docx
L2C4.docx		L2C4.docx
Mapping the Proto-Indo-European Language using Graph Neural Networks and BERTs.ipynb		Mapping the Proto-Indo-European Language using Graph Neural Networks and BERTs.ipynb
ORCID Informatioon.docx		ORCID Informatioon.docx
Old Journal ESWA-D-22-02755.pdf		Old Journal ESWA-D-22-02755.pdf
Picture1.png		Picture1.png
Picture2.png		Picture2.png
Picture3.png		Picture3.png
Picture4.png		Picture4.png
Picture5.png		Picture5.png
Picture6.png		Picture6.png
README.MD		README.MD
clustering.py		clustering.py
clustering_optimal_algos.py		clustering_optimal_algos.py
declaration-of-competing-interests.docx		declaration-of-competing-interests.docx
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

L2C4

A Language Lending Itself: Mapping Clusters of Contextually Close Cognates in Indo-European Languages

About

Releases

Packages

Languages

sarthakrastogi/L2C4

Folders and files

Latest commit

History

Repository files navigation

L2C4

A Language Lending Itself: Mapping Clusters of Contextually Close Cognates in Indo-European Languages

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages