Learn a shared embedding space between words in multiple languages.
- A survey of cross-lingual embeddings.[http://ruder.io/cross-lingual-embeddings/index.html#fn:24]
- Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoder[https://arxiv.org/pdf/1608.02996.pdf]
- wiki dump - https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
- http://textminingonline.com/training-word2vec-model-on-english-wikipedia-by-gensim
- Code for loading Embeddings and batch sampling of embeddings
- Obtaining pre-trained/training word vectors of English and another language(Hindi or German)
- Writing and testing the auto encoder GAN model with loaded word vectors
- Scripts/code hosted on [email protected]:~/suhan/Cross-Lingual-Word-Embeddings
- Dataset hosted on [email protected]:~/suhan/datasets
- Word2vec models hosted on [email protected]:~/suhan/Cross-Lingual-Word-Embeddings/models