Skip to content

guillemram97/wp-hungarian

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The code in this repository is a support for the experiments in the paper On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Learning.

Running directions (GPU required)

Code iterative_hungarian takes one initialisation matrix W_0 and refines it.

Experiments from Section 5.1 are recreated the following way (this example shows English-Spanish):

  1. The source and target embeddings can be downloaded in the following way (change link for other languages):

    • English fastText Wikipedia embeddings: curl -Lo wiki.en.vec https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.en.vec
    • Spanish fastText Wikipedia embeddings: curl -Lo wiki.es.vec https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.es.vec
  2. Obtaining the initialisation matrix

    • MUSE: python unsupervised.py --src_lang en --tgt_lang es --src_emb data/wiki.en.vec --tgt_emb data/wiki.es.vec --n_refinement 5
    • Procrustes: python supervised.py --src_lang en --tgt_lang es --src_emb data/wiki.en.vec --tgt_emb data/wiki.es.vec --n_refinement 5 --dico_train default
    • ICP: python get_data.py python run_icp.py python eval.py
  3. Running IH: python iterative_hungarian.py —-grows 45000 —-write_path AUX —-src_path PATH_SRC_EMBEDDINGS —-tgt_path PATH_TGT_EMBEDDINGS —-w_path PATH_INITIALIZATION_MATRIX --nrefin 5

Experiments from Section 5.2 are recreated the following way:

  1. Word embeddings are obtained using Fasttext following the instructions in the paper Unsupervised Alignment of Embeddings with Wasserstein Procrustes
  2. python iterative_hungarian.py —-grows 10000 —-write_path AUX —-src_path PATH_SRC_EMBEDDINGS —-tgt_path PATH_TGT_EMBEDDINGS —-w_path PATH_INITIALIZATION_MATRIX

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages