All the results from the models and the EstSimLex-999 data set can be accessed here.
Three families of computational models were evaluated in this thesis. Almost all the code can be accessed from this GitHUb repository, except code for convolutional autoencoder. Next, it is shown were the used resources can be found and how the similarity between concept pairs from EstSimLex-999 were calculated.
All the used distributional models can be downloaded online.
- Eleri Aedmaa's word and sense vectors can be downloaded here.
- Estnltk models can be downloaded here.
- Facebook research models can be downloaded here.
To use sense vectors it is necessary to install SenseGram.
Used similarity metric is cosine similarity between word ans sense vectors.
Two semantic networks were used: Estonian Wordnet and Estonian Wikipedia bitaxonomy. Wordnet version 2.2 can be downloaded here. Wikipedia bitaxonomy is not publicly available, but this taxonomy can be accessed online - MultiWiBi.
Three path-based similarity measures were implemented: path similarity, Leacock & Chodorow and Wu & Palmer.
Used code for implementing convolutional autoencoder can be accessed here. This code was changed a bit to met the needs of this work.
Pre-trained ResNet-18 was accessed through PyTorch subpackage torchvision.models. Documentation for that can be found here.
Similarity between embeddings were calculated as the cosine similarity.