Skip to content
This repository has been archived by the owner on Nov 17, 2020. It is now read-only.

Latest commit

 

History

History
92 lines (71 loc) · 3.43 KB

README.md

File metadata and controls

92 lines (71 loc) · 3.43 KB

Sentence Similarity Calculator

This repo contains various ways to calculate the similarity between source and target sentences. You can choose the pre-trained models you want to use such as ELMo, BERT and Universal Sentence Encoder (USE).

And you can also choose the method to be used to get the similarity:

1. Cosine similarity
2. Manhattan distance
3. Euclidean distance
4. Angular distance
5. Inner product
6. TS-SS score
7. Pairwise-cosine similarity
8. Pairwise-cosine similarity + IDF

You can experiment with (The number of models) x (The number of methods) combinations!


Installation

  • This project is developed under conda enviroment
  • After cloning this repository, you can simply install all the dependent libraries described in requirements.txt with bash install.sh
conda create -n sensim python=3.7
conda activate sensim
git clone https://github.com/Huffon/sentence-similarity.git
cd sentence-similarity
bash install.sh

Usage

  • To test your own sentences, you should fill out corpus.txt with sentences as below:
I ate an apple.
I went to the Apple.
I ate an orange.
...
  • Then, choose the model and method to be used to calculate the similarity between source and target sentences
python sensim.py
    --model    MODEL_NAME  [use, bert, elmo]
    --method   METHOD_NAME [cosine, manhattan, euclidean, inner,
                            ts-ss, angular, pairwise, pairwise-idf]
    --verbose  LOG_OPTION (bool)

Examples

  • In this section, you can see the example result of sentence-similarity
  • As you know, there is a no silver-bullet which can calculate perfect similarity between sentences
  • You should conduct various experiments with your dataset
    • Caution: TS-SS score might not fit with sentence similarity task, since this method originally devised to calculate the similarity between long documents
  • Result:


References

Papers


Libraries


Articles