An easy script to get sentence embeddings from Google's pre-trained models on TensorFlow Hub. This script includes 6 such models of varying embedding dimensions (20-512) and/or architectures.
Returns an embedding vector per sentence of the input.
EMBEDDING_SIZE = n
Input = ["Colorless green ideas sleep furiously.", \
"Noam Chomsky offered this as an example of a grammatically valid, \
semantically nonsensical sentence."]
Output = array of shape (m,n) #m=number of sentences(=2), n=EMBEDDING_SIZE
Run pip3 install -r requirements.txt
Run python3 get_embeddings.py
Add the required model's URL available here to dictionary here.
- Create a key for new dictionary values (URLs) with the format "embed_size/modelName_model_url".
- Pass size/modelName as EMBEDDING_SIZE.
All the models used (and more) are available here on TensorFlow Hub.