Skip to content
/ griot Public

Sample implementation of multilingual semantic search with Elasticsearch using NLP embeddings.

Notifications You must be signed in to change notification settings

rkouye/griot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

griot

Sample implementation of multilingual semantic search with Elasticsearch using NLP embeddings.

Try griot, the multilingual quote search engine.

Demo

Architecture

ML models are served with Tensorflow Serving which provide a rest API to create word embeddings.

A Logstash pipeline is used to embed the quotes before indexing in Elasticsearch. This can be used in production, as it should automatically index new entries.

For each search request, the web service embed the term, then request a similarity score to Elasticsearch, and finally display the most relevant results.

This can also be combined with a simple term matching to filter large dataset as computing the similarity score for each entry can be expensive.

For the model, I picked Google's Universal Sentence Encoder because it provided multilingual search.

Running locally

Install docker, then run in this directory :

docker-compose up

Todo

  • Add webapp to docker compose
  • Add BERT as a model and allow switching to compare their efficiency

Citation