A repo for cool image, text retrieval using OpenAI's CLIP model. For information about how CLIP works see my article on Medium.
Using the following Flickr 8K dataset from Kaggle. It contains a variety of images each paired
with 5 different captions. A few example images and captions can be found in the sample_data
folder.
- Install all the dependencies with poetry using
poetry install
. It is recommended to have the virtual environments created inside the project with thepoetry config virtualenvs.prefer-active-python true
command. - The first step is to pre-compute the image embeddings with the
image_text_retrieval/scripts/pre_compute_embeddings.py
script. This script can be ran with thepre_compute_embeddings
command. It uses dvc to store the script parameters which can be changed in theparams.yaml
file. It expects a directory of images (data/images
by default) and it saves out the embeddings a mapping file in two different files. - To run the backend api use the command
image_text_retrieval_api
and go tohttp://0.0.0.0:8000/docs
in the browser to get an interactive swagger: - To run the streamlit ui run
streamlit run image_text_retrieval/ui/app_ui.py
and it should open in the browser. You should see a page which looks like this: