Reverse Image Search

Description 🔮

Lets you search for up to 5 similar images in Caltech101 dataset, or VOC2012 dataset using pre-loaded indexes.

Four ways of searching

Click the examples
Upload your own
Capture from webcam
Paste a public url

Refining your search

Crop the image
Mirror between the 2 windows to compare while editing
Reuse outputs as inputs

How it works

Resnet50 feature extractor turn images to 2048 dimensional vectors before being added to faiss IVF100,PQ8 index. Index update is not supported, but could happen by training new index then merging on disk.

Setup 🔧

Create virtual environment and install requirements

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Download data (required if want to run gradio and show images)
- chmod u+x data_download.sh to make script executable
- ./data_download all to download both caltech (1 minute) and voc2012 (6 minutes ⚠️) (all can be substituted with caltech or voc)
Create Indexes (optional since already created)
- python gradio/create_index.py --data caltech (caltech can be substituted with voc)
Run gradio (ensure features and datasets folder exist at same level as gradio folder)
- python gradio/block.py (it will request webcam access, close browser tab running gradio to shut webcam)

Folder Structure 📁

.
├── Makefile
├── README.md
├── custom_ivfpq.py
├── custom_ivfpq_faiss.py
├── data_download.sh
├── datasets
│   ├── VOCdevkit
│   └── caltech101
├── features
│   ├── class_ids-caltech101.pickle
│   ├── features-caltech101-resnet-finetuned.pickle
│   ├── features-caltech101-resnet.pickle
│   ├── features-caltech101-resnetscratch.pickle
│   ├── features-voc2012-resnet.pickle
│   ├── filenames-caltech101.pickle
│   └── filenames-voc2012.pickle
├── figures
│   ├── appscreenshot.png
│   ├── hnsw.png
│   ├── ivfdistance.png
│   ├── ivfoverview.png
│   ├── ivfpq.png
│   └── ivfumap.png
├── gradio
│   ├── block.py
│   ├── create_index.py
│   ├── index_ivfpq_caltech.index
│   ├── index_ivfpq_voc.index
│   └── interface.py
│   └── main.css
├── ivfpq.pptx
├── notebooks
│   ├── feature_extraction.ipynb
│   ├── index_search.ipynb
│   ├── runtime_experiments.ipynb
│   └── visualizations.ipynb
├── requirements.txt
└── test_custom_ivfpq.py

features (366.5MB) and datasets (2GB) are not commited to version control
- features - train your own using feature_extraction.ipynb, then download from colab to local
- datasets - download using data_download.sh or manually

File Content 📚

custom_ivfpq_faiss.py - pure python (except clustering section) implementation of IVFPQ paper, with tweaks in inverted file data structure
- run with python custom_ivfpq_faiss.py, < 2 seconds to train
- Development process in notebooks/index_search.ipynb
- Editable API design in ivfpq.pptx
custom_ivfpq.py - same content as previous, except using sklearn Kmeans instead of faiss.Kmeans for both coarse and fine quantizers
- run with python custom_ivfpq.py, 40 seconds to fit, 20 seconds to predict all ~9000 caltech101 images
test_custom_ivfpq.py - tests for custom_ivfpq_faiss.py
- run with make pytest
gradio
- block.py - blocks (low level) gradio api, allows complete control of data flow
- interface.py - interface (high level) gradio api, limited control of data flow
- create_index.py - script to create indexes to store in same gradio folder, to be loaded by gradio for search
notebooks
- feature_extraction.ipynb - Convert data to dense feature by fine-tuning models, to be indexed for search
  - run in colab for GPU, filepaths generated there are on root /features, /datasets to prevent network transfer latency with google drive, should remove leading / if run locally to prevent messing with filesystem
- runtime_experiments.ipynb- Experiments on index/search runtime performances of sklearn KNN and Annoy libraries
  - ⚠️ Start notebook with %cd .. so current directory contains notebooks and open() can access features, datasets
- index_search.ipynb - Experiments on speed, memory and recall tradeoffs of faiss ANN algorithms and custom implementations of IVFPQ (inverted file index with product quantization)
  - ⚠️ Start notebook with %%cd .. so current directory contains notebooks and open() can access features, datasets
- visualizations.ipynb - Federpy visualizations of faiss IndexIVFFlat and hnswlib
  - run in colab because hnswlib cannot be installed locally (ERROR: Could not build wheels for hnswlib which use PEP 517 and cannot be installed directly)
  - indexes and images are hosted on S3, local files do not work
  - Example visualizations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Reverse Image Search

Description 🔮

Setup 🔧

Folder Structure 📁

File Content 📚

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
figures		figures
gradio		gradio
notebooks		notebooks
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
custom_ivfpq.py		custom_ivfpq.py
custom_ivfpq_faiss.py		custom_ivfpq_faiss.py
data_download.sh		data_download.sh
ivfpq.pptx		ivfpq.pptx
requirements.txt		requirements.txt
test_custom_ivfpq.py		test_custom_ivfpq.py

Uh oh!

Uh oh!

gitgithan/reverse_image_search

Folders and files

Latest commit

History

Repository files navigation

Reverse Image Search

Description 🔮

Setup 🔧

Folder Structure 📁

File Content 📚

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages