Clipper: CLIP Embeddings Visualizer for Computer Vision Datasets

Clipper is a web application that allows you to visualize the CLIP embeddings of the images in your computer vision dataset.

Features

2d-visualization of the CLIP embeddings of your image dataset, after dimensionality reduction with UMAP.
2d-projection of text queries to find images that are semantically similar to the query.
Semantic search of images in your dataset using text queries.

Background

CLIP (Contrastive Language-Image Pre-training) is a neural network trained on a variety of (image, text) pairs, using constrastive learning. It is able to map images and text to a common embedding space, such that semantically similar images and text are close to each other in the embedding space. CLIP is trained on a large dataset of images and text, and is able to generalize to new images and text.

Tech Stack

Flask - Python web framework
Vue.js - Javascript framework
ChromaDB - Vector database
UMAP - Dimensionality reduction
HuggingFace's Transformers - Neural network models

Installation

Clone the repository.

git clone https://github.com/nicolafan/clipper.git

Backend

Create a virtual environment.
```
python3 -m venv venv
```
Activate the virtual environment.
```
source venv/bin/activate
```
Install the required Python packages.
```
pip install -r requirements.txt
```

Frontend

Navigate to the frontend directory.
```
cd clipper-frontend
```
Install the required Node.js packages.
```
npm install
```

Usage

Clipper requires a dataset of .jpg images. The images have to be stored in the data/images directory. If you have metadata attached to the images, you can store it in a JSON file with the following format:

{
    "image_id_1": {
        "metadata_key_1": "metadata_value_1",
        "metadata_key_2": "metadata_value_2",
        ...
    },
    "image_id_2": {
        "metadata_key_1": "metadata_value_1",
        "metadata_key_2": "metadata_value_2",
        ...
    },
    ...
}

where image_id_1, image_id_2, etc. are the filenames of the images without the .jpg extension. Supported metadata types are declared in the ChromaDB documentation. The JSON file has to be stored in the data/images directory with the name metadata.json.

Creating the Embeddings

To create the embeddings of the images in your dataset, and store them in ChromaDB, run the following command:

python3 clipper/process/clippify.py

The embeddings are created by passing the images through CLIP, and storing the output of the last layer of the CLIP model. The embeddings are stored in ChromaDB, along with the metadata of the images. The script will possibly use your GPU to speed up the process.

Reducing the Dimensionality of the Embeddings

To reduce the dimensionality of the embeddings, run the following command:

python3 clipper/process/dimred.py

The script will use UMAP to reduce the dimensionality of the embeddings from 512 to 2 for visualization. The reduced embeddings are stored in ChromaDB.

UMAP has been chosen because of the possibility to define a custom distance function. The distance function used is the cosine distance between the embeddings. In particular, the cosine distance between a text embedding and an image embedding has to be multiplied by a factor of 2.5, to account for the different representation of the text and image embeddings. The factor has been chosen according to the results presented in the CLIPScore paper.

Once the dimensionality reduction is complete, you can start the backend and frontend servers.

Backend

Activate the virtual environment.
```
source venv/bin/activate
```
Enter the API directory.
```
cd clipper/api
```
Run the Flask server.
```
flask run
```

Frontend

Navigate to the frontend directory.
```
cd clipper-frontend
```
Run the Vue.js server.
```
npm run serve
```

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated!

Clipper is still in its early stages, and there are many features that can be added!

Please, open an issue if you have any questions or suggestions.

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

References

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
clipper-frontend		clipper-frontend
clipper		clipper
data		data
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clipper: CLIP Embeddings Visualizer for Computer Vision Datasets

Features

Background

Tech Stack

Installation

Backend

Frontend

Usage

Creating the Embeddings

Reducing the Dimensionality of the Embeddings

Backend

Frontend

Contributing

References

License

About

Releases

Packages

Languages

License

nicolafan/clipper

Folders and files

Latest commit

History

Repository files navigation

Clipper: CLIP Embeddings Visualizer for Computer Vision Datasets

Features

Background

Tech Stack

Installation

Backend

Frontend

Usage

Creating the Embeddings

Reducing the Dimensionality of the Embeddings

Backend

Frontend

Contributing

References

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages