[EXPERIMENTAL]
In a world of compression without storing original images, latent space representations are all you need....?
Stack
- Models
- VQVAE: pretraining (see notebooks)
- VAE Tiny: madebyollin/taesd
- model size: 2.4M params
- Stable Diffusion model: Lykon/dreamshaper-8
- for generating synthetic data
- model size: > 1B params
- Flavour
- 8bit latent space
- Similarity
- Vision Transformer: facebook/dinov2-base
- model size: 86.6M params
- visual feature extractor
- Cosine
- Vision Transformer: facebook/dinov2-base
Memory | n(X) | Q1 MB | Q2 MB | Q3 MB | Σ MB |
---|---|---|---|---|---|
Originals | 99 | 0.299 | 0.338 | 0.376 | 33.631 |
Latents | 99 | 0.0127 | 0.0131 | 0.0134 | 1.294 |
- vd = vector database
- fs = file storage
- r = reconstruction
- () = n elements
Elapsed time (ms) | µ | m | σ | min | max | 1 run |
---|---|---|---|---|---|---|
Originals, fs (1) | 1.856 | 1.761 | 0.889 | 1.037 | 8.913 | - |
Originals, fs (99) | - | - | - | - | - | 195.232 |
Latents, fs (1) | 1.255 | 1.125 | 0.418 | 0.99 | 3.961 | - |
Latents with r, fs (1) | - | - | - | - | - | 40.559 |
Latents with r, fs (99) | - | - | - | - | - | 2522.841 |
Latents as payload, vd (1) | 63.007 | 63.717 | 8.502 | 42.864 | 104.509 | - |
Latents as payload, vd (99) | - | - | - | - | - | 4189 |
Search with latents as payload + r, vd (topk=5) | - | - | - | - | - | 235.427 + 425.043 |
Search with filename as payload + r, vd (topk=5) | - | - | - | - | - | 15.473 + 210.832 |
- install miniforge
- create virtual env || conda
- initialize Qdrant
- from root enter the following command line
pip install -r requirements.txt
pip install python-dotenv
- tensorflow
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
pip install tensorflow==2.10
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
- tensorflow installer for MPS
conda install -c apple tensorflow-deps
pip install tensorflow-macos==2.10.0 tensorflow-metal==0.6.0
pip install torch torchvision
- app
python app/main.py
- webapp
streamlit run webapp.py