Dataset
: Labeled Faces in the Wild.
Our goal is to learn such an embedding space in which similar sample pairs stay close to each other while dissimilar ones are far apart.
SimCLR (Chen et al, 2020) proposed a simple framework for contrastive learning of visual representations. It learns representations for visual inputs by maximizing agreement between differently augmented views of the same sample via a contrastive loss in the latent space.
This category of approaches produce two noise versions of one anchor image and aim to learn representation such that these two augmented samples share the same embedding. The algorithm is following:
1. Randomly sample a minibatch of
where two separate data augmentation operators,
- random crop;
- random flip;
- random rotation;
- color jitter;
- gaussian blur.
2. Given one positive pair, other
3. The contrastive learning loss is defined using cosine similarity
Since we have labels for the dataset, we will be using supervised contrastive loss (SupConLoss
):
SimCLR
needs a large batch size to incorporate enough negative samples to achieve good performance.
All actions should be done from the inside ./
directory.
You can set all the model parameters in the ./source/config.py
file:
import torch
ORIGINAL_SIZE = 255 # original image size
IMAGE_SIZE = 64 # augmented image size
BATCH_SIZE = 128
LEARNING_RATE = 0.1
NUM_EPOCH = 50
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
python source/train.py \
--model_path models/model.pth \
--data_path data/
python source/predict.py \
--model_path models/model.pth \
--image_path_1 images/Aaron_Peirsol_0001.jpg \
--image_path_2 images/Aaron_Peirsol_0002.jpg
The output is a cosine similarity between 2 given images.
python source/visualize.py \
--data_path data/ \
--model_path models/model.pth \
--plot_path images/tsne.jpg \
--k_classes 3
Result: