Supervised contrastive learning on LFW dataset

Dataset: Labeled Faces in the Wild.

Review

Our goal is to learn such an embedding space in which similar sample pairs stay close to each other while dissimilar ones are far apart.

Model structure

SimCLR (Chen et al, 2020) proposed a simple framework for contrastive learning of visual representations. It learns representations for visual inputs by maximizing agreement between differently augmented views of the same sample via a contrastive loss in the latent space.

This category of approaches produce two noise versions of one anchor image and aim to learn representation such that these two augmented samples share the same embedding. The algorithm is following:

1. Randomly sample a minibatch of $N$ samples and each sample is applied with two different data augmentation operations, resulting in $2N$ augmented samples in total.

$$\tilde{\mathbf{x}}_i = t(\mathbf{x}),\quad\tilde{\mathbf{x}}_j = t'(\mathbf{x}),\quad t, t' \sim \mathcal{T}$$

where two separate data augmentation operators, $t$ and $t'$, are sampled from the same family of augmentations $\mathcal{T}$. Data augmentation includes:

random crop;
random flip;
random rotation;
color jitter;
gaussian blur.

2. Given one positive pair, other $2(N-1)$ data points are treated as negative samples. The representation is produced by a base encoder $f(.)$:

$$\mathbf{h}_i = f(\tilde{\mathbf{x}}_i),\quad \mathbf{h}_j = f(\tilde{\mathbf{x}}_j)$$

3. The contrastive learning loss is defined using cosine similarity $\text{sim}(.,.)$. Note that the loss operates on an extra projection layer of the representation $g(.)$ rather than on the representation space directly. But only the representation $\mathbf{h}$ is used for downstream tasks.

$$\mathbf{z}_i = g(\mathbf{h}_i),\quad \mathbf{z}_j = g(\mathbf{h}_j)$$

Since we have labels for the dataset, we will be using supervised contrastive loss (SupConLoss):

$$\mathcal{L}_\text{supcon} = - \sum_{i=1}^{2n} \frac{1}{2 \vert N_i \vert - 1} \sum_{j \in N(y_i), j \neq i} \log \frac{\exp(\mathbf{z}_i \cdot \mathbf{z}_j / \tau)} {\sum_{k \in I, k \neq i}\exp({\mathbf{z}_i \cdot \mathbf{z}_k / \tau})}$$

SimCLR needs a large batch size to incorporate enough negative samples to achieve good performance.

Quick start

All actions should be done from the inside ./ directory.

Setup

You can set all the model parameters in the ./source/config.py file:

import torch


ORIGINAL_SIZE = 255  # original image size
IMAGE_SIZE = 64  # augmented image size
BATCH_SIZE = 128
LEARNING_RATE = 0.1
NUM_EPOCH = 50
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

Train

python source/train.py \
--model_path models/model.pth \
--data_path data/

Predict

python source/predict.py \
--model_path models/model.pth \
--image_path_1 images/Aaron_Peirsol_0001.jpg \
--image_path_2 images/Aaron_Peirsol_0002.jpg

The output is a cosine similarity between 2 given images.

Visualization

python source/visualize.py \
--data_path data/ \
--model_path models/model.pth \
--plot_path images/tsne.jpg \
--k_classes 3

Result:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
models		models
source		source
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supervised contrastive learning on LFW dataset

Review

Model structure

Quick start

Setup

Train

Predict

Visualization

About

Releases

Packages

Languages

enterpolation/SimCLR-LFW

Folders and files

Latest commit

History

Repository files navigation

Supervised contrastive learning on LFW dataset

Review

Model structure

Quick start

Setup

Train

Predict

Visualization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages