Self-Supervised Vision Transformer

This is my attempt at creating a Self-Supervised model. I decided to try and make it a general purpose, pretrained model.

The method I went for was using Masked Autoencoding (MAE), based on the paper "Masked Autoencoders are scalable vision learners" (citied below). The general structure is the same as the MAE paper, but I introduce the Mask Tokens before the encoder, rather than after. To summarize the overall architecture

patchify the images
mask 75% of the patches, repalcing with learnable mask tokens
Pass the sequence to the encoder
Pass the encoder output to the decoder
Regenerate the image and use MSE loss to optimize.

The encoder architecture is fairly simple. The masked inputs are projected into the embedding space. Then learnable mask tokens are selected and inserted from a set of learnable tokens (purple patches). Positional embedding is also added to each patch (symbolized by green lines). This is then passed into the transformer which sends it to an MLP head and then outputs a sequence.

The Decoder architecture is a copy of the encoder architecture, except positional encoding and mask tokens are not injected. Its just a transformer with an MLP head.

The transformer architecture is the standard transformer architecture (image snipped from Wikipedia: https://en.wikipedia.org/wiki/Vision_transformer)

Currently, I will be training on Tiny-ImageNet (cited at bottom). Note that because the test images have no annotations, the val images will be used for testing instead.

Download: http://cs231n.stanford.edu/tiny-imagenet-200.zip

Citations

He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2021). Masked autoencoders are scalable vision learners. arXiv. https://arxiv.org/abs/2111.06377

Le, Y., & Yang, X. (2015). Tiny imagenet visual recognition challenge. CS 231N, 7(7), 3.

https://cs231n.stanford.edu/2021/schedule.html (used variety of their lecture slides for reference)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
DataSet.py		DataSet.py
README.md		README.md
data2npy.py		data2npy.py
model.py		model.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Supervised Vision Transformer

Citations

About

Releases

Packages

Languages

AaronHonjaya/Self-Supervised-ViT

Folders and files

Latest commit

History

Repository files navigation

Self-Supervised Vision Transformer

Citations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages