CIFAR-10 Vision Transformer

The notebook contains

Vision Transformer

In the first part, we trained the model with 20 epochs and 4 attention heads and 4 layers. Also embedding dimension has been set to 64.

There is also a built-in version of ViT model which was trained on ImageNet-21k at resolution 224 * 224. Here we try this with 3 epochs model.

ResNet18 with 3 epochs has been trained on the dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
Vision Transformer.ipynb		Vision Transformer.ipynb
res.png		res.png