Skip to content

Training a ViT on CIFAR-10 dataset and then comparing a pre-trained ViT and CNN

Notifications You must be signed in to change notification settings

Mehrab-Kalantari/Vision-Transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

CIFAR-10 Vision Transformer

The notebook contains

  • Building a vision transformer from scratch
  • Training a pre-trained ViT
  • Comparing ViTs and CNNs

Vision Transformer

In the first part, we trained the model with 20 epochs and 4 attention heads and 4 layers. Also embedding dimension has been set to 64.

Pre-trained Vision Transformer

There is also a built-in version of ViT model which was trained on ImageNet-21k at resolution 224 * 224. Here we try this with 3 epochs model.

Pre-trained CNN

ResNet18 with 3 epochs has been trained on the dataset.

Results

pic

About

Training a ViT on CIFAR-10 dataset and then comparing a pre-trained ViT and CNN

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published