A PyTorch implementation of Vision Transformers as described in: An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale
trained on CIFAR-100 dataset
- torch==2.0.1
- pytorch_lightning==1.9.0
- torchmetrics==0.11.4
- torchvision==0.15.2
pip install -r requirements.txt
python main.py
tensorboard --logdir logs
model = LightningVisionTransformer(
image_size = 384,
patch_size = 16,
in_channels = 3,
n_classes = 1000,
embedding_dimension=768,
depth=12,
n_heads=12,
mlp_ratio=4.,
qkv_bias=True,
proj_p=0.,
attn_p=0
)