Learning Transformers right from the Beginning

Learning Transformers right from the Beginning

NLP

Start with The illustrated Transformer. It is the most comprehensible guide to start with visual aid.

For deeper level understanding:

The original paper from 2017.
The annotated Transformer comments the code implementation.
The illustrated BERT, a pretty popular model which is now the base for new models that are coming around, and its paper (2018).
Hugging Face library that encapsules a lot of transformer models for you to choose :).

Vision Transformers

The original Vision Transformer and its github. It is easier to use the hugging face library ViT though.
The next gen ViT that is a backbone to most Vision tasks, The Swin Transformer, and its second version, Swin V2. The second version had some tweaks so as to be more scalable and use images with high-resolution. They also pre-trained Swin V2 in a self-supervised way. Both models are available at huggingface through the pages: Swin V1 and Swin V2.
Another model that seems to be excellent is BEIT which is based on BERT. This is also available at huggingface.

Video Transformers

Now the ultimate level, the one that will test your limits and understanding of the previous models. The Video transformer. I will confess that I am new to this Video Transformer thing, but here come some suggestions. I personally just read about the Video Swin Transformer.

Video Swin Transformer. The code is in the same repository as Swin V1 and V2. There is an implementation of this model on torchvision lib.

The ones that I do not know but want to know more about:

Hugging face implementation of TimeSFormer.
Torchvision implementation of MVIT another video transformer for classification and detection.
Another interesting model is Swin2SR which performs Super Resolution tasks on compressed image and videos.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
callbacks.ipynb		callbacks.ipynb
vit_fine_tunning_lightning.ipynb		vit_fine_tunning_lightning.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Transformers right from the Beginning

NLP

Vision Transformers

Video Transformers

About

Releases

Packages

Languages

coffeedrunkpanda/learning

Folders and files

Latest commit

History

Repository files navigation

Learning Transformers right from the Beginning

NLP

Vision Transformers

Video Transformers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages