FoodMini Vision Transformer (ViT) Project

Original ViT paper: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Original Transformer paper: Attention is all you need

Overview

Welcome to the FoodMini Vision Transformer project! In this repository, I am excited to share my journey of replicating a machine learning research paper, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" (ViT paper), using PyTorch. The primary goal is to create a Vision Transformer (ViT) from scratch and achieve a test accuracy of above 90% on the FoodVision Mini problem.

Background

Transformer neural network architecture, which was originally introduced in the machine learning research paper "Attention is all you need." Initially designed for one-dimensional (1D) sequences of text, the Transformer architecture leverages the attention mechanism as its primary learning layer. Similar to how a convolutional neural network (CNN) uses convolutions, a Transformer architecture is generally any neural network that utilizes the attention mechanism as its primary learning layer.

Replication

Replicated the impressive results presented in the ViT paper by implementing the Vision Transformer architecture with PyTorch. ViT has emerged as a state-of-the-art solution for computer vision tasks, showcasing remarkable performance in image recognition at scale.

Getting Started

To embark on this replication journey and run the Vision Transformer on the FoodVision Mini problem, follow these steps:

Clone the repository:

git clone https://github.com/rkstu/FoodMini-Vision-Transformer.git
cd FoodMini-Vision-Transformer

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
modular		modular
FoodMini_Vision_Transformer.ipynb		FoodMini_Vision_Transformer.ipynb
README.md		README.md
helper_functions.py		helper_functions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FoodMini Vision Transformer (ViT) Project

Overview

Background

Replication

Getting Started

About

Releases

Packages

Languages

rkstu/FoodMini-Vision-Transformer

Folders and files

Latest commit

History

Repository files navigation

FoodMini Vision Transformer (ViT) Project

Overview

Background

Replication

Getting Started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages