Skip to content

Repository for creating a Vision Transformer (ViT) model for solving the FoodMini problem.

Notifications You must be signed in to change notification settings

rkstu/FoodMini-Vision-Transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FoodMini Vision Transformer (ViT) Project

Original ViT paper: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Original Transformer paper: Attention is all you need

Dataset: FoodVision Mini dataset

Overview

Welcome to the FoodMini Vision Transformer project! In this repository, I am excited to share my journey of replicating a machine learning research paper, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" (ViT paper), using PyTorch. The primary goal is to create a Vision Transformer (ViT) from scratch and achieve a test accuracy of above 90% on the FoodVision Mini problem.

Background

Transformer neural network architecture, which was originally introduced in the machine learning research paper "Attention is all you need." Initially designed for one-dimensional (1D) sequences of text, the Transformer architecture leverages the attention mechanism as its primary learning layer. Similar to how a convolutional neural network (CNN) uses convolutions, a Transformer architecture is generally any neural network that utilizes the attention mechanism as its primary learning layer.

Replication

Replicated the impressive results presented in the ViT paper by implementing the Vision Transformer architecture with PyTorch. ViT has emerged as a state-of-the-art solution for computer vision tasks, showcasing remarkable performance in image recognition at scale.

Getting Started

To embark on this replication journey and run the Vision Transformer on the FoodVision Mini problem, follow these steps:

  1. Clone the repository:
git clone https://github.com/rkstu/FoodMini-Vision-Transformer.git
cd FoodMini-Vision-Transformer

About

Repository for creating a Vision Transformer (ViT) model for solving the FoodMini problem.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published