MLP (for MNIST) from Scratch

Overview

The goal of this project was to better understand basic neural networks. Since I am a very hands on learner, I thought the best way to solidify my understanding was build it from scratch! I decided to build a MLP (multi-layer perceptron) with one hidden layer to classify the MNIST as JPG dataset. Sadly, there were no labels provided for the test set, so I just used the training set as the whole dataset and created my own train/val/test splits.

Of course, I am still using the PyTorch library for tensor multiplication and other tensor math, and NumPy for basic square roots, but my goal is for everything to be written from scratch eventually, besides menial annoying low-level functions like square roots (unless I want to practice numerical approximation!).

So far I have implemented:

Simple linear/fully-connected layer
- Includes weight optimization via gradient descent (using BGD or SGD would probably lead to greater accuracy)
ReLU
Dropout layer
Simple dataloader for MNIST-JPG (using OpenCV for image loading)
- Includes split function for creating train/test splits
Weight saving and loading

I am still using the PyTorch implementation for:

Softmax
CrossEntropyLoss (This combines nn.LogSoftmax() and nn.NLLLoss() in one single class.)
Backpropagation (i.e. the brunt of the work)

You can find my MLP code in mlp.py. As I continue to write more and more code from scratch, I expect my models performance to become worse and worse due to the advantages that using PyTorch has (it is written in C and highly optimized). For anyone who wants to implement a MLP in Python, and doesn't care for learning it indepthly, I highly recommend just completely using PyTorch. Maybe a future project is to do this from scratch in C!

torch-mlp.py is the same architecture, just built using all PyTorch functions.

Architecture

My architecture is as follows:

Linear(784, 500)
- ReLU()
- Dropout()
Linear(500, 250)
- ReLU()
- Dropout()
Linear(250, 10)
CrossEntropyLoss()

Results

I achieved a test accuracy of $0.9327$ with a loss of $0.2502$ on the MNIST as JPG dataset after $10$ epochs using gradient descent optimization. I used a learning rate of $0.001$ and a dropout probability of $0.1$. The weights for that model can be downloaded from weights.pt.

Using PyTorch functions and PyTorch's SGD optimizer in torch-mlp.py, I was able to achieve an accuracy of $>98\%$. Obviously, I am aim to match that with my own implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
main.py		main.py
mlp.py		mlp.py
torch-mlp.py		torch-mlp.py
weights.pt		weights.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLP (for MNIST) from Scratch

Overview

Architecture

Results

Future Goals

About

Languages

colinpannikkat/mlp_scratch

Folders and files

Latest commit

History

Repository files navigation

MLP (for MNIST) from Scratch

Overview

Architecture

Results

Future Goals

About

Resources

Stars

Watchers

Forks

Languages