A Keras implementation of CapsNet in the paper:
Sara Sabour, Nicholas Frosst, Geoffrey E Hinton. Dynamic Routing Between Capsules. NIPS 2017
This repository contains code to the section(4) which closely simulate those run by Geoffrey Hinton in the paper linked to above.
Differences with the paper:
- We only report the test errors after
50 epochs
training. In the paper they trained for1250 epochs
according to Figure A.1 - We only experimented routing iteration 2 and 3 in our code but paper did on 2,3,5,7 and 10
Please use Keras==2.2.4 with TensorFlow==1.15.0 backend, or the K.batch_dot
function may not work correctly.
Download the training and test dataset from the link given below:
http://www.cs.toronto.edu/~tijmen/affNIST/
This code has been compiled on Colab Notebook
and thus has Callbacks files accordingly to save the model, weights and results. If you don't want to save these just comment out the lines corresponding to ModelCheckpoint
and CSVLogger
Step 1. Clone this repository to local.
https://github.com/hrsht-13/Capsule_Network.git
Step 2. Install Keras==2.2.4 with TensorFlow==1.15.0 backend.
pip install tensorflow==1.15.0
pip install keras==2.2.4
To access idx3-ubyte
file as numpy array:
pip install idx2numpy
We have trained a model using the above command, the trained model is
saved in .h5 file
, which can be downloaded from:
for routing iteration=2
https://drive.google.com/file/d/1l2oiNhAeWxEKE4MlTwaL3oC3LShNy9g9/view
for routing iteration=3
https://drive.google.com/file/d/1pLiw2Boeedmbx9fe631dgBfkn9W_128c/view
The testing data is same as the validation data. It will be easy to test on new data, just change the code as you want.
CapsNet classification test error on MNIST. Losses and accuracies on Test set only for routing iteration=3
The data for the above graph is save in CSV_data folder.
About 285s / epoch
on Google Colab GPU for routing iteration=2.
About 448s / epoch
on Google Colab GPU for routing iteration=3.
Digits at left-side are real images from MNIST and digits at right-side are corresponding reconstructed images.
- With different number of routings we have different losses for initial epochs, but after a sufficient number of epochs losses
converges to the same range. - While, training with routing iterations 2 and 3, we observed that with 2 routings, loss was initially high which drastically dropped just after few initial epochs, whereas when number of routings was increased to 3, model started with low loss which eventually decrease slowly.
- In general more routing iterations increases the network capacity and tends to overfit to the training dataset
- The paper itself is not very detailed and hence it leaves some open questions about specifics of the network implementation that are as of today still unanswered because the authors did not provide their code.