This is my (not very successful) attempt to do both detection and classification of numbers in SVHN dataset using 2 CNNs.
This project contains 2 parts:
- Using CNN to do bounding box regression to find the top, left, width and height of the bounding box which contains all the digits in a given image
- Use the bounding box from step 1 to extract a part of the image with only digits and use a another multi-output CNN to classify the digits of the cut image.
My original intension was that this would improve the accuracy compared to the case where we just feed the entire svhn image into the CNN and let the CNN predict all the digits in the image. But the entire pipeline gave me only 51% accuracy where all the digits match exactly and individual digit accuracies of 71%, 65%, 84% and 98% for the 1st, 2nd, 3rd and 4th digit respectively (we only consider max of 4-digit prediction).
- Get input image (so far, this has only been tested on test dataset images of SVHN dataset)
- Resize to 64x64, convert to greyscale and normalize the image
- Feed processed image into detection CNN to get bounding box
- Re-scale bounding box to image's original size
- Cut the bounding box alone and resize to 64x64
- Feed the image we just cut and resized to the classification CNN to get digits
- Convert CNN predictions into an understandable format
- Output digits
The bounding boxes in the images below are coordinates predicted by the detection CNN and the number prediction is done by the classification CNN.
Image | Predicted value | Actual value |
---|---|---|
1522 | 1502 | |
135 | 135 | |
861 | 861 | |
348 | 348 | |
114 | 114 | |
23 | 23 |
The bounding boxes in the images below are coordinates predicted by the detection CNN and the number prediction is done by the classification CNN.
Image | Predicted value | Actual value |
---|---|---|
32 | 863 | |
6 | 7 | |
8 | 26 | |
1 | 184 | |
1410 | 44 | |
27 | 6 |
- I did not want to use YOLO for such a simple task, but detection CNN could be improved
- Augmenting the dataset by shifting the actual bounding boxes for training the detection CNN slighlty improved the accuracy (+5%) more augmentation can be exlored
- Same can be done for classification CNN - but it was not done in this project
construct_datasets.py
Uses the images downloaded from SVHN dataset website website along with the .mat files describing the bounding box to build a single table for each test and train for easy use in other files. If you don't want to run this file, download it .h5 files from the google drive link below.
train_digit_classification.py
Uses the processed .h5 files in data folder to train a classification CNN.
train_digit_detection.py
Uses the processed .h5 files in data folder to train a detection CNN.
combi_models.py
After training both networks, this file uses both networks to implement all the steps described in the pipeline section above.
Weights for both CNNs and .h5 files for train and test datasets are available in the link below:
CNN Weights: https://drive.google.com/open?id=1vv7vzqzGjjUqjcCZYeX_NaGrqSU1Ami2
Dataset: https://drive.google.com/open?id=1KfVqQHjimQnXdzsCtQurwmTSpMe2mmA7
Python 3.5
All code was run on Amazon EC2 Deep Learning AMI version 7 (ami-139a476c)
I also tested this on my local Windows 10 PC with the following libraries:
- Numpy 1.13.1
- Keras 2.0.5
- Pandas 0.20.3
- OpenCV 3.2.0
- TensorFlow 1.2.1 (with GPU support)