Combining CRAFT, Faster R-CNN, Tesseract and Siamese neural network model to make an Optical character recognition software which is hosted in azure cloud here (Note : Annotation works only in Firefox). The neural network models are trained with the help of PyTorch on FUND dataset and the server is hosted in a virtual machine in azure cloud using Flask. The frontend website consists of options for users to upload a scanned document of files of formats - .png, .jpg, .jpeg, .pdf (for pdf only the first page is considered) which is in return is converted into editable text, bounding boxes for each word and sentences, classified labels for each sentence among 'other', 'question', 'answer' and 'header' and also the linked sentences. The website also provides a user-friendly interface for users to modify the model predictions using annotate features which can also be done to a document without feeding it to the model waiting for model predictions from scratch.

        The annotation interface is made with the help of annotorious.js. After the model result or after annotating the document the information can be downloaded into simple .txt format. There are also options to run the model offline so that multiple images can be fed to the images at once and it is also an option to decide if the output should be of MTX format or FUND dataset format.

        I am running the models in Azure VM because of the requirement of Tesseract and Popper. I am using Standard B2s (2 vcpus, 4 GiB memory) in Azure VM with Linux (ubuntu 18.04) as the operating system. I have added the videos and images of accessing the website which has been hosted through Azure VM but currently, I am unable to keep the VM open all the time due to interruption when the SSH connection is closed (I start the server in Azure VM using PuTTY to through SSH connection). But the same result can still be achieved by following the server installation and starting setup given below. I will be leaving the server open for as long as possible in a whole day so it might so the link might sometimes work.

     Most of the model training is done with the help of Pytorch. I have explained the training steps and the metrics to analyze the models in training. You can download all the trained models and public test dataset predictions here.

Website link :

http://frozenwolf-ocr.westeurope.cloudapp.azure.com:5000/home

Submission Link :

https://drive.google.com/drive/folders/1rcIWV1qp_k9rbPBL-IcCa_fp1fHW7auG?usp=sharing

Built Using

Python :

Flask
pickle-mixin
numpy
Pillow
regex
pdf2image
opencv-python
scikit-image
torch
torchvision
pytesseract

Javascript :

bootstrap
annotorious

Installation

Dependencies :

tesseract-ocr
poppler-utils

1.Install server Requirements :

Minimal Installation through command :

Note: The libraries installed through this process are targeted for Ubuntu Python 3.6 version. Also, the Pytorch CPU version is installed in this case to minimize memory usage

pip install -r requirements.txt

Additional Training Install Requirements (Optional) :

Note : This is required only if you want to run the .ipynb training notebooks in training folder

matplotlib
seaborn
nltk
torchinfo
albumentations

Finally after installing requirements, clone the repo

git clone https://github.com/FrozenWolf-Cyber/OCR.git

Usage

1.Starting the server :

Project structure :

server:
|   app.py
|   craft.py
|   craft_utils.py
|   imgproc.py
|   ocr_predictor.py
|   refinenet.py
|   word_Detection.py
|   
+---basenet
|   |   vgg16_bn.py
|   |   
|   \---__pycache__
|           vgg16_bn.cpython-39.pyc
|                      
+---img_save
|       requirements.txt
|       
+---saved_models
|       craft_mlt_25k.pth
|       craft_refiner_CTW1500.pth
|       embs_npa.npy
|       faster_rcnn_sgd.pth
|       siamese_multi_head.pth
|       vocab
|       
+---static
|   \---assets
|       +---bootstrap
|       |   +---css
|       |   |       bootstrap.min.css
|       |   |       
|       |   \---js
|       |           bootstrap.min.js
|       |           
|       +---css
|       |       animated-textbox-1.css
|       |       animated-textbox.css
|       |       annotorious.min.css
|       |       Codeblock.css
|       |       custom.css
|       |       custom_annotate.css
|       |       Drag--Drop-Upload-Form.css
|       |       Features-Blue.css
|       |       Footer-Basic.css
|       |       Navigation-Clean.css
|       |       PJansari---Horizontal-Stepper.css
|       |       steps-progressbar.css
|       |       
|       +---fonts
|       |       ionicons.eot
|       |       ionicons.min.css
|       |       ionicons.svg
|       |       ionicons.ttf
|       |       ionicons.woff
|       |       material-icons.min.css
|       |       MaterialIcons-Regular.eot
|       |       MaterialIcons-Regular.svg
|       |       MaterialIcons-Regular.ttf
|       |       MaterialIcons-Regular.woff
|       |       MaterialIcons-Regular.woff2
|       |       
|       +---img
|       |       bg-masthead.jpg
|       |       bg-showcase-2.jpg
|       |       bg-showcase-3.jpg
|       |       
|       \---js
|               annotate.js
|               annotorious.min.js
|               annotorious.umd.js.map
|               bs-init.js
|               navigator.js
|               recogito-polyfills.js
|               result.js
|               upload.js
|               
+---status
|       requirements.txt
|       
+---temp
|       requirements.txt
|       
+---templates
       annotate.html
       home.html
       result.html
       upload.html
       upload_annotate.html

To start the server run the app.py inside the server folder

python app.py

Local-Server-Demo.mp4

2.Predicting mutliple scanned documents offline :

To run this program minimal installation is enough

Project structure

batch_run
|   app.py
|   craft.py
|   craft_utils.py
|   demo_batch_run.png
|   imgproc.py
|   ocr_predictor.py
|   predict.py
|   refinenet.py
|   tree.txt
|   word_Detection.py
|   
+---basenet
|   |   vgg16_bn.py
|   |   
|   \---__pycache__
|           vgg16_bn.cpython-39.pyc
|           
+---img_save
|       
+---result
|       
+---saved_models
|       craft_mlt_25k.pth
|       craft_refiner_CTW1500.pth
|       embs_npa.npy
|       faster_rcnn_sgd.pth
|       siamese_multi_head.pth
|       vocab
|       
+---testing_data
   +---documents
   |       your_pdf1.pdf
   |       your_pdf2.pdf
   |       your_pdf3.pdf
   \---images
        your_image1.png
        your_image2.png

Custom run :

Prediction :

Inside batch_run folder run,

python predict.py -path <target folder> -MTX <Y/N> -sr <Y/N> -pdf <Y/N>

usage: predict.py [-h] [-path PATH] [-MTX MTX] [-sr SR] [-pdf PDF]

optional arguments:
  -h, --help            show this help message and exit
  -path PATH, --path PATH
                        Use relative path
  -MTX MTX, --MTX MTX   Should be <Y> or <N>. If <Y> then the output will be in MTX Hacker Olympics format, if <N>
                        then the output will be of FUND dataset format
  -sr SR, --sr SR       Should be <Y> or <N>. If <Y> then the output will be saved in a seperate JSON file whereas the
                        scores for each label classification and linking will be in seperate file, if <N> then the
                        both will be in same file
  -pdf PDF, --pdf PDF   Should be <Y> or <N>. If <Y> then the target folder contains multiple .pdf documents, if <N>
                        then the folder contains multiple .png,.jpg,.jpeg documents

Example :

python predict.py -path testing_data/images -MTX Y -sr N -pdf N

python predict.py -path testing_data/documents -MTX Y -sr N -pdf Y

Evalution :

Inside batch_run folder run,

python evaluate.py -img <Image folder> -anno <Annotations folder> -sr <Y/N>

optional arguments:
  -h, --help            show this help message and exit
  -img IMG_PATH, --img_path IMG_PATH
                        Use relative path
  -anno ANNO_PATH, --anno_path ANNO_PATH
                        Use relative path
  -sr SR, --sr SR       Should be <Y> or <N>. If <Y> then the output will be saved in a seperate JSON file whereas the
                        scores for each label classification and linking will be in seperate file, if <N> then the
                        both will be in same file

Example :

python evaluate.py -img testing_data/images -anno testing_data/annotations -sr Y

Each prediction and score are saved in the result folder as a .json file together or separate based on the custom configuration you have selected. In case of evaluation additional metrics.json file is saved, it contains label and linking accuracy, f_score, precision and recall value of each image seperately.

3. Website :

Note : In the website the format the model returns is that of the FUND dataset, for MTX evaluation purposes go to batch_run where you can choose the output format.Annotation works only in Firefox

Hosting in azure VM

Azure-Demo_low.mp4

Home

There are options to annotate after model predictions or else to start annotating from scratch

Upload

You can either drag and drop the images or just select them. The images should be of form .png or .jpeg or .jpg or .pdf Note: For .pdf files, the first page alone will be considered

Progress

After getting the model output using either can continue to modify their bounding box, label, translation, and linking predictions in annotations or finish it by downloading it in the form of a .txt

Annotate

Using annotorius.js the annotation can be now done very much easier. To modify the words you have to click any one of the corresponding sentences. After completing annotating the images used can either download the final result in the .txt form. Instead of waiting for model predictions to come, users can choose to annotate from scratch too.

RCNN Performance:

Icons made by Freepik from www.flaticon.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

About

Website link :

Submission Link :

Built Using

Installation

Dependencies :

1.Install server Requirements :

Minimal Installation through command :

Additional Training Install Requirements (Optional) :

Usage

1.Starting the server :

2.Predicting mutliple scanned documents offline :

Project structure

Custom run :

Prediction :

Evalution :

3. Website :

Hosting in azure VM

Home

Upload

Progress

Annotate

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
Demo		Demo
Public Test Set Submission		Public Test Set Submission
__pycache__		__pycache__
basenet		basenet
batch_run		batch_run
img_save		img_save
saved_models		saved_models
static/assets		static/assets
status		status
temp		temp
templates		templates
training		training
.gitignore		.gitignore
Key-Value Pair Detection in Documents.pptx		Key-Value Pair Detection in Documents.pptx
README.md		README.md
app.py		app.py
craft.py		craft.py
craft_utils.py		craft_utils.py
imgproc.py		imgproc.py
ocr_predictor.py		ocr_predictor.py
refinenet.py		refinenet.py
requirements.txt		requirements.txt
tesseract_install_cmd.txt		tesseract_install_cmd.txt
word_Detection.py		word_Detection.py

FrozenWolf-Cyber/OCR

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

About

Website link :

Submission Link :

Built Using

Installation

Dependencies :

1.Install server Requirements :

Minimal Installation through command :

Additional Training Install Requirements (Optional) :

Usage

1.Starting the server :

2.Predicting mutliple scanned documents offline :

Project structure

Custom run :

Prediction :

Evalution :

3. Website :

Hosting in azure VM

Home

Upload

Progress

Annotate

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages