Easy way to convert scanned documents into an editable text document,
classifying key-value pairs and annotating them
Train results »  Â
Download Results and Models » Â
Live Demo »  Â
Preview »  Â
        Combining CRAFT, Faster R-CNN, Tesseract and Siamese neural network model to make an Optical character recognition software which is hosted in azure cloud here (Note : Annotation works only in Firefox). The neural network models are trained with the help of PyTorch on FUND dataset and the server is hosted in a virtual machine in azure cloud using Flask. The frontend website consists of options for users to upload a scanned document of files of formats - .png, .jpg, .jpeg, .pdf (for pdf only the first page is considered) which is in return is converted into editable text, bounding boxes for each word and sentences, classified labels for each sentence among 'other', 'question', 'answer' and 'header' and also the linked sentences. The website also provides a user-friendly interface for users to modify the model predictions using annotate features which can also be done to a document without feeding it to the model waiting for model predictions from scratch.
        The annotation interface is made with the help of annotorious.js. After the model result or after annotating the document the information can be downloaded into simple .txt format. There are also options to run the model offline so that multiple images can be fed to the images at once and it is also an option to decide if the output should be of MTX format or FUND dataset format.
        I am running the models in Azure VM because of the requirement of Tesseract and Popper. I am using Standard B2s (2 vcpus, 4 GiB memory) in Azure VM with Linux (ubuntu 18.04) as the operating system. I have added the videos and images of accessing the website which has been hosted through Azure VM but currently, I am unable to keep the VM open all the time due to interruption when the SSH connection is closed (I start the server in Azure VM using PuTTY to through SSH connection). But the same result can still be achieved by following the server installation and starting setup given below. I will be leaving the server open for as long as possible in a whole day so it might so the link might sometimes work.
    Most of the model training is done with the help of Pytorch. I have explained the training steps and the metrics to analyze the models in training.
You can download all the trained models and public test dataset predictions here.
http://frozenwolf-ocr.westeurope.cloudapp.azure.com:5000/home
https://drive.google.com/drive/folders/1rcIWV1qp_k9rbPBL-IcCa_fp1fHW7auG?usp=sharing
Python :
Flask
pickle-mixin
numpy
Pillow
regex
pdf2image
opencv-python
scikit-image
torch
torchvision
pytesseract
Javascript :
bootstrap
annotorious
tesseract-ocr
poppler-utils
Note: The libraries installed through this process are targeted for Ubuntu Python 3.6 version. Also, the Pytorch CPU version is installed in this case to minimize memory usage
pip install -r requirements.txt
Note : This is required only if you want to run the .ipynb training notebooks in training folder
matplotlib
seaborn
nltk
torchinfo
albumentations
Finally after installing requirements, clone the repo
git clone https://github.com/FrozenWolf-Cyber/OCR.git
Project structure :
server:
| app.py
| craft.py
| craft_utils.py
| imgproc.py
| ocr_predictor.py
| refinenet.py
| word_Detection.py
|
+---basenet
| | vgg16_bn.py
| |
| \---__pycache__
| vgg16_bn.cpython-39.pyc
|
+---img_save
| requirements.txt
|
+---saved_models
| craft_mlt_25k.pth
| craft_refiner_CTW1500.pth
| embs_npa.npy
| faster_rcnn_sgd.pth
| siamese_multi_head.pth
| vocab
|
+---static
| \---assets
| +---bootstrap
| | +---css
| | | bootstrap.min.css
| | |
| | \---js
| | bootstrap.min.js
| |
| +---css
| | animated-textbox-1.css
| | animated-textbox.css
| | annotorious.min.css
| | Codeblock.css
| | custom.css
| | custom_annotate.css
| | Drag--Drop-Upload-Form.css
| | Features-Blue.css
| | Footer-Basic.css
| | Navigation-Clean.css
| | PJansari---Horizontal-Stepper.css
| | steps-progressbar.css
| |
| +---fonts
| | ionicons.eot
| | ionicons.min.css
| | ionicons.svg
| | ionicons.ttf
| | ionicons.woff
| | material-icons.min.css
| | MaterialIcons-Regular.eot
| | MaterialIcons-Regular.svg
| | MaterialIcons-Regular.ttf
| | MaterialIcons-Regular.woff
| | MaterialIcons-Regular.woff2
| |
| +---img
| | bg-masthead.jpg
| | bg-showcase-2.jpg
| | bg-showcase-3.jpg
| |
| \---js
| annotate.js
| annotorious.min.js
| annotorious.umd.js.map
| bs-init.js
| navigator.js
| recogito-polyfills.js
| result.js
| upload.js
|
+---status
| requirements.txt
|
+---temp
| requirements.txt
|
+---templates
annotate.html
home.html
result.html
upload.html
upload_annotate.html
To start the server run the app.py inside the server folder
python app.py
Local-Server-Demo.mp4
To run this program minimal installation is enough
batch_run
| app.py
| craft.py
| craft_utils.py
| demo_batch_run.png
| imgproc.py
| ocr_predictor.py
| predict.py
| refinenet.py
| tree.txt
| word_Detection.py
|
+---basenet
| | vgg16_bn.py
| |
| \---__pycache__
| vgg16_bn.cpython-39.pyc
|
+---img_save
|
+---result
|
+---saved_models
| craft_mlt_25k.pth
| craft_refiner_CTW1500.pth
| embs_npa.npy
| faster_rcnn_sgd.pth
| siamese_multi_head.pth
| vocab
|
+---testing_data
+---documents
| your_pdf1.pdf
| your_pdf2.pdf
| your_pdf3.pdf
\---images
your_image1.png
your_image2.png
Inside batch_run folder run,
python predict.py -path <target folder> -MTX <Y/N> -sr <Y/N> -pdf <Y/N>
usage: predict.py [-h] [-path PATH] [-MTX MTX] [-sr SR] [-pdf PDF]
optional arguments:
-h, --help show this help message and exit
-path PATH, --path PATH
Use relative path
-MTX MTX, --MTX MTX Should be <Y> or <N>. If <Y> then the output will be in MTX Hacker Olympics format, if <N>
then the output will be of FUND dataset format
-sr SR, --sr SR Should be <Y> or <N>. If <Y> then the output will be saved in a seperate JSON file whereas the
scores for each label classification and linking will be in seperate file, if <N> then the
both will be in same file
-pdf PDF, --pdf PDF Should be <Y> or <N>. If <Y> then the target folder contains multiple .pdf documents, if <N>
then the folder contains multiple .png,.jpg,.jpeg documents
Example :
python predict.py -path testing_data/images -MTX Y -sr N -pdf N
python predict.py -path testing_data/documents -MTX Y -sr N -pdf Y
Inside batch_run folder run,
python evaluate.py -img <Image folder> -anno <Annotations folder> -sr <Y/N>
optional arguments:
-h, --help show this help message and exit
-img IMG_PATH, --img_path IMG_PATH
Use relative path
-anno ANNO_PATH, --anno_path ANNO_PATH
Use relative path
-sr SR, --sr SR Should be <Y> or <N>. If <Y> then the output will be saved in a seperate JSON file whereas the
scores for each label classification and linking will be in seperate file, if <N> then the
both will be in same file
Example :
python evaluate.py -img testing_data/images -anno testing_data/annotations -sr Y
Each prediction and score are saved in the result folder as a .json file together or separate based on the custom configuration you have selected. In case of evaluation additional metrics.json file is saved, it contains label and linking accuracy, f_score, precision and recall value of each image seperately.
Note : In the website the format the model returns is that of the FUND dataset, for MTX evaluation purposes go to batch_run where you can choose the output format.Annotation works only in Firefox
Azure-Demo_low.mp4
There are options to annotate after model predictions or else to start annotating from scratch
You can either drag and drop the images or just select them. The images should be of form .png or .jpeg or .jpg or .pdf
Note: For .pdf files, the first page alone will be considered
After getting the model output using either can continue to modify their bounding box, label, translation, and linking predictions in annotations or finish it by downloading it in the form of a .txt
Using annotorius.js the annotation can be now done very much easier. To modify the words you have to click any one of the corresponding sentences. After completing annotating the images used can either download the final result in the .txt form. Instead of waiting for model predictions to come, users can choose to annotate from scratch too.