This Repository contains a python based application that implements Photo Ocr, Photos to Pdf converter, Text to speech converter and Speech to text converter.
The Multitasker Project is a Machine Learning project made in python language and UI made in PyQt framework. The parts of the app are:
-
The Photo Ocr is implemented in machine learning with the help of SVM library of skLearn and the OpenCv for image manipution. The Photo Ocr application gives nearly 80% accuray on the digital printed cleaned text. The accuracy of prediction decrease as the level of noise increases.
-
The pdf scanner takes the images from the user in .png , .jpg , .jpeg, and .gif and then using the Py2Pdf Python Library convert it to the pdf files. The features of this application are that you can add ,remove, change the order of images dynamically. After you convert it to pdf you can easily save in any directory.
-
Sometimes there are situations when you want to write something rapidly but you have to write everthing by you hand so, what Speech to Text basically do is it directly transforms your spoken text in to an editable text where you can also edit the text in the editor.It uses the Python gtts module to convert the spoken words to editable text.
-
This is an application by which you can cconvert any editable text document to an audio file and play instantly or save it for later use.Currently it supports three accents i.e English-US ,English-UK, English-Indian. It uses the python Speech recognition library to do the conversion.
The app is designed with Qt4 framework and is successfully tested on Windows 10 and Windows 8.1
If you want to contact me, then feel free to ping me here : https://kvikesh800.wixsite.com/learner/contact
Main Window | Photo Ocr Window |
Pdf Scanner Window | Speech To Text Window |
Text To Speech Window | About Us Window |