An innovative tool designed to address the crucial need for domain informed specificity in scientific data extraction from extensive online resources. Utilizing a novel algorithm, Whitespace, and advanced NLP techniques, this tool demonstrates significant accuracy in extracting relevant information for the material science domain. Tested across diverse datasets, it achieved commendable results, showing promise for broader applications.
- Integrated OCR and Layout Parsing
- Database/Corpus Generation
- Relevancy Check
- Data Extraction
- Multiple PDF uploads
Clone this repository in local using command git clone https://github.com/d-mittal-21/Reseractor.git
or simply dowload the zip file from here
Make sure all the packages listed in requirements.txt
file are installed before following further steps.
Download the pre-trained model from this link and store it in models
folder.
Now run the command python main.py
to start the interactive GUI window.
Next you can follow the Demo Video to get use to the GUI.