Skip to content

NLP-based tool for domain specific conditions mining from PDF research articles

Notifications You must be signed in to change notification settings

d-mittal-21/Reseractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reseractor

A NLP based tool for domain specific extraction from research paper pdfs

Overview

An innovative tool designed to address the crucial need for domain informed specificity in scientific data extraction from extensive online resources. Utilizing a novel algorithm, Whitespace, and advanced NLP techniques, this tool demonstrates significant accuracy in extracting relevant information for the material science domain. Tested across diverse datasets, it achieved commendable results, showing promise for broader applications.

Features

  • Integrated OCR and Layout Parsing
  • Database/Corpus Generation
  • Relevancy Check
  • Data Extraction
  • Multiple PDF uploads

Installation

Clone this repository in local using command git clone https://github.com/d-mittal-21/Reseractor.git or simply dowload the zip file from here

Make sure all the packages listed in requirements.txt file are installed before following further steps.

Download the pre-trained model from this link and store it in models folder.

Now run the command python main.py to start the interactive GUI window.

Next you can follow the Demo Video to get use to the GUI.

Implementation

Demo Video

tool_video.mp4

About

NLP-based tool for domain specific conditions mining from PDF research articles

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published