Skip to content

Term Document Frequency based Information retrieval WebApp with Django + PostgreSQL

Notifications You must be signed in to change notification settings

BeTKH/IR_postgres

Repository files navigation

Information Retrieval WebApp (Postgre SQL + Django)

Problem:

  • Retrieving documents by looking for query terms alone throughout documents is slow and not scalable. Moreover, it is usually needed in information retrieval systems to arrange query results based on the order of relevance to the query.

Accomplishments:

  • Implemented a responsive information retrieval WebApp (< 2 sec) using PostgreSQL + Django
  • vectorial representation of files using a term-document-matrix (TDM) and query results
  • Retrieval of those documents using a dictionary data structure and display of search result files in order of relevance.
  • Created a database schema in PostgreSQL with Python's Django models.
  • Database connection, configuration, and population with text files.
  • Text pre-processing and cleaning using natural language processing toolkit ( NLTK)

View the Jupyter Notebook here.

Video

ir_video.mp4

System Architecture

Screenshot 2024-01-29 at 3 55 10 PM

Term-Document Matrix

Screenshot 2024-01-29 at 3 49 27 PM

Query processing

newplot (4)

Zif's Law

newplot (1)

Web App's UI

Screenshot 2024-01-29 at 3 55 38 PM

About

Term Document Frequency based Information retrieval WebApp with Django + PostgreSQL

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published