- Retrieving documents by looking for query terms alone throughout documents is slow and not scalable. Moreover, it is usually needed in information retrieval systems to arrange query results based on the order of relevance to the query.
- Implemented a responsive information retrieval WebApp (< 2 sec) using PostgreSQL + Django
- vectorial representation of files using a term-document-matrix (TDM) and query results
- Retrieval of those documents using a dictionary data structure and display of search result files in order of relevance.
- Created a database schema in PostgreSQL with Python's Django models.
- Database connection, configuration, and population with text files.
- Text pre-processing and cleaning using natural language processing toolkit ( NLTK)
View the Jupyter Notebook here.