NLP Pipeline for Multi-Document Search

Introduction

This repo aims to utilise different feature engineering techniques to maximise the accuracy of obtaining the most relevant documents in a corpus in relation to a query given by the user. This is done in the backend (private repository).

Pipeline

The model firstly pre-processes both the query as well as the documents into bag-of-words, using scikitlearn tools.

It then finds the cosine similarity using 2 feature engineering algorithms (LSI and TF-IDF) and are weighed (evenly atm) to form a combined cosine similarity score.

Afterwards, it attempts to implement pointwise ranking to rank the scores based on historic data to give a set of best solutions. (WIP)

The website is integrated with OpenAI's Completion prompt and utilises prompt engineering to give you the most relevant result in the top file.

To-dos

Find the best weights for cosine similarity of TF-IDF and LSI algorithms
Do more pre-processing for the user's query
Use natural language to shape the sentence answers to answer the query

Credits

Practical Natural Language Processing - Sowmya Vajjala et. al (O'Reilly Publications)

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
public		public
src		src
.env.production		.env.production
.gitignore		.gitignore
README.md		README.md
custom.d.ts		custom.d.ts
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Pipeline for Multi-Document Search

Introduction

Pipeline

To-dos

Credits

About

Releases

Packages

Languages

neozhixuan/internship-nlpfrontend

Folders and files

Latest commit

History

Repository files navigation

NLP Pipeline for Multi-Document Search

Introduction

Pipeline

To-dos

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages