CITS5553_G1_Semantic_Word_Search

The primary aim of this UWA Capstone Project is to develop a unified platform to efficiently identify and retrieve relevant contracts & tender specification documents within Department of Finance. By harnessing the power of advanced Machine Learning (ML) and Natural Language Processing (NLP) techniques, the project seeks to enhance the search process through applied filters and semantic similarity while ensuring usability and understandability.

Introduction

The Department of Finance as a central agency operates as the driving force behind critical financial processes and decisions, often collaborating with other government bodies to facilitate smooth execution of projects, procurements, and services. Finance is responsible for the whole-of-government procurement portal where agencies are required to complete their procurement processes for contracts of value greater than $50,000. The key bottleneck in existing data interpretation process is the considerable manual effort required for ad-hoc reporting, particularly for scenarios like Parliamentary Questions (PQs). The complex nature of information needed makes this process labor-intensive which not only slows the decision-making process but also makes it susceptible to human errors.

The solution consists of three components:

The Pre-processor assesses and organizes tender data
The Semantic Crawler enhances search using NLP
The User Interface offers user-friendly query input with filters and provides reliability scores where the entire approach aims to enhance the efficiency of retrieving tenders.

Features

The Preprocessor component:

Uses web scraping to download zipped files
Unzip the downloaded file
Extract the textual content from pdf and docx files
Extract useful content from the files (POS and NER tags)
Prepare a CSV file that has all the useful data
Generate a Vector database from the previously generated CSV file

The "www" foler consists of 2 sub-folders:

Client - It has the user interface (UI) to facilitate user input of search queries, initiation of search operation and then seamless display of the matching contracts for easy access and interpretation. The matching contracts will be shown in order of their ranking.
It will also provide reporting mechanisms/visualizations to facilitate quicker insights into complex data scenarios for better understanding of the search results.
UI commmunicates with a backend that will search for contracts with semantically similar content will be found, rated based on similarity score.
Server - It has the backend 'Semantic Crawler' that uses various NLP models to detect and report semantically similar contracts.

Prerequisites

Python 3.x or higher
Jupyter Notebook (optional)
Next.js
FastAPI
Tailwind CSS

Installation

All the python libraries can be installed using "pip". The UI dependencies can be installed using - pnpm install (or) npm install after cloning repository

Starting Backend and Frontend:

To start the backend server, use the following command
uvicorn main:app --reload

To start the frontend, use the following command
pnpm run dev

Name	Name	Last commit message	Last commit date
Latest commit NicholasChoong remove appDir Nov 8, 2023 fcb0944 · Nov 8, 2023 History 190 Commits
Contributions	Contributions	integration of ui filters	Oct 30, 2023
Documentations	Documentations	init documentation	Aug 30, 2023
Preprocessing	Preprocessing	integraating ui filters	Oct 30, 2023
www	www	remove appDir	Nov 8, 2023
.gitignore	.gitignore	ignore node_modules	Nov 8, 2023
LICENSE	LICENSE	Create LICENSE	Aug 30, 2023
README.md	README.md	Update README.md	Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CITS5553_G1_Semantic_Word_Search

Introduction

Features

Prerequisites

Installation

Starting Backend and Frontend:

About

Releases

Packages

Contributors 7

Languages

License

abhishek-and/CITS5553_G1_Semantic_Word_Search

Folders and files

Latest commit

History

Repository files navigation

CITS5553_G1_Semantic_Word_Search

Introduction

Features

Prerequisites

Installation

Starting Backend and Frontend:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages