Our project looks to focus on the theme of queries regarding job searches. Our team understands the pain with regards to finding jobs as almost-graduates ourselves, and many a time we will head online to look for relevant reviews for the job and/or the company that we are applying for. These reviews will be able to better help our understanding of the position and the company we are applying for, and also to get a general sense of the attitude towards that current role based on current or past staff.
Job_reviews.csv → original csv file
Cleaned_data.csv → csv file after performing data cleaning and preprocessing
Train_cleaned_data.csv → split to include only 80% of the cleaned_data.csv after shuffling
Test_cleaned_data.csv → includes the remaining 20% of cleaned_data.csv
Cleandata.ipynb → initial steps to clean up the original csv file and preprocess it
CosineSimilarity.ipynb→ ipynb file to run cosine similarity and cosine similarity with pos tagging, relevance feedback , Average Precision; includes other methods explored such as AND, OR, Jaccard Similarity and Boolean Retrieval.
Bm25_og.ipynb → original bm25 model code
Bm25_postag.ipynb → bm25 model code with added weights for specific terms such as location, place, position
bm25_ap→ ipynb file for bm25 for relevance feedback and Average Precision